Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Quick overview of Python and Scala

Foundational Concepts (Theory):

  • Architecture
  • RDD
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Mastering the Basics with Databricks (Hands-on Workshop):

  • RDD API exercises
  • Essential action and transformation functions
  • PairRDD
  • Join operations
  • Caching strategies
  • DataFrame API exercises
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • UDF (User Defined Function)
  • Introduction to the DataSet API
  • Streaming

Understanding Cloud Deployment with AWS (Hands-on Workshop):

  • Fundamentals of AWS Glue
  • Differences between AWS EMR and AWS Glue
  • Sample jobs across both environments
  • Analysis of pros and cons

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories