Hadoop with Python 교육 과정

Course Code

hadooppython

Duration

28 hours (usually 4 days including breaks)

Requirements

  • Experience with Python programming
  • Basic familiarity with Hadoop

Overview

Hadoop은 널리 사용되는 Big Data 프로세싱 프레임 워크입니다 파이썬은 명확한 구문 및 코드 가독성으로 유명한 고 레벨 프로그래밍 언어입니다 강사가되어 실습을하면서 Hadoop, MapReduce, Pig 및 Spark를 Python을 사용하여 여러 가지 예제와 사용 사례로 단계별로 작업하는 방법을 배우게됩니다 이 훈련이 끝나면 참가자는 다음을 할 수 있습니다 Hadoop, MapReduce, Pig 및 Spark의 기본 개념 이해 Hadoop 분산 파일 시스템 (HDFS), MapReduce, Pig 및 Spark에서 Python 사용 Snakebite를 사용하여 Python 내에서 HDFS에 프로그래밍 방식으로 액세스합니다 mrjob을 사용하여 파이썬에서 MapReduce 작업 작성하기 파이썬으로 스파크 프로그램 작성하기 Python UDF를 사용하여 돼지의 기능을 확장하십시오 Luigi를 사용하여 MapReduce 작업 및 돼지 스크립트 관리 청중 개발자 IT 전문가 코스 형식 파트 강의, 파트 토론, 연습 및 무거운 핸드 슨 연습 .

Machine Translated

Course Outline

Introduction

Understanding Hadoop's Architecture and Key Concepts

Understanding the Hadoop Distributed File System (HDFS)

  • Overview of HDFS and its Architectural Design
  • Interacting with HDFS
  • Performing Basic File Operations on HDFS
  • Overview of HDFS Command Reference
  • Overview of Snakebite
  • Installing Snakebite
  • Using the Snakebite Client Library
  • Using the CLI Client

Learning the MapReduce Programming Model with Python

  • Overview of the MapReduce Programming Model
  • Understanding Data Flow in the MapReduce Framework
    • Map
    • Shuffle and Sort
    • Reduce
  • Using the Hadoop Streaming Utility
    • Understanding How the Hadoop Streaming Utility Works
    • Demo: Implementing the WordCount Application on Python
  • Using the mrjob Library
    • Overview of mrjob
    • Installing mrjob
    • Demo: Implementing the WordCount Algorithm Using mrjob
    • Understanding How a MapReduce Job Written with the mrjob Library Works
    • Executing a MapReduce Application with mrjob
    • Hands-on: Computing Top Salaries Using mrjob

Learning Pig with Python

  • Overview of Pig
  • Demo: Implementing the WordCount Algorithm in Pig
  • Configuring and Running Pig Scripts and Pig Statements
    • Using the Pig Execution Modes
    • Using the Pig Interactive Mode
    • Using the Pic Batch Mode
  • Understanding the Basic Concepts of the Pig Latin Language
    • Using Statements
    • Loading Data
    • Transforming Data
    • Storing Data
  • Extending Pig's Functionality with Python UDFs
    • Registering a Python UDF File
    • Demo: A Simple Python UDF
    • Demo: String Manipulation Using Python UDF
    • Hands-on: Calculating the 10 Most Recent Movies Using Python UDF

Using Spark and PySpark

  • Overview of Spark
  • Demo: Implementing the WordCount Algorithm in PySpark
  • Overview of PySpark
    • Using an Interactive Shell
    • Implementing Self-Contained Applications
  • Working with Resilient Distributed Datasets (RDDs)
    • Creating RDDs from a Python Collection
    • Creating RDDs from Files
    • Implementing RDD Transformations
    • Implementing RDD Actions
  • Hands-on: Implementing a Text Search Program for Movie Titles with PySpark

Managing Workflow with Python

  • Overview of Apache Oozie and Luigi
  • Installing Luigi
  • Understanding Luigi Workflow Concepts
    • Tasks
    • Targets
    • Parameters
  • Demo: Examining a Workflow that Implements the WordCount Algorithm
  • Working with Hadoop Workflows that Control MapReduce and Pig Jobs
    • Using Luigi's Configuration Files
    • Working with MapReduce in Luigi
    • Working with Pig in Luigi

Summary and Conclusion

회원 평가

★★★★★
★★★★★

Related Categories

고객 회사

is growing fast!

We are looking to expand our presence in South Korea!

As a Business Development Manager you will:

  • expand business in South Korea
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!