Hadoop for Developers (4 days) 교육 과정

Course Code

hadoopdev

Duration

28 hours (usually 4 days including breaks)

Requirements

  • comfortable with Java programming language (most programming exercises are in java)
  • comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Firefox browser

Overview

Apache Hadoop 은 서버 클러스터에서 Big Data 를 처리하는 데 가장 널리 사용되는 프레임 워크입니다. 이 과정에서는 Hadoop 생태계의 다양한 구성 요소 (HDFS, MapReduce, Pig, Hive 및 HBase)에 대한 개발자를 소개합니다.

    Machine Translated

    Course Outline

    Section 1: Introduction to Hadoop

    • hadoop history, concepts
    • eco system
    • distributions
    • high level architecture
    • hadoop myths
    • hadoop challenges
    • hardware / software
    • lab : first look at Hadoop

    Section 2: HDFS

    • Design and architecture
    • concepts (horizontal scaling, replication, data locality, rack awareness)
    • Daemons : Namenode, Secondary namenode, Data node
    • communications / heart-beats
    • data integrity
    • read / write path
    • Namenode High Availability (HA), Federation
    • labs : Interacting with HDFS

    Section 3 : Map Reduce

    • concepts and architecture
    • daemons (MRV1) : jobtracker / tasktracker
    • phases : driver, mapper, shuffle/sort, reducer
    • Map Reduce Version 1 and Version 2 (YARN)
    • Internals of Map Reduce
    • Introduction to Java Map Reduce program
    • labs : Running a sample MapReduce program

    Section 4 : Pig

    • pig vs java map reduce
    • pig job flow
    • pig latin language
    • ETL with Pig
    • Transformations & Joins
    • User defined functions (UDF)
    • labs : writing Pig scripts to analyze data

    Section 5: Hive

    • architecture and design
    • data types
    • SQL support in Hive
    • Creating Hive tables and querying
    • partitions
    • joins
    • text processing
    • labs : various labs on processing data with Hive

    Section 6: HBase

    • concepts and architecture
    • hbase vs RDBMS vs cassandra
    • HBase Java API
    • Time series data on HBase
    • schema design
    • labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise

    회원 평가

    ★★★★★
    ★★★★★

    Related Categories

    고객 회사

    is growing fast!

    We are looking to expand our presence in South Korea!

    As a Business Development Manager you will:

    • expand business in South Korea
    • recruit local talent (sales, agents, trainers, consultants)
    • recruit local trainers and consultants

    We offer:

    • Artificial Intelligence and Big Data systems to support your local operation
    • high-tech automation
    • continuously upgraded course catalogue and content
    • good fun in international team

    If you are interested in running a high-tech, high-quality training and consulting business.

    Apply now!