Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the roles of HDFS Daemons.
- Describe the standard operation of an Apache Hadoop cluster, covering both data storage and processing.
- Recognize the current features of computing systems that drive the need for platforms like Apache Hadoop.
- Outline the primary objectives of HDFS design.
- In given scenarios, identify appropriate use cases for HDFS Federation.
- Identify the components and daemons involved in an HDFS HA-Quorum cluster.
- Analyze the role of HDFS security, specifically regarding Kerberos.
- Determine the most suitable data serialization method for specific scenarios.
- Describe the paths for file reading and writing.
- Identify the commands used to manipulate files in the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Understand the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster configurations.
- Learn how to deploy MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
- Grasp the fundamental design strategy behind MapReduce v2 (MRv2).
- Understand how YARN manages resource allocation.
- Trace the workflow of a MapReduce job running on YARN.
- Determine the necessary file changes and procedures to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Identify key considerations when selecting hardware and operating systems for hosting an Apache Hadoop cluster.
- Analyze options for selecting an operating system.
- Understand kernel tuning and disk swapping mechanisms.
- In given scenarios and workload patterns, identify the appropriate hardware configuration.
- In given scenarios, determine the ecosystem components required for the cluster to meet SLA requirements.
- Cluster sizing: Given a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O requirements.
- Disk sizing and configuration, including JBOD versus RAID, SANs, virtualization, and cluster disk sizing requirements.
- Network Topologies: Understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario.
4: Hadoop Cluster Installation and Administration (25%)
- In given scenarios, identify how the cluster handles disk and machine failures.
- Analyze logging configurations and log file formats.
- Understand the basics of Hadoop metrics and cluster health monitoring.
- Identify the functions and purposes of available cluster monitoring tools.
- Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the functions and purposes of available tools for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overall design goals of each Hadoop scheduler.
- In given scenarios, determine how the FIFO Scheduler allocates cluster resources.
- In given scenarios, determine how the Fair Scheduler allocates cluster resources under YARN.
- In given scenarios, determine how the Capacity Scheduler allocates cluster resources.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Understand how to monitor cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe how to monitor swap space and memory allocation on all nodes.
- Identify how to view and manage Hadoop’s log files.
- Interpret log files.
Requirements
- Foundational skills in Linux administration
- Basic programming proficiency
35 Hours
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.