Data Stage Development

Data Stage Development Course Content

Overview of the course:

Data Stage Development Program is a one stop course that introduces you to the domain of Data Stage development as well as gives you technical knowhow of the same. At the end of this course you will be able to earn a credential of Data Stage developer and you will be capable of dealing with Terabyte scale of data and analyze it successfully using map reduce

Big Data

- What is Big Data

- Dimensions of Big Data

- Big Data in Advertising

- Big Data in Banking

- Big Data in Telecom

- Big Data in eCommerce

- Big Data in Healthcare

- Big Data in Defense

- Processing options of Big Data

- Data Stage as an option

Data Stage

- What is Data Stage

- How Data Stage Works


- Map reduce

- How Data Stage has an edge

Data Stage Ecosystem

- Pig

- Hive

- Flume

Data Stage Hands On

- Setting up Data Stage on a Single node cluster

- Running HDFS commands

- Running your Mapreduce program

- Running Sqoop Import and Sqoop Export

- Creating Hive tables directly from Sqoop

- Creating Hive tables

- Querying Hive tables

- Running an Oozie workflow

- Analyzing twitter data using Flume

Multinode Setup

- Setting up Multinode setup on Amazon ec2

- Setting up multimode setup on the classroom machines

- Setting up multimode setup on the classroom machines

- Setting up Cloudera Manager on the cloud

- Setting up Cloudera Manager on local setup

Cluster Capacity Planning

Level 1: Mini Project

Level 1: Evaluation Test (50 marks)

Advanced Mapreduce

- Mapreduce Code Walkthrough

- ToolRunner

- MR Unit

- Combiner

- Partitioner

- Setup and Cleanup methods

- Using Java API to access HDFS

- Map Side joins

- Reduce side joins

- Input Types in Mapreduce

- Output Types in Mapreduce

- Custom Input Data types

- Custom Output Data types

- Multiple reducer MR programs

Zero Reducer Mapper

- Program

Advanced Mapreduce Hands On