Big Data Courses Syllabus
Big data Hadoop is a software based program for storing and processing big data in companies. It is an open-source tool build on java platform
Big data Courses Content
Introduction of Big Data & Hadoop
- Big Data & Hadoop Introduction
- What is Hadoop?
- Why & Who use Hadoop?
- What is Hadoop History?
- How many Different types of Components in Hadoop?
- Detailed information on HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
- What is the scope of Hadoop in industry?
Deep Drive in HDFS (for Storing the Data)
- HDFS Introduction
- Design of HDFS
- Role of HDFS in Hadoop
- HDFS Feature
- Intro of Hadoop Daemons and its functionality
- Name Node
- Secondary Name Node
- Job Tracker
- Data Node
- Task Tracker
- Anatomy of File Wright
- Anatomy of File Read
- Network Topology
- Nodes
- Racks
- Data Center
- Parallel Copying using DistCp
- Basic Configuration for HDFS
- Data Organization
- Blocks and
- Replication
- Heartbeat Signal
- How to Store the Data into HDFS
- How to Read the Data from HDFS
- Accessing HDFS (Introduction of Basic UNIX commands)
- CLI commands
MapReduce using Java (Processing the Data)
- The introduction of MapReduce.
- MapReduce Architecture
- Data flow in MapReduce
- Splits
- Mapper
- Portioning
- Sort and shuffle
- Combiner
- Reducer
- Understand Difference Between Block and InputSplit
- Role of RecordReader
- Basic Configuration of MapReduce
- MapReduce life cycle
- Driver Code
- Mapper and Reducer
- How MapReduce Works
- Writing and Executing the Basic MapReduce Program using Java
- Submission & Initialization of MapReduce Job.
- File Input/Output Formats in MapReduce Jobs
- Text Input Format
- Key Value Input Format
- Sequence File Input Format
- NLine Input Format
- Joins
- Map-side Joins
- Reducer-side Joins
- Word Count Example
- Partition MapReduce Program
- Side Data Distribution
- Distributed Cache (with Program)
- Counters (with Program)
- Types of Counters
- Task Counters
- Job Counters
- User Defined Counters
- Propagation of Counters
- Job Scheduling
- Introduction to Apache PIG
- Introduction to PIG Data Flow Engine
- MapReduce vs. PIG in detail
- When should PIG use?
- Data Types in PIG
- Basic PIG programming
- Modes of Execution in PIG
- Local Mode and
- Execution Mechanisms
- Grunt Shell
- Script
- Embedded
- Operators/Transformations in PIG
- PIG UDF's with Program
- Word Count Example in PIG
- The difference between the MapReduce and PIG
- Introduction to SQOOP
- Use of SQOOP
- Connect to MySQL database
- SQOOP commands
- Import
- Export
- Evala
- Joins in SQOOP
- Export to MySQL
- Export to HBase
- Introduction to OOZIE
- Use of OOZIE
- Where to use?
Apache HIVE
- Introduction to HIVE
- HIVE Meta Store
- HIVE Architecture
- Tables in HIVE
- Managed Tables
- External Tables
- Hive Data Types
- Primitive Types
- Partition
- Joins in HIVE
- HIVE UDF's and UADF's with Programs
- Word Count Example
Mango DB
- What is MongoDB?
- Where to Use?
- Configuration On Windows
- Inserting the data into MongoDB?
- Reading the MongoDB data.
Apache HBase
- Introduction to HBASE
- Basic Configurations of HBASE
- Fundamentals of HBase
- What is NoSQL?
- HBase Data Model
- Table and Row
- Column Family and Column Qualifier
- Categories of NoSQL Data Bases
- Key-Value Database
- Document Database
- Column Family Database
- HBASE Architecture
- HMaster
- Region Servers
- Regions
- MemStore
- How HBASE is differed from RDBMS
- HDFS vs. HBase
- Client-side buffering or bulk uploads
- HBase Designing Tables
- HBase Operations
- Get
- Scan
- Put
- Delete
Cluster Setup
- Downloading and installing the Ubuntu12.x
- Installing Java
- Installing Hadoop
- Creating Cluster
- Increasing Decreasing the Cluster size
- Monitoring the Cluster Health
- Starting and Stopping the Nodes
- Introduction Zookeeper
- Data Modal
- Operations
- Introduction to Flume
- Uses of Flume
- Flume Architecture
- Flume Master
- Flume Collectors
- Flume Agents