Join us on Facebook

 

Big Data - Spark + HADOOP + HIVE + SQOOP - Live Training

Class Start Date Status Price
Big Data - Spark + HADOOP + HIVE + SQOOP
FREE for recent college graduates
Jan 30 Open $1,500.00

Big Data - Spark + HADOOP + HIVE + SQOOP - Online Training

Class Start Date Status Price
Big Data - Spark + HADOOP + HIVE + SQOOP
FREE for recent college graduates
Jan 30 Open $1,500.00

TO RESERVE YOUR SPOT:

Call: (540) 449-5501
E-mail: vijay@vxltraining.com

$500 payment immediately to reserve your spot and balance after the first day of class.

All of our credit card payments are processed by PayPal and are 100% secure.

You MUST EMAIL US with:

1) Name 2) Phone Number and 3) The Course you have signed up for, after you make your payment.

Please select the Option and Click on the Buy Now Button to pay for your class

Please select the Class and Option and Click on the Buy Now Button to pay for your class

Checkout
Please select a Class First:
Select the Option:
Select method of payment:

Curriculum

HADOOP + HIVE + SQOOP

Total Training Duration: 30-40 working hours

Technical Support Duration Post Training including Profile preparation: 60 working hours

Hands on Projects: 3

 

COURSE CONTENT

 

HADOOP BASICS

The Motivation for Hadoop

Problems with traditional large-scale systems
Data Storage literature survey
Data Processing literature Survey
Network Constraints
Requirements for a new approach

Hadoop: Basic Concepts

What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster


HDFS (Hadoop Distributed File System)

Blocks and Splits

Input Splits
HDFS Splits
Data Replication

Hadoop Rack Aware
Data high availability
Cluster architecture and block placement
CASE STUDIES

 

Programming Practices & Performance Tuning

 

Pseudo-distributed Mode

Fully distributed mode

Hadoop Development

Writing a MapReduce Program

Examining a Sample MapReduce Program with several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop Streaming API

Performing several Hadoop jobs

The configure and close Methods
Sequence Files
Record Reader
Record Writer
Role of Reporter
Output Collector
Counters
Directly Accessing HDFS
ToolRunner
Using The Distributed Cache

Several MapReduce jobs (In Detailed)

MOST EFFECTIVE SEARCH USING MAPREDUCE
GENERATING THE RECOMMENDATIONS USING MAPREDUCE
PROCESSING THE LOG FILES USING MAPREDUCE
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications

Advanced MapReduce Programming

The Secondary Sort
Customized Input Formats and Output Formats
Joins in MapReduce

Tuning for Performance in MapReduce

Reducing network traffic with combiner
Partitions
Reducing the amount of input data
Using Compression
Reusing the JVM
Running with speculative execution
Other Performance Aspects

HADOOP ANALYST

Hive

Hive concepts
Hive architecture
Install and configure hive on cluster
Different type of tables in hive
Hive library functions
Buckets
Partitions
File formats
Joins in hive

Sqoop

Install and configure Sqoop on cluster
Connecting to RDBMS
Installing Mysql
Import data from Oracle/Mysql to hive
Export data to Oracle/Mysql
Internal mechanism of import/export
SPARK INTRODUCTION WITH EXAMPLES

POC AND PROJECTS

APACHE SPARK

Total Training Duration: 30-40 working hours

Technical Support Duration Post Training including Profile preparation: 60 working hours

Hands on Projects: 2

 

COURSE CONTENT

 

 

❖ Spark – Introduction

❖ Spark – Ecosystem Components

❖ Spark – Terminologies & Concepts

❖ Spark – Install

❖ Spark – Install multi node Cluster

❖ Spark – Shell Commands

❖ Spark – Create Project in Eclipse

❖ Spark – SparkContext

❖ Spark – RDD

❖ Spark – Ways to Create RDD

❖ Spark – RDD Persistence & Caching

❖ Spark – RDD Features

❖ Spark – RDD Limitations

❖ Spark – Transformations Actions

❖ Spark – Map vs FlatMap

❖ Spark – In-Memory Computation

❖ Spark – Lazy Evaluation

❖ Spark – Fault Tolerance

❖ Spark – Directed Acyclic Graph

❖ Spark – Cluster Managers

❖ Spark – How it Works

❖ Spark – Why You must Learn

❖ Spark – Hadoop Compatibility

❖ Spark – Performance Tuning

❖ Spark – Limitations & Drawbacks

❖ Spark – Best Spark & Scala Books

❖ Spark SQL – Introduction

❖ Spark SQL – DataFrame

❖ Spark SQL – Optimization

 

❖ RDD vs DataFrame vs DataSet

❖ Spark Streaming – Introduction

❖ Spark Streaming – DStream

❖ Spark Streaming – Transformations

❖ Spark Streaming – Checkpointing

❖ Spark Streaming vs Apache Storm

❖ Spark vs Hadoop MapReduce

 

❖ Spark Interview Questions – I

❖ Spark Interview Questions – II

❖ Spark Interview Questions – III

 

Scala

 

❖ Scala – Introduction

❖ Scala – Features

❖ Scala – Control Structures

❖ Scala – Tuples

❖ Scala – Partial Functions

 

Class Location

38345 W 10 Mile Rd

Ste 215

Farmington Hills MI 48335

Plus Some classes will be online

Office Location

38345 W 10 Mile Rd

Ste 215

Farmington Hills MI 48335

FAQ

Coming Soon

 

SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign are trademarks or registered trademarks of SAP AG.