Apache Spark Training – Sachith Info Solutions

Apache Spark Training Overview

Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers.

Apache Spark Training Objectives

Apache Spark Architecture How to use Spark with Scala How to deploy Spark projects to the cloud Machine Learning with Spark

Pre-requisites of the Course

Basic knowledge of object-oriented programming is enough Knowledge of Scala will be an added advantage
Learners who have basic knowledge on Database, SQL Query will be an added advantage for Learning this Course

Spark-with-Scala-Online-Training-nareshit

Who should do the course

Developers, Architects, IT Professionals
Software Engineers, Data scientists, and Analysts

Apache Spark Course Content

Batch and Real-Time Analytics with Apache Spark

SCALA (Object Oriented and Functional Programming)

Getting started With Scala
Scala Background, Scala Vs Java and Basics
Interactive Scala – REPL, data types, variables, expressions, simple functions
Running the program with Scala Compiler
Explore the type lattice and use type inference
Define Methods and Pattern Matching

Scala Environment Set up

Scala set up on Windows and UNIX

Functional Programming

What is Functional Programming?
Differences between OOPS and FPP

Collections ( Very Important for Spark )

Iterating, mapping, filtering, and counting
Regular expressions and matching with them
Maps, Sets, group By, Options, flatten, flat Map
Word count, IO operations, file access, flatMap

Object-Oriented Programming

Classes and Properties
Objects, Packaging, and Imports
Traits
Objects, classes, inheritance, Lists with multiple related types, apply

Integrations

What is SBT?
Integration of Scala in Eclipse IDE
Integration of SBT with Eclipse

SPARK CORE

Batch versus real-time data processing
Introduction to Spark, Spark versus Hadoop
The architecture of Spark
Coding Spark jobs in Scala
Exploring the Spark shell to Creating Spark Context
RDD Programming
Operations on RDD
Transformations
Actions
Loading Data and Saving Data
Key Value Pair RDD
Broadcast variables

Persistence

Configuring and running the Spark cluster
Exploring to Multi-Node Spark Cluster
Cluster management
Submitting Spark jobs and running in the cluster mode
Developing Spark applications in Eclipse
Tuning and Debugging Spark

CASSANDRA ( N0SQL DATABASE )

Learning Cassandra
Getting started with architecture
Installing Cassandra
Communicating with Cassandra
Creating a database
Create a table
Inserting Data
Modelling Data
Creating an Application with Web
Updating and Deleting Data

Spark Integration with NoSQL (CASSANDRA) and Amazon EC2

Introduction to Spark and Cassandra Connectors
Spark With Cassandra to Set up
Creating Spark Context to connect the Cassandra
Creating Spark RDD on the Cassandra Database
Performing Transformation and Actions on the Cassandra RDD
Running Spark Application in Eclipse to access the data in the Cassandra
Introduction to Amazon Web Services
Building 4 Node Spark Multi-Node Cluster in Amazon Web Services
Deploying in Production with Mesos and YARN

Spark Streaming

Introduction of Spark Streaming
Architecture of Spark Streaming
Processing Distributed Log Files in Real Time
Discretized streams RDD
Applying Transformations and Actions on Streaming Data
Integration with Flume and Kafka
Integration with Cassandra
Monitoring streaming jobs

Spark SQL

Introduction to Apache Spark SQL
The SQL context
Importing and saving data
Processing the Text files, JSON and Parquet Files
DataFrames
user-defined functions
Using Hive
Local Hive Metastore server

Spark MLLib

Introduction to Machine Learning
Types of Machine Learning
Introduction to Apache Spark MLLib Algorithms
Machine Learning Data Types and working with MLLib
Regression and Classification Algorithms
Decision Trees in depth
Classification with SVM, Naive Bayes
Clustering with K-Means
Building the Spark server

Sachith Info
Solutions