Spark is a unique framework for big data analytics which gives one unique integrated API by developers for the purpose of data scientists and analysts to perform separate tasks. It supports a wide range of popular languages like Python, R, SQL, Java and Scala. Apache Spark main aim is to provide hands-on experience to create real-time Data Stream Analysis and large-scale learning solutions for data scientists, data analysts and software developers.
Apache Spark Training Objectives
Apache Spark Architecture How to use Spark with Scala How to deploy Spark projects to the cloud Machine Learning with Spark
Pre-requisites of the Course
Basic knowledge of object-oriented programming is enough Knowledge of Scala will be an added advantage
Learners who have basic knowledge on Database, SQL Query will be an added advantage for Learning this Course
Who should do the course
Developers, Architects, IT Professionals
Software Engineers, Data scientists, and Analysts
Apache Spark Course Content
Batch and Real-Time Analytics with Apache Spark
SCALA (Object Oriented and Functional Programming)
Getting started With Scala
Scala Background, Scala Vs Java and Basics
Interactive Scala – REPL, data types, variables, expressions, simple functions
Running the program with Scala Compiler
Explore the type lattice and use type inference
Define Methods and Pattern Matching
Scala Environment Set up
Scala set up on Windows and UNIX
Functional Programming
What is Functional Programming?
Differences between OOPS and FPP
Collections ( Very Important for Spark )
Iterating, mapping, filtering, and counting
Regular expressions and matching with them
Maps, Sets, group By, Options, flatten, flat Map
Word count, IO operations, file access, flatMap
Object-Oriented Programming
Classes and Properties
Objects, Packaging, and Imports
Traits
Objects, classes, inheritance, Lists with multiple related types, apply
Integrations
What is SBT?
Integration of Scala in Eclipse IDE
Integration of SBT with Eclipse
SPARK CORE
Batch versus real-time data processing
Introduction to Spark, Spark versus Hadoop
The architecture of Spark
Coding Spark jobs in Scala
Exploring the Spark shell to Creating Spark Context
RDD Programming
Operations on RDD
Transformations
Actions
Loading Data and Saving Data
Key Value Pair RDD
Broadcast variables
Persistence
Configuring and running the Spark cluster
Exploring to Multi-Node Spark Cluster
Cluster management
Submitting Spark jobs and running in the cluster mode
Developing Spark applications in Eclipse
Tuning and Debugging Spark
CASSANDRA ( N0SQL DATABASE )
Learning Cassandra
Getting started with architecture
Installing Cassandra
Communicating with Cassandra
Creating a database
Create a table
Inserting Data
Modelling Data
Creating an Application with Web
Updating and Deleting Data
Spark Integration with NoSQL (CASSANDRA) and Amazon EC2
Introduction to Spark and Cassandra Connectors
Spark With Cassandra to Set up
Creating Spark Context to connect the Cassandra
Creating Spark RDD on the Cassandra Database
Performing Transformation and Actions on the Cassandra RDD
Running Spark Application in Eclipse to access the data in the Cassandra
Introduction to Amazon Web Services
Building 4 Node Spark Multi-Node Cluster in Amazon Web Services
Deploying in Production with Mesos and YARN
Spark Streaming
Introduction of Spark Streaming
Architecture of Spark Streaming
Processing Distributed Log Files in Real Time
Discretized streams RDD
Applying Transformations and Actions on Streaming Data
Integration with Flume and Kafka
Integration with Cassandra
Monitoring streaming jobs
Spark SQL
Introduction to Apache Spark SQL
The SQL context
Importing and saving data
Processing the Text files, JSON and Parquet Files
DataFrames
user-defined functions
Using Hive
Local Hive Metastore server
Spark MLLib
Introduction to Machine Learning Types of Machine Learning
Introduction to Apache Spark MLLib Algorithms
Machine Learning Data Types and working with MLLib