Introduction to BigData Analytics with Apache Spark Part 1

By Fadi Maalouli and R.H.

Spark Overview

Apache Spark, an open source cluster computing system, is growing fast. Apache Spark has a growing ecosystem of libraries and framework to enable advanced data analytics. Apache Spark’s rapid success is due to its power and and ease-of-use. It is more productive and has faster runtime than the typical MapReduce BigData based analytics. Apache Spark provides in-memory, distributed computing. It has APIs in Java, Scala, Python, and R. The Spark Ecosystem is shown below.

Spark Tutorial: Spark SQL from Java and Python with Cassandra

in Spark

February 27, 2017

Analytics with Apache Spark Tutorial Part 2 : Spark SQL

Using Spark SQL from Python and Java

Combining Cassandra and Spark

By Fadi Maalouli and R.H.

Spark, a very powerful tool for real-time analytics, is very popular. In the first part of this series on Spark we introduced Spark. We covered Spark’s history, and explained RDDs (which are used to partition data in the Spark cluster). We also covered the Apache Spark Ecosystem.

Spark Tutorial: Spark Streaming with Kafka and MLib

in Spark

February 27, 2017

In this part of Spark’s tutorial (part 3), we will introduce two important components of Spark’s Ecosystem: Spark Streaming and MLlib.

Display - Edit

Spark Streaming

By Fadi Maalouli and R.H.

Spark Streaming is a real-time processing tool, that has a high level API, is fault tolerant, and is easy to integrate with SQL DataFrames and GraphX.

On a high level Spark Streaming works by running receivers that receive data from for example S3, Cassandra, Kafka etc… and it divides these data into blocks, then pushes these blocks into Spark, then Spark will work with these blocks of data as RDDs, from here you get your results. The following diagram will demonstrate the process:

Spark Tutorial: Introduction to BigData Analytics with Apache Spark Part 1

Introduction to BigData Analytics with Apache Spark Part 1

Spark Overview

Spark Tutorial: Spark SQL from Java and Python with Cassandra

Analytics with Apache Spark Tutorial Part 2 : Spark SQL

Using Spark SQL from Python and Java

Combining Cassandra and Spark

Spark Tutorial: Spark Streaming with Kafka and MLib

Spark Streaming

Search

Share

Follow

Categories

Tags