Spark – Mir Saman Tajbakhsh

Apache + Yarn + Spark: Play with Twitter data!

By Mir Saman April 23, 2017

In this tutorial I want to write about using Apache Spark on Ubuntu machines where you can develop big data analysis apps with it.

First of all, I want to write a small and quick introduction to Hadoop + Spark environment. Hadoop makes it possible to work with lots of computers in a cluster. Work can be: storing files in cluster (HDFS – Hadoop Distributed File System), storing database in cluster (Apache HBase), or run software in cluster (MapReduce, Spark).