In this tutorial I want to write about using Apache Spark on Ubuntu machines where you can develop big data analysis apps with it.
First of all, I want to write a small and quick introduction to Hadoop + Spark environment. Hadoop makes it possible to work with lots of computers in a cluster. Work can be: storing files in cluster (HDFS – Hadoop Distributed File System), storing database in cluster (Apache HBase), or run software in cluster (MapReduce, Spark).
Continue reading Apache + Yarn + Spark: Play with Twitter data!