In this tutorial I want to write about using Apache Spark on Ubuntu machines where you can develop big data analysis apps with it.

First of all, I want to write a small and quick introduction to Hadoop + Spark environment. Hadoop makes it possible to work with lots of computers in a cluster. Work can be: storing files in cluster (HDFS – Hadoop Distributed File System), storing database in cluster (Apache HBase), or run software in cluster (MapReduce, Spark).

Continue reading Apache + Yarn + Spark: Play with Twitter data!

Read more

I like to play with VirtualBox. So in this post I will show you how to install Kali Linux on a USB disk with VirtualBox which can be run both on VirtualBox and directly bootable.


Every change you made with VirtualBox (or direct access) will be stored on USB and can be accessed with Direct Access (or VirtualBox). It is awesome to use VirtualBox when you want to work beside another OS. And also awesome when you want to boot directly from Physical Computer.

Continue reading Installing Kali Linux on USB with VirtualBox

Read more


Recently I’ve interested in byte code structure of Java and Dalvik. I’ve found some useful tools for playing with them.

Destination Byte Code

Java byte codes are simple to reverse engineering because they compile in run time. i.e. JVM will execute the byte codes in run time, thus Java code is cross platform but executes with more delay than direct compiled machine codes (for example using C++ and gcc).

Continue reading Reversing Java: Part I

Read more

In the previous post, I’ve described the LDA process and how it can be applied on documents.

In this post I will explain how the probabilities can be estimated using collapsed Gibbs sampling.

Lets start with the LDA Probabilistic Graph Model.

Latent Dirichlet Allocation

Where W is the sampled word from document, Z is the topic assigned by Document (d), θ is the Dirichlet distribution of d, α and β are the input of Dirichlets. More info about hyperparameters can be found here.

So the only known variables are α, β, and w. All others (z, θ, and φ) are unknown. So based on the LDA graph we have:

p(w, z, θ, φ | α, β) = p(φ|β) p(θ|α) p(z|θ) p(w|φz)

The right side of the above conditional probability can be reached by the probabilistic graph model where each variable only depends on its parent nodes.

Continue reading Mathematical LDA

Read more
LinkedIn Auto Publish Powered By :