You may support Urmia Lake campaign by signing the following petition.
Recently I’ve interested in byte code structure of Java and Dalvik. I’ve found some useful tools for playing with them.
Java byte codes are simple to reverse engineering because they compile in run time. i.e. JVM will execute the byte codes in run time, thus Java code is cross platform but executes with more delay than direct compiled machine codes (for example using C++ and gcc).
In the previous post (Latent Dirichlet Allocation), I’ve described the LDA process and how it can be applied on documents.
In this post I will explain how the probabilities can be estimated using collapsed Gibbs sampling.
Lets start with the LDA Probabilistic Graph Model.
Where W is the sampled word from document, Z is the topic assigned by Document (d), θ is the Dirichlet distribution of d, α and β are the input of Dirichlets. More info about hyperparameters can be found this link.
So the only known variables are α, β, and w. All others (z, θ, and φ) are unknown. So based on the LDA graph we have:
p(w, z, θ, φ | α, β) = p(φ|β) p(θ|α) p(z|θ) p(w|φz)
The right side of the above conditional probability can be reached by the probabilistic graph model where each variable only depends on its parent nodes.
In a general view, LDA is an unsupervised method for clustering documents. It models (purified) documents as bag of words. Also it assumes each word (and document) has a mixture model of topics i.e. each word (and document) may belongs to each of the topics by a probability. It takes number of clusters in the corpus as input then, simply assigns each word in each document a random topic. Then tries for
It was a very general description of LDA.
After dealing with the old site, I’ve decided to reform my blog to this site. 🙂
The old UUTElgg is closed.