Mathematical LDA

Mathematical LDA

In the previous post (Latent Dirichlet Allocation),  I’ve described the LDA process and how it can be applied on documents.

In this post I will explain how the probabilities can be estimated using collapsed Gibbs sampling.

Lets start with the LDA Probabilistic Graph Model.


Where W is the sampled word from document, Z is the topic assigned by Document (d), θ is the Dirichlet distribution of d, α and β are the input of Dirichlets. More info about hyperparameters can be found this link.

So the only known variables are α, β, and w. All others (z, θ, and φ) are unknown. So based on the LDA graph we have:

p(w, z, θ, φ | α, β) = p(φ|β) p(θ|α) p(z|θ) p(w|φz)

The right side of the above conditional probability can be reached by the probabilistic graph model where each variable only depends on its parent nodes.


Latent Dirichlet Allocation

What is Latent Dirichlet Allocation?

In a general view, LDA is an unsupervised method for clustering documents. It models (purified) documents as bag of words. Also it assumes each word (and document) has a mixture model of topics i.e. each word (and document) may belongs to each of the topics by a probability. It takes number of clusters in the corpus as input then, simply assigns each word in each document a random topic. Then tries for

It was a very general description of LDA.