I have recently found a GitHub repository named word_cloud which uses python for creating beautiful word clouds! Here is the story …
Tag clouds or as we know, word clouds are interesting type of word visualization. The most occurred words are bigger. These words (small or big) are shaped as circle or any desired shape. They coloring are different. I recently found a tool named word_cloud which is a python script for creating beautiful word clouds from text or pdf files.
Installation
The installation steps are simple. I’ve copied these commands from the main GitHub repository.
pip install wordcloud
Or you can use Anaconda for installation.
conda install -c conda-forge wordcloud
However, after successful installation, I’ve checked different shapes created by the tool. One shape was very interesting, the parrot!
I’ve checked the repo, and found this parrot.py
file.
Word Clouding the Harry Potter
I’ve just used the script and changed the input text file and input image. I’ve found this vectorized Hogwarts logo from this site.
Now I need to find the harry potter text books which I’ve found from an NLP repository of GitHub named nlp. There were four books of Harry Potter franchise including Harry Potter and the Sorcerer’s Stone, Harry Potter and the Chamber of Secrets, Harry Potter and the Prisoner of Azkaban, and Harry Potter and the Goblet of Fire.
In the 4th book, the script gave me an error about encoding of the input file. So, I’ve used unicode_escape
instead of utf-8
.
The Result
Here are the 4 word clouds of the books. Additionally I’ve removed the center H from the vectorized Hogwarts logo and recreated the word clouds. Here are the results.