Knowing all possible web paths in the world is the initial step for making a search engine (SE). By means of SE one can analyze the web for the material he/she likes. In normal Domain Name System, each TLD provider (Top Level Domain) can sell or release list of all its domains. As an example .com TLD can sell or release all the domains which are end with “.com“. But the problem is more complicated in TOR (or other hidden service providers). In this post I will talk about my tool named Onion Harvester and how to find initial points for finding hidden services to be crawled.
I have investigated about how to find all onion addresses. My question in TOR stack exchange network can be found this link. In a conclusion there are two ways to find all the onion addresses, starting points for crawling and making a search engine.
- Run as a Hidden Service Directory which a hidden service publish its address through 6 of 9 HSDirs for being found by the users whom tries to connect.
- Brute force all address space which is exponentially time consuming.
The first method fails because the 9 HSDirs is controlled by the TOR network itself. You can check the 9 HSDirs status in this link. Detailed information about the 9 HSDirs can be found this link. Therefore you cannot add yourself as a HSDir without verification of the TOR developers. In addition, as defalt answered to my question in the tor stack exchange network:
In other point of view, TOR is open source and you can own and run your own TOR network which you can add your HSDirs. But users should use your network instead of TOR!
Check this link for finding how to run your own TOR network.
But in general, you may harvest the whole address space of onion for specific ports to check if the service is open or not. I have developed a small multi thread Java application named “Onion Harvester” to harvest the onion addresses. I generate onion addresses starting from “aaaaaaaaaaaaaaaa.onion” to “7777777777777777.onion”. The address is base-32 which contains alphabet and numbers except (0,1,8,9). There are totally addresses!
As a small program, I’ve added some flexibility to it:
- If the program got Exit signal, it stores the next onion address which should be resume the program.
- I’ve added a switch (–start) the resume the scanning.
- I’ve added configurable local TOR socks5 address. By default Tor Bundle uses 127.0.0.1:9150 as its socks5 proxy and tor binary in Linux uses 127.0.0.1:9050.
- The project is opensource and you can find it in Mr Tajbakhsh GitHub account in the repository.
Onion Harvester in Action:
You may fork or use the project for help me creating the onion database. If you want to contribute, contact me at saman [@] mstajbakhsh [.] ir