Onion Harvester: First step to TOR Search Engines

Knowing all possible web paths in the world is the initial step for making a search engine (SE). By means of  SE one can analyze the web for the material he/she likes. In normal Domain Name System, each TLD provider (Top Level Domain) can sell or release list of all its domains. As an example .com TLD can sell or release all the domains which are end with “.com“. But the problem is more complicated in TOR (or other hidden service providers). In this post I will talk about my tool named Onion Harvester and how to find initial points for finding hidden services to be crawled.

TOR Network

TOR Network

I have investigated about how to find all onion addresses. My question in TOR stack exchange network can be found this link. In a conclusion there are two ways to find all the onion addresses, starting points for crawling and making a search engine.

  1.  Run as a Hidden Service Directory which a hidden service publish its address through 6 of 9 HSDirs for being found by the users whom tries to connect.
  2. Brute force all address space which is exponentially time consuming.

The first method fails because the 9 HSDirs is controlled by the TOR network itself. You can check the 9 HSDirs status in this link. Detailed information about the 9 HSDirs can be found this link. Therefore you cannot add yourself as a HSDir without verification of the TOR developers. In addition, as defalt answered to my question in the tor stack exchange network:

“Harvesting onion addresses has been fixed in Next Generation Tor Onion Services so you can’t fetch list of running onion services by hosting your own HSDir anymore.”

In other point of view, TOR is open source and you can own and run your own TOR network which you can add your HSDirs. But users should use your network instead of TOR!

My Own Tor Network

My Own Tor Network

Check this link for finding how to run your own TOR network.

But in general, you may harvest the whole address space of onion for specific ports to check if the service is open or not. I have developed a small multi thread Java application named “Onion Harvester” to harvest the onion addresses. I generate onion addresses starting from “aaaaaaaaaaaaaaaa.onion” to “7777777777777777.onion”. The address is base-32 which contains alphabet and numbers except (0,1,8,9). There are totally (26 + 6) ^ {16} = 1208925819614629174706176 addresses!

As a small program, I’ve added some flexibility to it:

  1. If the program got Exit signal, it stores the next onion address which should be resume the program.
  2. I’ve added a switch (–start) the resume the scanning.
  3. I’ve added configurable local TOR socks5 address. By default Tor Bundle uses as its socks5 proxy and tor binary in Linux uses
  4. The project is opensource and you can find it in Mr Tajbakhsh GitHub account in the repository.

Onion Harvester in Action:

Onion Harvester in Action

Onion Harvester in Action

You may fork or use the project for help me creating the onion database. If you want to contribute, contact me at saman [@] mstajbakhsh [.] ir

2 Comments, RSS

Your email address will not be published. Required fields are marked *