Short Bytes: The Dark Web is the World Wide Web content that exists in the darkness. The Dark Web consists of networks which use the public Internet but require specific software and authorization to get access to the website. These websites are not indexed by the search engines and are also not easily accessible to the public even though they use public networks.
As a part of their research, they analyzed 5,205 live websites belonging to the Dark Web in five weeks. They were able to classify 2,723 websites by content. Of those, 1,547 hosted illicit material which comes out to be around 57 percent.
To find out the illicit information from the internet about the Dark Web, Moore and Rid’s used a Python-based web crawler to cycle through known hidden services. Using this web crawler, they were able to catch links to other dark websites. The contents of those sites were ripped and then classified those into the different categories.
The classification of the Dark Web was based on an algorithm that had been taught to split the content into various themes. At first, Moore manually categorized 600 documents under different headings such as “drugs”, “porn”, “social”, “financial”, and a number of others. If a page didn’t display any content at all, or only had under 50 words, it was placed into the “none” category.
Once the system learned about the classification, rest of the data was automatically classified by the algorithm.
In a nutshell, this latest research provides a deeper basis for discussion around hidden Tor services and encryption software being used for accessing the Dark Webs.
Get pure python hacker bundle at fossBytes store.