New AI DarkBERT is trained on the Dark Web

Researchers have created a new AI called DarkBERT, trained almost exclusively on the Dark Web. While not intended for malicious use, it has faced a lot of seedy sites during its training.

The latest AI to be developed has a unique twist on its training. DarkBERT is built upon the BERT framework, developed by Google. Rather than the chatty capabilities of Google Bard, BERT is used to analyze and produce answers based on a particular data set.

Article continues after ad

Researchers have created DarkBERT to help better swift through the dark web in hopes of bettering cybersecurity around it. Feeding it a mass of data over the course of nearly 16 days across two sets.

One was “raw” – an unedited quantity of data – and the other, “preprocessed”, with certain aspects of what can be found on the dark web edited out. This includes things like “victim organization name, descriptions of leaked data, and threat statements with sample data”.

Article continues after ad

They also ensured that images were ejected, in case of illicit and illegal images:

Subscribe to our newsletter for the latest updates on Esports, Gaming and more.

“… our automated web crawler takes the approach of removing any non-text media and only stores raw text data. By doing so, we do not expose ourselves to any sensitive media that is potentially illegal.”

DarkBERT AI has seen the depths of the dark web

The paper goes into detail on how much data they fed DarkBERT, including a table that details every site and category it was filed under. Unsurprisingly, over 1000 pages were filed under adult entertainment.

Article continues after ad

Most of the research was done by crawling with Tor, the most popular browser for accessing the deep or dark web. As these websites aren’t on the “surface web”, you require the browser to access “onion links”. A vast majority – as also pointed out in the research paper – is now error codes or useless pages with minimal information on them.

DarkBERT has no current plans to release to the public, with a heavy emphasis on the research that the data set won’t be released to the public. A request can be made for academic purposes, due to the nature of the dark web’s materials, however.

Article continues after ad

Author

articles

Samsung’s newest gaming monitor has a 3D twist

Asus just accidentally leaked AMD’s next-gen Ryzen 9000X3D CPUs

Nvidia announces big G-Sync shakeup to combat AMD dominance

Nvidia’s talking NPCs are coming to speedrun menus for you

Procreate fires back at Canva & Adobe as CEO shuts down AI concerns

New AI DarkBERT is trained on the Dark Web

DarkBERT AI has seen the depths of the dark web

keep reading

AI company sued over teen’s death after “falling in love” with chat bot