Harnessing Expertise in Big Data, Data Science and AI
Resecurity announced that its six-year effort to create the first comprehensive index of the dark web is on track for completion by 2020. The project, which began in stealth mode in 2014 and dives deeper into the hidden recesses of the internet than any previous undertaking, is expected to yield multiple petabytes of data that law enforcement agencies and cybersecurity clients can use to thwart cybercrime and investigate threat actors.
“There have been attempts in the past to scan the dark web, but the tools that were available when those efforts were undertaken were extremely primitive”
The dark web — those trackless parts of the internet that traditional search engines do not index and where individuals mask their identities through powerful anonymization tools — is home to innumerable hidden marketplaces, communities, groups and forums used to traffic in illegal goods and services — from child pornography, drugs and weapons to tools for malware and ransomware distribution, even stolen data that can affect national security.
The consequences of cybercrime originating from the dark web impact nearly every industry and cost the global economy as much as $600 billion — about 0.8 percent of global GDP. Security industry experts project that companies around the world could incur costs and lost revenue amounting to more than $5 trillion over the next five years due to cyberattacks.
“By definition,” said Gene Yoo, chief executive officer of Resecurity, “the dark web is a completely uncontrolled and unregulated cross-border ecosystem. It poses a problem whose scale is growing rapidly due to cross-border legislation as well as technical barriers affecting law enforcement. This gives cybercriminals enough freedom to perform illegal activities in cyberspace that affect all elements of our society.”
Power of Big Data, Data Science and AI
Resecurity research calculates the total volume of dark web data in circulation can be measured in petabytes (PBs) — and that doesn’t even take into account the replication of data on dark web mirror sites. The data is highly dynamic, too, as many of the underground communities themselves will appear suddenly, disappear as suddenly, and later appear again.
“There have been attempts in the past to scan the dark web, but the tools that were available when those efforts were undertaken were extremely primitive,” added Yoo. “They generated a lot of false positives and noise — and not a lot of truly actionable intelligence. To deliver the maximum visibility into the dark web, to get to the point where we can associate a particular threat actor with his real identity, we need to apply the power of data science and big data, which is exactly what we’re doing at Resecurity.”
Read More: The Artificial Intelligence Week
To analyze and store that much dynamic data in a retrospective form, Resecurity relies on a range of advanced big data, data science and AI technologies:
- It has trained machine learning models and artificial intelligence engines to recognize relevant content by category and to mine meaningful information about threat actors and their operations in a near real time.
- It captures and processes millions of dark web postings every day, including textual, graphical and binary information (containing attachments and other important artifacts), as well as metadata associated with the postings and the posting sources.
- Cyberthreat intelligence analysts and cybercrime investigators interacting with the data captured by Resecurity can view information by thematic niche, community size (total number of actors and published postings), update and activity dynamics, risk perspective, and more to understand how to prioritize the key threat sources for more systematic monitoring.
“We have built what is probably the largest dark web monitoring platform ever developed,” said Mr. Yoo. “It’s storing massive amounts of data, all of which can be acted upon, which is key to facilitating the investigation of complex cybercrime cases.”