Yuriy Arbitman
Data Science is about magic. If only data scientists believed it before the Generative AI revolution began with the explosion of ChatGPT, now everybody thinks so. I was lucky to be in the industry in the past 20+ years, first as a developer and researcher, then manager and more recently as a data scientist.
As a data scientist in Imperva, I apply magic to cybersecurity challenges. We use Large Language Models, logistic regression, clustering, and whatever it takes to protect the good guys from the bad ones. The former happen to be our customers.
I hold an M.Sc. in Computer Science from the Weizmann Institute in Israel.
Sessions
Every day hundreds of new data sources on security vulnerabilities (CVEs) appear on the web. These are articles, vulnerability databases, code repositories, forums, chats, and they contain a handful of details each. Security operators have to invest a lot of effort to find out:
- Is the published information new or already known?
- What is the applicability? Does the attack target a specific consumer device (e.g. printer), is it about a specific OS (e.g. Windows), is it a local or remote attack?
- What details are provided in the description? Is it a “news-type” article providing essentially a headline, or a “blog-type” article providing technical details that can be used to reconstruct the attack and protect against it?
After sorting out the above questions, provided that we have a new and informative description of the vulnerability, the security operator can finally work on protection measures. In the context of Web Application Firewall (WAF) this means crafting a special rule that will detect and potentially block the malicious traffic without affecting the benign one.
In this talk we present a machine learning pipeline that uses state-of-the-art Large Language Models (LLMs) to automate above tasks. This enables to:
- Reduce time-to-mitigation
- Reduce human costs by saving time required from highly skilled individuals
Our pipeline consists of several building blocks:
- Text extraction (including image-to-text and video-to-text capabilities)
- Classification tasks:
-- Is the article informative?
-- Does the article describe a web attack?
- Generation tasks: given a detailed description of an attack, transform it into a WAF rule that pertains to a given syntax
In this talk we describe the challenges of this exciting problem and show a stack of solutions that can be applied to a wide range of products on the market.