BalCCon2k24

Intro to Natural Language Processing - text mining for cybersecurity
2024-09-21, 13:15–15:15 (Europe/Belgrade), Pupin

The application of Natural Language Processing (NLP) has become increasingly vital for cybersecurity threat intelligence and response strategies today. NLP plays a crucial role by enabling more accurate and nuanced analyses of potential threats through advanced linguistic techniques. Among other applications, NLP allows quicker categorization of threats based on their nature – such as phishing schemes or anomalous behaviors – and enables prioritizing responses accordingly. NLP can also facilitate the development of content prediction schemes for analysts or provide powerful information extraction tools. We will cover two text-mining techniques that we believe are a good starting point with NLP for analysts and incident responders: sentiment analysis and Named Entity Recognition (NER). While sentiment analysis reveals underlying emotions or biases in social media content potentially linked to malicious activities, NER identifies critical information such as IP addresses, domains, and user details essential for correlating incidents across different data sources.

The workshop is fully hands-on, with a maximum of exercises and tests. You will be provided with a full development environment that contains everything necessary for the workshop, including all deep learning and NLP tools. You will build step-by-step two NLP pipelines to practice these techniques with real data. After the workshop, you can expect to have a good understanding of NLP foundational tasks and be ready to apply your new skills on your own data. Prerequisites: Familiarity with Python programming is expected.


Program:

  • What is natural language processing and what are classification tasks.

Hands-on:
- Load and explore your data
- Text preprocessing
- Load a model in the code environment
- Step-by-step building a classifier with a pre-trained model
- Run classification task: sentiment analysis
- Apply the same pipeline with NER.
- Interpret and discuss the NER results

  • Discussion: How to apply NLP to cybersecurity problems. The place and role of natural language processing within multi-modal models. Limits of language models today.

Prerequisites: familiarity with Python programming is expected.

Pauline is the founder of Cubessa. Human is at the center of her work. Her focus gravitates towards offensive cybersecurity, artificial intelligence, programming culture, cognition as well as the human element of cybersecurity. She has a diverse background with experience in various fields including linguistics, criminology, cybersecurity, computer engineering, and education. By blending together approaches from humanities and deep technical insight, she provides a unique lens on cyber threats and their evolution. Previously working as a Threat Analyst for the past few years, she provides these days AI developments and trainings, aiming to bridge the gap between human understanding and technology. She is also a French vice-champion para-climber and the founder of the DEFCON group Paris.

This speaker also appears in: