BalCCon2k24

NLP deep-dive: Transformers for Text Mining and Generation in Cybersecurity
2024-09-22, 12:00–14:00 (Europe/Belgrade), Pupin

The application of Natural Language Processing (NLP) has become increasingly vital for cybersecurity threat intelligence and response strategies today. NLP plays a crucial role by enabling more accurate and nuanced analyses of potential threats through advanced linguistic techniques. Among other applications, NLP allows quicker categorization of threats based on their nature – such as phishing schemes or anomalous behaviors – and enables prioritizing responses accordingly. NLP can also facilitate the development of content prediction schemes for analysts or provide powerful information extraction tools. We will cover two text-mining techniques that we believe are a good starting point with NLP for analysts and incident responders: sentiment analysis and Named Entity Recognition (NER). While sentiment analysis reveals underlying emotions or biases in social media content potentially linked to malicious activities, NER identifies critical information such as IP addresses, domains, and user details essential for correlating incidents across different data sources.

The workshop provides a hands-on, iterative deep dive into transformer-based NLP techniques and their applications in text mining and generation for cybersecurity threat intelligence and response strategies. It is dedicated to people who have already an experience using natural language processing and LLM or LLM with front-end (ex:LM studio), or deep learning to deeper their understanding and skills.


Program:

  • Quick Introduction to Transformers, best current models
  • Hands-on:
    - Text Preprocessing and Tokenization
  • Transformer-Based Sentiment Analysis
    • Choose and load a pre-trained models
    • Step-by-step building of an NLP pipeline using transformers library
    • Run the sentiment analysis task on an imported dataset
  • Same adapted the pipeline to Named Entity Recognition (NER) and text generation tasks
    • going deeper: Results interpretation
    • going deeper: Compare basic and light models (e.g., BART, T5, Llama)
      Discussion: Applications in Cybersecurity
    • -> Apply transformer-based NLP techniques to cybersecurity problems (e.g., threat intelligence, incident response)
    • -> Limitations and future directions of transformer-based NLP in cybersecurity

By the end of this workshop, you will have a deeper understanding of transformer-based NLP techniques and their applications in text mining and generation for cybersecurity. You will be able to apply your new skills to real-world problems. You'll be able to work directly with the code.

Pauline is the founder of Cubessa. Human is at the center of her work. Her focus gravitates towards offensive cybersecurity, artificial intelligence, programming culture, cognition as well as the human element of cybersecurity. She has a diverse background with experience in various fields including linguistics, criminology, cybersecurity, computer engineering, and education. By blending together approaches from humanities and deep technical insight, she provides a unique lens on cyber threats and their evolution. Previously working as a Threat Analyst for the past few years, she provides these days AI developments and trainings, aiming to bridge the gap between human understanding and technology. She is also a French vice-champion para-climber and the founder of the DEFCON group Paris.

This speaker also appears in: