Past AI for Cybersecurity Research Lunches

This page has all previews schedule AI for Cybersecurity Research Lunch.

Return to Main Page: Go Back

History Schedule

Week 1: Welcome to a new semester

Date: 01/22/2025

Speakers: Texas A&M University

Abstract:

Hooray!

Week 3: Preserving Privacy in Chatbot Interactions: A Hybrid Anonymization and Reversible Transformation Framework

Date: 01/29/2025

Speakers: Shuning Gu, PhD Student @ Texas A&M University

Abstract:

Abstract: AI-based chatbots like ChatGPT are widely used but present privacy risks as users may accidentally share sensitive information. Existing approaches, such as generalization, differential privacy, and federated learning, often involve trade-offs that reduce either the accuracy or the usability of the system. To address these challenges, we propose a privacy-preserving framework that combines local LLMs with deterministic rule-based methods to detect and anonymize sensitive data entirely on-device, reducing the risk of data leakage while maintaining functionality. We introduce a benchmark dataset with thousands of annotated test cases to evaluate privacy-preserving methods. We further demonstrate the practical application of this framework through a browser extension that anonymizes user input in real time, providing secure chatbot interactions. This work provides a scientifically grounded approach to improving privacy in AI-driven systems.

Deep-Learning Side Channel Attacks

Date: 02/05/2025

Speakers: Mabon Ninan, PhD Student @ Texas A&M University

Abstract:

BIO: Mabon Ninan is a first year PhD student working for Dr.Botacin, his research is focused on Machine Learning for Malware detection. This talk draws on his previous research conducted at the University of Cincinnati, where he focused on leveraging deep learning techniques to analyze and exploit side-channel attacks.

Abstract: A side-channel attack exploits information leaks such as power consumption, timing variations, or electromagnetic emissions to infer sensitive data from a system. In recent years, researchers have demonstrated that deep learning can offer significant advantages over traditional statistical-based attacks. However, further studies have shown that these attacks are highly sensitive to various discrepancies often overlooked that are crucial for understanding their effectiveness in real-world scenarios. This talk takes a deep dive into both software and physical discrepancies, demonstrating their impact on side-channel attacks and their applicability in practical settings. With a key focus on portability, the talk aims to explore strategies for developing more robust and transferable attacks.

Vulnerable and Non-vulnerable Code Creation Using Large Language Models

Date: 02/12/2025

Speakers: Bryson Brown

Abstract:

Speaker Bio: Bryson Brown graduated from the Air Force Academy with a bachelors in Cyber Science and am working on his Masters in Computer Science. He is currently researching the applications of LLMs to create datasets for code vulnerability detection.

Time: 12:00pm - 1:00pm
Location: PETR 214

Abstract: Vulnerabilities have long plagued software and are the root cause of many cyber attacks. Finding software vulnerabilities using artificial intelligence has been of great interest to researchers and businesses alike, but much of the research in the field has been hamstrung by a lack of quality data. Most existing datasets are constructed with previously written human code that is then labeled. This paper focuses on the viability of using Large Language Models (LLMs) to create vulnerable code to make a suitable dataset. Our LLM prompts successfully created 20,000 vulnerable and non-vulnerable C code samples across 13 Common Weakness and Enumerations (CWE) with a 98% label accuracy rate. Preliminary results show that LLMs fine tuned on this dataset struggle to transfer to other datasets.

Please feel free to join us at 12:00pm every Wednesday. If you want to schedule a talk, email Ze Sheng at zesheng@tamu.edu.

Cross-Regional Malware Detection via Model Distilling and Federated Learning

Date: 02/19/2025

Speakers: Dr. Marcus Botacin

Abstract:

Speaker Bio: Marcus Botacin is an assistant professor in the computer science and engineering department at Texas A&M University. He holds a Ph.D. in Computer Science (Federal University of Paraná, Brazil, 2021), a master’s in computer science (University of Campinas, Brazil, 2017) and a bachelor’s in computer engineering (University of Campinas, Brazil, 2015). Botacin’s main research interests are malware analysis and reverse engineering. Botacin’s research has been published in major scientific venues (e.g., ACM Transactions and USENIX Security). Botacin has spoken at academic, industry, and hacking conferences (e.g., USENIX Enigma and HackInTheBox).

Time: 12:00pm - 1:00pm
Location: PETR 214

Abstract: Machine Learning (ML) is a key part of modern malware detection pipelines, but its application is not straightforward. It involves multiple practical challenges that are frequently unaddressed by the literature works. A key challenge is the heterogeneity of scenarios. Antivirus (AV) companies for instance operate under different performance constraints in the backend and in the endpoint, and with a diversity of datasets according to the country they operate in. In the presented paper, we evaluate the impact of these heterogeneous aspects by developing a classification pipeline for 3 datasets of 10K malware samples each collected by an AV company in the USA, Brazil, and Japan in the same period. We characterize the different requirements for these datasets and we show that a different number of features is required to reach the optimal detection rate in each scenario. We show that a global model combining the three datasets increases the detection of the three individual datasets. We propose using Federated Learning (FL) to build the global model and a distilling process to generate the local versions. We order the samples temporally to show that although retraining on concept drift detection helps recover the detection rate, only a FL approach can increase the detection rate.

Understanding How Inconsistencies in ENS Normalization Facilitate Homoglyph Attacks

Date: 02/26/2025

Speakers: Jianwei Huang

Abstract:

Speaker Bio: Jianwei is a Ph.D. student in SUCCESS Lab in the Department of Computer Science and Engineering at Texas A&M University. His research interests focus on System Security & Web Security. Besides that, he is active in finding vulnerabilities in various applications.

Time: 12:00pm - 1:00pm
Location: PETR 214

Abstract: In recent years, the Ethereum Name Service (ENS) has garnered significant attention within the community for enabling the use of Unicode in domain names, thereby facilitating the inclusion of a wide array of character sets such as Greek, Cyrillic, Arabic, and Chinese. While this feature enhances the versatility and global accessibility of domain names, it concurrently introduces a substantial security vulnerability due to the presence of homoglyphs—characters that are visually similar to others across Unicode and ASCII sets. These similarities can be exploited in homoglyph attacks, posing a distinct threat to domain name integrity. This study investigates the prevalence and security implications of homoglyph domains within the ENS ecosystem, revealing that these domains present a more pronounced security concern compared to their counterparts in the traditional Domain Name System (DNS). Despite community efforts to counteract this issue through a normalization process prior to domain resolution, our analysis uncovers significant discrepancies in how the normalization processes are applied across various applications. This inconsistency could result in the same domain name being resolved to different addresses in different applications, underscoring a critical vulnerability. To systematically evaluate this inconsistency, we designed a tool for detecting application-level discrepancies in domain normalization process without requiring access to the application’s source code. Our evaluation on hundreds of real-world Web3 applications identifies widespread deviations from established homoglyph mitigation practices, with more than 60% digital wallets and 80% dApps (decentralized applications) not able to produce consistent ENS resolving results, potentially impacting millions of users. This analysis underscores the urgent need for a standardized implementation of normalization processes to safeguard the integrity and security of ENS domains.

Please feel free to join us at 12:00pm every Wednesday. If you want to schedule a talk, email Ze Sheng at zesheng@tamu.edu.

Fuzzing Complex Software with Structured Inputs Using LLMs

Date: 03/05/2025

Speakers: Zhicheng Chen

Abstract:

Time: 12:00pm - 1:00pm
Location: PETR 214

Abstract: Recent advances have explored the use of large language models (LLMs) for fuzz driver generation (e.g., PromptFuzz) and commit-based fuzzing (e.g., WAFLGo). However, both approaches have significant limitations. PromptFuzz performs fuzzing by combining APIs but cannot effectively fuzz parts affected by commit changes. Additionally, PromptFuzz does not address the challenge of generating drivers for complex programs like Nginx. WAFLGo, on the other hand, supports commit-based fuzzing but performs poorly on inputs with strict structural requirements. This talk will analyze the strengths and weaknesses of these approaches and present preliminary experimental results using Nginx, demonstrating how controlling mutation regions can significantly improve the efficiency of fuzzing highly structured inputs.

Please feel free to join us at 12:00pm every Wednesday. If you want to schedule a talk, email Ze Sheng at zesheng@tamu.edu.

LLMPirate: LLMs for Black-box Hardware IP Piracy

Date: 03/19/2025

Speakers: Matthew DeLorenzo

Abstract:

Time: 12:00pm - 1:00pm
Location: PETR 214

Abstract: The rapid advancement of large language models (LLMs) has enabled the ability to analyze and generate code nearly instantaneously, resulting in researchers and companies integrating LLMs across the hardware design and verification process. However, LLMs can also induce new attack scenarios within hardware development. One such threat not yet explored is intellectual property (IP) piracy, in which LLMs may be utilized to rewrite hardware designs to evade piracy detection. Furthermore, we propose LLMPirate, the first LLM-based technique able to generate pirated variations of circuit designs that successfully evade detection on 100% of tested circuits across multiple state-of-the-art piracy detection tools, even capable of pirating full processor designs.

Please feel free to join us at 12:00pm every Wednesday. If you want to schedule a talk, email Ze Sheng at zesheng@tamu.edu.