In July 2025, during an experiment with an artificial intelligence (AI) coding assistant at Replit –– a platform that offers AI-powered coding tools –– the tool deleted a live production database containing the records of nearly 2,500 executives and companies. The AI ignored explicit instructions, ran unauthorized commands and even fabricated fake user profiles to cover up the damage.
“As AI becomes increasingly embedded in everything from food ordering to healthcare, the consequences of its malfunctions — whether erroneous decisions, data breaches or catastrophic failures — are growing rapidly in number and severity. AI adoption is soaring, and so is the need for rigorous oversight, fault localization and code repair work, vital to our safety,” says Mohammad Wardat, Ph.D., assistant professor of computer science and engineering. Dr. Wardat is on a mission to make artificial intelligence more transparent and trustworthy for the next era of software.
Dr. Wardat’s knowledge of AI, Deep Learning (DL) and Large Language Models (LLM) make him acutely aware that even one flaw could mean far more than just a software crash. It could upend lives. This is the reason why his multi-faceted research efforts aim to engineer AI systems that heal themselves, giving developers new tools to detect faults before they cause harm.
One of Dr. Wardat’s recent projects is the development of Theia, an approach designed to localize structural bugs in DL programs. DL models often suffer from hidden flaws that become apparent only after training cycles, leading to challenges as developers strive to identify the root cause and address these bugs. To support bug detection and localization in DL programs, Dr. Wardat and his collaborators from Tulane University and Polytechnique Montréal, proposed Theia, which utilizes training dataset characteristics for early-stage bug detection during development.
“Unlike previous tools, Theia considers the training dataset characteristics to automatically detect bugs in DL programs using two DL libraries, Keras and PyTorch, and does so at the beginning of the training process. It identifies 57 out of 75 structural bugs in 40 real-world buggy programs —outperforming previous tools by a significant margin. It also alerts the developer with informative messages containing the bug’s location and actionable fixes, helping them improve the structure of the model,” Dr. Wardat explains Theia’s capabilities.
To tackle another major issue in DL–– separating and independently testing data and models –– Dr. Wardat co-authored the “Mock Deep Testing” methodology. Traditionally, data preparation and model design are tightly linked, which means that flaws in one stage can cascade into the other, which complicates the bug detection process. By introducing synthetic “mocks” that emulate real data and realistic model behaviors, the team created KUnit, a testing framework for the popular Keras library. Through empirical experiments, this approach helped pinpoint ten major problems in data preparation and 53 model design faults during development.
“Both Theia and KUnit enable early bug discovery, bringing pragmatic rigor to a field notorious for late-breaking and expensive failures. Our recently proposed LocatorGraph (LG) takes it even further by identifying the root causes of faults in sequence-based models (SBMs),” Dr. Wardat says.
SBMs differ from traditional methods primarily due to their black box, data-driven nature and sequential dependencies. Currently used methods often require specialized expertise and are not directly applicable to SBMs’ unique structures and requirements. LG, a graph neural network-based framework, transforms traditional debugging by converting code into TraceGraphs. This framework doesn’t just flag where faults tend to occur — it maps the code execution as a structural graph, helping isolate the true origins of failures with an unprecedented level of precision. It correctly identifies faults about 89.46% of the time (AUC) and balances well between finding faults and avoiding false alarms with an 81.34% score (F1 score).
Among Dr. Wardat’s other accomplishments is RAGFix , a tool that enhances code repair using LLM by harnessing retrieval-augmented generation alongside community knowledge from sources like Stack Overflow. His work on “TransBug” pushes the frontier of bug detection by integrating transformer models to diagnose faults within deep neural networks.
At Oakland University’s Laboratory for Software Innovation, Dr. Wardat’s research group of two Ph.D. students is expanding the boundaries of software reliability to autonomous and LLM–driven systems. Muhammad Anas Raza’s project on autonomous software lifecycle builds directly upon Dr. Wardat’s foundational work on AI-driven fault localization and repair –– exploring how LLMs can autonomously detect, explain and correct code errors throughout the software lifecycle.
Niful Islam’s research, LLMs as Both Bug Hunters and Bug Sources, investigates these powerful models’ dual natures: their abilities to identify software vulnerabilities, while also introducing new ones at times. This work extends Dr. Wardat’s research into the reliability of AI-assisted coding, aiming to create frameworks that evaluate and mitigate risks introduced by AI-generated code.
Dr. Mohammad’s research has attracted various types of funding, including grants from the National Science Foundation and internal support from OU. These projects target developing transformer models to identify and repair faults in convolutional neural networks, as well as and expanding the capabilities of AI in cybersecurity, autonomous driving and edge computing.
On a larger scale, Dr. Wardat’s research is about deepening the trust between humans and intelligent machines. It is about the evolving partnership between computer scientists and increasingly autonomous AI systems. With every model debugged and each fault mapped, Dr. Wardat is crafting software that is not just intelligent, but trustworthy as well.
For more information, please visit Dr. Wardat’s website: https://wardat.github.io