Microsoft has quietly rolled out Project Ire, an autonomous AI agent blending large language models with classic reverse-engineering tools like angr and Ghidra to detect and classify malware with impressive precision. In early trials involving Windows driver samples, the prototype achieved up to 98% precision and a recall of 83%, enabling it to make the first-ever AI-powered decision strong enough to let Windows Defender automatically block an advanced persistent threat (APT) strain. In broader tests on nearly 4,000 previously unclassified files, Project Ire flagge 90% of malicious files with a false-positive rate of only about 2–4%, though it still missed a portion of threats overall. The system logs a clear “chain of evidence” for each verdict, supporting analyst oversight. Developed jointly by Microsoft Research, Defender Research, and Discovery & Quantum teams, Project Ire is slated for deeper integration into the Defender suite as a “Binary Analyzer,” aiming to speed detection and reduce analyst fatigue while scaling security defense.
Sources: IT Pro, Hacker News, Axios
Key Takeaways
– AI-driven automation is shifting cybersecurity paradigms: Project Ire can autonomously reverse-engineer malware samples and take blocking decisions without human input, a step-change in defensive capability.
– High precision, variable recall: Initial tests show excellent precision (up to 98%) and low false positives (2–4%), but recall—identifying all threats—remains a work in progress, highlighting where human vigilance still matters.
– Analyst-friendly transparency: Each decision from Project Ire includes an evidence trail, reinforcing human oversight and reducing risk of unchecked automated actions.
In-Depth
Microsoft’s quietly powerful new prototype, Project Ire, is subtly changing the rules of the malware detection game.
Think of it as a hybrid genius: part large-language-model wizard, part old-school reverse-engineer. It digs into unknown software with tools like angr and Ghidra, recreates control-flow graphs, and reasons through behavior—effectively performing what used to be a labor-intensive human task.
Results? In one public dataset of Windows drivers, Project Ire nailed a 98% precision rate with 83% recall, showcasing just how sharp its instincts are when it flags something malicious. In tougher testing conditions—nearly 4,000 hard-to-classify files—it still delivered 90% precision with a 2–4% false-positive rate, even if it caught only one in four of the threats present. That means fewer distractions for human analysts, who are often buried under endless scans and alerts.
And it’s not just fast—Project Ire includes an auditable chain of evidence that lays out exactly how it reached a decision, putting human oversight firmly in the driver’s seat. Built by a cross-team effort spanning Microsoft Research, Defender Research, and Quantum/Discovery, the plan is to fold Project Ire deeper into the Defender platform as a “Binary Analyzer.” With this, Microsoft could speed detection, scale defenses, and lighten analyst workloads—while still keeping that essential human check in place.

