Malware analysis and reverse engineering

Summer 2026 Internship — Project Track 1

Malware Copilot: AI-Assisted Reverse Engineering Lab

Build, improve, and benchmark Model Context Protocol tools for AI-driven reverse engineering and malware analysis workflows.

Program Dashboard Back to Internship

Track Focus

AI agents for malware analysis

Reverse engineering, MCP server design, benchmarking, and real-world binary analysis using AI agent harnesses.

Project Sponsor

This track is sponsored by an industry expert bringing real-world malware analysis experience.

Industry Sponsor

Caleb Fenton

Co-Founder & CEO, Delphos Labs

Caleb brings deep expertise in malware analysis, reverse engineering, and security tooling. He is the driving force behind this track’s focus on AI-assisted binary analysis and MCP tool design.

About This Track

Explore how AI agents and MCP tooling can improve the speed, accuracy, and cost of real-world malware analysis.

Existing Ghidra MCP integrations are useful but limited. This project will explore how to make them more effective for real-world reverse engineering workflows. Interns may either build a new MCP server from scratch or improve existing open-source MCP tools by consolidating tool interfaces, improving tool descriptions, reducing token usage, and optimizing the agent’s ability to analyze binaries efficiently.

Interns will work with a range of binary samples, including simple learning exercises, harder malware samples, supply chain attack examples such as the xz-utils backdoor, and advanced benchmark sets such as BinaryAudit. The goal is to understand how different tool designs, model choices, and agent harnesses affect analysis quality, speed, cost, and reliability.

What You’ll Do

Hands-on engineering across agent tooling, binary analysis, and benchmarking.

Agent Harnesses

Experiment with adding MCP servers to agent harnesses such as Claude Code, Codex, OpenCode, or a custom terminal-based interface.

Evaluate how tool surface area, tool descriptions, and server implementation choices affect performance. Compare many narrow tools versus fewer general tools, test local models against hosted models, or rewrite a Python MCP server in Rust for speed and reliability.

Test-and-Measure Engineering

A major part of this project is rigorous benchmarking and evaluation.

Help define benchmark samples, run repeatable evaluations, compare models and tool configurations, and document what works best. Explore integrations with tools such as Ghidra, Mandiant Speakeasy, or open-source sandboxing systems.

Potential Deliverables

Concrete, shareable artifacts that demonstrate your work and contributions.

A working MCP server for malware analysis or reverse engineering.
Improved Ghidra MCP tooling with cleaner, better-described tools.
Integration with an agent harness such as Claude Code, Codex, OpenCode, or a custom TUI.
Benchmark results comparing different models, tool designs, and MCP configurations.
Experiments on reducing token usage and improving tool-selection accuracy.
Optional integrations with Speakeasy, sandbox tooling, or additional binary-analysis systems.
Final documentation explaining architecture, tradeoffs, lessons learned, and future improvements.

Skills You’ll Learn

Gain hands-on experience across malware analysis, AI tooling, and engineering measurement.

Reverse Engineering

Malware analysis fundamentals and Ghidra automation for binary-analysis workflows.

MCP & AI Agents

How MCP servers work, how agents use tools, and how to design effective tool descriptions.

Benchmarking

Evaluation design, measurement methodology, and performance and cost tradeoffs for AI agents.

Tool Design

Prompt and tool-description optimization, token-usage reduction, and tool-selection accuracy improvement.

Secure Practices

Safe handling of malware samples in controlled environments and responsible research methods.