To make my life a bit easier, I built deep-research-mcp, a small Python agent that exposes several “deep research” backends through a single Model Context Protocol server (Anthropic, 2024). It allows Claude Code, Codex, Gemini CLI, or any MCP client to fire off long-running research tasks against whichever backend the user prefers.
What the server exposes
The MCP server exposes deep_research, research_with_context, and research_status. The first kicks off a task; the second resumes one after an optional clarification round; the third polls. Everything else – provider selection, timeouts, clarification models, system prompts – lives in ~/.deep_research.
Architecture
DeepResearchAgent optionally runs a clarification pass (triage the query, ask follow-up questions, enrich it), optionally rewrites the query into a longer research brief, and then delegates to one of four backends behind a common interface. All backends return the same normalised record – report, citations, reasoning steps, task id, execution time – so the MCP tools and the CLI do not care which one ran.
Backends
- OpenAI Responses API (default). Uses a Deep Research model such as
o4-mini-deep-researchwith built-inweb_search_previewandcode_interpretertools, background mode, and polling. Follows the reference pattern in OpenAI’s cookbook (OpenAI, 2025a; OpenAI, 2025b). - OpenAI Chat Completions. Same backend file, different code path (
api_style = "chat_completions"). No built-in tools, no polling – just a blocking chat call. This is the escape hatch for any OpenAI-compatible endpoint: Perplexity’s Sonar Deep Research (Perplexity, 2025), Groq, Together, Ollama, vLLM, orllama.cpp’sllama-server. - Gemini Deep Research. Implemented against the Interactions API, with Google Search and URL context as built-in tools (Google, 2024).
- DR-Tulu. Allen AI’s open research agent, called over its
/chatendpoint; the client-side integration is intentionally thin (AI2, 2025). - Open Deep Research. A self-contained smolagents stack with a text browser and search tools, using LiteLLM as the model layer (Roucher et al., 2025).
Why MCP?
Plugging a deep-research tool into Claude Code takes one line:
claude mcp add deep-research -- uv run --directory /path/to/deep-research-mcp deep-research-mcp
Stdio for local spawning, Streamable HTTP on /mcp for a shared server. The repository also ships a CLI (cli/deep-research-cli.py) and a Textual TUI (cli/deep-research-tui.py) that can either run the agent directly or act as MCP clients against the HTTP endpoint – useful for debugging providers without involving the model in the loop.
Clarification
Optional, disabled by default in my setup and probably going away at some point. Three small chat models – a triage model, a clarifier, and an instruction builder – decide whether a query is underspecified, ask follow-up questions, and merge the answers into a longer brief before the expensive provider call. The pattern is adapted from OpenAI’s cookbook. On local endpoints (Ollama, llama-server), clarification needs a model with reasonable structured-output behaviour; very small models trip the triage step.
Notes
- Long-running tasks can take hours. Client timeouts matter more than the agent’s own – raise
MCP_TOOL_TIMEOUT(Claude Code) ortool_timeout_sec(Codex) and pollresearch_statusrather than blocking a single tool call. - Clarification and instruction building remain OpenAI-compatible even when the main provider is Gemini or DR-Tulu; set
clarification_base_urlseparately. - The repo ships a Claude Code skill and a Codex skill so the assistant knows how to use the tools without re-reading the README.
Code, tests, and setup: github.com/pminervini/deep-research-mcp.