mutmut-mcp: Tests, CI, and Survivor Prioritization

mutmut-mcp wraps the mutmut mutation testing tool so an agent can run mutation tests and act on the results without you doing anything. I wrote about it in the practical MCP use post, where I described the basic workflow: the agent runs mutmut, looks at surviving mutants, writes tests to kill them, validates, and repeats.

That workflow still works well. What didn’t work well was the lack of tests on the MCP server itself. A testing tool with no tests. I know.

What Changed

The server now has 23 tests covering command construction, error handling, virtual environment support, and the survivor prioritization logic. These are mostly mock-based since you don’t want to actually run mutation testing in a unit test, but they verify that the right commands are being built and the right error messages come back.

The project also got GitHub Actions CI, pre-commit hooks, ruff for linting and formatting, and uvx support:

uvx --from git+https://github.com/wdm0006/mutmut-mcp mutmut-mcp

Survivor Prioritization

The feature I still like most about this server is prioritize_survivors. When mutmut finds surviving mutants, you can end up staring at a long list wondering which ones actually matter. This tool scores each survivor by likely materiality using a simple heuristic: mutations in core logic get flagged as high priority, while changes to logging, debug statements, or print calls get deprioritized.

It’s not a sophisticated analysis, but it’s enough to focus your attention. When you’ve got 40 survivors and half of them are in logging setup, knowing you can skip those and focus on the actual business logic saves real time.

The Agentic Loop

The real power here is letting an agent drive the whole thing. The workflow looks like:

  1. Agent calls run_mutmut on your module
  2. Agent calls show_survivors to see what lived
  3. Agent calls prioritize_survivors to rank them
  4. Agent writes new tests targeting the high-priority survivors
  5. Agent runs the test suite to make sure the new tests pass
  6. Agent calls rerun_mutmut_on_survivor to verify the mutants are now killed
  7. Repeat until coverage is solid

I’ve had this loop running hands-off on real codebases and it genuinely improves test quality. The mutations it finds are often things you wouldn’t think to test: off-by-one errors, wrong comparison operators, negated conditions.


Links: