← Blog
|release|By Kirill Safonov, Andrey Breslav|

CodeSpeak can improve test coverage in your project

āš ļø CodeSpeak is in Alpha Preview: many things are rough around the edges. Please use at your own risk and report any issues to our Discord. Thank you!

Today we release CodeSpeak 0.3.x. Please find the full release notes at the end of this post.

The key feature in this release is automated test coverage improvement. TL;DR: you can run codespeak coverage and CodeSpeak will run your tests, measure coverage, and add tests to bring it as high as possible.

Why test coverage matters

It wouldn't be much of an exaggeration to say that AI code generation is as good as the test suite that verifies the changes. The power of coding agents is more than just generating correct code from scratch (sometimes), the much more impressive thing they do is finding and correcting their own mistakes. And the better the test suite, the more bugs it can catch, and therefore the better the results that AI code generators can deliver.

While CodeSpeak is not a chat-based tool, it of course uses the best agentic code generating technology under the hood, and therefore benefits from good tests as much as any other agentic coding tool.

What is code coverage

What can we measure about the quality of a test suite? Structural metrics like the number of tests aren't very informative in most cases. One important aspect that can be captured by a metric is code coverage, i.e. what percentage of the code is the suite actually testing. Most of the time this is measured as a percentage of all lines of code that have been run by the test suite. Granted, some lines are not executable: comments, some declarations, etc. These are usually excluded from the calculation.

For example:

if temperature > ALARM_THRESHOLD:
    indicator_color = RED
else:
    indicator_color = GREEN

A good test suite will run both code paths: with temperature > ALARM_THRESHOLD and with temperature <= ALARM_THRESHOLD, and get 100% line coverage on this code. If we forget to test, for example, the case of temperature <= ALARM_THRESHOLD, the coverage will be 66.6% (2 out of 3 executable lines, else itself is not executable), and this is how we know that the suite can be improved.

The codespeak coverage command described below finds such gaps in test suites and adds missing tests until it reaches the desired coverage level (which may actually be lower than 100%, depending on the nature of your codebase).

How to use codespeak coverage

To illustrate the usage of codespeak coverage, we'll use our clone of microsoft/MarkItDown, the anything-to-markdown converter (ā­ļøŽ84.9K on GitHub), see a previous blog post on using Mixed mode.

Project setup

Install CodeSpeak first (you already have it, run uv tool upgrade codespeak-cli).

Prerequisites

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Now, restart your terminal or run source ~/.bashrc (source ~/.zshrc, depending on what terminal you are using).

Make sure uv is available:

uv --version

Get an Anthropic API key

CodeSpeak uses BYOK (Bring Your Own Key). Please get an API key at:

Configure ANTHROPIC_API_KEY variable:

  • either just šŸ“‹ paste your key when CodeSpeak asks you to (this will create an .env.local file in your project dir),
  • or export ANTHROPIC_API_KEY=...

Install CodeSpeak

To install CodeSpeak with uv:

uv tool install codespeak-cli

Log in with Google or email/password:

codespeak login

Now, let's clone CodeSpeak's fork of the MarkItDown repository. This repository already has CodeSpeak project initialized.

git clone https://github.com/codespeak-dev/markitdown markitdown-codespeak && cd markitdown-codespeak

Initialise the environment and install dependencies:

uv venv --python=3.12 .venv
source .venv/bin/activate
uv pip install hatch

Improve coverage

Now, let's make CodeSpeak bring Python test coverage to 100%:

codespeak coverage --target 100 --max-iterations 5

The build will fail with the following message:

A placeholder for the test runner command was added to codespeak.json for spec
'packages/markitdown/src/markitdown/converters/eml_converter.cs.md'.
Please fill it in with the actual command, or run 'codespeak coverage --auto-configure --spec <spec>'
to auto-detect it. Use {tests_report_file} placeholder for the test results output in pytest-json-report format
and {tests_coverage_report_file} placeholder for the test coverage results in pytest-cov JSON format.

Wait, what has just happened? In order to run tests with coverage, CodeSpeak needs to know how to do it. CodeSpeak found no registered command that runs relevant tests for the managed Python code. So, it added a placeholder in codespeak.json which now needs to be filled. To help you, CodeSpeak can automatically detect this command and put it in the file for you. Let's try that!

codespeak coverage --auto-configure

CodeSpeak analyzed the project and came up with a meaningful command, which it put in codespeak.json:

Auto-configured test runner for spec 'packages/markitdown/src/markitdown/converters/eml_converter.cs.md': cd packages/markitdown && hatch run pytest tests/ --json-report
--json-report-file={tests_report_file} --cov=src/markitdown --cov-branch --cov-report=json:{tests_coverage_report_file} --tb=short

So now, we're good to go and run our original command!

codespeak coverage --target 100 --max-iterations 5

CodeSpeak will now execute up to 5 iterations trying to bring test coverage for the managed Python code to 100%. During initial test run, it will show you the command it uses. It will also report the test coverage the project had initially and after each iteration:

╭────────────────────────────────────────────────────────────────────────────────────────── CodeSpeak Progress ───────────────────────────────────────────────────────────────────────────────────────────╮
│ ā— Improving coverage (2m 24s)                                                                                                                                                                           │
│ ╰─ āœ“ Analyzing testable Python files (6.6s)                                                                                                                                                             │
│    ╰─ āœ“ _eml_converter.py (6.6s)                                                                                                                                                                        │
│ ╰─ ā— Running and validating tests (initial run) (1m 44s)                                                                                                                                                │
│    Using command: cd packages/markitdown && hatch run pytest tests/ --json-report                                                                                                                       │
│ --json-report-file=/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/tests_report.json --cov=src/markitdown --cov-branch                                        │
│ --cov-report=json:/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/coverage_report.json --tb=short                                                             │
╰────────────────────────────────────────────────────────────────────────────────────────── 🚧 Alpha Preview 🚧 ───────────────────────────────────────────────────────────────────────────────────────────╯
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%

Note there are 6 pre-existing test failures, because some new tests are trying to download test data from the original GitHub repo, which the fork does not include. During test coverage improvement, all pre-existing test failures will be ignored. For this project, you can disable the failing tests by adding GITHUB_ACTIONS=1 to the test runner command in codespeak.json.

Verify the results

After build completes, let's look at the full output:

╭────────────────────────────────────────────────────────────────────────────────────────── CodeSpeak Progress ───────────────────────────────────────────────────────────────────────────────────────────╮
│ āœ“ Improving coverage (8m 44s)                                                                                                                                                                           │
│ ╰─ āœ“ Analyzing testable Python files (6.6s)                                                                                                                                                             │
│    ╰─ āœ“ _eml_converter.py (6.6s)                                                                                                                                                                        │
│ ╰─ āœ“ Running and validating tests (initial run) (1m 57s)                                                                                                                                                │
│ ╰─ āœ“ Improving coverage (6m 7s)                                                                                                                                                                         │
│    ╰─ āœ“ Collect context & plan work (43.9s)                                                                                                                                                             │
│    ╰─ āœ“ Create test file for EML converter missing coverage (26.5s)                                                                                                                                     │
│    ╰─ āœ“ Write test for _get_body_and_attachments covering nested message with payload as Message (0.0s)                                                                                                 │
│    ╰─ āœ“ Write test for _human_readable_size covering KB, MB, and GB ranges (0.0s)                                                                                                                       │
│    ╰─ āœ“ Write test for EmlConverter.accepts covering rejection case (0.0s)                                                                                                                              │
│    ╰─ āœ“ Run validate_tests to check coverage (4m 45s)                                                                                                                                                   │
│       ╰─ āœ“ Running and validating tests (after iteration 1) (2m 5s)                                                                                                                                     │
│       ╰─ āœ“ Running and validating tests (after iteration 2) (1m 55s)                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────── 🚧 Alpha Preview 🚧 ───────────────────────────────────────────────────────────────────────────────────────────╯
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%
Iteration 1: ran 231 tests, observed 8 test failures. Coverage: 88%
Iteration 2: ran 231 tests, observed 6 test failures. Coverage: 100%
Reached target coverage.

Done!

CodeSpeak added 4 tests and achieved 100% coverage šŸŽ‰ Now, this test sute can catch more bugs and support better agentic code generation.

The Road Ahead

This early version of codespeak coverage is the first step on our journey of perfecting test suites with CodeSpeak. Generating reliable code is crucial for our mission, and we'll keep improving the toolchain to add more capabilities in this area.

A few things we are planning to do in the future:

  • support more languages (the current version only supports Python),
  • branch coverage and other more sophisticated metrics,
  • mutation testing,
  • better CI/in-cloud support for test improvements.

Full Changelog since 0.3.1

New

  • Added codespeak coverage command to automatically improve test coverage for Python code, including auto-detection of your project's test runner configuration.
  • codespeak takeover no longer requires specs to be pre-configured.
  • Further improved build cancellation speed when using the MCP server integration.

Bug fixes

  • Fixed "prompt is too long" errors that could occur in large mixed mode projects.
  • Fixed the current Python environment leaking into child processes, which could cause dependency conflicts during builds.
  • Improved error reporting when external API calls fail during a build.
  • Cleaned up build progress output to reduce visual clutter.

See Also