CodeSpeak can improve test coverage in your project
Today we release CodeSpeak 0.3.x. Please find the full release notes at the end of this post.
The key feature in this release is automated test coverage improvement. TL;DR: you can run codespeak coverage and CodeSpeak will run your tests, measure coverage, and add tests to bring it as high as possible.
Why test coverage matters
It wouldn't be much of an exaggeration to say that AI code generation is as good as the test suite that verifies the changes. The power of coding agents is more than just generating correct code from scratch (sometimes), the much more impressive thing they do is finding and correcting their own mistakes. And the better the test suite, the more bugs it can catch, and therefore the better the results that AI code generators can deliver.
While CodeSpeak is not a chat-based tool, it of course uses the best agentic code generating technology under the hood, and therefore benefits from good tests as much as any other agentic coding tool.
What is code coverage
What can we measure about the quality of a test suite? Structural metrics like the number of tests aren't very informative in most cases. One important aspect that can be captured by a metric is code coverage, i.e. what percentage of the code is the suite actually testing. Most of the time this is measured as a percentage of all lines of code that have been run by the test suite. Granted, some lines are not executable: comments, some declarations, etc. These are usually excluded from the calculation.
For example:
if temperature > ALARM_THRESHOLD:
indicator_color = RED
else:
indicator_color = GREENA good test suite will run both code paths: with temperature > ALARM_THRESHOLD and with temperature <= ALARM_THRESHOLD, and get 100% line coverage on this code. If we forget to test, for example, the case of temperature <= ALARM_THRESHOLD, the coverage will be 66.6% (2 out of 3 executable lines, else itself is not executable), and this is how we know that the suite can be improved.
The codespeak coverage command described below finds such gaps in test suites and adds missing tests until it reaches the desired coverage level (which may actually be lower than 100%, depending on the nature of your codebase).
How to use codespeak coverage
To illustrate the usage of codespeak coverage, we'll use our clone of microsoft/MarkItDown, the anything-to-markdown converter (āļø84.9K on GitHub), see a previous blog post on using Mixed mode.
Project setup
Install CodeSpeak first (you already have it, run uv tool upgrade codespeak-cli).
Prerequisites
Install uv
curl -LsSf https://astral.sh/uv/install.sh | shNow, restart your terminal or run source ~/.bashrc (source ~/.zshrc, depending on what terminal you are using).
Make sure uv is available:
uv --versionGet an Anthropic API key
CodeSpeak uses BYOK (Bring Your Own Key). Please get an API key at:
Configure ANTHROPIC_API_KEY variable:
- either just š paste your key when CodeSpeak asks you to (this will create an
.env.localfile in your project dir), - or
export ANTHROPIC_API_KEY=...
Install CodeSpeak
To install CodeSpeak with uv:
uv tool install codespeak-cliLog in with Google or email/password:
codespeak loginNow, let's clone CodeSpeak's fork of the MarkItDown repository. This repository already has CodeSpeak project initialized.
git clone https://github.com/codespeak-dev/markitdown markitdown-codespeak && cd markitdown-codespeakInitialise the environment and install dependencies:
uv venv --python=3.12 .venv
source .venv/bin/activate
uv pip install hatchImprove coverage
Now, let's make CodeSpeak bring Python test coverage to 100%:
codespeak coverage --target 100 --max-iterations 5The build will fail with the following message:
A placeholder for the test runner command was added to codespeak.json for spec
'packages/markitdown/src/markitdown/converters/eml_converter.cs.md'.
Please fill it in with the actual command, or run 'codespeak coverage --auto-configure --spec <spec>'
to auto-detect it. Use {tests_report_file} placeholder for the test results output in pytest-json-report format
and {tests_coverage_report_file} placeholder for the test coverage results in pytest-cov JSON format.
Wait, what has just happened? In order to run tests with coverage, CodeSpeak needs to know how to do it. CodeSpeak found no registered command that runs relevant tests for the managed Python code. So, it added a placeholder in codespeak.json which now needs to be filled. To help you, CodeSpeak can automatically detect this command and put it in the file for you. Let's try that!
codespeak coverage --auto-configureCodeSpeak analyzed the project and came up with a meaningful command, which it put in codespeak.json:
Auto-configured test runner for spec 'packages/markitdown/src/markitdown/converters/eml_converter.cs.md': cd packages/markitdown && hatch run pytest tests/ --json-report
--json-report-file={tests_report_file} --cov=src/markitdown --cov-branch --cov-report=json:{tests_coverage_report_file} --tb=short
So now, we're good to go and run our original command!
codespeak coverage --target 100 --max-iterations 5CodeSpeak will now execute up to 5 iterations trying to bring test coverage for the managed Python code to 100%. During initial test run, it will show you the command it uses. It will also report the test coverage the project had initially and after each iteration:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā CodeSpeak Progress āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā®
ā ā Improving coverage (2m 24s) ā
ā ā°ā ā Analyzing testable Python files (6.6s) ā
ā ā°ā ā _eml_converter.py (6.6s) ā
ā ā°ā ā Running and validating tests (initial run) (1m 44s) ā
ā Using command: cd packages/markitdown && hatch run pytest tests/ --json-report ā
ā --json-report-file=/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/tests_report.json --cov=src/markitdown --cov-branch ā
ā --cov-report=json:/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/coverage_report.json --tb=short ā
ā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š§ Alpha Preview š§ āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆ
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%
Note there are 6 pre-existing test failures, because some new tests are trying to download test data from the original GitHub repo, which the fork does not include. During test coverage improvement, all pre-existing test failures will be ignored. For this project, you can disable the failing tests by adding GITHUB_ACTIONS=1 to the test runner command in codespeak.json.
Verify the results
After build completes, let's look at the full output:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā CodeSpeak Progress āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā®
ā ā Improving coverage (8m 44s) ā
ā ā°ā ā Analyzing testable Python files (6.6s) ā
ā ā°ā ā _eml_converter.py (6.6s) ā
ā ā°ā ā Running and validating tests (initial run) (1m 57s) ā
ā ā°ā ā Improving coverage (6m 7s) ā
ā ā°ā ā Collect context & plan work (43.9s) ā
ā ā°ā ā Create test file for EML converter missing coverage (26.5s) ā
ā ā°ā ā Write test for _get_body_and_attachments covering nested message with payload as Message (0.0s) ā
ā ā°ā ā Write test for _human_readable_size covering KB, MB, and GB ranges (0.0s) ā
ā ā°ā ā Write test for EmlConverter.accepts covering rejection case (0.0s) ā
ā ā°ā ā Run validate_tests to check coverage (4m 45s) ā
ā ā°ā ā Running and validating tests (after iteration 1) (2m 5s) ā
ā ā°ā ā Running and validating tests (after iteration 2) (1m 55s) ā
ā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š§ Alpha Preview š§ āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆ
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%
Iteration 1: ran 231 tests, observed 8 test failures. Coverage: 88%
Iteration 2: ran 231 tests, observed 6 test failures. Coverage: 100%
Reached target coverage.
Done!
CodeSpeak added 4 tests and achieved 100% coverage š Now, this test sute can catch more bugs and support better agentic code generation.
The Road Ahead
This early version of codespeak coverage is the first step on our journey of perfecting test suites with CodeSpeak. Generating reliable code is crucial for our mission, and we'll keep improving the toolchain to add more capabilities in this area.
A few things we are planning to do in the future:
- support more languages (the current version only supports Python),
- branch coverage and other more sophisticated metrics,
- mutation testing,
- better CI/in-cloud support for test improvements.
Full Changelog since 0.3.1
New
- Added
codespeak coveragecommand to automatically improve test coverage for Python code, including auto-detection of your project's test runner configuration. codespeak takeoverno longer requires specs to be pre-configured.- Further improved build cancellation speed when using the MCP server integration.
Bug fixes
- Fixed "prompt is too long" errors that could occur in large mixed mode projects.
- Fixed the current Python environment leaking into child processes, which could cause dependency conflicts during builds.
- Improved error reporting when external API calls fail during a build.
- Cleaned up build progress output to reduce visual clutter.
See Also
- First glimpse of
codespeak takeover: Transition from Code to Specs in Real Projects
New features: Extract a spec from existing code, improvements to Mixed Mode and error handling