How Fast Is It? Benchmarking Your Code with Pytest-Benchmark
We’ve covered testing functionality (pytest
), ensuring test coverage (coverage.py
), testing across environments (tox
), and isolating code with mocking. Now, let’s talk about speed. While correctness is usually the primary goal, the performance of your library code can be critical, especially if it’s intended for use in loops, data processing pipelines, or other potentially hot code paths.
But how do you reliably measure how fast a piece of your code runs? Simply timing it once with time.time()
isn’t very accurate due to variations in system load, caching, and other factors. You need a more systematic approach – you need benchmarking.
Why Benchmark Your Library Code?
- Identify Performance Regressions: Just like functional tests catch bugs, benchmarks catch unexpected slowdowns introduced during development or refactoring.
- Compare Implementations: Objectively measure whether a new algorithm or approach is actually faster than the old one.
- Optimize Key Code Paths: Focus optimization efforts on parts of your library where performance truly matters, backed by data.
- Provide Performance Guarantees (with caution): While absolute times vary wildly between machines, benchmarks can give a relative sense of performance characteristics.
Introducing pytest-benchmark
pytest-benchmark
is a pytest
plugin designed to make benchmarking code snippets straightforward. It integrates directly into your test suite and provides a fixture that handles the complexities of running your code multiple times, collecting timing data, and performing statistical analysis to give you reliable results.
How it works:
- Fixture: You request the
benchmark
fixture in your test function. - Execution: You call the
benchmark
fixture, passing the function (or callable) you want to benchmark, along with any arguments. - Repetition & Timing: The plugin runs your callable many times in a loop, measuring the execution time accurately.
- Statistical Analysis: It calculates statistics like minimum, maximum, mean, median, and standard deviation of the execution times.
- Reporting: Results are displayed in the
pytest
output, and can be saved and compared across runs.
Getting Started with pytest-benchmark
1. Installation:
pip install pytest-benchmark
2. Writing a Benchmark Test:
Let’s revisit our simple add_numbers
function from the first post and benchmark it.
# tests/test_utils_benchmark.py
import pytest
from my_library.utils import add_numbers
# Note: No need to import pytest_benchmark, just request the fixture
def test_add_numbers_benchmark(benchmark):
"""Benchmark the add_numbers function."""
# The benchmark fixture takes the function to call
# and any positional or keyword arguments.
result = benchmark(add_numbers, 2, 3)
# You can still assert the correctness of the result
assert result == 5
# Example with a lambda for slightly more complex setup
def test_add_numbers_floats_benchmark(benchmark):
a, b = 1.5, 2.5
result = benchmark(lambda: add_numbers(a, b))
assert result == 4.0
3. Running and Interpreting the Output:
Simply run pytest
as usual:
pytest tests/test_utils_benchmark.py
The output will include a table summarizing the benchmark results:
-------------------------------- benchmark: 1 tests --------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers Rounds Iterations
-------------------------------------------------------------------------------------------------------------------
test_add_numbers_benchmark 0.1234 0.5678 0.1500 0.0123 0.1450 0.0200 1;1 1000 1000
-------------------------------------------------------------------------------------------------------------------
- Min, Max, Mean, Median: Statistical measures of the time taken per call (microseconds
us
by default). - StdDev, IQR: Measures of the variability or spread of the timings.
- Rounds, Iterations: How many times the benchmark was run.
The key takeaway is usually the Mean or Median time, which gives you a central estimate of how long the operation takes on your machine.
Comparing Benchmarks
A major strength of pytest-benchmark
is its ability to save results and compare them between runs. This is invaluable for detecting performance regressions.
- Saving:
pytest --benchmark-save=mybaseline
- Comparing:
pytest --benchmark-compare=mybaseline
If subsequent runs are significantly slower (or faster) than the saved baseline, pytest-benchmark
will flag them in the output.
When to Use Benchmarking vs. Profiling
pytest-benchmark
is excellent for measuring the overall execution time of relatively small, self-contained functions or code snippets. It tells you how fast something is.
However, if a benchmark shows something is slow, pytest-benchmark
doesn’t tell you why it’s slow. For that, you need profiling tools, which analyze where time is being spent within your code (e.g., which lines or function calls are taking the most time). We’ll cover profiling tools like pyinstrument
and memory profiling with memray
next in the series.
Integrating benchmarks into your test suite with pytest-benchmark
provides an easy way to monitor the performance of critical parts of your Python library. It helps prevent accidental slowdowns and provides objective data when comparing different implementations.
Next, we’ll dive deeper into performance analysis with profiling, starting with pyinstrument
to understand where the time is going.
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.