How Fast Is It? Benchmarking Your Code with Pytest-Benchmark

Exploring Ideas: A Blog on Technology, Startups, Food, and More

We’ve covered testing functionality (pytest), ensuring test coverage (coverage.py), testing across environments (tox), and isolating code with mocking. Now, let’s talk about speed. While correctness is usually the primary goal, the performance of your library code can be critical, especially if it’s intended for use in loops, data processing pipelines, or other potentially hot code paths.

But how do you reliably measure how fast a piece of your code runs? Simply timing it once with time.time() isn’t very accurate due to variations in system load, caching, and other factors. You need a more systematic approach - you need benchmarking.

Why Benchmark Your Library Code?

Identify Performance Regressions: Just like functional tests catch bugs, benchmarks catch unexpected slowdowns introduced during development or refactoring.
Compare Implementations: Objectively measure whether a new algorithm or approach is actually faster than the old one.
Optimize Key Code Paths: Focus optimization efforts on parts of your library where performance truly matters, backed by data.
Provide Performance Guarantees (with caution): While absolute times vary wildly between machines, benchmarks can give a relative sense of performance characteristics.

Introducing `pytest-benchmark`

pytest-benchmark is a pytest plugin designed to make benchmarking code snippets straightforward. It integrates directly into your test suite and provides a fixture that handles the complexities of running your code multiple times, collecting timing data, and performing statistical analysis to give you reliable results.

How it works:

Fixture: You request the benchmark fixture in your test function.
Execution: You call the benchmark fixture, passing the function (or callable) you want to benchmark, along with any arguments.
Repetition & Timing: The plugin runs your callable many times in a loop, measuring the execution time accurately.
Statistical Analysis: It calculates statistics like minimum, maximum, mean, median, and standard deviation of the execution times.
Reporting: Results are displayed in the pytest output, and can be saved and compared across runs.

Getting Started with `pytest-benchmark`

1. Installation:

pip install pytest-benchmark

2. Writing a Benchmark Test:

Let’s revisit our simple add_numbers function from the first post and benchmark it.

# tests/test_utils_benchmark.py
import pytest
from my_library.utils import add_numbers

# Note: No need to import pytest_benchmark, just request the fixture

def test_add_numbers_benchmark(benchmark):
    """Benchmark the add_numbers function."""
    # The benchmark fixture takes the function to call
    # and any positional or keyword arguments.
    result = benchmark(add_numbers, 2, 3)

    # You can still assert the correctness of the result
    assert result == 5

# Example with a lambda for slightly more complex setup
def test_add_numbers_floats_benchmark(benchmark):
    a, b = 1.5, 2.5
    result = benchmark(lambda: add_numbers(a, b))
    assert result == 4.0

3. Running and Interpreting the Output:

Simply run pytest as usual:

pytest tests/test_utils_benchmark.py

The output will include a table summarizing the benchmark results:

-------------------------------- benchmark: 1 tests --------------------------------
Name (time in us)                   Min      Max     Mean   StdDev  Median     IQR  Outliers Rounds Iterations
-------------------------------------------------------------------------------------------------------------------
test_add_numbers_benchmark       0.1234   0.5678   0.1500   0.0123  0.1450  0.0200       1;1   1000       1000
-------------------------------------------------------------------------------------------------------------------

Min, Max, Mean, Median: Statistical measures of the time taken per call (microseconds us by default).
StdDev, IQR: Measures of the variability or spread of the timings.
Rounds, Iterations: How many times the benchmark was run.

The key takeaway is usually the Mean or Median time, which gives you a central estimate of how long the operation takes on your machine.

Comparing Benchmarks

A major strength of pytest-benchmark is its ability to save results and compare them between runs. This is invaluable for detecting performance regressions.

Saving: pytest --benchmark-save=mybaseline
Comparing: pytest --benchmark-compare=mybaseline

If subsequent runs are significantly slower (or faster) than the saved baseline, pytest-benchmark will flag them in the output.

When to Use Benchmarking vs. Profiling

pytest-benchmark is excellent for measuring the overall execution time of relatively small, self-contained functions or code snippets. It tells you how fast something is.

But if a benchmark shows something is slow, pytest-benchmark doesn’t tell you why it’s slow. For that, you need profiling tools, which analyze where time is being spent within your code (e.g., which lines or function calls are taking the most time). We’ll cover profiling tools like pyinstrument and memory profiling with memray next in the series.

Integrating benchmarks into your test suite with pytest-benchmark provides an easy way to monitor the performance of critical parts of your Python library. It helps prevent accidental slowdowns and provides objective data when comparing different implementations.

Next, we’ll dive deeper into performance analysis with profiling, starting with pyinstrument to understand where the time is going.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.

How Fast Is It? Benchmarking Your Code with Pytest-Benchmark

Why Benchmark Your Library Code?

Introducing pytest-benchmark

Getting Started with pytest-benchmark

Comparing Benchmarks

When to Use Benchmarking vs. Profiling

Subscribe to the Newsletter

Introducing `pytest-benchmark`

Getting Started with `pytest-benchmark`