Finding the Slowdown: Profiling Python Code with Pyinstrument
In our last post, we used pytest-benchmark
to measure the overall speed of our library’s functions. Benchmarking is great for spotting that something is slow or if a change introduced a performance regression. But it doesn’t tell you why it’s slow.
Is one specific line of code the culprit? Is it spending too much time in a helper function? Is an underlying library call taking longer than expected? To answer these questions, we need to move beyond benchmarking and into the realm of profiling.
Benchmarking vs. Profiling: What’s the Difference?
- Benchmarking: Measures the total time taken to execute a specific piece of code (e.g., a function call). Answers: “How fast is this?”
- Profiling: Analyzes the execution time within a piece of code, breaking down time spent by function calls, lines, or other units. Answers: “Where is the time being spent?”
Think of it like diagnosing a slow car. Benchmarking tells you the 0-60mph time. Profiling pops the hood and tells you if the problem is the engine, the transmission, or maybe a dragging brake.
Why Profile Your Library?
- Pinpoint Bottlenecks: Identify the exact functions or lines of code consuming the most CPU time.
- Understand Execution Flow: See the call stack and how time is distributed across different parts of your code.
- Targeted Optimization: Focus your optimization efforts where they will have the most impact, rather than guessing.
- Verify Assumptions: Confirm whether the parts of the code you think are slow actually are the bottlenecks.
Introducing Pyinstrument
: A Clearer View
Python has a built-in profiler, cProfile
, but its output can sometimes be dense and hard to interpret quickly. Pyinstrument
is a popular alternative statistical profiler that aims to provide clearer, more actionable insights with lower overhead.
How it works (Statistical Profiling):
Instead of meticulously tracking every single function call and exit (like cProfile
, a tracing profiler), pyinstrument
periodically interrupts the program (e.g., every millisecond) and records the entire call stack at that moment. By analyzing the frequency with which functions appear at different levels of the stack, it builds a statistical picture of where time is being spent. This approach typically has less performance overhead than tracing every call, making it suitable for profiling more complex applications without slowing them down drastically.
Using Pyinstrument
1. Installation:
pip install pyinstrument
2. Command-Line Usage:
The simplest way to profile a script is from the command line:
pyinstrument my_slow_script.py --arg1 --arg2
Pyinstrument will run your script and then print a hierarchical report to the console, showing the total time and the time spent within each function call.
3. Programmatic Usage (Context Manager):
For more fine-grained control, especially within a test or specific code block, you can use the Profiler
context manager:
from pyinstrument import Profiler
from my_library.complex_module import process_data
def run_complex_processing():
# Setup data...
data = list(range(10000))
options = {'mode': 'detailed'}
print("Starting profiling...")
profiler = Profiler()
with profiler:
# Code to profile
result = process_data(data, options)
print("Profiling finished.")
# Print the console report
print(profiler.output_text(unicode=True, color=True))
# Or save an HTML report
# profiler.open_in_browser()
# profiler.output_html(outfile="profile_report.html")
if __name__ == "__main__":
run_complex_processing()
4. Interpreting the Output:
The output (console or HTML) shows a tree structure representing the call stack. Each line typically includes:
- Time Spent: The total time spent in this function and any functions it called.
- Percentage: The percentage of the total profiling time spent here.
- Self Time: (Often in HTML/detailed views) The time spent purely within this function’s own code, excluding time spent in functions it called.
- Function Name & Location: The function name and the file/line number where it’s defined.
The tree is usually ordered with the biggest time consumers at the top, making it easy to spot the main bottlenecks.
Pyinstrument vs. Other Profilers
- vs.
cProfile
: Pyinstrument often has lower overhead and produces (arguably) more readable output out-of-the-box, focusing on wall-clock time.cProfile
is built-in, deterministic (tracing), and provides detailed call counts, but can have higher overhead and requires tools likesnakeviz
orpstats
to visualize nicely. - vs. Line Profilers (
line_profiler
): Line profilers tell you the time spent on each individual line within specific functions (which you must mark explicitly). Pyinstrument works at the function call level. They are often used together: Pyinstrument finds the slow function, thenline_profiler
finds the slow lines within that function. As long as your McCabe complexity is sane, the function level profiling should be plenty of detail to track down bottlenecks.
When benchmarks flag a performance issue, a profiler like pyinstrument
is your next step. Its clear output helps you understand where your code is spending its time, allowing you to make informed decisions about optimization. Remember, profile first, then optimize!
While pyinstrument
excels at CPU time profiling, what about memory usage? Next up, we’ll explore memray
, a powerful memory profiler for Python.
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.