Where Did All the RAM Go? Memory Profiling with Memray

We’ve benchmarked our code’s speed (pytest-benchmark) and profiled where it spends CPU time (pyinstrument). But there’s another critical performance dimension: memory usage. A library that consumes excessive amounts of RAM can slow down entire applications, lead to crashes in memory-constrained environments, or indicate subtle bugs like memory leaks.

Especially for libraries handling data or running for extended periods, understanding memory allocation patterns is crucial. Just like with CPU time, we need specialized tools to see what’s happening under the hood.

Why Memory Profile Your Library?

  • Detect Memory Leaks: Identify objects that are allocated but never released, causing memory usage to grow indefinitely over time.
  • Find Memory Bloat: Pinpoint functions or data structures responsible for allocating unexpectedly large amounts of memory.
  • Optimize Memory Footprint: Understand allocation patterns to potentially use more memory-efficient data structures or algorithms.
  • Ensure Stability: Prevent out-of-memory errors in user applications, particularly in resource-limited settings (like containers or embedded systems).

Introducing Memray: A Modern Memory Profiler

Memray is a powerful memory profiler for Python applications developed by Bloomberg. It goes beyond some older tools by tracking every memory allocation and deallocation within the Python interpreter and in C/C++/Rust extension modules that your library might use. This provides a comprehensive view of memory usage.

Key Features:

  • Tracks Native Code: Profiles memory used by C extensions (e.g., from NumPy, Pandas, or your own compiled code), which many other Python profilers miss.
  • Detailed Reporting: Generates various reports, including flame graphs, which are excellent for visualizing where allocations occur.
  • Low Overhead (relatively): While memory profiling inherently adds some overhead, Memray is designed to be efficient.
  • Leak Detection: Includes features specifically designed to help track down memory leaks.

Using Memray

1. Installation:

# Memray itself
pip install memray

# Optional: For rich terminal reports (recommended)
pip install rich 

2. Basic Usage (Command Line):

The most common way to use Memray is to run your Python script through it:

memray run your_script.py --args

This command executes your_script.py. When the script finishes, Memray creates a capture file (e.g., memray-your_script.py.<PID>.bin) containing all the allocation data.

3. Generating Reports:

You then use memray subcommands to generate reports from this capture file. Some common ones are:

  • Flame Graph (HTML): Excellent for visualizing allocation hotspots.

    memray flamegraph memray-your_script.py.*.bin -o report.html

    Open report.html in your browser. Wider bars in the flame graph indicate functions that allocated more memory (directly or indirectly).

  • Table Report (Console): Shows a table of functions ranked by memory allocated.

    memray table memray-your_script.py.*.bin
  • Summary Report (Console): Provides high-level statistics.

    memray summary memray-your_script.py.*.bin
  • Stats Report (Console): Detailed statistics about allocations.

    memray stats memray-your_script.py.*.bin

4. Tracking Live Usage:

Memray can also track the memory usage of a running process:

# Get the PID of your running python process
memray attach <PID>

# Later, detach and generate reports from the created .bin file

5. Finding Leaks:

Memray can help identify leaks by showing allocations that were not deallocated by the time the process exited. Analyze the reports (especially flame graphs or tables sorted by leaked memory) for functions that allocate memory which persists unexpectedly.

Interpreting Memory Profiles

When looking at Memray reports:

  • Focus on High Allocations: Identify functions or code paths responsible for the largest amounts of memory allocation (often shown as total bytes or number of allocations).
  • Look for Unexpected Persistence: In leak analysis, pay attention to objects allocated within functions that you expected to be temporary.
  • Correlate with Code: Trace the high-memory functions back to your source code. Could a list be growing unbound? Are large data structures being held longer than necessary? Are files not being closed properly?
  • Consider Native Code: Remember Memray tracks C extensions. High memory usage might originate from a library like NumPy or Pandas, influencing how you use those libraries.

Memory Profiling vs. CPU Profiling

CPU profiling (pyinstrument, cProfile) tells you where time is spent. Memory profiling (memray) tells you where memory is allocated. A function might be very fast (low CPU time) but allocate huge amounts of memory, or vice-versa. You often need both types of profiling to get a complete performance picture.

Understanding and managing memory usage is a critical aspect of developing robust and performant Python libraries. Memray provides powerful tools to inspect allocations, identify hotspots, and hunt down leaks, helping you deliver libraries that are not only functional but also efficient in their resource consumption.

This concludes our initial series on testing Python libraries! We’ve covered functional testing, coverage, multi-environment checks, mocking, and performance/memory analysis. Implementing these practices will significantly boost the quality and reliability of your projects.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.