Secure Coding Practices for Python Library Developers

Exploring Ideas: A Blog on Technology, Startups, Food, and More

“It’s just a simple utility function,” I thought to myself as I reviewed the code. “What could go wrong?” Then I saw it: the function was blindly executing shell commands based on user input. One carefully crafted argument later, and someone could have turned our helpful library into their personal backdoor.

When you’re building a Python library, you’re not just writing code; you’re creating something that other developers will trust and incorporate into their systems. That trust comes with responsibility. In this post, I’ll share the security practices I’ve learned that help me sleep better at night knowing I’m not accidentally creating vulnerabilities for my users.

Trust No One: The Art of Input Validation

Remember that scene in The Matrix where Neo learns not to trust what he sees? That’s how you should approach any input coming into your library. Whether it’s a function argument, a file’s contents, or a network response, treat it all as potentially malicious. Here’s how I approach this:

from typing import Optional
import re

def process_user_data(input_string: str) -> str:
    # First, validate the type
    if not isinstance(input_string, str):
        raise ValueError("Input must be a string")
    
    # Then, check length constraints
    if len(input_string) > 1000:  # Reasonable limit for our use case
        raise ValueError("Input string too long")
    
    # Finally, validate the format (e.g., only alphanumeric)
    if not input_string.isalnum():
        raise ValueError("Input must be alphanumeric")
    
    # Now we can safely process the input
    return input_string.lower()

def validate_email(email: str) -> bool:
    """Validate email using a simple regex pattern."""
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

def sanitize_filename(filename: str) -> Optional[str]:
    """Sanitize a filename to prevent path traversal."""
    # Remove any directory components
    clean_name = os.path.basename(filename)
    
    # Only allow certain characters
    if not re.match(r'^[a-zA-Z0-9._-]+$', clean_name):
        return None
        
    return clean_name

I’ve learned to prefer allowlisting (specifying what’s allowed) over denylisting (trying to block bad stuff). Why? Because I’m not clever enough to think of all the ways someone might try to break my code, but I can definitely specify what valid input looks like.

The Principle of Least Privilege: Less is More

Think of permissions like superpowers: cool to have, but with great power comes great responsibility. I try to follow this simple rule: if my code doesn’t absolutely need a permission to function, it shouldn’t have it.

Here’s an example:

# Before: Reading the entire config directory
def load_config():
    return glob.glob("/etc/myapp/**/*")

# After: Reading only what we need
def load_config():
    config_file = "/etc/myapp/config.yaml"
    if not os.path.exists(config_file):
        raise FileNotFoundError(f"Config file not found: {config_file}")
    return [config_file]

Graceful Error Handling: The Art of Saying Nothing

One of my favorite security principles is that good error messages are like good waiters: helpful when needed but not revealing what’s happening in the kitchen. Here’s how I structure error handling:

try:
    result = process_sensitive_data(user_input)
except ValueError as e:
    # Log the detailed error for debugging
    logger.debug(f"Validation error: {str(e)}", exc_info=True)
    # But only show a generic message to the user
    raise ValueError("Invalid input provided") from None
except Exception as e:
    # Log any unexpected errors
    logger.error("Unexpected error", exc_info=True)
    # Generic message for users
    raise RuntimeError("An unexpected error occurred") from None

But take a close look at this: where do these logs go? Is that ok?

Dependencies: Your Code’s Extended Family

Your library is only as secure as its weakest dependency. I learned this the hard way when a tiny utility package we used turned out to have a critical vulnerability. Now I follow these rules:

Keep a lean dependency list; every package you add is code you’re trusting
Use pip-audit regularly to check for known vulnerabilities
Set up Dependabot to automatically update dependencies

Here’s what my typical GitHub Actions workflow looks like:

name: Security Checks

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
      
      - name: Install dependencies
        run: |
          pip install pip-audit bandit
          
      - name: Check dependencies
        run: pip-audit
        
      - name: Run security linter
        run: bandit -r .

The Standard Library: Friend or Foe?

Even Python’s standard library can be a source of security issues if used carelessly. Here are some patterns I’ve learned to avoid:

# DON'T: Unsafe deserialization
import pickle
data = pickle.loads(user_input)  # Could execute arbitrary code!

# DO: Use safer alternatives
import json
data = json.loads(user_input)  # Much safer!

# DON'T: Shell injection vulnerability
import subprocess
subprocess.run(f"git clone {user_input}", shell=True)  # Dangerous!

# DO: Use safer parameter passing
subprocess.run(["git", "clone", user_input])  # Much better!

Testing for Security: Beyond “Does It Work?”

Security testing is different from regular testing. Instead of asking “Does this work?”, we need to ask “How could this break?” Here’s some toy test examples for validating protection against a security issue:

def test_path_traversal_attempt():
    with pytest.raises(ValueError):
        process_filename("../../../etc/passwd")  # Should reject path traversal
        
def test_command_injection_attempt():
    with pytest.raises(ValueError):
        process_command("echo 'hello'; rm -rf /")  # Should reject command injection

Secure Configuration Management: The Safe Way to Handle Secrets

One area that bit me recently was configuration management. We had a library that needed API keys for optional features, and initially, we just had users pass them directly as function arguments. Bad idea! Here’s how we fixed it:

from functools import wraps
import os
from typing import Optional

class SecureConfig:
    def __init__(self, config_path: Optional[str] = None):
        self._config = {}
        if config_path:
            self._load_from_file(config_path)
        self._load_from_env()
    
    def _load_from_file(self, path: str) -> None:
        """Load config from file, with proper permissions checking."""
        if not os.path.exists(path):
            raise FileNotFoundError(f"Config file not found: {path}")
            
        # Check file permissions (Unix-like systems)
        if os.name == 'posix':
            stat = os.stat(path)
            if stat.st_mode & 0o077:  # Check if group/others have any access
                raise PermissionError("Config file has unsafe permissions")
        
        with open(path) as f:
            self._config.update(json.load(f))
    
    def _load_from_env(self) -> None:
        """Load config from environment variables."""
        for key in os.environ:
            if key.startswith('MYLIB_'):
                self._config[key[6:].lower()] = os.environ[key]
    
    def get(self, key: str, default: Optional[str] = None) -> Optional[str]:
        """Get a config value safely."""
        return self._config.get(key, default)

def requires_api_key(f):
    """Decorator to check for required API key."""
    @wraps(f)
    def wrapper(self, *args, **kwargs):
        if not self.config.get('api_key'):
            raise ValueError("API key required for this operation")
        return f(self, *args, **kwargs)
    return wrapper

class MyLibrary:
    def __init__(self, config: Optional[SecureConfig] = None):
        self.config = config or SecureConfig()
    
    @requires_api_key
    def do_api_operation(self, data: str) -> str:
        # Use self.config.get('api_key') safely here
        return "Result"

This approach gives us several security benefits:

API keys can be loaded from environment variables (12-factor app style)
Config files are checked for proper permissions
Sensitive operations are clearly marked with decorators
Users have flexibility in how they provide credentials

A Case Study: The Path Not Taken

Let me share an example incident that these practices helped prevent. We have a function that generated temporary files based on user input:

# The original code (DON'T DO THIS!)
def create_temp_file(filename: str) -> str:
    path = f"/tmp/{filename}"
    with open(path, 'w') as f:
        f.write("some data")
    return path

# What could go wrong? Well...
create_temp_file("../../../etc/passwd")  # Oops! Path traversal
create_temp_file("file; rm -rf /")  # Double oops! Command injection

Thanks to our input validation practices, we caught this during code review. Here’s how we fixed it:

import tempfile
import os
from typing import Tuple

def create_temp_file(filename: str) -> Tuple[str, str]:
    """Create a temporary file safely.
    
    Returns:
        Tuple[str, str]: (directory path, filename)
    """
    # Sanitize the filename
    clean_name = sanitize_filename(filename)
    if not clean_name:
        raise ValueError("Invalid filename")
    
    # Create a secure temporary directory
    temp_dir = tempfile.mkdtemp()
    
    # Create the file inside our controlled directory
    path = os.path.join(temp_dir, clean_name)
    with open(path, 'w') as f:
        f.write("some data")
    
    return temp_dir, clean_name

This version:

Sanitizes the filename
Uses tempfile for secure temporary directory creation
Properly joins paths to prevent directory traversal
Returns both the directory and filename for proper cleanup

A team using our library later told us they had been targeted by an attacker who tried path traversal attacks. Thanks to these practices, the attempts failed.

The Bottom Line

Security isn’t a feature you can add later; it’s a mindset you need from the start. Every time I write a new function, I try to think: “How could this be misused?” It might seem paranoid, but in the world of security, a little paranoia is a good thing.

Remember: your code might run in contexts you never imagined, processing input you never expected. By following these practices, you’re not just protecting your code; you’re protecting every system that will ever use it.

And if you’re ever tempted to skip security practices because “it’s just a small library,” remember my story from the beginning. In security, there’s no such thing as “just” anything.

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.