PyGeoHash Gets Type Hints: A Journey into Modern Python
I’ve been on a bit of a maintenance kick lately with PyGeoHash, and the latest update brings something I’m particularly excited about: comprehensive type hints and a new types
module. If you’ve been following along, you know I’ve been working on modernizing PyGeoHash, and this is another big step in that direction.
What’s New?
The latest PR adds two major features to PyGeoHash:
- Type Hints Throughout: Every function and class now has proper type annotations, making it crystal clear what goes in and what comes out.
- A New Types Module: A dedicated module for type-related functionality, including pandas integration and validation helpers.
But why should you care? Well, if you’re like me and spend way too much time trying to figure out what type of object you’re supposed to pass to a function, this update is for you.
The Joy of Type Hints
If you’re new to type hints in Python (or maybe you’ve been avoiding them like I used to), here’s the deal: they’re like little documentation snippets right in your code that tell you exactly what kind of data you’re working with. Instead of:
def encode(latitude, longitude, precision=12):
# What type is latitude? float? int? str? 🤷♂️
pass
You now get:
def encode(latitude: float, longitude: float, precision: int = 12) -> str:
# Ah, now I know exactly what I'm dealing with! 🎉
pass
The New Types Module
One of the coolest parts of this update is the new types
module. It’s not just about type hints – it’s about making your life easier when working with PyGeoHash data. Want to validate that your data is actually a valid geohash? We’ve got you covered. Working with pandas and want to make sure your Series of geohashes is properly typed? Yep, that too.
Pandas Integration Examples
Let’s look at some practical examples of how you can use the new pandas integration. First, import what you need:
import pandas as pd
from pygeohash.types import GeoHashType, validate_geohash_series
# Create a sample DataFrame with geohash data
df = pd.DataFrame({
'location': ['dqcjq', 'dqcjw', 'dqcjx'], # These are valid geohashes
'value': [1, 2, 3]
})
# Type your Series as a geohash series
df['location'] = df['location'].astype(GeoHashType)
# Now your IDE knows this is a series of geohashes!
# You'll get proper autocomplete and type checking
# Validate a series of geohashes
valid_mask = validate_geohash_series(df['location'])
print(f"All geohashes valid? {valid_mask.all()}")
# You can also use it in type hints for your own functions
def process_locations(locations: pd.Series[GeoHashType]) -> pd.Series[float]:
# Your IDE will now know this is a series of geohashes
# and warn you if you try to pass something else
return locations.str.len().astype(float)
This is particularly useful when you’re working with large datasets and want to ensure data quality. The type system will catch issues like:
# This will raise a validation error - '123' isn't a valid geohash
df.loc[0, 'location'] = '123'
# This will give you a type error at development time
def bad_function(locations: pd.Series[GeoHashType]) -> None:
return locations.mean() # IDE/mypy will warn: can't take mean of geohashes!
Tools of the Trade: Ruff and Mypy
This update wouldn’t have been possible without two amazing tools:
Ruff: The Speed Demon
Ruff is like having a very fast, very thorough code reviewer who never gets tired. It’s a Python linter written in Rust that’s ridiculously fast and catches a ton of potential issues. In this update, we used it to ensure our type hints were consistent and properly formatted.
Mypy: The Type Checker
Mypy is like having a friend who’s really, really good at spotting when you’re trying to add a string to an integer. It’s a static type checker that makes sure all our type hints actually make sense and catch real issues.
Common Errors Mypy Catches
Let me share some real-world examples of the kinds of errors Mypy helps catch before they become runtime bugs. These are the kinds of issues that might slip through code review but will fail in production:
from pygeohash import encode, decode
# Error: Argument 1 to "encode" has incompatible type "str"; expected "float"
def bad_encoding(location: str) -> str:
return encode(location, 45.0) # Oops, passed a string instead of float!
# Error: Incompatible return value type (got "Tuple[float, float]"; expected "str")
def wrong_return_type(lat: float, lon: float) -> str:
return decode('dqcjq') # decode returns a tuple, but we promised a string!
# Error: List item 0 has incompatible type "int"; expected "str"
def invalid_geohash_list() -> list[str]:
return [123, 'dqcjq', 'dqcjw'] # Mixed types in list
# Error: Incompatible types in assignment
def type_confusion() -> None:
precision: int = 'high' # Can't assign string to int!
# Error: "GeoHashType" has no attribute "split"
def undefined_method(geohash: GeoHashType) -> list[str]:
return geohash.split(',') # GeoHashType doesn't have this method!
These might seem obvious when you’re looking right at them, but they’re the kind of bugs that love to sneak into large codebases. With Mypy watching our back, these errors get caught immediately during development instead of surprising us in production.
What I particularly love about Mypy is how it catches errors in ways that unit tests might miss. You’d need a pretty comprehensive test suite to catch all these edge cases, but Mypy spots them automatically just by analyzing the code.
Why This Matters
You might be thinking, “Will, it’s just some extra syntax in the code, what’s the big deal?” But here’s why this update is actually pretty important:
- Better IDE Support: Your code editor can now give you much better autocomplete suggestions and catch type-related errors before you even run the code.
- Self-Documenting Code: The type hints serve as documentation that never gets out of date because it’s right there in the code.
- Fewer Runtime Errors: By catching type-related issues early, we can prevent a whole class of bugs from making it to production.
Type Hints for C Extensions
One of the trickier parts of this update was adding type information for our C extension module. If you’re not familiar with C extensions in Python, they’re modules written in C for performance reasons. The challenge is that Mypy can’t peek inside C code to figure out types, so we need to tell it what’s going on using stub files.
A stub file (.pyi
) is like a type-only version of your Python module. Think of it as a contract that says “here’s what this module promises to do” without actually implementing anything. Here’s a simplified version of what our stub file looks like:
# pygeohash/_geohash.pyi
def encode_c(latitude: float, longitude: float, precision: int) -> str: ...
def decode_c(geohash: str) -> tuple[float, float]: ...
def decode_extent_c(geohash: str) -> tuple[float, float, float, float]: ...
The ...
syntax is stub file shorthand for “the implementation is elsewhere”. This tells Mypy everything it needs to know about our C functions without having to understand the C code itself.
Why Stub Files Matter
Stub files are particularly important for PyGeoHash because:
- Performance with Safety: We get the speed of C with the safety of Python’s type system
- IDE Support: Your editor can provide autocomplete and type checking even for C functions
- Documentation: The stub file serves as a clear interface definition for our C extension
If you’re maintaining a Python library with C extensions, stub files are your friend. They bridge the gap between the low-level C code and Python’s type system, giving users a seamless experience regardless of implementation details.
Looking Forward
This update is part of a larger effort to make PyGeoHash more maintainable, more reliable, and more enjoyable to use. Type hints might seem like a small thing, but they’re a fundamental building block for modern Python development.
If you’re using PyGeoHash, update to the latest version to get all these goodies. And if you’re working on your own Python library, I highly recommend taking the time to add type hints – your future self (and your users) will thank you.
Want to see the nitty-gritty details? Check out the PR on GitHub. And as always, if you run into any issues or have suggestions for improvements, the issue tracker is open!
Subscribe to the Newsletter
Get the latest posts and insights delivered straight to your inbox.