Geohash: When Clever Isn't Always Smart

Geohash: The Good, The Bad, and The Jagged Edges

Geohash is one of those beautifully simple ideas that feels just right when you first encounter it. Take a messy pair of latitude and longitude coordinates, run them through a clever algorithm, and poof – you get a short, shareable string that represents a rectangular area on the map. Indexing spatial data suddenly seems easier, proximity queries look doable with simple string prefixes, and you feel like a geo-wizard.

It’s used everywhere, from databases like Elasticsearch and Redis for geo-queries to various location-based services. And for good reason! It can be genuinely useful.

But like many elegant abstractions, the devil is in the details, and the smooth surface of Geohash hides some surprisingly jagged edges. Relying on it too heavily without understanding its limitations can lead you down some frustrating paths. Let’s talk about where Geohash stumbles and where people often trip up using it.

The Pitfall: Proximity Isn’t Guaranteed by Prefix

The most common misconception is that two locations with similar Geohash prefixes are always close together. While often true, it’s dangerously unreliable, especially near the boundaries of Geohash cells.

Imagine two points right next to each other, but one falls just inside one Geohash rectangle, and the other falls just outside into a neighboring one. Their Geohash strings might look completely different, even at shorter lengths.

Let’s see this with pygeohash. We’ll take two points very close together, straddling the meridian (0 longitude), which is a common Geohash boundary.

import pygeohash as pgh

# Point 1: Slightly west of the prime meridian
lat1, lon1 = 52.5, -0.00001
hash1 = pgh.encode(lat1, lon1, precision=12)

# Point 2: Slightly east of the prime meridian
lat2, lon2 = 52.5, 0.00001
hash2 = pgh.encode(lat2, lon2, precision=12)

print(f"Point 1: ({lat1}, {lon1}) -> Geohash: {hash1}")
print(f"Point 2: ({lat2}, {lon2}) -> Geohash: {hash2}")

# Let's check prefixes
print(f"Common prefix length: {len(pgh.common_prefix([hash1, hash2]))}")

Running this, you’ll likely see something like:

Point 1: (52.5, -1e-05) -> Geohash: gcpvj0000000
Point 2: (52.5, 1e-05) -> Geohash: u10hbpbpbpbp
Common prefix length: 0

These points are meters apart, yet their Geohashes share no common prefix! Why? Because the Geohash algorithm interleaves bits based on longitude and latitude ranges, and crossing certain boundaries (like 0 longitude or the equator) dramatically changes the resulting string.

The Trap: If you build a “nearby locations” feature solely by querying for locations sharing a Geohash prefix with your target point, you will miss valid neighbors that happen to fall across a cell boundary.

The Fix: You can’t just query the target point’s Geohash prefix. You must also calculate the Geohashes of its 8 neighbors (N, NE, E, SE, S, SW, W, NW) and query for points matching any of those prefixes. pygeohash even has a helper for this:

# Center hash from Point 1
center_hash = pgh.encode(lat1, lon1, precision=5) # Using a shorter precision for demonstration

# Get neighbors
neighbors = pgh.neighbors(center_hash)

print(f"Center hash: {center_hash}")
print(f"Neighbors: {neighbors}")
print(f"Total hashes to query: {[center_hash] + neighbors}")

This gives you a set of 9 hashes (the center plus its neighbors) to cover the area properly and avoid the edge-case blindness.

The Pitfall: Cells Aren’t Uniformly Sized (or Shaped)

Geohash partitions the world into rectangles. We think of grids as uniform, but projecting a sphere onto flat rectangles gets messy, especially away from the equator.

Geohash cells vary in size and aspect ratio depending on latitude:

  • Near the Equator: Cells are roughly square.
  • Near the Poles: Cells become tall and skinny rectangles.

This means a Geohash of a certain length (say, 6 characters) represents a much wider area (east-west) near the equator than it does near the poles.

# Equator example
hash_equator = pgh.encode(0, 0, precision=6)
bbox_equator = pgh.decode_exactly(hash_equator)
width_equator = bbox_equator[3] - bbox_equator[2] # Max lon - Min lon

# Arctic example
hash_arctic = pgh.encode(80, 0, precision=6)
bbox_arctic = pgh.decode_exactly(hash_arctic)
width_arctic = bbox_arctic[3] - bbox_arctic[2]

print(f"Geohash: {hash_equator} (Equator)")
print(f"Approximate Width (degrees longitude): {width_equator:.6f}")

print(f"
Geohash: {hash_arctic} (Arctic)")
print(f"Approximate Width (degrees longitude): {width_arctic:.6f}")
# Note: Actual kilometer width also depends on latitude!

You’ll see the degree width is constant, but remember a degree of longitude near the poles covers far less distance than near the equator.

The Trap: Assuming a fixed Geohash precision corresponds to a fixed distance radius is incorrect. A precision=6 search might cover a few hundred meters across near the equator but potentially only tens of meters across near the poles. Calculating distances based solely on Geohash precision without considering latitude is inaccurate.

The Fix: Always calculate actual great-circle distances (e.g., using the Haversine formula) between points after retrieving candidates via Geohash prefix (and neighbor) lookups. Geohash is for candidate selection, not precise distance calculation or filtering.

The Pitfall: Precision vs. Accuracy

A longer Geohash string gives you a smaller bounding box, increasing “precision.” But this doesn’t inherently increase the accuracy of your proximity analysis. As we saw, two points very close can have wildly different hashes. Conversely, two points at opposite corners of the same large Geohash cell might share a long prefix but be relatively far apart.

The Trap: Thinking precision=12 (the maximum standard Geohash) gives you centimeter-level useful proximity. It gives you a tiny bounding box, yes, but the boundary problem still exists, and points within that box can still be further apart than points just outside it.

The Fix: Use Geohash precision appropriate for your initial filtering scale. Need city-level candidates? Use a shorter precision. Need street-level? Use a longer one. But always, always follow up with actual distance calculations on the candidate set retrieved using the Geohash (and its neighbors). Don’t rely on the hash itself for the final distance check.

When Is Geohash Inappropriate?

Given these limitations, Geohash might be the wrong tool, or at least insufficient on its own, when:

  1. High-Precision Proximity is Critical: If you need guaranteed retrieval of all points within a precise radius (e.g., 50 meters) without fail, relying solely on Geohash prefix queries (even with neighbors) can be risky due to the rectangular cell shapes vs. circular query areas. You might need spatial indexes designed for true radial queries (like R-trees, k-d trees) or perform a broader Geohash query and then very accurate post-filtering.
  2. Uniform Area Coverage is Needed: If your analysis assumes each spatial bin covers a roughly equal area (e.g., density mapping), Geohash is unsuitable due to the latitude-dependent distortion. You’d need equal-area grid systems instead.
  3. You Forget the “Check Neighbors” Rule: Seriously, if you’re doing proximity queries and only checking the central Geohash prefix, you will have bugs.
  4. You Use Precision as a Proxy for Distance: Don’t filter results based on “they must share a prefix of length 8.” Filter based on calculated distance.

Conclusion: Use It Wisely

Geohash is a fantastic tool for coarse-grained spatial indexing and candidate selection. It’s great for quickly narrowing down potential points of interest in a large dataset before applying more computationally expensive distance calculations.

But it’s not a magic bullet for all geospatial problems. Understand its rectangular, non-uniform nature, remember the boundary problem (and the neighbor solution!), and never confuse Geohash precision with actual geographic distance or proximity guarantees. Use it as a powerful first-pass filter, but always refine with real distance math. Happy (geo)hashing!

Subscribe to the Newsletter

Get the latest posts and insights delivered straight to your inbox.