2020-03-29

Airbnb-like marker culling for PostGIS

553 words - 2 minutes

When placing markers on maps there is a delicate balance between too many markers and too clumsy UX.

The most popular solution to this is marker clustering, where overlapping or visually clustered markers are collapsed into single nodes indicating the number of nodes in said cluster. Clicking the cluster node then opens up the cluster, possibly zooming in to the map to reveal the nodes underneath.

However, clustering is not a perfect solution. By showing the user too many options (or markers), you’re also subjecting them to a choice overload. Additionally, clustering the nodes is visibly hiding relevant information from the user, resulting in a subconscious annoyance and constant zooming back and forth.

Airbnb had a clever solution to this: selectively remove markers to keep the number of visible markers pleasantly low.

I’m calling the trick to achieve this marker culling (after face culling)

Marker culling in PostGIS

Here’s the full query, assuming we have a table places containing lonlat entries (of st_point type), we want to cluster the places into 10 clusters, and we only select top 5 places from each cluster.

WITH sparse_places AS (
  SELECT
    lonlat, id, COUNT(*) OVER() as count
  FROM places
), clustered AS (
  SELECT
    sparse_places.id,
    ST_ClusterKMeans(lonlat::geometry, LEAST(count::integer, 10)) OVER() AS cid
  FROM sparse_places
), culling_candidates AS (
  SELECT
    clustered.id, ROW_NUMBER() OVER (PARTITION BY cid)
  FROM clustered
) SELECT culling_candidates.id
FROM culling_candidates
WHERE row_number <= 5;

The SQL statement is split into multiple Common Table Expressions:

sparce_places

Pick the columns we will be using (lonlat for culling, id for results, count as a limit for kmeans later)

clustered

Do the actual clustering using PostGIS’ ST_ClusterKMeans. In k-means clustering you provide the number of wanted clusters and each point is converged to a cluster index. Importantly, the number of clusters provided to the function must be lower or equal to the number of input rows, which is why we had to take the count in sparce_places.

culling_candidates

Finally, we run a window function over the clusters assigning an index to each place within each cluster. If you want to implement ordered culling (e.g. prefer places with lower distance or cost to others when culling), you can add an ORDER BY clause within the OVER().

main query

Finally, the main query returns places from culling_candidates with row_number less than 5 (namely, five or less places from each cluster).

Here’s roughly what the clustering looks like