Indexing geo-data 2 : simple benchmark
After my last post, I decided to do some benchmarking. For this benchmark I used the US data from Geonames.org. I inserted all the data (1,886,420 records) and searched for a big area around new york (between 41.3665028663272, -72.41912841796875 and 40.113789191575236, -75.83038330078125). We're expecting to get 38259 records back for this query.
Test 1: Selecting on longitude, latitude
SELECT SQL_NO_CACHE lat,lng FROM geotest WHERE lat < 41.3665028663272 AND lng < -72.41912841796875 AND lat > 40.113789191575236 AND lng > -75.83038330078125;
No index 1.73s. With B-Tree index on latitude 0.72s.
Test 2: Using spatial extensions and POINT field
SET @rect = 'POLYGON((41.3665028663272 -72.41912841796875,41.3665028663272 -75.83038330078125,40.113789191575236 -75.83038330078125,40.113789191575236 -72.41912841796875,41.3665028663272 -72.41912841796875))'; SELECT SQL_NO_CACHE astext(location) from geotest where intersects(location,GeomFromText(@rect));
Time taken without index: 9.52s. With a spatial index: 0.73s.
Test 3: Using morton number
SELECT SQL_NO_CACHE lat,lng FROM geotest WHERE morton > 3667198027933142835 AND morton < 3671111582099533095 AND lat < 41.3665028663272 AND lng < -72.41912841796875 AND lat > 40.113789191575236 AND lng > -75.83038330078125;
Time taken without index: 0.78s, with index on on morton: 0.65s.
In the table below 'small' is around times square, 'medium' is new york city and 'large' is about 2/3rd of the US. I didn't bother doing all benchmarks for the ones I knew were slower.
|index on latitude||0.72s|
|using point field||9.52s|
|using point field + spatial index||0.00s||0.73s||18.82s|
|using morton number||0.78s|
|index on morton||0.00s||0.65s||3.23s|
So it seems like using the morton number is a bit faster than using the spatial index, but there's not a huge difference considering this relatively large dataset. Using the spatial index has a number of benefits, the biggest being that you're easily able to select on much more complex queries (polygons and such). The major benefit of the morton number methodology is that it's significantly faster, especially as your dataset grows and you're able to use InnoDB, which can be much better performing if you're expecting a lot of updates.
Early update: my coworker kevin mentions the spatial queries are likely slowed down because 'astext' is called for every row. I'll have to do these again with separate lat/lng fields.
Update 2: Adding a lat and lng field and selecting on those is actually even slower (consistently 0.91s).
Update 3: With a smaller resultset both the spatial index and the morton index are both pegged at 0.00s. With a much larger resultset (big chunk of the US) I got 18.82s for the spatial index, and 3.23s for the morton index.