Indexing geo-data 3: In practice

Since my last post I've found out that using the 'morton' number to index spatial number is also referred to as the Z-order.

To index using this order, you can use this stored function:

CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC 
BEGIN

  -- 11930464 is round(maximum value of a 32bit integer / 360 degrees) 
  
  DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;  
 
  SET @lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
  SET @lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
  SET bit = 1;
  
  WHILE bit <= @lat || bit <= @lng DO 
    
    IF(bit & @lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
    IF(bit & @lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
  
    SET pos = pos + 1;
    
    SET bit = 1 << pos;
  
  END WHILE; 
  
  RETURN morton;
END;

Some caveats

  • Since the function is using floating-point numbers, there will be rounding errors. These are generally very small, and for us well within the acceptable margin of error.
  • More significantly, this function assumes euclidean geometry (i.e.: a flat surface). The earth obviously isn't, so as you get closer to the poles you might get back results from outside your rectangle, or miss results from within.
  • It's not recommended to use this function directly in your WHERE clause. Even though the function is marked deterministic (i.e.: it will always yield the same results for the same arguments), the MySQL query optimizer currently ignored that modifier. So, set it in a temporary variable first (SET @number =) and use the variable in your where clause.
  • Don't forget dealing with queries spanning over the -180°, 180° longitude line.
  • Also store your actual longitude and latitude values in the DB. This function is not intended to be exact.
  • When you do your queries, select both on the morton number, and longitude and latitude.

A better way to do it?

In my research I've found the Hilbert Curve to be an even better algorithm, but haven't yet gone through the effort of trying to express it in SQL.

Web mentions

Comments

  • Roland Bouman

    Roland Bouman

    Hi! I just read your posts on geo indexing, and from there i ended up reading wikipedia on the matter. Interesting stuff, thanks! I just want to point out there is an error in your MySQL getGeoMorton function. The line RETURN @lng; Should not be there. I also think there are a number of things that could be improved in your implementation. For example, there is no need to use the user-defined varibles (the @ variables). You will find that these are quite a bit slower than proper local variables. There are more performance optimizations possible (and with MySQL stored functions, you need all the speed you can get). Here's what I came up with: DELIMITER go DROP FUNCTION IF EXISTS f_morton; go CREATE FUNCTION f_morton( p_longitude DOUBLE , p_latitude DOUBLE ) RETURNS BIGINT UNSIGNED DETERMINISTIC BEGIN DECLARE v_bit BIGINT UNSIGNED DEFAULT 1; DECLARE v_shift BIGINT UNSIGNED DEFAULT 1; DECLARE v_morton BIGINT UNSIGNED DEFAULT 0; DECLARE v_latitude BIGINT UNSIGNED DEFAULT CAST((p_latitude + 90) * 11930464 AS UNSIGNED); DECLARE v_longitude BIGINT UNSIGNED DEFAULT CAST((p_longitude + 180) * 11930464 AS UNSIGNED); WHILE v_bit <= v_latitude || v_bit <= v_longitude DO IF v_bit & v_longitude THEN SET v_morton := v_morton | v_shift; END IF; SET v_shift := v_shift << 1; IF v_bit & v_latitude THEN SET v_morton := v_morton | v_shift; END IF; SET v_shift := v_shift << 1; SET v_bit := v_bit << 1; END WHILE; RETURN v_morton; END; go DELIMITER ; Some testing indicated this is about 20-25% faster. I hope it helps! Cheers, Roland
  • Evert

    Evert

    Dropped that return line. I must have left it in after debugging; Thanks for that Roland!
  • Evert

    Evert

    jbland, From what I read there is that geohashing uses the exact same concept for encoding lat/long and shares the same limitations. The difference is that I encode to a 64bit integer, geohashing encodes this number further as a string. Using the integer will give you better performance. Evert
  • jbland

    jbland

    Evert, im thinking of going with morton numbers to help with clustering, but im unsure about how to dial up/down precision... any pointers
  • Evert

    Evert

    It's not really easy to calculate range, what you could do is simply drop bits off the bottom, like: morton XOR 0x1 to drop the last bit, morton XOR 0x2 for the 2 last bits, etc.. That way you might not get the best possible clusters, but it could definitely push you in the right direction.
  • kellogs

    <p>Very good stuff, especially since you have had a "real-life" set of data to test against. Will attempt an Hilbert curve implementation and share my experience on SO.</p>
  • James Peterson

    <p>Hello, thank you very much for this! Here is some extra resources for PostgreSQL Users:</p><p><a href="http://www.spatialdbadvisor.com/postgis_tips_tricks/169/morton-key-function-for-postgresql-postgis" rel="nofollow noopener" title="http://www.spatialdbadvisor.com/postgis_tips_tricks/169/morton-key-function-for-postgresql-postgis">http://www.spatialdbadvisor...</a></p><p>I made some tweaks to the above link to support large inputs (latitude/longitude coordinates in my case) to support precision up to 4 decimal places. As <a href="http://spatialdbadvisor.com" rel="nofollow noopener" title="spatialdbadvisor.com">spatialdbadvisor.com</a>'s script threw "int out of range" errors.</p><p>See edit here:</p><p><a href="http://pastebin.com/zzSU4X7J" rel="nofollow noopener" title="http://pastebin.com/zzSU4X7J">http://pastebin.com/zzSU4X7J</a></p><p>And further proofs on random set of coordinates:</p><p>Sample size 10,000: <a href="http://oi62.tinypic.com/2448pvk.jpg" rel="nofollow noopener" title="http://oi62.tinypic.com/2448pvk.jpg">http://oi62.tinypic.com/244...</a></p><p>Sample size 1,000: <a href="http://oi60.tinypic.com/r0pp5i.jpg" rel="nofollow noopener" title="http://oi60.tinypic.com/r0pp5i.jpg">http://oi60.tinypic.com/r0p...</a></p><p>Sample size 100: <a href="http://oi58.tinypic.com/10py0pf.jpg" rel="nofollow noopener" title="http://oi58.tinypic.com/10py0pf.jpg">http://oi58.tinypic.com/10p...</a></p><p>I use the procedure like so to improve record locality on hard disk:</p><p>---</p><p>alter my_table add column morton_col int8;</p><p>update my_table set morton_col = morton(cast(trunc(least(longitude_col+180,360)*10000) as int8), cast(trunc(least(latitude_col+90,180)*10000) as int8));</p><p>create index idx_morton_col on my_table(morton_col) with (fillfactor=100);</p><p>cluster my_table using idx_morton_col;</p><p>---</p><p>The multiplication by 10,000 ensures that the coordinate is represented to a precision of 4 decimal places when encoded as a morton value (which is going to represent an accuracy of around 11.1 meters even at the equator - more than precise enough!)</p>
  • SpatialDBAdvisor

    <p>The morton key implementation on my website is an almost direct translation of the original David Mark algorithm. That algorithm was implemented in C and used unsigned 32 bit integers.</p><p>I have used the morton implementation on my site a lot on a lot of data and have not come across the overflow errors. I am not saying that they won't exist for I have never tried creating morton keys for lat/long data, or the merits of treating unequal lat/long ranges as one would planar data. But let's assume that it is OK to do so, what data range is being indexed (is it the whole world)? How big are the Col/Row grids (4 decimal digits of a degree); How many grids are to be created in x and y direction? Does the morton grid start with 0 0 as the lower left corner? Whether the Row/Col ranges are equal (a recommended requirement for a morton key)?</p><p>The only comment I have is why the introduction of the (Oracle implementation) Left_Shift function? Postgresql's "&lt;&lt;" operator implements a bitwise shift left:</p><p>select 2::int8 &lt;&lt; 2 as result;<br>result<br>bigint<br> 8</p><p>Which is same as:</p><p>select left_shift(2,2) as result;result<br>bigint<br>8</p><p>I will update the parameters to int8 in the function on my site.</p><p>Thanks<br>Simon</p>
  • SpatialDBAdvisor

    <p>Morton use is subject to the restrictions I record on my site:</p><p>"There is a restriction to how a Morton space curve can be applied: the grid must be regular. This means that while the X and Y dimensions of the grid may be different (ie giving a rectangle not a square) the number of grids either side must be the *same* (ie a 100x100 grid not a 150x100 grid). Another requirement that affects the generation of a correct space curve is that the (row,column) index values must always be the same. So, in a 100x100 grid the lower left grid should be referenced as (0,0) or (1234,1234) and not (0,1) or (1234, 3456)."</p>
  • SpatialDBAdvisor

    <p>Here is David Mark's implementation of a Hilbert Curve in C</p><p>#define XMAX 255<br>#define YMAX 255<br>#define LEVEL 8<br>unsigned int xy_to_hilbert (x,y)<br>unsigned int x,y;<br>/**************************************************************/<br>/* */<br>/* converts xy to Hilbert-Peano key */<br>/* */<br>/* Written: David M. Mark, CSIRO-DIT CSIS, Sept 1988 */<br>/* */<br>/**************************************************************/<br>/**************************************************************/<br>/* first, an 'almost dummy' procedure to add parameters to */<br>/* a simple call */<br>/**************************************************************/<br>{<br>unsigned int order[4];<br>unsigned int xy_to_hlbt ();<br>order[0]=0; order[1]=1; order[2]=3; order[3]=2;<br>return (xy_to_hlbt (x,y,XMAX,YMAX,order,LEVEL));<br>}</p><p>/**************************************************************/<br>/* and next, the actual recursive procedure, renamed */<br>/* xy_to_hlbt() */<br>/**************************************************************/<br>unsigned int xy_to_hlbt (x,y,xmax,ymax,order,level)<br>unsigned int x,y,xmax,ymax,order[],level;<br>{<br>unsigned int half_side,add_key,child,temp,q,i;<br>if (level==0)<br>return (0);<br>else<br>/* determine q, the Morton quad containing x,y */<br>{<br>half_side = 01&lt;&lt;(level-1);<br>if (x &gt; (xmax-half_side))<br>if (y &gt; (ymax-half_side)) q=3;<br>else q=1;<br>elseif (y &gt; (ymax-half_side)) q=2;<br>else q=0;<br>/* now determine which 'child' of the parent quad the current x,y falls in */<br>for (i=0; i&lt;4; i++)<br>if (order[i]==q) child=i;<br>/* add_key is the contribution to the key of the current bit_pair */<br>add_key = child&lt;&lt;(2*(level-1));<br>/* modify xmax and xmax to those of the current q */<br>if ((q==0)||(q==2)) xmax -= half_side;<br>if ((q==0)||(q==1)) ymax -= half_side;<br>/* now, if we are in the first (0) or last (3) child<br> of the parent quad, we must change the order of the<br> children for the next level of recursion */<br>if (child==0)<br>{<br>temp = order[1];<br>order[1] = order[3];<br>order[3] = temp;<br>}<br>if (child==3)<br>{<br>temp = order[0];<br>order[0] = order[2];<br>order[2] = temp;<br>}<br>/* finally, make a recursive call, and add 'add_key' to the result */<br>return ((xy_to_hlbt(x,y,xmax,ymax,order,(level-1))) + add_key);<br>}<br>}</p><p>regards<br>Simon</p>