Efficient batch operations for Shapely

I began exposing some of the new features of GEOS 3.1 in Shapely today. Geometries can be tuned up, or "prepared" ala SQL, for efficient batch operations. For example consider this set of random polygons scattered around (0.5, 0.0) and their intersection with a triangle:

>>> from shapely.geometry import Point, Polygon
>>> from random import random
>>> spots = [Point(random()*2.0-0.5, random()*2.0-1.0).buffer(0.1) for i in xrange(200)]
>>> triangle = Polygon(((0.0, 0.0), (1.0, 1.0), (1.0, -1.0)))
>>> x = [s for s in spots if triangle.intersects(s)]
>>> len(x)
67

Without preparing the triangle for batch intersection (or using an index), it takes about 12 ms to get the 67 intersecting spots:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if triangle.intersects(s)]',
...                  setup='from __main__ import spots, triangle')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
11917.51 usec/pass

A prepared triangle finds the same 67 in just a little more than 1/4 of the time:

>>> from shapely.prepared import prep
>>> pt = prep(triangle)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
3145.49 usec/pass

Update (2009-01-26): let's try an example from the real world. The boundary of Larimer County, Colorado, as represented in 2000 Census data, is a polygon with a 370 vertex exterior ring:

>>> from osgeo import ogr
>>> ds = ogr.Open('/Users/seang/data/census/co99_d00.shp')
>>> counties = ds.GetLayer(0)
>>> co069 = counties.GetFeature(1151)
>>> g = co069.geometry()
>>> larimer = loads(g.ExportToWkt())
>>> # nb: OGR properties and methods usually can't be safely chained
>>> len(larimer.exterior.coords)
370

Scatter 200 spots around the neighborhood of Larimer County:

>>> b = larimer.bounds
>>> w = b[2] - b[0]
>>> h = b[3] - b[1]
>>> cx = b[0] + w/2.0
>>> cy = b[1] + h/2.0
>>> from random import random
>>> from shapely.geometry import Point
>>> def spotter():
...     x = (random()*2.0-1.0)*w + cx
...     y = (random()*2.0-1.0)*h + cy
...     return Point(x, y).buffer(0.05)
...
>>> spots = [spotter() for i in xrange(200)]
>>> x = [s for s in spots if larimer.intersects(s)]
>>> len(x)
48

48 spots intersect the county, and it takes about 22 ms to find them using the unoptimized method:

>>> import timeit
>>> t = timeit.Timer('x = [s for s in spots if larimer.intersects(s)]',
...                  setup='from __main__ import spots, larimer')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
21728.65 usec/pass

It only takes about 1.5 ms using a prepared geometry, a speed-up of about 14X:

>>> from shapely.prepared import prep
>>> pt = prep(larimer)
>>> t = timeit.Timer('x = [s for s in spots if pt.intersects(s)]',
...                  setup='from __main__ import spots, pt')
>>> print "%.2f usec/pass" % (1000000 * t.timeit(number=100)/100)
1570.41 usec/pass

Comments

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

I've amazed that preparing a triangle makes a damn bit of difference. The difference is probably in the extra short-circuit tests in the prepared operation rather than the indexed shape (since a triangle has so few edges to index). Try this with a more complex polygon to really see the differences. It's the difference between N and log(N) on the number of edges, so increasing the polygon complexity is where things will soar.

Re: Efficient batch operations for Shapely

Author: Sean

I was a bit surprised too. See the update for an example of the kind of speed-up you were thinking of.

Re: Efficient batch operations for Shapely

Author: Paul Ramsey

That makes a nice difference !