Rasterio 0.34

Last fall Even Rouault announced that GDAL 2.1 would have a new Amazon S3 virtual file system. Extending GDAL's capability to make HTTP byte range requests to AWS's HTTPS + XML S3 API, Even has made it possible to efficiently access partial content of S3 objects using certain formats like GeoTIFF. In other words, metadata of a GeoTIFF on S3 or overviews stored as sub-images can be accessed without retrieving the bulk of its image data. Génial!

With help from Even, Rob Emanuele, and Matt Perry, Rasterio 0.34 has a handy abstraction for this feature. Rasterio uses s3:// URIs instead of GDAL's /vsis3/ paths because URIs are how we identify resources on the web and because this is the URI scheme – if unregistered – used by the AWS Command Line Interface. The same URIs you use with the AWS CLI

$ aws s3 ls s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF
2015-03-14 17:20:01   51099231 LC81390452014295LGN00_B1.TIF
2015-03-14 17:20:30    6626356 LC81390452014295LGN00_B1.TIF.ovr

can also be used with Rasterio:

$ rio info s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF --indent 2
{
  "nodata": null,
  "dtype": "uint16",
  "crs": "EPSG:32645",
  "bounds": [
    381885.0,
    2279085.0,
    610515.0,
    2512815.0
  ],
  "count": 1,
  "blockxsize": 512,
  "driver": "GTiff",
  "transform": [
    30.0,
    0.0,
    381885.0,
    0.0,
    -30.0,
    2512815.0
  ],
  "blockysize": 512,
  "tiled": true,
  "lnglat": [
    86.96327090815723,
    21.666821827007748
  ],
  "shape": [
    7791,
    7621
  ],
  "compress": "deflate",
  "res": [
    30.0,
    30.0
  ],
  "width": 7621,
  "height": 7791,
  "interleave": "band"
}

Rasterio gets its credentials in the same manner as the AWS CLI (see Configuring the AWS Command Line Interface). If you're already using the AWS CLI no extra configuration is needed to start using Rasterio on S3 raster datasets.

A close read of the GDAL debug logs shows that only 16384 bytes of this 50MB TIFF are fetched in order to get the metadata printed above. That's an efficiency of 3000:1.

The S3 virtual filesystem is only available in Rasterio if you have a GDAL library version >= 2.1.0dev. The macosx wheels for Rasterio 0.34 on PyPI contain GDAL version 2.1.0dev and are probably the easiest way to try this new feature.

Share and enjoy!