Mock is Magic

I'm sprinting with my teammates with occasionally spotty internet. We're developing a module that takes some directory names, archives the directories, uploads the archive to S3, and then cleans up temporary files. Testing this by actually posting data to S3 is slow, leaves debris, and is almost pointless: we're using boto3 and boto3 is solid. We've only ever found one new boto bug at Mapbox, and that involved very large streaming uploads. My teammates and I only need to test that we're making a proper archive, using the boto3 API properly, and cleaning up afterwards. Whether or not data lands on S3 isn't important for these tests. Python's mock module is one of many Python tools for faking components during testing. If you're not already using it to create fake boto components (and AWS services), this post will get you started down the right path.

Here's the function to test, with the code I actually want to be testing glossed over. This post is about boxing out the things you don't want to test. The boto3 module is the thing we avoid testing by using mock objects.

import boto3

def archive_and_upload(*args, **kwargs):
    """Archive data and upload to S3"""
    # A bunch of code makes a zip file in a `tmp` directory.

    boto3.resource('s3').Object('mybucket', 'mykey').upload_file(
        os.path.join(tmp, zip_file))

    # A bunch of code now cleans up temporary resources.

Now, in the test function that we're discovering and running with pytest, we create a fake boto3 API using mock.patch.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload(*args, **kwargs)

While the test runs, boto3 in the module is replaced by an instance of unittest.mock.MagicMock. We're also able to bind the same mock object to boto3 for inspection within the test function by passing that as an argument. These mock objects have almost incredible properties. Substitute one for the boto3 module and you get a fairly complete API, in the sense that all the methods and properties seem to be there.

>>> from unittest.mock import MagicMock
>>> boto3 = MagicMock()
>>> boto3.resource('s3')
<MagicMock name='mock.resource()' id='4327834232'>
>>> boto3.resource('s3').Object('mybucket', 'mykey')
<MagicMock name='mock.resource().Object()' id='4327879960'>

It does almost nothing, of course, but that's fine for these tests. One thing that the mock objects do to help with testing is record how they are accessed or called. We can assert that certain calls were made with certain arguments.

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload(*args, **kwargs)

    boto3.resource().Object().upload_file.assert_called_with('/tmp/test.zip')

Asserting that the mock file uploader was called with the correct argument is, in this case, preferable to posting data to S3. It's fast. It leaves no artifacts to remove.

Look out for bugs like the one in the test below. Can you find it?

from unittest.mock import patch

@patch('mymodule.boto3')
def test_archive_and_upload(boto3):
    """Data is archived, uploaded, and the floor is swept"""
    archive_and_upload(*args, **kwargs)

    boto3.resource().Object().upload_file().assert_called_with('/tmp/test.zip')

Because all mock methods and properties yield more mocks, it can be hard to figure out why boto3.resource().Object().upload_file() is never called with the expected arguments, even when you're certain the arguments are right. Extra, and bogus, parentheses after upload_file cost me 15 minutes of head scratching earlier this morning.

Minutemen Live at Brett's Party

I can't get over the amazing video time capsule that is Minutemen playing in a backyard in Rancho Palo Verde in June, 1985. From the uploader:

It was 1985. Brett & my bdays are in June. We had both just graduated from college. We had a birthday/graduation party at his mom's house on the outskirts of San Pedro. The Minutemen played. It was a great day.

Many of the 6258 views of this video are mine.

Mercantile 0.11.0

As I mentioned the other day, my team is in Fort Collins this week for a sprint. Damon Burgett and I were sitting together working yesterday and he says, "I keep looking for the inverse of the xy function in mercantile, but it's never there." It's true, the module is missing a function to convert web mercator x and y to longitude and latitude. He wrote it, and some tests that numbers round trip properly through xy and lnglat, then made a pull request. I merged it, tagged it, pushed to GitHub. A minute later Travis-CI had uploaded mercantile 0.11.0 to the Python package index and we were pulling it back into our sprint work through an updated pip requirements file. I love how frictionless Python development and packaging can be now.

I've got limited time for Rasterio and GDAL issues this week. I can catch up a bit in the evenings, but I must prioritize getting extra rest before Saturday's race. Apologies if I don't respond until Monday.

One More Week of Running

Next week is the 18th and final week of my training for the Blue Sky Marathon, an all-dirt and 90% singletrack in the foothills west of Fort Collins. The race is on Saturday. I'm aiming to finish in less than 5:30, a 12 minute-per-mile pace.

I've run more miles in training than I have for my previous marathon: over 500 by Saturday morning. I've run over 20 miles three times and a little less than 20 on the race course once. I've run more hills than I did in training for the Trail Quillan in March. Barring an accident, I'm going to finish. With luck, I may finish with a respectable time.

The weather forecast for this week is beautiful: sunny, dry, and mild. The latest forecast discussion from the Denver/Boulder NWS office has a change coming for Saturday: a high of about 60 and a chance of showers. I'd rather have dry and 70, but am relieved that the trails will be dry and snow gear won't be needed.

This is the one year anniversary of my first race in France: the Trail des Calades. The 4th edition of the race was run earlier today. I saw photos online and it looked like a great day in Saint-Jean-de-Cuculles.

Because of my participation in the Blue Sky Marathon, I won't be at the State of the Map in Boulder on Saturday like much of Mapbox and many the other mapping folks in Fort Collins. I will, however, get to see my own Mapbox team: they're coming here to work with Matt Perry and I all week. It's the first time we've come together as a team outside of DC or San Francisco, and my first chance to hang in person with Vincent Sarago! I heard some French on the A trail today while I was doing my last longish training run and did a double take, but it wasn't Vincent.

Geodata in the Cloud

This past week there was a flurry of blog posts about deploying and accessing geospatial data in "the cloud." Yes, I'm still putting scare quotes around "the cloud" in 2017.

I wrote a post specifically about Rasterio and datasets on S3 last December. Shortly before that, Chris Henrick wrote a great post about preparing data to be deployed on S3 for use with GDAL and Rasterio.

As far as I can tell, the birth announcement for geodata in the cloud came 7 years ago in "VSI Curl Support" by Christopher Schmidt.

It's remarkable that the authors of HTTP/1.1 foresaw this kind of application in 1999: https://tools.ietf.org/html/rfc2616#section-14.16.

Current status

I've been neglecting my open source projects this summer due to time constraints. At Mapbox we've been onboarding a new President and COO and a raft of new employees, plus a manager for my team at Mapbox. These are hugely positive developments, but have also been a big lift. Moving back to Colorado and bringing our lives here out of suspension has similarly taken all of my personal time. I've been the blocker for new releases of Fiona, Rasterio, and Shapely all summer long and have been feeling rather guilty about it.

Things are looking up now. My kids are back in school and have seen their doctor and dentist. Their schedule of soccer practices and other activities for the season is getting settled quickly. I'm resuming weekly yoga and gym workouts along with my existing running schedule. I like having a weekly routine; it helps me stay relaxed and gives me time for personal projects like writing and computering.

Open source continues to be a big part of my job and as my share of the onboarding lift eases I've been able to increase my time on Fiona, Rasterio, and Shapely. I released the long overdue Shapely 1.6.0 and have supported the GeoPandas team on getting 0.3.0 out. Fiona 1.7.9 was the first bug fix release of that project since June and I'm happy to have that in user hands. This week I'm working on coding and writing about the upcoming Rasterio release. I feel like I'm doing a good job as an open source maintainer and mentor again and am excited about what I'll be able to do in the next few months.

None of these projects would be viable without the help of other developers. To my open source collaborators: thanks for hanging in there and being patient with me!

Blue Sky Marathon Recon

Last Sunday I ran the first half of the Blue Sky Marathon route to refamiliarize myself with the trails, gauge my fitness, and try out some new gear.

https://c1.staticflickr.com/5/4375/36359335070_71d42ccd7a_b.jpg

Towers Trail

I don't yet feel completely acclimated to living at 5000 feet. The first mile of Towers Trail (cutting across the image above) at Horsetooth Mountain Park has an average grade of 12% and I had to fast walk much of it to keep my heart rate down.

https://c1.staticflickr.com/5/4344/36359338080_9a8524a798_b.jpg

Horsetooth Rock from Carey Springs Trail

At the top the trail is more gentle and rolling and I had no difficulties. The Tower Trail is a 4WD road, but 90% the race route is singletrack. Rose-colored granite rock and dirt on the heights and brick red sandstone rock and dirt below.

I enjoyed trading my large Camelback pack for a smaller and lighter vest. The marathon has 7 stations and I'm certain that I can get along with less than a liter of water between them. My new shorts with builtin boxer briefs are perfectly comfortable. I tried on a pair of Salomon Speedcross 4 shoes at the store to see what the hype was about but didn't buy them. I liked the fit, but I don't think I'm going to be running any local races that are steep enough or sloppy enough to warrant that kind of traction. They'd be great for French style trails, no doubt about it. My NB Leadville shoes are a good match for the relatively tame Blue Sky and Horsetooth trails.

The drive to the trailhead reminded me of some similarities between Fort Collins and Montpellier. Each city is between plain and mountains. Fort Collins is located where The Great Plain of North America meets the Rocky Mountains and Montpellier is between the Languedoc coastal plain and the Massif Central. Each city has its own local peak: Horsetooth Rock above in Fort Collins and the Pic Saint-Loup at Montpellier. I'd love to host folks from Montpellier here and ask if they have any of the same impressions.

Shapely 1.6.0

On behalf of the Shapely project, I'm pleased to announce a new minor release.

Shapely 1.6.0 adds several new attributes to existing geometry classes and new split() and polylabel() functions to the shapely.ops module. Exceptions have been consolidated in a shapely.errors module and logging practices have been improved. Shapely's optional features depending on Numpy are now gathered into a requirements set named "vectorized" and these may be installed by running pip install shapely[vectorized].

Much of the work on 1.6.0 was aimed to improve the project's build and packaging scripts and to minimize runtime requirements. Shapely now vendorizes packaging to use during builds only and never attempts to invoke the geos-config utility during import of the module.

Another big change for the project is that the documentation and manual are now hosted at Read the Docs: https://shapely.readthedocs.io/en/latest/.

Thank you all for using, promoting, and contributing (48 of us now!) to the Shapely project. The full change log can be found here.

Share and enjoy.

There and Back Again

My kids and I left our rental house in Montpellier for the last time at 5:00 a.m Tuesday morning and arrived in Fort Collins, Colorado, on Tuesday evening, a little over 21 hours later. We found our house and garden in great condition and found that friends had kindly done a little shopping for us. Ruth and our dog were scheduled to come in on Wednesday, but heat and other snafus delayed them until Saturday. She left for Seattle today and I'm solo parenting again all this week before I go to San Francisco for work next week. It's a little chaotic here with work, camp, birthdays, dentist appointments, and other deferred business, but less so than in the week before our flight. Thanks in part to earlier-than-usual rising, I haven't fallen too far behind in my running and did 20 miles this weekend. I'm relieved to be here and am happy to see friends, run along the river, ride my bike, and go out for real tacos with my kids.