code.flickr.com

The Code Behind Geofences

We launched geofences earlier this week. I want to give you a glimpse into some of the code that runs the feature.

Geofences are defined by three things: a coordinate pair, a radius, and a privacy level. We store the coordinates (ie, decimal latitude/longitude) in our database as integers, the radius as an integer representing meters, and the privacy level as an integer that maps to a constant in our code.

When an image is geotagged, we have to determine which geofences the point falls within. We do some complex privacy calculations that Trevor talks about in his introductory post.

First, we fetch all of your geofences from the database and then loop through each one, determining if the photo falls in any geofences. We limit the number of geofences you can create to ten, so running a point through 10 calculations is not that expensive.

We use the great-circle distance formula to figure out how far the image is from the center of a geofence. If this distance is less than the radius of the geofence, the geofence applies.

MySQL is a Great Hammer

When you create a geofence and want to apply it to existing photos, the backfill is a more involved process. In order to grab just the photos we care about for a geofence, we have to limit the number of photos that we select due to performance.

As mentioned, we store the latitude and longitude as integers in MySQL. Since radial queries are next to impossible with this setup, we do a first pass with a bounding box formula that encompasses the geofence and then use our great-circle distance formula to cull the photos that don’t fall in the geofence.

The bounding box formula we use takes into consideration geographic gotchas, like the poles and the 180 degree discontinuity (ie, the International Date Line). Since a geofence could overlap the International Date Line, we have to modify the DB query in doing the longitude part of our query since doing a simple BETWEEN won’t work. When the bounding box doesn’t cross the IDL, we can do “SELECT id FROM Table WHERE longitude BETWEEN $lon1 and $lon2” and when it does we do a “SELECT id FROM Table WHERE longitude < $west_lon AND longitude > $east_lon”.

In short, we use a query that MySQL is a good at it to limit our initial dataset and then use a little post-processesing of the data to get the points we actually care about for the geofence.

After we have this bundle of photos that the geofence backfill applies to, we then run our normal geoprivacy calculations, lowering the privacy if the combination of fences calls for it.

Concurrency, or the world keeps turning

The introduction of the geofences backfill pane also introduced some concurrency issues, which were fun to deal with.

The contract that we wanted to create for the user was that the calculated privacy that they saw in the preview pane for the backfill would exactly match the result of running the backfill. This seemed intuitive but the implementation was tricky.

There are two things that can affect a photo’s geoprivacy: the user’s default privacy and the geofences that exist. These things can change through time, that is between the time that a user hits ‘apply’ on the preview pane and when the backfill finishes running.

Whenever you deal with mutable state through the lifetime of a process, you have to change how you treat this state. Who can modify it? What can they modify? And what processes see this updated object?

The easiest way to deal with mutable state is to make it immutable. We do this by storing the state of the user’s geoprivacy world (that is, the default geoprivacy and the geofences) alongside the backfill task, so that when the task runs it uses this state instead of querying the DB, which is mutable, possibly having changed since the user hit the ‘apply’ button.

Storing this state ensures that regardless of how the user modifies their geoprivacy settings, the result of the backfill will match exactly what they saw in the preview pane, providing a consistent view of the world.

We also added the restriction of only allowing one backfill task per user to be running at a time, which simplified our bookeeping, our mental model of the problem, and the user’s expecation of what happens.

Lessons Learned

Geo, by itself, is not easy. The math is complex (especially for someone like me a few years out of school). Privacy is even harder. Since the project evolved quite a bit while we were building it, having a large amount of automated testing around the privacy rules gave us the confidence that we could go forward with new privacy demands without introducing bad stuff into code that was “done”. We learned to give ourselves plenty of time to get all this stuff right, since geo has so many edge cases and violating privacy is an absolute no-no.

During the project, we had to keep balance between our users’ privacy and their expectations, all while keeping a complex and new feature understandable and even fun to play with. The only way we could do this was being open and honest between engineering, design and the product team about what we were building. This was a total team effort from Flickr, and we’re very proud of the end result and the control that it gives our users over their presence on the Internet.

Also, if you’re interested in working on fun projects like this one, we’re hiring!