5 questions for Matt Biddulph

Matt Biddulph

Matt Biddulph takes lovely photos. And apparently when he isn’t taking photos he find times to be the founder and CTO of Dopplr. Which makes us happy, because we’re big Dopplr fans. If Flickr is the site where photos become social objects, Dopplr is that site for travel. Except with personal informatics, and positionally dependent rainbows. (we have a well documented weakness for rainbows)

Dopplr is consistently one of the most interesting projects tying together the loosely joined datastreams into a cohesive whole, and Matt’s meditations on how to do that are not to be missed.

Dopplr personal informatics coffeecup

1. What are you currently building that integrates with Flickr, or a past favorite that you think is cool, neat, popular and worth telling folks about? Or both.

Matt: I’m the CTO at Dopplr, and we use Flickr all over our site to integrate our users’ travel pictures and illustrate places in our “Social Atlas”. We’re particularly happy with Flickr’s recent integration of Dopplr places into their automatic machine tag handling

This year we’ve been radically upgrading our city information pages (e.g. http://dplr.it/san-francisco) and Flickr photos have been an important part of that. Designer Matt Jones came up with a combination of rich information visualisation and beautiful images, combining sparklines with photos of cities that we source from Flickr’s geotagged Creative-Commons licensed contributions.

The Flickr API is key to how these pages are made. Although ‘interestingness’ is a powerful sorting metric, we found that just asking for the most interesting picture of a city didn’t always give us the right aesthetic (just as Dan Catt’s pandas occasionally show a preference for bikinis). Instead, our developer Tom Insam built a workflow tool that shows site admins a list of the top pictures per place. It displays a preview of a sample data visualisation overlay, and limits the pictures to those licensed for derivative commercial works.

An admin can quickly work through our most popular cities, choosing pictures and a few design settings (such as whether to use black or white text for best contrast). Once a month, a script runs through our chosen images, regenerating this month’s data viz and removing any photos whose license has changed since our last check. This happens more often than we expected.

2. What are the best tricks or tips you’ve learned working with the Flickr API?

Matt: To make our city page Flickr integration possible, we had to correlate our city database with Flickr’s geotagging data. Our data is sourced from GeoNames and augmented by us and our users. Luckily Flickr provides a handy reverse-geocoder that maps a latitude/longitude point to an identifier. As a bonus, Flickr now uses the same “WOE IDs” as Yahoo’s Geo APIs, meaning that this correlation between our IDs and theirs can be used to lookup other rich geo data elsewhere on our site. We also started supporting WOE IDs in our own API in return, and we were very happy when Flickr started linking to our city pages from their own place pages.

Another tip for integrating Flickr API calls into web pages is to consider whether you can achieve a feature with purely client-side code. Rather than have our webserver tied up waiting for an HTTP request to return from Flickr, in certain places we serve up jQuery code to perform Flickr queries from the clientside, inserting the results into the page via ajax.

3. As a Flickr developer what would you like to see Flickr do more of and why?

Matt: I’m in complete agreement with Kellan — that realtime APIs are going to be big. For example, I want the ability to register an interest in something (“the latest CC-licensed photos of London”) and get the data pushed to me whenever new results are ready. This might be implemented using Web Hooks, like when github makes an HTTP POST for you whenever you push into a repository. It might be over XMPP, as used by Fire Eagle, when services register for publish-subscribe notifications of new location pings. The choice of technology is less important than the new possibilities (and design challenges) of the realtime web.

4. What excites you about Flickr and hacking? What do you think you’ll build next or would like someone else to build so you don’t have to?

Matt: The best thing about Flickr for hacking is the sheer amount of data that the community collects. The recent release of shapefiles derived from tags and geotagging is very exciting and has huge possibilities for geo mashups. I’m looking forward to firing up a Maps From Scratch instance and figuring out how to combine it with data from our service. There are some screenshots of my early experiments using Flickr’s clustr tool on our Social Atlas data at http://www.flickr.com/photos/mbiddulph/tags/clustr/

5. Besides your own, what Flickr projects and hacks do you use on a regular basis? Who should we interview next?

Matt: I nominate Dave Beckett, author of the C library and commandline tool Flickcurl because I use the commandline tool as a quick interface to Flickr data all the time. As a devotee of dynamic languages, I’m interested to hear about what it’s like to program the web in C.

horse=yes

a map Jason Bourne could eat cake on

Yes, I got a bit emotional at the third OpenStreetMap conference, held in the CCC, Amsterdam last weekend — mainly because this globe we are on is the only one we know — we really are mapping our universe, doing it our way. Creating the world we want to live in. I thought it worth while to say “Thanks” to some people. Being British, the feeling of being a bit foolish stopped me from being too effusive!

Tim Waters

A couple weeks ago I had the pleasure of attending, and the privilege of speaking at, the State of the Map conference in Amsterdam. I told the story of how we came to use Open Street Maps (OSM), how it works on the backend and talked a little bit about what we’d like to do next: Moving beyond “bags of tiles”, a better way to keep up to date with changes to the OSM database and, for good measure, a little bit of tree-hugging at the end.

Most of all, though, I wanted to take the opportunity to thank the OSM community. To thank them for making Flickr, the thing that we care about and work on all day, better. To thank them for proving the nay-sayers wrong.

To say that OSM started with an audacious plan (to map the entire world by “hand” one neighbourhood and one person at a time) would be an understatement. You would have been forgiven, at the time, for laughing.

And yet, in a few short years they are well on their way having nurtured both a community of users and an infrastructure of tools that makes it hard to ever imagine a world without Open Street Maps. In the U.K. alone, as Muki Haklay demonstrated, they have produced a free and open dataset whose coverage and fidelity rivals those created by the Ordinance Survey with its government funding and 250-year head start.

That is really exciting both because of the opportunities that such a rich and comprehensive dataset provide but also because it proves what is possible. The Internets are still a pretty great place that way.

mugs

photo by russelldavies

There were too many excellent talks to list them all, but here’s a short (ish) list that betrays some of my interests and biases:

  • Harry Wood’s talk on tagging in OSM. I actually missed this talk and after seeing the slides I am doubly disappointed. Open Street Map is not just the raw geographic data that people collect but also all the metadata that is used to describe it. OSM uses a simple tagging system for recording “map features” and Harry’s talk on managing the chaos, navigating the disputes and juggling the possibilities looked like it was really interesting.

    (The title of this post is, in fact, a gentle poke at the black sheep of the OSM tagging world. There really are map features tagged “horse=yes” which is mostly hilarious until you remember how much has been accomplished with a framework that allows for tags like that.)

  • The Sunday afternoon maps-and-history love-fest that included Frankie Roberto’s “Mapping History on Open Street Map“, Tim Water’s “Open Historical Maps” and the Dutch Nationaal Archief’s presenting “MapIt 1418“, a project to allow users to add suggested locations for their photos in the Flickr Commons!

    Tim’s been doing work for The New York Public Library, another Flickr Commons member, and MapWarper (the code that powers the NYPL’s historical map rectifier) is an open source project and available on GitHub.

  • Mikel Maron’s “Free and Open Palestine” (the slides are here but you should really watch the video) which is an amazing story of collecting map data in the West Bank and Gaza.

    Mikel was also instrumental in creating a scholarship program to pay the travel and lodging expenses for 15 members from the OSM community, from all over the world, to attend the conference. Because he’s kind of awesome, that way.

But that’s just me. I’d encourage you to spend some time poking around all the other presentations that are available online:

Despite the “bag of tiles” approach for using OSM on Flickr getting a bit old it still works so as of right here, right now:

We’ve also refreshed the tiles for Beijing and Tehran where, I’m told, the OSM community has added twice as much data since we first started showing (OSM) maps a month ago!

If it sometimes seems like we’re doing all of this in a bit of an ad hoc fashion that’s because we (mostly) are. How and when and where are all details we need to work out going forward but, in the meantime, we have map tiles where there were none before so it can’t be all bad.

Finally, because the actual decision to attend the conference was so last minute I did not get the memo to all presenters to include a funny picture of SteveC (one of the original founders of Open Street Maps) in their slides.

To make up for that omission, I leave you now with the one-and-only Steve Coast.

Steve *loves* Yahoo

photo by Andy Hume

Matthias Gansrigler: On flickery, and making your API apps fly

We meant to take a short break in our developer interview series. Well we ended up taking a very long short break. Sorry about that. Thankfully we’re coming back with a really great interview from the developer behind flickery, a full featured, and fast Flickr desktop client. And he is talking about super important topics: optimization, performance and caching.

Part detective story, part day-in-the-life-of-a-working developer, Matthias’ nuts and bolts discussion of how to take full advantage of the Flickr API without slowing down your app is a must read.

Hello there.

My name is Matthias Gansrigler, I’m the founder of Eternal Storms Software, responsible for donationware like GimmeSomeTune or PresentYourApps for Mac.

My newest creation and the reason I’m writing here, however – and first shareware application, might I add – is flickery, a Flickr desktop client for Mac OS X 10.5 Leopard!

With flickery you can easily upload your photos to Flickr, manage your sets and favorites, view your contacts’ photos, search for photos in Flickr’s data, comment, view the most “interesting” pictures and much more. All in one tiny, yet powerful application.

I was kindly invited by the great guys behind Flickr to write about my experiences optimizing calls to the Flickr API.

But first, a little background story:

I was born in 1986 in… well, uhm, that’s maybe a little too far back. Let’s start in 2008. That’s more like it.

In February 2008, I began flirting with the idea of writing a full-featured desktop client for Flickr for Mac. But really, I just wanted to toy around with the Flickr API since I’d seen so many web-applications making use of it and had heard good things of the API in general.

“like a kid in a toys-store”

Soon thereafter, I had tremendous results in the shortest amount of time. I felt like a kid in a toys-store – completely overwhelmed (and way in over my head, as it later turned out). I did not worry very much about how many calls I made to the API. My main interest was in getting a product ready. So it took me about three months (until mid-May 2008) to ship my first public beta. I had implemented lots of things, like commenting on photos, adding photos to favorites, different views of photos, etc. It was a great first public beta, feature-wise. But under the hood, it wasn’t that great. It was sluggish, slow, leaked tons of memory and made an awful lot of calls to the Flickr API. And when I say an awful lot, I mean a hell of a lot. Flickr’s servers were smoking (well, I doubt that, because I think then I would have heard from their lawyers).

“30 QPS”

With just about 1,000 registered users, flickery made more than 30 QPS (queries per second) to the Flickr API. 30 QPS. Can you imagine? How come, you might ask. Just try it, it’s easy (no, seriously, don’t!). I made calls for everything. I even made calls in advance. But let me elaborate.

Take flickery’s main screen. You can see 30 items in the thumbnail view. For every item in that list, I made a getInfo() and getSizes() call for later use (should the user want to view the photo bigger or view it’s description). Additionally, whenever a user selected one of the items, flickery would call out to the API again asking if it was allowed to download that photo. So that’s 1 call for the 30 photos, 30 calls for getInfo and 30 calls for getSizes. That alone is 61 calls for basically ONE user-interaction (clicking on Newest). That’s not right. Should a user go through the entire list and select every item, that’s 30 more calls (totalling at 91). That’s even not righter.

If you want to get your API key shut down (which is what happened to that first public beta of flickery) – that’s absolutely the way to do it. No doubt about it.

Obviously, the really hard work started here – retaining the same feature-set, the same interface and the same comfort, yet making far, far less calls to the API. What also starts here is the really technical gibberish. So get out your nerd-english dictionaries and let’s get to it.

“really hard work”

I basically had all the features in there for a 1.0 release. But I couldn’t release it until it made a proper amount of calls to the API. So where I started, obviously, was at removing the “in-advance” calls to the API, like getInfo and getSizes. They just don’t get called in advance anymore, only on demand. So if a user selects a photo and clicks on “More info”, then the getInfo call is made. No sooner. I chose a different approach for getSizes, which I eliminated entirely (except when downloading a photo). The photo gets loaded in thumbnail-size (which is always there). A double click on the photo loads the medium sized image, which is always there, as well. When you want to view the photo in fullscreen mode, there’s two possibilities. If the call to the API returned a o_dims attribute, we load the original size (since that’s returned in the one call to the API retrieving all the 30 photos). Should there be no such attribute, I just load the medium size and am done with it. So for viewing photos, whatever size, there’s no extra calls made. So from 61 calls, we trimmed that down to 1 call to retrieve the 30 photos. Also, when an item is selected, there is no call made anymore. Only when the user selects “Download”, flickery asks if it is allowed to download that photo and if so, makes a getSizes call to download the largest possible version.

“retrieving the photolist”

Now, for retrieving the photolist. In the first public beta, I made one call for every 30 images I wanted to retrieve. That’s kind of logical. However, that would mean that we would have to make 16 calls to view 480 items. We can trim that down. Flickr allows you to retrieve 500 items at once with each call. So that’s basically what I did. I wrote my own system that internally loaded 480 items, but only returned the 30 items of the page that were requested in the interface. So for viewing 480 photos, we now only need one call, which is like one 16th of the calls we made before. It has a little downside, though – the initial call takes a little longer for results to come back (since it is fetching more items at once from Flickr’s servers). But on the other hand, once those 480 items are retrieved, subsequent paging through those 16 pages is instant.

Adding photos to favorites, sets and groups was the next step. Let’s stay at favorites, because basically, it’s the same procedure for all three of them. In the first public beta, a user could add one photo to their favorites over and over again, and flickery would dutifully send that call to the API. Of course, completely unnecessarily. So I just cached what was added to the favorites and just checked if we already added the photo to the favorites, which, as it turned out, saved lots of highly unnecessary calls.

I also took the liberty to implement some of Flickr’s rules right into flickery, meaning the following: If you’re not the admin of a group, you can only remove your own photos from it. If you’re the admin, you can remove all of them. So instead of flickery making a call to the API when the user wants to remove a photo and they’re not the admin, the application itself tells them they can’t do that. A lot of calls are saved this way.

“caching”

Another big step in making less calls to the API was caching, naturally. In the first public beta, nothing except the frob for talking to the API was saved over application launches. So every time the user launched the application, the authorization token, the Newest photos, the user’s sets and contacts were loaded. All that has changed and they are now saved over restarts for different time intervals. Newest photos for one hour, sets and contacts forever (albeit, the user can reload photosets groups and contacts (once an hour) should they ever be out of sync). The same goes for the contents of photosets, groups, favorites and own photos. To remove some other calls, if there’s a cache already in place for, say, a photoset, I don’t delete that cache should a new photo be added to it, but update it. So, the call is made to the API to add the photo to the set, obviously, but to then view the set (with the newly added photo in it), I don’t reload the photoset but just update the existing cache. Another example here is comments. When adding your own comment to a photo, flickery doesn’t reload the entire comment-list, but just adds the new comment to the already existing cache.

“extras”

Specifying options in the “extras” parameter can save you lots of calls as well. (e.d. see our post on the standard photos response for more on extras) Instead of having to make another call to retrieve certain information about an item, you can have all kinds of information returned in the initial response. Say, you want the owner’s name of items. Instead of having to call flickr.photos.getInfo on every item, you just specify the owner_name option in the “extras” parameter and immediately have the owner’s name for every item of your query. This is really handy and saves lots of time for the users and lots of calls for you.

As for the Flickr API – it’s really great and easy to use. However, there are some smaller things I’d love to see:
Being able to specify more than 1 photoid for adding to / removing from favorites, photosets and groups – resulting in far less calls to the API, the upload response returning not only the photoid for the uploaded image, but also the farm and server, official support for collections, adding / removing contacts, support for retrieving videos, maybe even some calls for flickrMail. Some more “extras” options would be great, too, like can_download, for example, which is currently only retrievable through flickr.photos.getInfo.

Other than that, the Flickr API let’s you really do everything you could imagine. There’s almost no restriction to what you can do with it!

Closing, I’d just like to thank the wizards at Flickr for the opportunity to write about optimizing here and working so closely with me on flickery. Here’s to more great applications using the Flickr API!

extra:extra=extra

Arm Horns (The Hair Web)

This is Eric. We loves him!

Internally, the nomenclature for tags goes something like this: There are “raw” tags (the actual tag you enter on a photo), “clean” tags (the tag that you see in a URL), “machine tags” (things like upcoming:event=2413636) and machine tag “extras”.

Machine tag “extras” are what we call the entire process of using a machine tag as a kind of foreign key to access data stored on another website. Small pieces (of data) loosely joined (by the Internets).

For example if you tagged a photo with upcoming:event=2413636 that would cause a robot squirrel on the Flickr servers to call the robot squirrels running the Upcoming API and ask for the name of the Upcoming event with ID 2413636.

Upcoming then answers back and says: That event was called “Flickr Turns 5.25″ and we store the title in our database. The next time you load that photo we’ll show a little Upcoming icon and the name of the event in the sidebar.

To date, we’ve only had machine tags “extras” available for upcoming:event= and lastfm:event= tags but starting today we’re adding support for three new projects: Dopplr, Open Plaques and the Open Library.

dopplr:(eat|stay|explore)=

Dopplr is a social travel site which recently launched a Social Atlas to allow their users to create and share lists of interesting places, in the cities they know about, of where to eat and stay and poke around during their visit.

“Over time, we can anonymise and aggregate all the recommendations that have been added to Dopplr. This is the Social Atlas itself, something that’s greater than the sum of its parts: a kind of world map representing the combined wisdom of smart travellers. It’s early days still, but we are very excited by its potential.”

Which is pretty exciting, especially when you think about how many pictures of delicious food people upload to Flickr!

Dopplr/Flickr machine-tagging

photo by moleitau

You can add Social Atlas machine tags to your photos by tagging them with either "dopplr:eat=", "dopplr:stay=" or "dopplr:explore=" followed by the short-code for that place.

For example, dopplr:eat=tp71.

Dopplr's closed the loop: Machine-tagged flickr pix on their 'Social Atlas'

photo by moleitau

As an added bonus every single page in the Dopplr Social Atlas displays the complete machine tag you need to tag your photos with so you can just copy and paste the tag from one page into the other and your photos will be updated like magic!

openplaques:id=

The Open Plaques website is a community-run website set up to catalogue and document the many blue plaques that are hung across the UK to commemorate persons and famous events.

Frankie Roberto, one of the people behind the project has written often about the project, and the motivations behind it so rather than try to paraphrase I will just quote him (at length):

“With these in mind, I was thinking how this kind of ‘mobile learning’ might apply to the heritage sector, and as you might have guessed from the title, thought of blue plaques. You see them everywhere — especially when sat on the top deck of a double decker bus in London — and yet the plaques themselves never seem that revealing. You’ve often never heard of the person named, or perhaps only vaguely, and the only clue you’re given is something like “scientist and electrical engineer” (Sir Ambrose Fleming) or “landscape gardener” (Charles Bridgeman). I always want to know more. Who are these people, what’s the story about them, and why are they considered important enough for their home to be commemorated?”

Getting information about blue plaques on your mobile phone…

Picture 7

“The final step towards making this more compelling was to add some photographs. Here, Flickr came to our rescue. There was already a ‘blue plaques’ group, which contained hundreds of photos. To link them together, I used special tags called ‘machine tags’, which are like normal tags except that they contain some slightly more structured data. It’s very simple though — each plaque on the Open Plaques website has an ID number (which can be found at the end of the URL), and the corresponding machine tag for that plaque is openplaques:id=999 (where 999 is the ID number). Another script then uses the Flickr API to find all the photos tagged with a relevant machine tag, checks to see if they are Creative Commons licenced, and then to displays them on the Open Plaques website, with a credit and a link back to the Flickr photo page.”

Open Plaques project update

So, we did the same! If you have an openplaques:id= machine tag on your photo then we’ll try to look up and display the inscription for that plaque.

You can add Open Plaques machine tags to your photos by tagging them with "openplaques:id=" followed by the numeric ID for a specific plaque.

For example, openplaques:id=1633.

openlibrary:id=

The Open Library is a part of the Internet Archive whose mission is to create a “web page for every book ever published.” To do that they’re hoping that anyone and everyone will participate and help by adding information they have a published work or a particular edition.

“After almost fifty years of computerizing everything, we’re realising now that the stories have gone, and we need them back — the handicraft, the boutique, the beauty, the dragons, the colour of stories. I’m reminded of the gorgeous mysterious early maps of the Australian coast. The explorer only got so far, and the cartographer could only draw so much. Much more exciting than boring old satellite, top-of-a-pin’s-head accuracy! I love the idea of trying to catch some of these dog-eared tales within Open Library.”

George Oates

As it happens Flickr users have created over 900 groups about book covers and a casual search for the phrase (“book covers”) returns 98, 000 photos!

Back in July of 2007 Johnson Cameraface uploaded a photo of the cover of ROBOTS Spaceships & Other Tin Toys”. Two years later, George asked if it would be alright to use the photo to update the Open Library record for the book, and added an openlibrary machine tag along the way.

Now, starting today, the photo page now displays the title of the book and links back to the Open Library!

ol

This makes me happy.

You can add Open Library machine tags to your photos by tagging them with "openlibrary:id=" followed by the unique identifier for that book.

For example, openlibrary:id=OL5853184M.

It’s worth noting that the unique identifiers for Open Library books are sometimes a bit of a treasure hunt; they are the letters and numbers that come after openlibrary.org/b/ and before the book title in the URL for that book. Like this:

http://openlibrary.org/b/OL5853184M/Soviet-science-fiction

But wait! There’s more!!

Approaching zero

Did I mention that we have over one million photos tagged with Last.fm event machine tags? That makes it kind of hard to know when new machine tags have been added because lopping over all those tags just to find recent ones is expensive and time-consuming.

To help address this problem we’ve add a shiney new API method called:

flickr.machinetags.getRecentValues

This does pretty much what it sounds like. Given a namespace or a predicate (or both) and a Unix timestamp, the method returns the values for those machine tags that have been added since the date specified.

Enjoy!