Introducing yakbak: Record and playback HTTP interactions in NodeJS

Did you know that the new Front End of www.flickr.com is one big Flickr API client? Writing a client for an existing API or service can be a lot of fun, but decoupling and testing that client can be quite tricky. There are many different approaches to taking the backing service out of the equation when it comes to writing tests for client code. Today we’ll discuss the pros and cons of some of these approaches, describe how the Flickr Front End team tests service-dependent libraries, and introduce you to our new NodeJS HTTP playback module: yakbak!

Scenario: Testing a Flickr API Client

Let’s jump into some code, shall we? Suppose we’re testing a (very, very simple) photo search API client:

https://gist.github.com/jeremyruppel/fd25c723a5962a49936f174d765aa11a

Currently, this code will make an HTTP request to the Flickr API on every test run. This is less than desirable for several reasons:

  • UGC is unpredictable. In this test, we’re asserting that the response code is an HTTP 200, but obviously our client code needs to provide the response data to be useful. It’s impossible to write a meaningful and predictable test against live content.
  • Traffic is unpredictable. This photos search API call usually takes ~150ms for simple queries, but a more complex query or a call during peak traffic may take longer.
  • Downtime is unpredictable. Every service has downtime (the term is “four nines,” not “one hundred percent” for a reason), and if your service is down, your client tests will fail.
  • Networks are unpredictable. Have you ever tried coding on a plane? Enough said.

We want our test suite to be consistent, predictable, and fast. We’re also only trying to test our client code, not the API. Let’s take a look at some ways to replace the API with a control, allowing us to predictably test the client code.

Approach 1: Stub the HTTP client methods

We’re using superagent as our HTTP client, so we could use a mocking library like sinon to stub out superagent’s Request methods:

https://gist.github.com/jeremyruppel/8b837f439663db325aaa2437a2259934

With these changes, we never actually make an HTTP request to the API during a test run. Now our test is predictable, controlled, and it runs crazy fast. However, this approach has some major drawbacks:

  • Tightly coupled with superagent. We’re all up in the client’s implementation details here, so if superagent ever changes their API, we’ll need to correct our tests to match. Likewise, if we ever want to use a different HTTP client, we’ll need to correct our tests as well.
  • Difficult to specify the full HTTP response. Here we’re only specifying the statusCode; what about when we need to specify the body or the headers? Talk about verbose.
  • Not necessarily accurate. We’re trusting the test author to provide a fake response that matches what the actual server would send back. What happens if the API changes the response schema? Some unhappy developer will have to manually update the tests to match reality (probably an intern, let’s be honest).

We’ve at least managed to replace the service with a control in our tests, but we can do (slightly) better.

Approach 2: Mock the NodeJS HTTP module

Every NodeJS HTTP client will eventually delegate to the standard NodeJS http module to perform the network request. This means we can intercept the request at a low level by using a tool like nock:

https://gist.github.com/jeremyruppel/d92a62400f635b42249adc041cdecc96

Great! We’re no longer stubbing out superagent and we can still control the HTTP response. This avoids the HTTP client coupling from the previous step, but still has many similar drawbacks:

  • We’re still completely implementation-dependent. If we want to pass a new query string parameter to our service, for example, we’ll also need to add it to the test so that nock will match the request.
  • It’s still laborious to specify the response headers, body, etc.
  • It’s still difficult to make sure the response body always matches reality.

At this point, it’s worth noting that none of these bullet points were an issue back when we were actually making the HTTP request. So, let’s do exactly that (once!).

Approach 3: Record and playback the HTTP interaction

The Ruby community created the excellent VCR gem for recording and replaying HTTP interactions during tests. Recorded HTTP requests exist as “tapes”, which are just files with some sort of format describing the interaction. The basic workflow goes like this:

  1. The client makes an actual HTTP request.
  2. VCR sits in front of the system’s HTTP library and intercepts the request.
  3. If VCR has a tape matching the request, it simply replays the response to the client.
  4. Otherwise, VCR lets the HTTP request through to the service, records the interaction to a new tape on disk and plays it back to the client.

Introducing yakbak

Today we’re open-sourcing yakbak, our take on recording and playing back HTTP interactions in NodeJS. Here’s what our tests look like with a yakbak proxy:

https://gist.github.com/jeremyruppel/7050b34342a10d8e3dd8bc2dba0d50c0

Here we’ve created a standard NodeJS http.Server with our proxy middleware. We’ve also configured our client to point to the proxy server instead of the origin service. Look, no implementation details!

yakbak tries to do things The Node Way™ wherever possible. For example, each yakbak “tape” is actually its own module that simply exports an http.Server handler, which allows us to do some really cool things. For example, it’s trivial to create a server that always responds a certain way. Since the tape’s hash is based solely on the incoming request, we can easily edit the response however we like. We’re also kicking around a handful of enhancements that should make yakbak an even more powerful development tool.

Thanks to yakbak, we’ve been writing fast, consistent, and reliable tests for our HTTP clients and applications. Want to give it a spin? Check it out today: https://github.com/flickr/yakbak

P.S. We’re hiring!

Do you love development tooling and helping keep teams on the latest and greatest technology? Or maybe you just want to help build the best home for your photos on the entire internet? We’re hiring Front End Ops and tons of other great positions. We’d love to hear from you!

The Data Freshener

 

Change

You may have noticed some changes in Flickr a couple months back. Like, half the site changed. 95% even, by some metrics. Some say CHANGE IT BACK! while others welcome change. Whatever your thoughts, the changes are here, and they mean things. For example, they mean new visual design and better usability. They mean a faster site. Unfortunately, up until recently, they also meant more stale data. Yuck.

Change
Change
 

Why? What? Well…here’s the deal. We have a new-ish frontend stack we’ve been using for the past couple years now. It’s an isomorphic single-page application, runs on node.js, and is generally awesome. We call it Reboot.

hi there / i am the computer
Reboot
 

In the World of Reboot, we treat data with kid gloves. We <3 data. We never want to give it up, never want to let it down. Once we pull data from our APIs, we store the fetched data in your browser so that we don’t have to fetch it again the next time it’s needed. This means faster page loads and faster navigation, and less API traffic (and thus a more stable and scalable API). The data cached in your browser exists as long as the current Reboot session — until you refresh or leave Reboot for a non-Rebooted page.

However, this also meant that data could become stale. You change the date taken of your photo, someone else adds a comment, you navigate to a page with cached data…and you don’t see the changes. Wat? Yeah. So, this was not a huge problem until we moved lots of pages onto Reboot in the beginning of May. From that point forward, most Flickr user sessions have spent their entirety on Reboot, feeding off the same stale loaves of cached data.

The thinking (design / prototypes)

We considered a number of possibilities for freshening up data during a user session. A brief history of the strategies we sampled, and their results:

1. Refresh on update

Ice Tea
 

The first stab focused on updating data locally after it was changed by the user. Most of our simpler use cases already updated as expected, but some trickier cases with indirect relationships did not. For example, changing the date taken of a photo updated the data model for the photo, but deleting a photo did not necessarily ensure the photo was removed from all the cached albums, groups, and galleries to which it belonged. (Note that the photo was removed correctly from the backend, just not from the cached representation of those entities on the client.)

Cleaning up these relationships using change events between models helped, but didn’t solve all our problems. When someone outside of the local session (read: another user) changed data, it would not reflect in the current session. The only way to catch changes from outside the current session was to be more aggressive about evicting models.

2. Nuclear option

The pendulum swung all the way in the other direction — instead of surgical removal of data models we knew to be out-of-date, what would happen if we removed all cached data on every navigation? This prototype was quick to build, and incredibly destructive. By doing this, all our cached data always remained as fresh as could be, but we essentially reverted to Web 1.0 — with the exception of the Reboot framework, everything was reloaded on every page.

Not surprisingly, this blew up API traffic (locally only! did not unleash that disaster at scale), and inflated page load times like a Jeff Koons sculpture. It did give us some baseline timing metrics we could point to as worst-case scenarios, however. The next step was to swing the pendulum back toward the middle — to a carefully-knitted solution that would preserve fast page loads and navigation, while ensuring the freshest data we could serve up.

3. Refetch on navigate

fetched
 

At this point, our challenge was to find a solution that would keep navigation fast, API traffic slim, and pick up all changes to session data, whether local or remote. We ended up with a solution we call “refetching”: evicting and requesting new data models as the model is needed by the application. But when?
We could refetch periodically or on a user action; we determined that the best time to trigger a refetch was on navigation — when the user navigates, cached models become eligible for refetching. Specifically, when the user navigates between sections of the site, refetching is triggered. This proved to be the happiest medium between speed and freshness.

A high-level outline of how the refetching strategy works:

  • The user loads a page; data are requested from the API, and models are cached. As new models are created, they’re marked as being fresh.
  • The user navigates to another site section (e.g. Photostream → Search); all freshness marks are removed from all models. They’re now all eligible for refetching.
  • As Reboot builds the new page, it requests data models from the cache. Since they no longer have their seal of freshness, they are refetched, and marked as fresh once retrieved and cached.

One important note — refetching is not triggered on browser back/forward navigation. Users expect near-immediate navigation, thanks to browser caching, when navigating to already-viewed content. Therefore, we refetch only when the user clicks a link to navigate to a new site section.

4. Miscellany

There were a couple other options we considered and rejected from the start, but they’re worth mentioning here.

One was a TTL (time-to-live) algorithm, commonly used in caching applications. TTL algorithms expire data and evict from the cache a certain amount of time after they’re written or last updated. The arbitrary nature of TTL would mean that users would sometimes have fresh data and sometimes stale; it would be fresh more often than without any solution, but freshness would vary arbitrarily and would not result in much of an improvement on user experience.

The other was to write an algorithm that tracks the amount of time since a data model was last accessed, and refetch when it grows too old. While this sounded interesting at first, it has the same flaw as a standard TTL algorithm — freshness becomes arbitrary. It’s also more complex to implement, and might end up not being worth the complexity.

The doing (implementation)

So that was it! Refetch on navigate, all done. Right?….of course not. With the general strategy in place, the devil started sneaking around in all the details. Some of the highlights:

Exemptions

It proved to be not the best idea to evict on all navigation. For example, in Reboot we often preload photo metadata models on pages with lists of photos, in order to make navigation into the photo page snappy. The refetch setup therefore has an exemption config that allows us to easily retain models when navigating into, away from, or between specific site sections.

Child models

We often have parent-child associations between data models. For example, the data model for a photo has a reference to a data model for the author of the photo. When the photo model is refetched, the person model must be refetched as well. This means the function doing the eviction and refetching has to recurse through all child models.

Collections

An issue similar to child models above, but more complex, is the case of a model containing a list of other models. For example, the data model for a person’s photostream contains a list of photo models.

What made this particularly tricky is pagination and filtering — say you load the first 2 pages of your photostream, set your view filter to private, jump to page 5, switch the view to “Date taken”, and navigate away and back to your photostream…imagine the mess of different models with partially-loaded collections. Evicting one parent model, and its children, might evict photo models from the collection within another, without properly refetching. The solution here actually lay in the controller responsible for fetching pages: if a requested page of models is not already completely in-cache, a refetch will always happen to ensure we have all the data, in its freshest state.

Refetch only once per page view

Critical to the refetch-on-navigation strategy is to refetch only once per navigation. This was not too difficult, but essential to get right. We accomplish this by adding a flag when a model is initially fetched and upserted into the cache. When navigating to a new, non-exempt site section, all those flags are cleared, and any model requested by the new page will be refetched. When refetched, the model is again upserted into the cache and marked as fresh, until the next navigation.

But did it fresh?

Go on without me
 

With the thinking and the doing out of the way, it was time to push all this to production. Because these changes are essentially pulling the rug out from underneath the data layer on every navigation, we had to tread very carefully in order to prevent any negative impact to the end user experience.

We did very thorough manual and automated testing across all of Reboot. We left the feature turned on for staff users for a while, to be able to respond to any bug reports. Finally, the time came to test on Real People. There were three things we needed to keep an eye on: errors (of course), impact on page navigation timing, and API traffic. Since refetching implies more requests for data, we needed to be sure that we were keeping the user experience smooth and fast, and also that we weren’t blowing up our data centers.

In order to get a good read on these things, though, we had to go all in. Letting in just a small percentage of users would not give reliable numbers for timing or traffic impacts, due to the noise inherent in relatively small sample sizes. So, we did something unusual: we turned on refetching for all users for a short period of time. We flipped on refetching and kept an eagle eye on our stats for 2 hours, then reverted; then, we took a careful look at the aggregated data to see how the experiment went.

Surprisingly, the impact on both timing and traffic was relatively low. After some thought, we decided this is most likely because the changes disproportionately impact people on long sessions, say a Flickr tab open for hours or days. Most people don’t hang around that long; they come, they go. Also, the photo page represents north of 90% of our page views, and is exempt from refetching (see Exemptions above).

So where did we end up? A negligible bump in navigation timing and API traffic, and fresher data for all. Perhaps an anticlimactic resolution, but the story we’ve heard today outlines a serious consideration for anyone building an application with a data caching layer: keep in mind from the beginning how you plan to deal with stale data, but in a way that keeps all the other benefits of a single-page application.

#CCC is a breadcat
Busting through staleness. Yep.

Group APIs

With over 1.5 million groups, it’s no doubt that they are an important part of Flickr. Today, we’re releasing a few new ways to interact with groups using our API.

Group Membership

Cat meeting...

We are adding two new methods to manage group membership through the API.

flickr.groups.join to join a group. Before calling this method, check if the group has rules using flickr.groups.getInfo. The user needs to agree to the rules before being able to join the group. Pass the accept_rules argument if the user accepted the rules.

flickr.groups.leave to leave a group. The user’s photos can also be deleted when leaving the group by passing the delete_photos argument.

Group Discussions

shut UP WALTON

We are also opening up group discussions in the API. You can now fetch a list of discussion topics for a group using flickr.groups.discuss.topics.getList, with sticky topics first, then regular topics sorted from newest to oldest.

&lt;rsp stat=&quot;ok&quot;&gt;
    &lt;topics group_id=&quot;46744914@N00&quot; iconserver=&quot;1&quot; iconfarm=&quot;1&quot; name=&quot;Tell a story in 5 frames (Visual story telling)&quot; members=&quot;12428&quot; privacy=&quot;3&quot; lang=&quot;en-us&quot; ispoolmoderated=&quot;1&quot; total=&quot;4621&quot; page=&quot;1&quot; per_page=&quot;2&quot; pages=&quot;2310&quot;&gt;
        &lt;topic id=&quot;72157625038324579&quot; subject=&quot;A long time ago in a galaxy far, far away...&quot; author=&quot;53930889@N04&quot; authorname=&quot;Smallportfolio_jm08&quot; role=&quot;member&quot; iconserver=&quot;5169&quot; iconfarm=&quot;6&quot; count_replies=&quot;8&quot; can_edit=&quot;0&quot; can_delete=&quot;0&quot; can_reply=&quot;0&quot; is_sticky=&quot;0&quot; is_locked=&quot;&quot; datecreate=&quot;1287070965&quot; datelastpost=&quot;1336905518&quot;&gt;
            &lt;message&gt; ... &lt;/message&gt;
        &lt;/topic&gt;
    &lt;/topics&gt;
&lt;/rsp&gt;

flickr.groups.discuss.topics.add to post a new topic to a group, passing a subject and the message content.

Additionally, you can fetch a list of replies for a topic using flickr.groups.discuss.replies.getList, which includes the information for the topic along with all the replies, sorted from oldest to newest.

&lt;rsp stat=&quot;ok&quot;&gt;
    &lt;replies&gt;
        &lt;topic topic_id=&quot;72157625038324579&quot; subject=&quot;A long time ago in a galaxy far, far away...&quot; group_id=&quot;46744914@N00&quot; iconserver=&quot;1&quot; iconfarm=&quot;1&quot; name=&quot;Tell a story in 5 frames (Visual story telling)&quot; author=&quot;53930889@N04&quot; authorname=&quot;Smallportfolio_jm08&quot; role=&quot;member&quot; author_iconserver=&quot;5169&quot; author_iconfarm=&quot;6&quot; can_edit=&quot;0&quot; can_delete=&quot;0&quot; can_reply=&quot;0&quot; is_sticky=&quot;0&quot; is_locked=&quot;&quot; datecreate=&quot;1287070965&quot; datelastpost=&quot;1336905518&quot; total=&quot;8&quot; page=&quot;1&quot; per_page=&quot;3&quot; pages=&quot;2&quot;&gt;
            &lt;message&gt; ... &lt;/message&gt;
        &lt;/topic&gt;
        &lt;reply id=&quot;72157625163054214&quot; author=&quot;41380738@N05&quot; authorname=&quot;BlueRidgeKitties&quot; role=&quot;member&quot; iconserver=&quot;2459&quot; iconfarm=&quot;3&quot; can_edit=&quot;0&quot; can_delete=&quot;0&quot; datecreate=&quot;1287071539&quot; lastedit=&quot;0&quot;&gt;
            &lt;message&gt; ... &lt;/message&gt;
        &lt;/reply&gt;
    &lt;/replies&gt;
&lt;/rsp&gt;

flickr.groups.discuss.replies.add to post a reply to a topic, passing the message content.

flickr.groups.discuss.replies.edit to edit a reply, passing the updated message.

flickr.groups.discuss.replies.delete to delete a reply.

You can only edit and delete replies when authorized as the owner of the reply. For now, it is not possible to edit or delete a topic through the API.

If you have any questions, comments, concerns, or just want to chat about these methods or anything else related to the API, please join the Flickr Developer mailing list.

Photos from fofurasfelinas and larissa_allen.

Don’t be so PuSHy

You know three things that would be cool?

  • the ability to subscribe to the output of a Flickr API call in a feed aggregator
  • the ability to get the results of Flickr API calls as…

Oh wait. That was a while ago. Wouldn’t it be great if you didn’t have to poll our API over and over just to see if photos were there, only to find out you waited too long since last time and now because of the results size limits you can’t get them all and you have to figure out how many you missed and then you have to make another call with the right offset to get those results but of course in between then and now the result set changed a bit so you aren’t sure if you really got them all and…

Wouldn’t it be great if Flickr had something that could just PuSH photos to you as they appeared, kind of like this?

Introducing the new (and experimental) flickr.push API methods. These allow you to subscribe to new uploads and updates from your contacts and favorites from your contacts. Let’s dive right in and see exactly how it all works:

"there is a virtuous circle in this ecosystem"

The 20,000 ft overview is basically this:

  1. You make an API call to Flickr asking to subscribe to one of several different photo feeds, providing a callback URL in the arguments.
  2. A little verification dance ensues during which we make a request to your callback URL. If you respond appropriately we’re all good and from then on…
  3. Live(-ish) updates are POSTed from Flickr to your callback URL in Atom 1.0 format.

Subscribing

The subscription system is based as closely as possible on Google’s Pubsubhubbub protocol, with a few wrinkles. One is that Flickr acts as the hub and the publisher all rolled in to one. We’re obviously not really “publishing” separate feeds of every single user’s contacts’ photos and faves to a central hub somewhere, we only create the feeds on demand when someone subscribes to them. So we couldn’t, for example, publish them all to a 3rd-party hub like Superfeedr. But the whole pubsubhubbub metaphor still works pretty well.

Another difference is that the subscription happens via an authenticated API call and not an HTTP POST; hopefully the reasons for this are obvious. We’ll get into them in detail a little bit later. But even though the mechanism for the subscription request is different we’ve tried to follow the protocol as closely as possible and keep the parameters the same. The Google PubSubHubbub Core 0.3 section on how the subscription flow works is a good place to start, and the rest of this post assumes you’ve read that and more or less understand what the interactions should be between hub and subscriber. Done? OK, here are the methods:

flickr.push.getTopics

This method just tells you what you can subscribe to. It returns something like this:

<rsp stat="ok">
  <topics>
    <topic name="contacts_photos" />
    <topic name="contacts_faves" />
  </topics>
</rsp>

yeah, yeah, you already get that part. You can currently subscribe to contacts’ photos (new uploads and updates) or contacts’ faves. So subscribe already!

flickr.push.subscribe

This method (which requires an authentication token with read permissions) takes almost exactly the same arguments as a “proper” PubSubHubbub subscribe HTTP request would. Wee differences:

topic – unlike the topic argument in the HTTP version (which is a URL), this is just one of the topic types returned by flickr.push.getTopics.

secret – currently not supported, so this parameter is omitted.

callback – this must be unique, i.e. you can’t use the same URL for more than one subscription.

Everything else works as you would expect – verification (either synchronous or asynchronous), the hub challenge string, subscription expiration/refreshing, unsubscribing (with flickr.push.unsubscribe) etc. Which brings us to:

flickr.push.getSubscriptions

This method also requires an authentication token with read permissions, and returns a list of subscriptions for the authenticated user, like so:

<rsp stat="ok">
  <subscriptions>
    <subscription topic="contacts_photos" callback="http://example.com/contacts_photos_endpoint?user=12345" pending="0" date_create="1309293755" lease_seconds="0" expiry="1309380155" verify_attempts="0" />
    <subscription topic="contacts_faves" callback="http://example.com/contacts_faves_endpoint?user=12345" pending="0" date_create="1309293785" lease_seconds="0" expiry="1309380185" verify_attempts="0" />
  </subscriptions>
</rsp>

Oh yeah, the docs:

flickr.push.subscribe
flickr.push.unsubscribe
flickr.push.getTopics
flickr.push.getSubscriptions

Feeds

The format of the feed that gets posted to your endpoint is currently limited to Atom 1.0, i.e. exactly what you’d get from something like

http://api.flickr.com/services/feeds/photos_public.gne?id=_YOUR_NSID_HERE_&format=atom

For the contacts_faves topic type it’s the same thing but with the addition of the atom:contributor element to indicate the user who faved the photo.

And that’s about it. Questions?

Privacy and Restrictions

The astute observer may notice that not all photos are being sent in the PuSH feeds. Since this is a new (and experimental) feature for Flickr, we’ve basically turned all of the privacy/safety restrictions on it up to 11, at least to start with. PuSH feeds currently only contain images that have public visibility and safe adultness level. In addition, users with their “Who can access your original image files” option set to anything other than “anyone” and users who are opted out of the API will not have their photos included in PuSH feeds.

While this may be a bit restrictive (for example since the API call is authenticated and the photos are coming from your contacts technically you should be allowed to see contacts only or friends/family photos for contacts that allow it),  we feel that since this is a new thing it’s better to start conservative and see how the feature is being used. It’s possible that we may relax some of these restrictions in the future, but for now a PuSH feed is essentially what a signed-out user could get just by grabbing the RSS feeds from various people’s photostreams.

You will also notice that for now we’ve limited the feature to pro account holders only.

So… What?

screens beget BACON!!!

So what can you do with it? There’s the obvious: any web application which currently does some kind of polling of the Flickr API to get photos for its users can potentially be altered to receive the push feeds instead. More timely updates, cheaper/simpler for the application and as it turns out cheaper for Flickr, too – it’s often easier on our servers to push out events shortly after they happen and we’ve got them (often fresh in our cache) than it is to go and dig them up when they’re asked for some time later.

Surprise!

Some of the more interesting things that we hope these API methods will enable revolve around the more real-time nature of the events they expose. As an example of what’s possible in this space, Aaron Cope has created a little application he calls “Pua”. It’s a wonderfully simple way to surf Flickr without having to do much of anything; Pua takes you on a ride through your contacts’ photos and favorites, as they happen. Have a read about exactly what it is, why it’s called Pua and why he made it. If you ask nicely maybe Pua will give you an invite code.

Later

Hopefully there will be much more to come. Finer-grained controls on the subscriptions (safety levels, visibility levels, restricting to just new uploads or only certain types of updates, lightweight JSON feeds, etc.), new types of subscriptions (photos of your friends/family, photos from a particular location, photos having a particular tag, something to do with galleries…), and maybe some other stuff we haven’t thought of yet. Hey, wouldn’t it be cool if you didn’t need to run a web server on the other end to be the endpoint of the feeds?

Let us know what you’d like to see! What works, what doesn’t, what we got wrong and how to make it more useful to the people who want to Build Stuff (that’s you).

Fine Print

"there is a virtuous circle in this ecosystem"

The Flickr PuSH feeds are part of the Flickr API, and thus fall under the API Terms of Service Agreement. This means all the usual things about respecting photo owners’ copyrights and all also the other good bits about API abuse. In other words, don’t try to subscribe to all of Flickr. Trust me, we’ll notice.

Flickr now Supports OAuth 1.0a

We’re happy to announce that Flickr now supports OAuth! This is an open standard for authentication, which is now fully supported by the Flickr API. You can get started by going to our OAuth documentation. As part of this announcement, we would also like to note that the old Flickr authentication is now deprecated, and is expected to be disabled early 2012.

I'm Guarding the Door

I’m Guarding the Door by Frenck’s Photography

OAuth is very similar to the old Flickr auth in a lot of ways. You start by getting a request token (frob in the old flow), redirecting the user to the authentication page, and then getting a token which can be used to make authenticated requests. With proper OAuth support, though, you will be able to use one of the many libraries available in a variety of languages to get started.

In addition to this, we have streamlined the authentication process across desktop, mobile and web, and have simplified the user experience by removing the anti-phishing step for the Desktop flow, which is no longer necessary.

Currently, we only support OAuth 1.0a, but we have plans to eventually support OAuth 2.0. The decision was based on the fact that OAuth 2.0 is still an evolving definition that is rapidly changing.

We wanted to make the transition to OAuth seamless to the user, so we created a method to exchange an old token, with an OAuth token. The application has to simply make an authenticated request to flickr.auth.oauth.getAccessToken, which returns an OAuth auth token and signature for that user which are tied to your application. The exchange is meant to be final, so the old authentication token is scheduled to expire 24 hours after this API method is called.

Now, it’s your turn! Go read our OAuth documentation if you already have an application, or visit our developer guide for more information on how to get started. If you experience any problems, or have any questions or suggestions regarding our OAuth implementation, please post to our developer mailing list.

Refreshing The API Explorer

Most people know that Flickr has an API. As it wouldn’t be much use without documentation, we have that, too. (There’s even a list of methods and information about each available via the API itself.) What if I told you there was also a way to experiment with it from the comfort of your browser, no coding required?

Sink Explorer

Sink Explorer by Zabowski

Well, that’s what Flickr’s API Explorer offers. It’s an easy way to customise requests by filling in simple form fields, whether the method requires authentication or not, and to see the responses that are returned. It’s great for one-off prototype scripts where you quickly want to find some data, for seeing whether a method does what you think it does, or to sanity-check some code that’s not doing the right thing.

It’s been around for years, but nobody ever seems to have made much of a fuss about it. (The only mention I can find on this blog is an interview with a certain API developer singing its praises.) However, it’s needed a little attention to bring it up to date, and so I made some time to teach it a few new tricks.

Firstly, it now offers a choice of output response. While it doesn’t offer every format that the API does, the three (and a bit) available – the default XML, JSONP (or raw JSON), and PHP serialized data – should cover a lot of ground. Secondly, the Explorer pages now have proper URLs, so it’s possible to link to the API method for fetching the list of pandas, for example. Finally, for the most popular of those response types – XML and JSON(P) – responses are now pretty-printed and syntax highlighted, as are the examples in the API documentation pages for each method. That is to say, the returned values are indented and have line breaks, while the name, attributes and quoted values of the elements are coloured appropriately.

Now that you know that it exists, and that it’s all freshened up with spiffy features, why not go and play around? Have fun!

Galleries APIs

Giant Isopod

We love galleries. After all, without galleries how would you find your giant sea bugs?

This post is to quickly announce we’ve added galleries to the API.

A Rose+GUID by Any Other Name ….

Galleries in the API use “compound-ids”. Like tags. An example gallery compound id might look like 9634-72157621980433950. Unlike photos you can’t simply grab the last number off a gallery url and stick it into the API. Yeah, I’m not thrilled about it either, but there are good (read boring) reasons why it works that way.

So when an API method says it takes a gallery_id, we’re talking about the compound-id.

You can however use the flickr.urls.lookupGallery method to go from gallery url to gallery_id. Pass the method the URL for the gallery, and we’ll give you back the gallery info blob.

You can also get gallery IDs from flickr.galleries.getList and flickr.galleries.getInfo.

Behold, a gallery info blob:

<gallery id="6065-72157617483228192" 
    url="http://www.flickr.com/photos/straup/galleries/72157617483228192" 
    owner="35034348999@N01" 
    primary_photo_id="292882708" primary_photo_server="112" primary_photo_farm="1" primary_photo_secret="7f29861bc4" 
    date_create="1241028772" date_update="1270111667" count_photos="17" count_videos="0" >
 <title>Cat Pictures I've Sent To Kevin Collins</title>
 <description>dive dive dive</description>
 </gallery>

The primary_photo_* attributes refer to the “cover photo” for the gallery. The owner is the Flickr user_id (aka NSID) of the member who created the gallery. The id is that compound-id we talked about.

Lists of Galleries

You can fetch all of a member’s galleries using flickr.galleries.getList, sorted from newest to oldest, returning a list of gallery info blobs.

Or you can fetch all the galleries a given photo is in with flickr.galleries.getListForPhoto.

A Bag of Photos

Perhaps most interesting, flickr.galleries.getPhotos will return a list of all the photos for a given gallery. It’s a standard photo response, with a twist.

<photos page="1" pages="1" perpage="500" total="15"> 
   <photo id="2935475111" owner="8147452@N05" secret="e20746148b" server="3068" farm="4" title="Day off from the Death Star." ispublic="1" isfriend="0" isfamily="0" is_primary="1" has_comment="1">
        <comment>best cat picture ever!</comment>
   </photo>
   <photo id="3078977730" owner="68779755@N00" secret="dba9d8105e" server="3229" farm="4" title="&quot;We could stuff it with Kleenex...&quot;" ispublic="1" isfriend="0" isfamily="0" is_primary="0" has_comment="0" /> 
   <photo id="3212123792" owner="10983978@N03" secret="4231501383" server="3391" farm="4" title="1-19-09: Some People Just Don't Get It" ispublic="1" isfriend="0" isfamily="0" is_primary="0" has_comment="0" /> 
     ....
</photos>

In addition to standard photo response attributes, there is also a has_comment attribute which signals whether the gallery creator added a comment about why she included the photo, and whether the child comment element is present. Also is_primary, when set to 1, indicates this is the gallery’s “cover photo”.

CRUD

flickr.galleries.create creates a gallery, with a title, description, and optional primary photo, and will return a gallery element with the compound-id and the URL of the gallery.

<gallery id="50736-72157623680420409" url="http://www.flickr.com/photos/kellan/galleries/72157623680420409" />

flickr.galleries.editMeta is simply for updating the title and description. flickr.galleries.editPhoto confusingly doesn’t edit a photo, but rather the comment about a photo in a gallery.

Of course the money is all in flickr.galleries.addPhoto which allows you to actually build a gallery of photos.

Nota bene: Remember only public-safe can be added to galleries.

The Curated Life

We’ve also added the ability to restrict searches to only photos in galleries, with the in_gallery argument to flickr.photos.search

So whether you’re interested in kittens deemed cute enough for galleries, or hand selected pink photos, or Flickr Commons photos in galleries, or simply photos taken near you (assuming you’re in Brooklyn), in galleries, that’s available.

Twitter in the API

Ever since we launched our Flickr2Twitter beta, developers have been requesting new API methods, so they can support Flickr as a photo sharing option in their Twitter clients.

I’ve got good news, and bad news.

The bad news is we don’t have any new APIs to offer you.

The good news is we shipped our “Twitter APIs” nearly five years ago.

Let me explain.

Working with Blogs (including Twitter)

For as long as anyone can remember, we’ve supported the option of posting to external blogs directly from Flickr. Once you’ve configured a blogging service it becomes available in the “Blog This” drop down, as an option for Upload by Email, and, of course, in the API.

You and I might have serious philosophical questions about whether Twitter is a blogging service, but our web servers are more pragmatic. To them, the Twitter integration is just a new blogging service.

Configuring a blogging service

The first step for a member wishing to blog (or tweet) via Flickr is to configure an external blog. The only way to do this on flickr.com, generally from the Add a blog page.

Twitter is a bit special (or rather a preview of things to come) as we’ve given it its own service page. Directing users of your app to the Flickr2Twitter page is probably the best way get them “tweet ready”.

All set?

From here on out, you’ll need your user to have authorized you to access their Flickr account. (Find out more about FlickrAuth)

With a signed call to flickr.blogs.getList() you can get a list of all the blogging services a member has configured. Alternately you can pass in a service id (e.g. Twitter) to scope the list of blogs to the service you’re interested in. The response looks something like:

<blogs>
  <blog id="7214" name="Code Flickr" service="MetaWeblogAPI" needspassword="0" url="http://code.flickr.com/blog/"/>
  <blog id="7215" name="Twitter: kellan" service="Twitter" needspassword="0" url="http://twitter.com/kellan"/>
  <blog id="72157" name="Twitter: Flickr" service="Twitter" needspassword="0" url="http://twitter.com/flickr"/>
</blogs>

This account has 3 blogs configured. A WordPress blog, and two Twitter accounts. Each one has a unique id. Additionally needpassword="0" means we have credentials for these blogs stored server side and you don’t need to prompt your user to log in to their blog.

If you passed in Twitter as the service, and instead of the above you got something like:

<blogs/>

Then your user hasn’t configured any blogs for that service.

The Easy Option: Upload a photo to Flickr, post to Twitter via Flickr

If your application has been authorized to upload photos on your user’s behalf, and you’ve made sure they have a Twitter blog configured with Flickr, then the easiest solution is to use Flickr as a passthru service.

Once you’ve successfully uploaded a photo you’ll get an API response like <photoid>1234</photoid>. (Find out more about uploading and asynchronous uploading).

Pass the blog id from the <blogs> list above, and the photoid from the upload response to flickr.blogs.postPhoto(). If you’re posting to Twitter the title argument is optional and the description argument is ignored. (By default the title of the photo is the body of the tweet, alternately pass a different status update in the title field)

Or instead of passing a blog id, you can pass a service id (i.e. Twitter) and the photo (and blog post) will be sent to the first matching blog of that service. If we don’t find a blog matching that service, you’ll get a “Blog not found.” error.

Assuming your API call to flickr.blogs.postPhoto() is well formed, Flickr will turn around and post your user’s tweet to Twitter, including a short flic.kr url linking back to their photo.

The Established Option: Upload a photo Flickr, post to Twitter any which way you can

If you’re looking to integrate Flickr photos into an existing Twitter application you might already have a preferred method for posting to Twitter.

After you’ve successfully uploaded a photo and received the photoid follow these instructions for manufacturing a short url using the flic.kr domain.

Unlike most URL shortening schemes, every photo on Flickr already has a short URL associated with it. The follow the form:

http://flic.kr/p/{base58-photo-id}

By the way, you shouldn’t feel constrained to only use short urls on Twitter. They work equally well for a diverse range of applications including fortune cookies.

Thumbnails

If you want to display a thumbnail of a photo, you’ll need to make an API call to one of the methods that returns the photo’s secret. Either flickr.photos.getSizes() or flickr.photos.getInfo() will do. Read up on constructing Flickr URLs.

Follow Along

My favorite new game has been watching the flows of shared Flickr photos as they appear on Twitter.

Happy photo sharing!

What Would Brooklyn Do?

The other day, Mike Ellis posted a really lovely interview with Shelley Bernstein and Paul Beaudoin about the release of the Brooklyn Museum’s Collections API.

One passage that I thought was worth calling out, and which I’ve copied verbatim below, is Shelley’s answer to the question “Why did you decide to build an API?”

First, practical… in the past we’d been asked to be a part of larger projects where institutions were trying to aggregate data across many collections (like d*hub). At the time, we couldn’t justify allocating the time to provide data sets which would become stale as fast as we could turn over the data. By developing the API, we can create this one thing that will work for many people so it no longer become a project every time we are asked to take part.

Second, community… the developer community is not one we’d worked with before. We’d recently had exposure to the indicommons community at the Flickr Commons and had seen developers like David Wilkinson do some great things with our data there. It’s been a very positive experience and one we wanted to carry forward (emphasis mine) into our Collection, not just the materials we are posting to The Commons.

Third, community+practical… I think we needed to recognize that ideas about our data can come from anywhere, and encourage outside partnerships. We should recognize that programmers from outside the organization will have skills and ideas that we don’t have internally and encourage everyone to use them with our data if they want to. When they do, we want to make sure we get them the credit they deserve by pointing our visitors to their sites so they get some exposure for their efforts.

The only thing I would add is: What she said!

Tags in Space

A lot of you enjoyed our post (“Found in Space”) on the amazing astrometry.net project, and there have been some interesting followups.

A mysterious figure known only as “jim” paired up astronomy photos from Flickr with Google Sky. (You’re going to need the Google Earth plug-in for your browser — just follow the instructions on that page if you don’t have it.) In his technical writeup, “jim” explains how he used the Yahoo Query Language (YQL) to fetch the data. YQL is similar to the existing Flickr APIs, but it’s a query language like SQL rather than a set of REST-ish APIs. And both of those are really just ways to get data out of Flickr’s machine tag system, specifically the astro:* namespace. It’s turtles all the way down.

Who else is using astrotags? The British Royal Observatory in Greenwich is sponsoring a contest to determine the Astronomy Photographer of the Year and the whole thing is based on a Flickr group and extensive use of Flickr’s APIs. The integration is so seamless — galleries of photos and discussions are surfaced on their site as well as ours — you might as well consider Flickr to be their “backend” server. But they’ve also added much, such as great documentation about how to astrotag your photos as well as a concise explanation about how Astrometry.net identifies your photo, even among millions of known stars. (The sci-fi website io9 interviewed Fiona Romeo of the Royal Observatory about the contest; check it out.)

It’s dizzying how many services have been combined here — Astrometry.net grew out of research at the University of Toronto, web mashups use Google Sky for visualization in context, Yahoo infrastructure delivers and transforms data, the Royal Observatory at Greenwich provides leadership and expertise, and then little old Flickr acts as a data repository and social hub. And let’s not forget you, the Flickr community, and your inexhaustible creativity — which is the reason why all this can even come together.

All this was done with pretty light coordination and few people at Flickr were even aware what was going on until recently. I have no idea what the future is for APIs and a web of services loosely joined, but I hope we get to see more and more of this sort of thing.