I See Smart People, AKA: We Do Stuff …

Over on our sister photo arty blog you could easily imagine reading phrases like “One of the amazing things about working at Flickr is the vast amount of incredible photography it exposes you to“, or some such. Hah! Those arty types!

Over here, I’d like to post the flip side … about how one of the amazing things about working at Flickr, is the awesome people I get to work with.

Huddle

Take for example …

Ross Harmes

… people often think I’m joking when we’re sitting in a meeting, discussing how we should standardise our front-end coding conventions or some such and I say we should just “ask Ross”.

But! BUT!! but, Ross wrote a fricking book about JavaScript; Pro JavaScript Design Patterns and sits 3 desks away, it’s faster (and more amusing (to me)) to shout out a question than it is to flick through the index of the book. It’s like having the talking Kindle version, but with a much more natural voice!

You can also find out more at jsdesignpatterns.com

Anyway …

The reason why I wanted to post this, is that recently an awful lot of my co-workers have been doing stuff! So with an eye to that …

The Lovely Kellan Elliott-McCrea

Flashing quickly across the radar last couple of weeks is/was a bunch of discussion about URL shorteners, sort of starting with Josh over here: on url shorteners, and David with The Security Implications of URL Shortening Services. With other discussions popping up here and there and sorta everywhere.

And so, with this bit of philosophy from “Stinky” Willison (more on Simon later) …

Twitter / kellan: “You can build prototypes ...

… Kellan built this (maybe even during meetings); RevCanonical: url shortening that doesn’t hurt the internet, you can read more about it in his blog post URL Shortening Hinting [Note: includes mention of Flickr], where the comments are worth price of admission alone, also, features hamster photos.

If you want to join in, you can read more over on the official RevCanonical blog and get, fork or whatever it is people do with code on github. And I’m sure we’ll have more news about RevCanonical here soon :)

But not content with starting that wildfire, Kellan has also been does his bit to help OpenStreetMap, by walking around with a GPS unit and, I think this part is important, drinking beer

Neocartography

… shown here featuring our good friend Mikel Maron, remember we use OpenSteetMap on Flickr when our own maps are a little sparse. More on maps later!

Meanwhile …

The Mighty John Allspaw

Mr Allspaw is our wonderful Ops guy, here he is …

I have a working iSight now.

… smoldering.

As well as smoldering he also gave a talk at the Web 2.0 Expo called Operational Efficiency Hacks the other week. If you’re into that type of thing and missed it, which you probably did, here are his slides

… and his follow up post adds a little further reading. If you like this kinda of thing you should probably subscribe to his blog where he posts really interesting Flickr related stuff, and infuriatingly enough *not* here on this blog, /me sulk.

On the subject of Allspaw (and as we’ve already mentioned one book), I was pretty sure I’d mentioned his book before: The Art of Capacity Planning: Scaling Web Resources … if building big things on the web is your kinda thing, but apparently I haven’t, so …

According to a reviewer on Amazon “John’s examples are just like Charlie’s from the TV show Numb3rs”, having never watched Numb3rs I can only assume that’s a probably a bad thing (kinda like Scully from X-Files explaining science) but gave it 4 stars anyway :) on those grounds alone you should buy it …

The Art of Capacity Planning: Scaling Web Resources

Oh and don’t forget the WebOps Visualizations Pool on flickr, that John often posts to when things suddenly get much better or worse ;) if you enjoy graphs like this …

The Obama Drop

Getting back to the front-end for a second …

Scott Schiller, Fish enthusiast!

You’ve heard of Muxtape right? That website that plays MP3s, how’d you think it does that? Or the cool javascript/audio stuff that Jacob Seidelin does over at Nihilogic; JavaScript + Canvas + SM2 + MilkDrop = JuicyDrop & Music visualization with Canvas and SoundManager2?

Screenshot shown here (warning: link to noisy thing) …

A [ Radiohead / JavaScript / Boids / Canvas / SM2 ] mashup by Jacob Seidelin

Well they both use SoundManager 2 written by our very own Scott Schiller.

It’s an extensive and easy to use javascript thingy … wait, Scott says it better … “SoundManager 2 wraps and extends the Flash Sound API, and exposes it to Javascript. (The flash portion is hidden, transparent to both developers and end users.)” … which basically means that if you like JavaScript, messing with audio but hate, I mean, dislike working with Flash, it can save you a lot of pain. Here’s one of the demos Scott put together …

And if you think that all looks awesome, remember that Scott is one of our fantastic front-end guys, bringing all that good js magic to Flickr! Apart from the music part, well unless we one day decide to add music and customisable backgrounds to flickr [1].

Aaron Straup Cope

Aaron covered this only the other day: The Only Question Left Is, but has been doing an awful lot with generating shapefiles recently. I just wanted to add my take to it, because even I have trouble keeping up.

Basically what I want, is to be able to send something-somewhere a list of latitudes and longitudes and it return me the “shape” that those points make. This could be anything, the locations of geotagged Squirrels, or even something useful, well kinda like this from Matt Biddulph (him wot of Dopplr) …

London dopplr places, filtered to only places my social network has been to, clustrd
London dopplr places, filtered to only places my social network has been to, clustrd

… which maps out something of interest to him, where his “social network” go/eat etc. in London. Which may be different from mine, or could even have some overlap, thus answering the time old question; which pub should we all go to for lunch?

It’s not quite at the point where you can do it without having to put a little effort in, but I keep prodding Aaron because I want it now! But if you’re the type that does enjoy putting the effort in then you can again do the GitHub dance here: ws-clustr and py-wsclustr (Python bindings for spinning up and using an EC2 instance running ws-clustr).

Once more, more on maps later.

Daniel Bogan – Setup Man

Bogan is virtual, and only exists in the internets, as can be seen here …

Denied

… kind of like Max Headroom, but with worse resolution. Which I think makes Flickr the first interweb company you have a real AI working on the code, not that pretend AI stuff!

Last year he has a bash at helping to put a little context around us delicate flower developers, with a quick run-down of the setup we each used with Trickr, or Humanising the Developers (Part 1) & Trickr, or Humanising the Developers (Part 2). Based on an old project of his called “The Setup” where he used to interview various Internet Famous people (when there wasn’t so damn many of them) about their Setup.

Recently he’s reprised that task, with, wait for it … The Setup, where our very own Bogan asks such leading lights as John Gruber, Steph Thirion, Jonathan Coulton & Gina Trapani. I have no idea how he finds the time!

In turn you can read a quick interview with Bogan over at indicommons.

Rev Dan Catt — errr, me!

Meanwhile, if you’re reading this blog, you’ve probably already seen this, I’ve been trying my hand at using Processing to visualize 24 hours of geotagged photos on Flickr …

… which I managed by following the instructions here Processing, JSON & The New York Times to get Processing to consume our very own Flickr API in JSON format. Which in turn started me off prodding at the Processing group

Flickr: The Processing.org Pool

and Visualization groups on Flickr …

Flickr: The Cool Data Visualization Techniques - Information Visualization Pool

Pulling it all together

So that’s what some of us are up-to, and going back to the start, I’m amazed and all the stuff that goes on, brilliant minds and all that.

In my head, this is what ties it all together, hang on, here we go …

Kellan’s been walking around with a GPS unit (along with 1000s of others), adding to the OpenStreetMap (OSM) dataset, we (Flickr) sometimes use that dataset, but also … Matt Jones (also him wot of Dopplr) made this …

My first Cloudmade map style: "Lynchian_Mid"

… using Cloudmade, who in turn use OSM data to allow people to easily style up and use maps. Now, I’m sure Mr Jones, wont mind me saying that he’s not a coder, infact here he is; Matt Jones – Design Husband …

Matt Jones - design husband

… but a non-programmer can now easily make maps, as demonstrated above and described here: My first Cloudmade map style: “Lynchian_Mid”.

Then using our wonderfully public Shape Files API (flickr.places.getShapeHistory which yes you can get in JSON format for using with Processing or JavaScript) overlay boundaries. Even, when it’s easier (for the non-programmer) plot outlines and shapes, based on the code Aaron is working on, onto those maps.

More on making maps here: maps from scratch.

But where could you get such useful data for plotting or visualising, well obviously there’s our API, which is where the senseable team at MIT got the data for their Los ojos del mundo (the world’s eyes) project, again using Processing …


(un)photographed Spain from senseablecity on Vimeo.

But also, let us recall “Stinky” Willison, one time employee of Flickr, who now works at The Gruadian. They have a geocoding project, that allows you, if you so wished to place their stories on a map … http://guardian.apimaps.org/search.html … which uses Mapping from Cloudmade, map data from OpenStreetMap, location search from our very own API, and stories from their own API. Which in turn allows you to plot their stories on your own maps, phew!

More about the Guardian Open Platform.

The Guardian Open Platform | guardian.co.uk

You can also read about their Data Store, which gives you access to a load of easy to use data just ripe for visualizing…

… be that with Processing, Flash or JavaScript (following the advice in Ross’s book), and even with photos from Flickr and Audio driven with Scott’s SoundManager2, in “Shapes” powered by Aaron, and preserved with short URLs that’ll stick around, thanks Kellan :) and you can scale it if you need to by following John’s insights.

And that’s just what we do when we’re not working on Flickr.

Photos by Ross Harmes, Kellan, jspaw, jesse robbins, Matt Biddulph, waferbaby, moleitau and dan taylor.


[1]never going to happen.

The Only Question Left Is

photo by Shawn Allen

At the Emerging Technology conference this year Stamen Design’s Michal Migurski and Shawn Allen led an afternoon workshop called “Maps from Scratch: Online Maps from the Ground Up” where people made digital maps from, well… scratch.

If you’ve never heard of Stamen they’ve been doing some of the most exciting work around the idea of “custom cartography” including: Cabspotting, Oakland Crimespotting and Old Oakland Maps, work for the London Olympics, and designing custom map tiles for CloudMade. (Stamen also built the recently launched Flickr Clock :-)

All of this is interesting in its own right; proof that there is still a lot of room in which to imagine maps beyond so-called red-dot fever. All of this is extra interesting in light of Apple’s recent announcement to allow developers to define their own map tiles in the next iPhone OS release. All of this super-duper interesting because it is work produced by a team of less than 10 people.

The tools, and increasingly the data, to build the maps we want are bubbling up and becoming easier and more accessible to more people every day. Easier, anyway.

“One of the things that made this tutorial especially interesting for us was our use of Amazon’s EC2 service, the “Elastic Compute Cloud” that provides billed-by-the-hour virtual servers with speedy internet connections and a wide variety of operating system and configuration options. Each participant received a login to a freshly-made EC2 instance (a single server) with code and lesson data already in-place. We walked through the five stages of the tutorial with the group coding along and making their own maps, starting from incomplete initial files and progressing through added layers of complexity.

“Probably the biggest hassle with open source geospatial software is getting the full stack installed and set up, so we’ve gone ahead and made the AMI (Amazon Machine Image, a template for a virtual server) available publicly for anyone to use, along with notes on the process we used to create it.”

Michal Migurski

The Maps From Scratch (MFS) AMI may not be a Leveraged Turn Key Synergistic Do-What-I-Mean Solutions Platform but, really, anything that dulls the hassle and cost of setting up specialized software is a great big step in the right direction. I mention all of this because Clustr, the command-line application we use to derive shapefiles from geotagged photos, has recently been added to the list of tools bundled with the MFS AMI.

Specifically: ami-4d769124.

We’re super excited about this because it means that Clustr is that much easier for people to use. We expressly chose to make Clustr an open-source project to share some of the tools we’ve developed with the community but it has also always had a relatively high barrier to entry. Building and configuring a Unix machine is often more that most people are interested in, let alone compiling big and complicated maths libraries from scratch. Clustr on EC2 is not a magic pony factory but hopefully it will make the application a little friendlier.

Shapes

Creating and configuring an EC2 account is too involved for this post but there are lots of good resources out there, starting with Amazon’s own documentation. When I’m stuck I usually refer back to Paul Stamatiou’s How To: Getting Started with Amazon EC2.

Assuming that you familiar using Unix command line tools, let’s also assume that you have gotten all your ducks in a row and are ready to fire up the MFS AMI:

your-computer> ec2-run-instances ami-4d769124 -k example-keypair

your-computer> ec2-describe-instances

At which point, you’ll see something like this:

INSTANCE i-xxxxxxxx ami-4d769124 ec2-xxxxx.amazonaws.com blah blah blah

i-xxxxxxxx is the unique identifier of your current EC2 session. You will need this to tell Amazon to shut down the server and stop billing you for its use.

ec2-xxxxx.amazonaws.com is the address of your EC2 server on the Internets.

Once you have that information, you can start using Clustr. First, log in and create a new folder where you’ll save your shapefile:

your-computer> ssh -i example-rsa-key root@ec2-xxxxx.amazonaws.com

ec2-xxxxx.amazonaws.com> mkdir /root/clustr-test

The MFS AMI comes complete with a series of sample “points” files to render. We’ll start with the list of all the geotagged photos uploaded to Flickr on March 24:

ec2-xxxxx.amazonaws.com> /usr/bin/clustr -v -a 0.001 
   /root/clustr/start/points-2009-03-24.txt 
   /root/clustr-test/clustr-test.shp

By default Clustr generates a series of files named clustr (dot shp, dot dbf and dot shx because shapefiles are funny that way) in the current working directory. You can specify an alternate name by passing a fully qualified path as the last argument to Clustr. When run in verbose mode (that’s the -v flag) you’ll see something like this:

Reading points from input.
Got 44410 points for tag '20090324'.
799 component(s) found for alpha value 0.001.
- 23 vertices, area: 86.7491, perimeter: 71.9647
- 32 vertices, area: 1171.51, perimeter: 41.3095
- 8 vertices, area: 18.5112, perimeter: 0.529504
- 12 vertices, area: 1484.81, perimeter: 10.8544
...
Writing 505 polygons to shapefile.

Yay!

ec2-xxxxx.amazonaws.com> ls -la /root/clustr-test
total 172
drwxr-xr-x 2 root root  4096 2009-04-07 03:14 .
drwxr-xr-x 5 root root  4096 2009-04-07 02:22 ..
-rw-r--r-- 1 root root 52208 2009-04-07 03:14 clustr-test.dbf
-rw-r--r-- 1 root root 97388 2009-04-07 03:14 clustr-test.shp
-rw-r--r-- 1 root root  4140 2009-04-07 03:14 clustr-test.shx

Now copy the shapefiles back to your computer and terminate your EC2 instance (or you might be surprised when you get your next billing statement from Amazon).

ec2-xxxxx.amazonaws.com> scp -r /root/clustr-test 
   you@your-computer:/path/to/your/desktop/

ec2-xxxxx.amazonaws.com> exit

your-computer> ec2-terminate-instances i-xxxxxxxxx

I created this image (using the open source QGIS application) for all those points by running Clustr multiple times with alpha numbers ranging from 0.05 to 603:

SHAPEZ (2009-03-24)

Here’s another version rendered using the nik2img application and a custom style sheet, both included with the MFS distribution:

clustr

Here’s one of all the geotagged photos tagged “route66” (with alpha numbers ranging from 0.001 to 0.5):

tag=route66, alpha=(0.001 - 0.5)

Apologies and big sloppy kisses to Stamen’s own Mappr (first released in 2005).

Or tagged “caltrain“, the commuter train that runs between San Francisco and San Jose:

tag=caltrain, alpha=0.001

Meanwhile, Matt Biddulph at Dopplr has been generating a series of visualizations depicting the shape of where to eat, stay and explore for the cities in their Places database. This is what London looks like:

Or: “London dopplr places, filtered to only places my social network has been to, clustrd“.

One of the things I like the most about Clustr is that it will generate shape(file)s for any old list of geographic coordinates. Now that most of the hassle of setting up Clustr has been (mostly) removed, the only question left is: What do you want to render?

“They do not detail locations in space but histories of movement that constitute space.”

Rob Kitchin, Chris Perkins

If you’re like me you’re probably thinking something like “Wouldn’t it be nice if I could just POST a points file to a webservice running on the AMI and have it return a compressed shapefile?” It sure would so I wrote a quick and dirty version (not included in the MFS AMI; you’ll need to do that yourself) in PHP but if there are any Apache hackers in the house who want to make a zippy C version that would be even Moar Awesome ™.

If you don’t want to use the MFS AMI and would rather just install Clustr on your own machine instance, here are the steps I went through to get it work on a Debian 5.0 (Lenny) AMI; presumably the steps are basically the same for any Linux flavoured operating system:

$> apt-get update
$> apt-get install libcgal-dev
$> apt-get install libgdal1-dev
$> apt-get install subversion

$> svn co http://code.flickr.com/svn/trunk/clustr/
$> cd clustr
$> make
$> cp clustr /usr/bin/

$> clustr -h

clustr 0.2 - construct polygons from tagged points
written by Schuyler Erle

(c) 2007-2008 Yahoo!, Inc.

Usage: clustr [-a <n>] [-p] [-v] <input> <output>
   -h, -?      this help message
   -v          be verbose (default: off)
   -a <n>      set alpha value (default: use "optimal" value)
   -p          output points to shapefile, instead of polygons

If <input> is missing or given as "-", stdin is used.
If <output> is missing, output is written to clustr.shp.
Input file should be formatted as: <tag> <lon> <lat>n
Tags must not contain spaces.
        

Just like that!

photo by Timo Arnall

How the contact cache was won

You say ‘cash’, I say ‘kaysh’

Flickr has a lot of users. A lot. And most of those users have contacts, family, friends; somewhere between none and a bajillion. Or tens of thousands, anyhow. That’s a lot of relationships flying hither and yon, meaning we can’t just cache this stuff on the fly whenever the need strikes us. And strike it did.

Thus, Bo Selecta.

This project was designed to grab up a person’s contacts from anywhere in Flickrspace, and it had to be usable in bits of the site we hadn’t even designed yet. But it also had to not suck, and it had to be fast.

Supafast.

Luckily for us, we have at our disposal a shipping crate in the basement full of terribly clever little robots wearing suitable, fleshy attire and having names like Ross and Paul and Cal.

Walking into the river

As Rossbot has already covered, we spent a lot of time back-and-forthing on how we’d seed this aggregated cache all over the damned place without compromising on speed or our own general sexual attractiveness. Plus, I just wanted to use big words like ‘aggregated’ and ‘seed’.

As I’ve already mentioned above, making this magic happen at request time was not an option, so we turned to our (somewhat) trusty offline tasks system. These tasks munge and purge and generally do all sorts of wonderful data manipulation on boxes separate to the main site, in a generally orderly fashion, and do it in the background.

Offline tasks do it in the background

First up, we needed to work out what data we’d actually want to cache, which ended up being a minimal chunk useful enough for Rossbot to do whatever it is he does with Javascript that makes the ladies throw their panties on stage, and not a single byte more. We ended up with something that looks like this:

You got me.

Oh, you’re a clever one. That’s actually a picture of a fish. We really ended up with something like this:

NSIDaemail@address.comacharacter_nameareal nameaicon serveraicon farmapath aliasais_friendais_familyamagic_dust

Thus, we’re generating a bunch of contact data separated by designated control characters, and ultimately stored in a TEXT field in a database. The first time your cache is built, we actually walk your entire contact list and generate one of these chunks for each person you’re affiliated with. On subsequent updates, we use a bit of regular expression hoohah pixie dust to only change the necessary details from individuals, and write those changes back to the DB.

Big ups to Mylesbot for his help with making these tasks as efficient and as well-oiled as he is.

Speaking of updates, clearly we have to make sure we catch any changes you or your contacts make, so we have various spots around the site that fire off these offline tasks – when you update your various profile details, when you pick a named URL on Flickr for the first time, or when change your relationship with someone.

These updates have been carefully honed to work in the context of what’s changing – again, to squeeze out as much speed as we can. F’instance, there’s no need for us to tell all of your contacts that your relationship with SexyBabe43 has progressed to ‘Friend’. Unless that’s your sort of thing, but really, let’s leave that as an exercise for the reader.

All of this attention to detail has ultimately helped us eck out as much speed as possible. Seeing a theme here? So any time you’re sending a Flickrmail, searching for a contact or sharing a photo, think of the robots, and smile that secret little smile of yours, knowingly.

Tags in Space

A lot of you enjoyed our post (“Found in Space”) on the amazing astrometry.net project, and there have been some interesting followups.

A mysterious figure known only as “jim” paired up astronomy photos from Flickr with Google Sky. (You’re going to need the Google Earth plug-in for your browser — just follow the instructions on that page if you don’t have it.) In his technical writeup, “jim” explains how he used the Yahoo Query Language (YQL) to fetch the data. YQL is similar to the existing Flickr APIs, but it’s a query language like SQL rather than a set of REST-ish APIs. And both of those are really just ways to get data out of Flickr’s machine tag system, specifically the astro:* namespace. It’s turtles all the way down.

Who else is using astrotags? The British Royal Observatory in Greenwich is sponsoring a contest to determine the Astronomy Photographer of the Year and the whole thing is based on a Flickr group and extensive use of Flickr’s APIs. The integration is so seamless — galleries of photos and discussions are surfaced on their site as well as ours — you might as well consider Flickr to be their “backend” server. But they’ve also added much, such as great documentation about how to astrotag your photos as well as a concise explanation about how Astrometry.net identifies your photo, even among millions of known stars. (The sci-fi website io9 interviewed Fiona Romeo of the Royal Observatory about the contest; check it out.)

It’s dizzying how many services have been combined here — Astrometry.net grew out of research at the University of Toronto, web mashups use Google Sky for visualization in context, Yahoo infrastructure delivers and transforms data, the Royal Observatory at Greenwich provides leadership and expertise, and then little old Flickr acts as a data repository and social hub. And let’s not forget you, the Flickr community, and your inexhaustible creativity — which is the reason why all this can even come together.

All this was done with pretty light coordination and few people at Flickr were even aware what was going on until recently. I have no idea what the future is for APIs and a web of services loosely joined, but I hope we get to see more and more of this sort of thing.

Building Fast Client-side Searches

Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client.

Server Side: Cache Everything

To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc. Whenever a member updates their information, we update the cache for all of their contacts. Since a single member who changes their contact information can require updating the contacts cache for hundreds or even thousands of other members, we rely upon prioritized tasks in our offline queue system.

Testing the Performance of Different Data Formats

Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. This means that we need to defer loading until it’s needed, and that we have to be able to request, download, and parse the contact list in the amount of time it takes a member to go from hovering over a text field to typing a name.

With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. We started with Ajax and XML; this proved to be the slowest by far, so much so that the larger test cases wouldn’t even run to completion (the tags used to create the XML structure also added a lot of weight to the filesize). It appeared that using XML was out of the question.

BoSelectaJsonGoodFunTimes: eval() is Slow

DJ Bo Selecta on the decks

Next we tried using Ajax to fetch the list in the JSON format (and having eval() parse it). This was a major improvement, both in terms of filesize across the wire and parse time.

While all of our tests ran to completion (even the 10,000 contacts case), parse time per contact was not the same for each case; it geometrically increased as we increased the number of contacts, up to the point where the 10,000 contact case took over 80 seconds to parse — 400 times slower than our goal of 200ms. It seemed that JavaScript had a problem manipulating and eval()ing very large strings, so this approach wasn’t going to work either.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,617 1536 81312 0.14 7.66
4,878 681 18842 0.14 3.86
2,979 393 6987 0.13 2.35
1,914 263 3381 0.14 1.77
1,363 177 1837 0.13 1.35
798 109 852 0.14 1.07
644 86 611 0.13 0.95
325 44 252 0.14 0.78
260 36 205 0.14 0.79
165 24 111 0.15 0.67

JSON and Dynamic Script Tags: Fast but Insecure

Working with the theory that large string manipulation was the problem with the last approach, we switched from using Ajax to instead fetching the data using a dynamically generated script tag. This means that the contact data was never treated as string, and was instead executed as soon as it was downloaded, just like any other JavaScript file. The difference in performance was shocking: 89ms to parse 10,000 contacts (a reduction of 3 orders of magnitude), while the smallest case of 172 contacts only took 6ms. The parse time per contact actually decreased the larger the list became. This approach looked perfect, except for one thing: in order for this JSON to be executed, we had to wrap it in a callback method. Since it’s executable code, any website in the world could use the same approach to download a Flickr member’s contact list. This was a deal breaker.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,709 1105 89 0.10 0.01
4,877 508 41 0.10 0.01
2,979 308 26 0.10 0.01
1,915 197 19 0.10 0.01
1,363 140 15 0.10 0.01
800 83 11 0.10 0.01
644 67 9 0.10 0.01
325 35 8 0.11 0.02
260 27 7 0.10 0.03
172 18 6 0.10 0.03

Going Custom

Custom Ride

Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.

Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.

that.contacts = o.responseText.split("\c");

for (var n = 0, len = that.contacts.length, contactSplit; n < len; n++) {

	contactSplit = that.contacts[n].split("\a");

	that.contacts[n] = {};
	that.contacts[n].n = contactSplit[0];
	that.contacts[n].e = contactSplit[1];
	that.contacts[n].u = contactSplit[2];
	that.contacts[n].r = contactSplit[3];
	that.contacts[n].s = contactSplit[4];
	that.contacts[n].f = contactSplit[5];
	that.contacts[n].a = contactSplit[6];
	that.contacts[n].d = contactSplit[7];
	that.contacts[n].y = contactSplit[8];
}

Though this technique sounds like it would be slow, it actually performed on par with native JSON parsing (it was a little faster for cases containing less than 1000 contacts, and a little slower for those over 1000). It also had the smallest filesize: 80% the size of the JSON data for the same number of contacts. This is the format that we ended up using.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,741 818 173 0.08 0.02
4,877 375 50 0.08 0.01
2,979 208 34 0.07 0.01
1,916 144 21 0.08 0.01
1,363 93 16 0.07 0.01
800 58 10 0.07 0.01
644 46 8 0.07 0.01
325 24 4 0.07 0.01
260 14 3 0.05 0.01
160 13 3 0.08 0.02

Searching

Ben to the Rescue

Now that we have a giant array of contacts in JavaScript, we needed a way to search through them and select one. For this, we used YUI’s excellent AutoComplete widget. To get the data into the widget, we created a DataSource object that would execute a function to get results. This function simply looped through our contact array and matched the given query against four different properties of each contact, using a regular expression (RegExp objects turned out to be extremely well-suited for this, with the average search time for the 10,000 contacts case coming in under 38ms). After the results were collected, the AutoComplete widget took care of everything else, including caching the results.

There was one optimization we made to our AutoComplete configuration that was particularly effective. Regardless of how much we optimized our search method, we could never get results to return in less than 200ms (even for trivially small numbers of contacts). After a lot of profiling and hair pulling, we found the queryDelay setting. This is set to 200ms by default, and artificially delays every search in order to reduce UI flicker for quick typists. After setting that to 0, we found our search times improved dramatically.

The End Result

Head over to your Contact List page and give it a whirl. We are also using the Bo Selecta with FlickrMail and the Share This widget on each photo page.

Moar Panda: Is a Firehose of Snowflakes a Nor’easter?

Kellan in his blog post over here: Is a Firehose of Snowflakes a Nor’easter? correctly points that that I may have been slightly cagey about the awesomeness of what we actually put out yesterday, by saying …

“But because the documentation is quirky, I think people missed the significance. These are Flickr real time data APIs.”

My excuse is that’s my natural instinct to sneak new features past the lawyers, by dressing them up in a non-serious fashion :)

But in reality, this is just an excuse to post the following photo …

Magic Panda Bus

… see there I go again. Offsetting the important with light humor, go read Kellan’s post now for he is smart and knows what he’s talking about.

Now stop trying to get on explore & enjoy yourself

Photo by Las Tonterias points out

Panda Tuesday; The History of the Panda, New APIs, Explore and You

The New APIs

I’ll cut straight to the chase on this one, we’ve just launched two new API methods;

flickr.panda.getPhotos
flickr.panda.getList

The first call (flickr.panda.getPhotos) returns you a list of photos that the mystical Flickr Pandas are currently interested in, in the following format …

<photos interval="60000" lastupdate="1235765058272" total="120" panda="ling ling">
    <photo title="Shorebirds at Pillar Point" id="3313428913" secret="2cd3cb44cb"
        server="3609" farm="4" owner="72442527@N00" ownername="Pat Ulrich"/>
    <photo title="Battle of the sky" id="3313713993" secret="3f7f51500f"
        server="3382" farm="4" owner="10459691@N05" ownername="Sven Ericsson"/>
    <!-- and so on -->
</photos>

When calling this API method please ensure that your code uses the lastupdate and interval attributes to determine when to request new photos. lastupdate is a Unix timestamp indicating when the list of photos was generated and interval is the number of seconds to wait before polling the Flickr API again.

The second call (flickr.panda.getList) tells you which of the Mystical Flickr Pandas are around to request photos from.

Ling Ling and Hsing Hsing both return photos they are currently interested in, both have slightly different tastes in photos depending on their mood. The (currently) third Panda Wang Wang returns photos that have recently been geotagged, not quite realtime but close.

Important! No-one fully understands the whims and fancies of the Pandas at any moment in time, these APIs give you a direct dump of what they are looking at (filtered for safe and public photos only) at this very moment. We cannot promise that this will not include ladies (or men) in bikinis, or possibly even ladies not in bikinis. You should keep this in mind while planning for your target audience, I’ll touch on this a little bit more in the Implementation Hints section.

You Wanted the Best, You Got the Best

Philosophy

We’re always toying around with different ways to find and surface “interesting” photos. One method that people are generally familiar with is the Explore page, of which there’s always a lot of discussion. A good starting place for more Explore information is here with more general discussion in the Secrets Of Explore group.

But behind Explore is an amount of “Secret Sauce” a lot of which focuses on activity around a media object. The Explore page is generally a daily experience, although things often pop in and out during the day.

What we are doing with the Pandas is exposing a more real time experience, letting them each focus on various aspects of that activity, the first two Ling Ling and Hsing Hsing are fairly introspective and have different views of Flickr, but both focus on what they’ve noticed going on recently. The third Wang Wang is more a panda of the world and concentrates solely on geotagging. If you wanted to show photos being geotagged nearly as they happen, Wang Wang is the way to go.

From a numbers point of view, Explore shows 500 photos from the last 7 days … a Panda can get through around 150,000 photos per day.

A couple of caveats; Just because a Panda notices a photo, doesn’t mean that photo will make it into Explore, or even be destined to be considered by the Magic Donkey for Explore. Conversely, because a photo has made it into Explore doesn’t necessarily mean that a Panda will have noticed it. Also, because of the complexity and amount of stuff going on, no-one really knows exactly what Ling Ling and Hsing Hsing will find interesting, but that won’t stop us from tweaking the underlining code now and then.

Also, you may have noticed that this is one of the less serious APIs, it’s both experimental and there to hopefully encourage fun and play. Please respect the Pandas and don’t abuse them, help us protect this endangered species and keep them alive.

Implementation Hints

So, why are we doing all this messing around with “Pandas”?

Well, mainly because we want to give developers another stream of photos with which to play with. For example you could build your own Flickr Zeitgeist Badge or slideshow or tool to just show stuff that’s going on over at Flickr.

We built this …

Flickr Panda!

… (many people wished we hadn’t) … the Rainbow Vomiting Panda of Awesomeness as an experiment (which used Ling Ling fwiw). Since then we’ve been tweaking the backend, and now finally released the API so you can all have a go.

It’s a stream of, on average, more interesting photos then you’d generally get from polling Everyone’s photos. The quality is pretty good, the best thing to do is watch The Panda for a while and figure out if a) you want to build something with a live stream of photos b) you can build something more better than a vomiting panda (which lets face it, it pretty hard to top!).

This is what I learnt while building it, it may help you too …

When you get a packet of photos back from a panda, its around 60 seconds worth of stuff that the Panda found interesting. The packet is sorted by what the Panda found most interesting in that timeframe to least. Because people can often find ladies in bikinis interesting (for some reason) there can sometimes be some activity around ladies in bikinis that the Pandas may pick-up on.

However, Pandas tend not to find ladies in bikinis that interesting (which may have something to do with why they’re in such peril) and when those photos appear they tend to happen in lower half of the packet of photos.

So in the case of the Vomiting Panda above, I just threw away the second half of each packet, rather than go through all of the photos before asking for the next packet. I found this to be a good thing to do anyway, even though there’s always a lot of amazing stuff going on on Flickr the tail end of some packet wasn’t always quite as amazing as I wanted. You may want to do things differently, but that’s just what I found while working with the APIs.

Send us stuff

Anyway, on that note, at some point I’ll try to do a round up of stuff people have built on these APIs. If you want to post what you build to the Flickr Hacks group and/or FlickrMail me I’ll start to build up a list and take it from there.

Photos by ohad* and psd.

Videos in the Flickr API: Part Deux

We hope it’s obvious that long photos are taken extremely seriously here at FlickrHQ. To that end, it’s important that videos be first class citizens in our api. And with today’s launch of video for free users and the HD output, we have some important new api changes to share with you. So if you’re an uploader developer, a mobile application developer, or you make those things that are changing the way that people consume their tv and other media (nudge-nudge Boxee), take a look at these new additions to the api and make our long photos feel right at home in your apps.

With the addition of our HD output, all videos are now transcoded into 2 or 3 MP4s. Since these are in a widely-supported format, we thought it would be interesting to make them available to third-party developers. flickr.photos.getSizes will give you semi-permanent urls to the 700k site output, the mobile-optimized output, the 2mbps HD output (if available), and the original video (if the call is authenticated by the video owner and the owner is a pro member). For example:

<rsp stat="ok">
<sizes canblog="1" canprint="1" candownload="1">
<size label="Square" width="75" height="75" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_s.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/sq/" media="photo"/>
<size label="Thumbnail" width="100" height="56" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_t.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/t/" media="photo"/>
<size label="Small" width="240" height="135" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26_m.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/s/" media="photo"/>
<size label="Medium" width="500" height="281" source="http://farm4.static.flickr.com/3436/3232057393_815b1c5d26.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/m/" media="photo"/>
<size label="Original" width="1280" height="720" source="http://farm4.static.flickr.com/3436/3232057393_204af3bcff_o.jpg" url="http://www.flickr.com/photos/mylesdgrant/3232057393/sizes/o/" media="photo"/>
<size label="Video Player" width="640" height="360" source="http://www.flickr.com/apps/video/stewart.swf?v=1233362721&photo_id=3232057393&photo_secret=815b1c5d26" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Site MP4" width="640" height="360" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/site/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Mobile MP4" width="480" height="360" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/mobile/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="HD MP4" width="1280" height="720" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/hd/815b1c5d26/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
<size label="Video Original" width="1280" height="720" source="http://www.flickr.com/photos/mylesdgrant/3232057393/play/orig/123fakest/" url="http://www.flickr.com/photos/mylesdgrant/3232057393/" media="video"/>
</sizes>
</rsp>

Use this if you’re lazy. If you’re not lazy and want to be one of the cool kids, you can construct the urls to these outputs yourself, assuming you have the nsid or custom url of the owner, the video id, and the secret (or original secret). Requesting the HD output for a video that doesn’t have it (because it’s less than 720 pixels high, or the owner is a free member), will deliver a 404. An example url:

http://www.flickr.com/photos/{user-id|custom-url}/{photo-id}/play/{site|mobile|hd|orig}/{secret|originalsecret}/

If you’re developing uploading tools for free users, flickr.people.getUploadStatus has new attributes to determine whether a user is allowed to upload any more videos or not. If the user is a free user, the output of this method will now look something like this:

<rsp stat="ok">
<user id="88251462@N00" ispro="0">
<username>Dev Myles</username>
<bandwidth max="107373568" used="0" maxbytes="104857600" usedbytes="0" remainingbytes="104857600" maxkb="104857" usedkb="0" remainingkb="104857" unlimited="0"/>
<filesize max="10485760" maxbytes="10485760" maxkb="10240" maxmb="10"/>
<sets created="7" remaining="lots"/>
<videosize maxbytes="157286400" maxkb="153600" maxmb="150"/>
<videos uploaded="0" remaining="2"/>
</user>
</rsp>

Note the new <videos> element in the response, which will let you know how many more videos this user can upload this month.

And as always, if you feel like we’re missing something, please let us know.

YUI Blog: Improving The Flickr Upload Exprience With YUI Uploader

water pipe

Visual analogy of simultaneous file uploading. Also, internet/pipe joke goes here.

As a site which has many nifty JavaScript-driven features, Flickr makes good use of the Yahoo! User Interface library for much of its JavaScript DOM, Event handling and Ajax functionality.

One of the fancier widgets we’ve implemented is a flashy browser-based Web Uploadr which uses the YUI Uploader component (a combination of JavaScript and Flash) which allows for faster batch uploads, progress reporting, a nicer UI and overall improved user experience.

Head over to the YUI Blog and check out how Flickr uses YUI Uploader to provide a faster, shinier upload experience.