It seems that using queuing systems in web apps is the new hottness . While the basic idea itself certainly isn’t new, its application to modern, large, scalable sites seems to be. At the very least, it’s something that deserves talking about — so here’s how Flickr does it, to the tune of 11 million tasks a day.
But first, a use case! Every time you upload a photo to Flickr, we need to tell three different classes of people about it: 1) You, 2) Your contacts, 3) Everyone else. In a basic application, we’d have a table of your photos and a table containing people who call you a contact. A simple join of these tables, every time someone checks the latest photos from their contacts, works fine. If you’re browsing around everyone’s photos or doing a search, you can just check the single photos table.
Obviously not going to fly here.
For scale, Flickr separates these three lookups into three different places. When you upload that photo, we immediately tell you about it, and get out of your way so you can go off and marvel at it. We don’t make you wait while we tell your contacts about it, insert it into the search system, notify third-party partners, etc. Instead, we insert several jobs into our queueing system to do these steps "later". In practice, every single one of these actions subsequently takes place within 15 seconds of your upload while you’re free to do something else.
Currently, we have ten job queues running in parallel. At creation, each job is randomly assigned a queue and inserted — usually at the end of the queue, but it is possible for certain high-priority tasks to jump to the front. The queues each have a master process running on dedicated hardware. The master is responsible for fetching jobs from the queue, marking them as in-process, giving them to local children to do the work, and marking them as complete when the child is finished. Each task has an ID, task type, arguments, status, create/update date, retry count, and priority. Tasks detect their own errors and either halt or put themselves back in the queue for retry later, when hopefully things have gotten better. Tasks also lock against the objects they’re running on (like accounts or groups) to make sure that other tasks don’t stomp on their data. Masters have an in-memory copy of the queue for speed, but it’s backed by mysql for persistent storage, so we never lose a task.
There are plenty of off-the-shelf popular messaging servers out there . We wrote our own, of course. This isn’t just because we think we’re hot shit, there are also practical reasons. The primary one was that we didn’t need yet another piece of complicated architecture to maintain. The other was that for consistency and maintainability, the queue needed to run on the exact same code as the rest of the site (yes, that means our queuing system is written in php). This makes bug fixes and hacking easier, since we only need to do them in one place. So every time we deploy , the masters detect that, tell all their children to cleanly finish up what they’re doing, and restart themselves to take advantage of the latest and greatest.
It’s not just boring-old notifications, privacy changes, deletes, and denormalization calculations for our queue, no. The system also makes it easy for us to write backfills that automatically parallelize, detect errors, retry, and backoff from overloaded servers. As an essential part of the move-data-to-another-cluster pattern and the omg-we’ve-been-writing-the-wrong-value-there bug fix, this is quickly becoming the most common use of our queue.
The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queueing system to do the rest.