Powering Flickr’s Magic view by fusing bulk and real-time compute

Try it for yourself!

You can try out Flickr’s Magic View on your own photos here, and you can download a working code sample of the simplified lambda architecture here: https://github.com/yahoo/simplified-lambda

Introduction

In this post we’re going to talk about how we came up with a novel revision of the Lambda Architecture for fusing large-scale bulk compute with streaming compute to power Flickr’s Magic View. We were able to create a responsive, real time database operating at a scale of tens of billions of records, with tens to hundreds of millions of records updated per day. We turned to Yahoo’s Hadoop stack to find a way to build this at the massive scale we needed.

Magic View

Figure 1. Magic View in action

Motivation: the Magic View

Flickr’s Magic View takes the hassle out of organizing your photos by applying our computer-vision technology to automatically recognize objects or styles in your photos and present them to you in the Camera Roll’s scrolling view. This all happens in real time as soon as a photo is uploaded, it is categorized and placed into the Magic View.

Aggregating computer vision tags

When a photo is uploaded, it is processed by a computer vision pipeline to generate a set of computer vision tags, which are text labels of the contents of the image. We already had an existing architecture for stream computation of tags on upload, but to implement the Magic View, we needed to maintain per-user reverse indexes and some aggregations of the tags. And we needed to make sure all the data was consistent if a photo was added, removed or updated these indexes and aggregations would have to be updated to reflect this. Finally, we needed to initialize the system with tags for 12 billion photos and videos and run periodic backfills (every time we improved our computer vision algorithms and to cover cases where the stream compute missed images).

The Problem

We initially computed a snapshot of the Magic View indexes and aggregations using map-reduce (via Apache Oozie and Apache Pig), and we were happy with the quick turnaround time (about 7 hours). We considered updating Magic View as a daily batch job, but soon realized this would not give our users the responsive, “live” experience we wanted. So, we built a streaming data layer using Apache Storm and were soon able to update the categories in Magic View in real-time.

The next time we needed to run a backfill, we explored using this streaming layer to load the data. Unfortunately, the overhead of the read-modify-write process was simply too much for a load of this size — after kicking off the process we estimated it would take 28 days this way — much longer than the seven hours we had achieved with a bulk load.

Twenty-eight days was a non-starter – we realized we needed a way to update our bulk aggregations independently of the real-time data streaming in. Solving this problem is how we arrived at our revision to Lambda Architecture. Before digging into the solution, let’s do a quick review of the Lambda Architecture.  If you’re already familiar with it, you can skip this next section.

The Lambda Architecture

We’ll start with Nathan Marz’s book ‘Big Data’, which proposes the database concept of  ‘Lambda Architecture.’ In his analysis, he states that a database query can be represented as a function – Query – which operates on all the data:

result = Query(all data)

In the Lambda architecture, a traditional database is replaced with both a real time and a bulk database. Then query function becomes a “combiner” function of independent queries to each database:

result = Combiner(Query(real time data) + Query(bulk data))

An example of a typical Lambda Architecture is shown in figure 2. It is powered by an append-only queue for its system of record, which is fed by a real time stream of events. Periodically, all the data in the queue is fed into a bulk computation which pre-processes the data to optimize it for queries, and stores these aggregations in a bulk compute database. The real time event stream drives a stream computer, which processes the incoming events into real time aggregations. A query then goes via a query combiner, which queries both the bulk and real time databases, computes the combination, and stores the result.

Typical Lambda Architecture

Figure 2. Typical Lambda Architecture

While relatively new, Lambda Architecture has enjoyed popularity and a number of concrete implementations have been built. Some significant examples are the distributed analytics platform druid, Twitter’s Summingbird, and FiloDB. These implementations conveniently abstract away the databases behind the query combiner.

A significant advantage with this style of architecture is robustness and fault-tolerance via eventual consistency. If a piece of data is skipped in the real time compute there is a guarantee that it will eventually appear in the bulk compute database.

Criticism of the Lambda Architecture has centred around the complicated nature of the combiner. The combiner incurs a developer and systems cost from the need to maintain two different databases. It can be challenging to make sure both systems give the same result. Merging the two queries can become complicated, and finally, more points of failure may be introduced.

The “Ah-ha” Moment

Back to the problem. The data access layer we used for streaming compute uses the atomic read-modify-write pattern to ensure we write consistent data, one record-at-a-time to Apache HBase (a BigTable-style, non-relational database). Again, since this pattern was so much slower in the backfill case we needed to figure out how to get both consistent updates for streaming and  fast loads of the full dataset. Since our bulk data was static, we realized that if we relaxed the consistency constraint we could just run a fast, streaming, write-only load of the bulk data, bringing the load time back down to hours instead of days.

But how could we get around the consistency requirements? We didn’t want a bulk load to clobber data being written from the real time compute process. The insight was that we could just write bulk and streaming data to different column families in the same HBase row. So we added the concept of real time columns and bulk columns in a single row. Basically, bulk loads write to one set of columns and real time writes go to a different set of columns. Since HBase columns are sparse and data is updated relatively slowly we don’t pay much in storage or IO.

We could  now simplify the equation back to:

result = Combiner(Query(data))

The two sets of columns are managed separately by the real time and bulk subsystems. At query time, we perform a single fetch using the HBase API to get both the bulk and real time data. A separate combiner process assembles the final result.

Implementation

 Magic View backend system overview

Figure 3. Magic View Architecture

Figure 3 shows an overview of the system and our enhanced Lambda architecture. For the purposes of this discussion, a convenient abstraction is to consider that each row in the HBase table represents the current state of a given photo. The combiner stage is abstracted into a single Java process, which collects data from HBase and runs transformations on the data and sends it to a Redis cache which is used by the serving layer for the site.

Consistency on read in HBase — the combiner

We have two sets of columns to go with each row in HBase: bulk and real time. The combiner determines the final value for each attribute at read. In the case where data exists for real time but not for bulk (or vice versa) then there is only one value to choose. In the case where they both exist we always choose the real time value. This keeps the combiner very simple and fast.

There is a trick though – whenever we do a backfill, we may need to repair the row since the backfill data may be newer than any real time data that is already present. It turns out this slows down the backfill from seven hours to about 14 — still far faster than loading with read-modify-write.

Production throughput

At scale, this architecture has been able to keep up very comfortably with production load. We can simultaneously run backfills to HBase and serve user information at the same time without impacting latency or the user experience.

User experience

An important measure for how the system works is how the viewer perceives it. The slowest part of the system is paging data from HBase into the serving cache; median time for above-the-fold latency – i.e. enough data is available to render the page – is around 10ms.

Future directions

Our experience has been very positive so far with Magic View and we’re looking at how we might enable users to browse their photos in other dimensions (location or color for example). Early tests have shown that building an OLAP or data cube in this architecture is certainly possible but it’s less clear that it will scale well.

Contributors: Peter Welch, Bhautik Joshi, Hugo Haas, Srinivasan Singanallur, Ayan Ray, Pierre Garrigues, Ben Firestone, Sai Madhavan, Tim Miller

Thanks to Nathan Marz for reviewing this post.

Flickr September 2014

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring mobile, back-end and front-end engineers, in our San Francisco office. Find out more at flickr.com/jobs.

The Data Freshener

Hello Kitty car air freshener
So fresh

 

Change

You may have noticed some changes in Flickr a couple months back. Like, half the site changed. 95% even, by some metrics. Some say CHANGE IT BACK! while others welcome change. Whatever your thoughts, the changes are here, and they mean things. For example, they mean new visual design and better usability. They mean a faster site. Unfortunately, up until recently, they also meant more stale data. Yuck.

Change
Change
 

Why? What? Well…here’s the deal. We have a new-ish frontend stack we’ve been using for the past couple years now. It’s an isomorphic single-page application, runs on node.js, and is generally awesome. We call it Reboot.

hi there / i am the computer
Reboot
 

In the World of Reboot, we treat data with kid gloves. We <3 data. We never want to give it up, never want to let it down. Once we pull data from our APIs, we store the fetched data in your browser so that we don’t have to fetch it again the next time it’s needed. This means faster page loads and faster navigation, and less API traffic (and thus a more stable and scalable API). The data cached in your browser exists as long as the current Reboot session — until you refresh or leave Reboot for a non-Rebooted page.

However, this also meant that data could become stale. You change the date taken of your photo, someone else adds a comment, you navigate to a page with cached data…and you don’t see the changes. Wat? Yeah. So, this was not a huge problem until we moved lots of pages onto Reboot in the beginning of May. From that point forward, most Flickr user sessions have spent their entirety on Reboot, feeding off the same stale loaves of cached data.

bread
Staleness
 

The thinking (design / prototypes)

We considered a number of possibilities for freshening up data during a user session. A brief history of the strategies we sampled, and their results:

1. Refresh on update

Ice Tea
 

The first stab focused on updating data locally after it was changed by the user. Most of our simpler use cases already updated as expected, but some trickier cases with indirect relationships did not. For example, changing the date taken of a photo updated the data model for the photo, but deleting a photo did not necessarily ensure the photo was removed from all the cached albums, groups, and galleries to which it belonged. (Note that the photo was removed correctly from the backend, just not from the cached representation of those entities on the client.)

Cleaning up these relationships using change events between models helped, but didn’t solve all our problems. When someone outside of the local session (read: another user) changed data, it would not reflect in the current session. The only way to catch changes from outside the current session was to be more aggressive about evicting models.

2. Nuclear option

Atomic Bomb Test
 

The pendulum swung all the way in the other direction — instead of surgical removal of data models we knew to be out-of-date, what would happen if we removed all cached data on every navigation? This prototype was quick to build, and incredibly destructive. By doing this, all our cached data always remained as fresh as could be, but we essentially reverted to Web 1.0 — with the exception of the Reboot framework, everything was reloaded on every page.

Not surprisingly, this blew up API traffic (locally only! did not unleash that disaster at scale), and inflated page load times like a Jeff Koons sculpture. It did give us some baseline timing metrics we could point to as worst-case scenarios, however. The next step was to swing the pendulum back toward the middle — to a carefully-knitted solution that would preserve fast page loads and navigation, while ensuring the freshest data we could serve up.

3. Refetch on navigate

fetched
 

At this point, our challenge was to find a solution that would keep navigation fast, API traffic slim, and pick up all changes to session data, whether local or remote. We ended up with a solution we call “refetching”: evicting and requesting new data models as the model is needed by the application. But when?
We could refetch periodically or on a user action; we determined that the best time to trigger a refetch was on navigation — when the user navigates, cached models become eligible for refetching. Specifically, when the user navigates between sections of the site, refetching is triggered. This proved to be the happiest medium between speed and freshness.

A high-level outline of how the refetching strategy works:

  • The user loads a page; data are requested from the API, and models are cached. As new models are created, they’re marked as being fresh.
  • The user navigates to another site section (e.g. Photostream → Search); all freshness marks are removed from all models. They’re now all eligible for refetching.
  • As Reboot builds the new page, it requests data models from the cache. Since they no longer have their seal of freshness, they are refetched, and marked as fresh once retrieved and cached.

One important note — refetching is not triggered on browser back/forward navigation. Users expect near-immediate navigation, thanks to browser caching, when navigating to already-viewed content. Therefore, we refetch only when the user clicks a link to navigate to a new site section.

4. Miscellany

There were a couple other options we considered and rejected from the start, but they’re worth mentioning here.

One was a TTL (time-to-live) algorithm, commonly used in caching applications. TTL algorithms expire data and evict from the cache a certain amount of time after they’re written or last updated. The arbitrary nature of TTL would mean that users would sometimes have fresh data and sometimes stale; it would be fresh more often than without any solution, but freshness would vary arbitrarily and would not result in much of an improvement on user experience.

The other was to write an algorithm that tracks the amount of time since a data model was last accessed, and refetch when it grows too old. While this sounded interesting at first, it has the same flaw as a standard TTL algorithm — freshness becomes arbitrary. It’s also more complex to implement, and might end up not being worth the complexity.

The doing (implementation)

So that was it! Refetch on navigate, all done. Right?….of course not. With the general strategy in place, the devil started sneaking around in all the details. Some of the highlights:

Exemptions

It proved to be not the best idea to evict on all navigation. For example, in Reboot we often preload photo metadata models on pages with lists of photos, in order to make navigation into the photo page snappy. The refetch setup therefore has an exemption config that allows us to easily retain models when navigating into, away from, or between specific site sections.

Child models

We often have parent-child associations between data models. For example, the data model for a photo has a reference to a data model for the author of the photo. When the photo model is refetched, the person model must be refetched as well. This means the function doing the eviction and refetching has to recurse through all child models.

Collections

An issue similar to child models above, but more complex, is the case of a model containing a list of other models. For example, the data model for a person’s photostream contains a list of photo models.

What made this particularly tricky is pagination and filtering — say you load the first 2 pages of your photostream, set your view filter to private, jump to page 5, switch the view to “Date taken”, and navigate away and back to your photostream…imagine the mess of different models with partially-loaded collections. Evicting one parent model, and its children, might evict photo models from the collection within another, without properly refetching. The solution here actually lay in the controller responsible for fetching pages: if a requested page of models is not already completely in-cache, a refetch will always happen to ensure we have all the data, in its freshest state.

Refetch only once per page view

Critical to the refetch-on-navigation strategy is to refetch only once per navigation. This was not too difficult, but essential to get right. We accomplish this by adding a flag when a model is initially fetched and upserted into the cache. When navigating to a new, non-exempt site section, all those flags are cleared, and any model requested by the new page will be refetched. When refetched, the model is again upserted into the cache and marked as fresh, until the next navigation.

But did it fresh?

Go on without me
 

With the thinking and the doing out of the way, it was time to push all this to production. Because these changes are essentially pulling the rug out from underneath the data layer on every navigation, we had to tread very carefully in order to prevent any negative impact to the end user experience.

We did very thorough manual and automated testing across all of Reboot. We left the feature turned on for staff users for a while, to be able to respond to any bug reports. Finally, the time came to test on Real People. There were three things we needed to keep an eye on: errors (of course), impact on page navigation timing, and API traffic. Since refetching implies more requests for data, we needed to be sure that we were keeping the user experience smooth and fast, and also that we weren’t blowing up our data centers.

 
All In
All in
 

In order to get a good read on these things, though, we had to go all in. Letting in just a small percentage of users would not give reliable numbers for timing or traffic impacts, due to the noise inherent in relatively small sample sizes. So, we did something unusual: we turned on refetching for all users for a short period of time. We flipped on refetching and kept an eagle eye on our stats for 2 hours, then reverted; then, we took a careful look at the aggregated data to see how the experiment went.

Surprisingly, the impact on both timing and traffic was relatively low. After some thought, we decided this is most likely because the changes disproportionately impact people on long sessions, say a Flickr tab open for hours or days. Most people don’t hang around that long; they come, they go. Also, the photo page represents north of 90% of our page views, and is exempt from refetching (see Exemptions above).

So where did we end up? A negligible bump in navigation timing and API traffic, and fresher data for all. Perhaps an anticlimactic resolution, but the story we’ve heard today outlines a serious consideration for anyone building an application with a data caching layer: keep in mind from the beginning how you plan to deal with stale data, but in a way that keeps all the other benefits of a single-page application.

#CCC is a breadcat
Busting through staleness. Yep.

Optimizing Caching: Twemproxy and Memcached at Flickr

Introduction

Flickr places a high priority on our users’ experience, and a critical part of that experience is the speed of the interface.  Regardless of the client you use to access Flickr, caching the proper data and the speed at which our servers can access cached data is critical to delivering on that quality user experience.  The more effective our caching strategy is, the better the Flickr experience will be for our users.  This is true for all the layers of caching we deploy at Flickr from the photo caches to the process data caches buried deep in the system.  In a previous post, we looked at how regional photo caching improved photo serving time in other countries, in this post we’re going to dive down into the innards of Flickr’s software stack and take a look at how we improved Memcached performance for our backend systems.

Back in the olden days (pre-2014), we accessed our Memcached systems through a mix of direct reads from our web servers and writes through a Flickr-developed proxy.  Our proprietary proxy system, Cerberus, handled a whole host of responsibilities. In addition to Memcached set operations, Cerberus managed database updates, the bulk of our Redis accesses, cache consistencyCerberus Based Memcached Architecture (which is why we directed our writes through Cerberus), and a few other miscellaneous transactions.   As Flickr’s traffic and functionality had grown, Memcached set operation performance wasn’t keeping up, so we needed to consider how to address the gap.

Since the development of Cerberus, the software landscape had drastically changed.  When we developed Cerberus, there was no comparable software available, but now several open source projects exist that provide similar proxy services.  On top of the availability of open source tools, Flickr’s traffic and usage patterns have changed over the course of a decade, changing the requirements we had for a proxy system.  Needless to say we had a lot of questions to ask ourselves before we dived into revising the caching architecture.

After years of operation, we had a good picture into the strengths and weaknesses of our current architecture, so when we started thinking about revising it, a few lessons from the past years really stood out.  One thing that we learned over the years was that Cerberus’ lack of a single purpose made it difficult to troubleshoot operational issues with Memcached.  Due to the lack of isolation, a downstream issue with a user database, could impact Memcached access time.  Whatever came next had to isolate cache requests from other data accesses.

Experience had shown us that Memcached get operation latency was a key performance metric, and we learned through trial and error that placing our current Cerberus proxy between the web servers and the Memcached hosts added more network latency than we were willing to tolerate.  Unfortunately, our options for connection pooling and more efficient use of connections were limited in PHP, so we had little recourse than to suffer with a high connection load and fluctuating connections against our Memcached servers.  The next generation system would have to carefully monitor get operation timings and ensure we didn’t introduce more latency into the process.

So as 2014 rolled around, we started to look into an alternative to Cerberus for accessing the Memcached systems.  Should we build a Cerberus 2.0?  Should we look at an open source alternative?  As we weighed our options, one alternative that stood out because it was quite successful in other parts of Yahoo and throughout the industry was Twitter’s Twemproxy.

Introducing Twemproxy

Twemproxy is a lightweight daemon that proxies requests to a pool of Memcached instances.  It provides the following features that we believed would improve our caching infrastructure:

  • Consistent hashing:  Figuring out what hosts in the ring on which to store and retrieve cache data.  While we had implemented consistent hashing before Twemproxy, it was previously left to the client libraries to sort out.
  • Connection pooling:  Reuse of network connections to the Memcached instances, cutting down significantly on the connection load to the Memcached daemons.
  • Command Pipelining:  Accumulating requests destined for the same host and sending as a combined payload.  This feature further reduces connection load and network overhead to the cache processes.

Resulting Architecture

We implemented a solution where each web server host had two local Twemproxies, forwarding to the Memcached rings.

Twemproxy Based Memcached Architecture

In the resulting architecture, all Memcached operations go through twemproxy.  The change accomplished many goals, including:

  • Providing a dedicated system for Memcached requests that was isolated from other systems
  • Reducing the connection load on our Memcached servers through Twemproxy’s connection pooling.  We experienced a 75% overall reduction of TCP connections to Memcached nodes
  • Improved overall caching latency.  This was a benefit that we didn’t necessarily expect.  With Twemproxy, we found that get operations had a 5% reduction in mean processing time and set operations had a 40% in mean processing time

The Road to Twemproxy

As nice as it would be to say we dropped in Twemproxy, declared victory and went for ice cream, we still had to solve a few interesting challenges along the way: maintaining availability, dealing with disparate consistent hashing schemes, and re-implementing cache coherency.

If One is Good, Two is Even Better

From the start, we recognized that the simple daemon model of Twemproxy would need to be managed carefully.  Each deploy through our continuous deployment system could result in changes to the Memcached hosts configuration.  And unfortunately, configuration changes for Twemproxy require the process to be restarted to take effect.  We measured the time to restart Twemproxy in the 1-2 second range, but even for these necessary restarts, it was too much of an interruption for the clients.

Our solution was to run two instances of the daemon on every host that needed the service and manage a careful synchronization between the restarts.  This restart “dance” was wired into the process that deploys the configuration changes to all the Memcached clients.  A couple of patches have been proposed to allow configuring Twemproxy without restarting it, but none of them have yet made the master branch.

When is Ketama not Ketama?

Ketama is a popular consistent hashing algorithm used by many systems to determine where to place a particular key in a multi-node caching system.  Out of the box, Twemproxy uses an implementation of the Ketama algorithm that is compatible with libketama, the C library which is the most commonly used implementation of the Ketama algorithm.

Our initial implementation of consistent hashing was done using Ketama, but with the Spymemcached Java library.  It turns out that Spymemcached has a slight variation in the implementation that makes it incompatible with Twemproxy.

Our transition from our current system to Twemproxy had to happen live, and a sudden change in the cache algorithm would have a painful (and unacceptable) impact on our database systems.  How could we get across this bridge?  Ultimately, we had to patch Twemproxy’s implementation of Ketama to match Spymemcached to maintain a consistent implementation of the Ketama algorithm.

Redis latency in propagating cache clears

Until we figure out how to change the speed of light, the only way we are going to make Flickr fast across the world, is through multiple data centers conveniently located near our users.  While this is way easier than changing the speed of light, it’s not without its complications.

What do they compute at Night ?

Caches between the data centers have to be kept consistent.  Some caches, like photo caches, deal with immutable data and are easy to keep in synch, others like Memcached systems have read-write data which is harder.  Our approach to handling cache consistency in our Memcached systems was to invalidate stale keys in other colo facilities whenever a process updated a value.  As we mentioned previously, Memcache write operations were directly through our Cerberus proxy specifically so Cerberus could dispatch a cache invalidation event to other colo facilities.  The migration to Twemproxy would not be complete, until we implemented a new solution for cache invalidation.

In our Twemproxy-based architecture, we decided to take the responsibility for cache invalidation our of the hands of the data proxy and push it into the client libraries we used to access Memcached.  Whenever a client updates a Memcache key, it enqueues a corresponding cache invalidation event into a distributed Redis queue.  We then deployed simple, single-purpose Java daemons to process the cache invalidation events from the Redis queue and delete the corresponding keys in their local Memcached systems.  A diagram of the system, appears below: Cache Clear - Blog Post

The wrinkle with this approach was that the enqueuing of clear keys would occasionally take 20 times longer than the normal mean time, pushing cache sets up to 40ms. After much digging, we found that the spikes were happening when the clearing daemons dequeued a batch of keys.  The dequeuing daemons were performing operations across a WAN. Due to the single-threaded nature of Redis, it would periodically block the queue for adding keys for 10s of milliseconds.  Once we figured that out, the fix was a matter of keeping separate in- and out-queues, and moving the keys from in to out with a local daemon, which significantly reduced the blocked time for writing keys.

Conclusion

Caching is crucial to a high-traffic site like Flickr, and we have taken a big stride in making our Memcached utilization more effective.  Using Twemproxy, we were able to clean up an internal system, reduce the connection load on our caching daemons, and even make modest improvements to caching latency for all clients.  Although we faced some technical challenges in implementing twemproxy for Memcached, particularly in propagating cache clear events, it was ultimately well worth the engineering investment.  After several months, our implementation of Twemproxy has proven to make a positive contribution to caching speed and ultimately the experience of a responsive site for our users.

If you dream in low latency and love to rip that extra 10 microseconds of overhead out of an operation, we’d love to have you! Stop by our Jobs page and tell us how awesome you are.

Real-time Resizing of Flickr Images Using GPUs

At Flickr we work with a huge number of photos. Our users upload over 27 million photos a day, and our total collection has over 12 billion photos. This is fantastic! As usage grows, we are always looking for ways to use our storage more efficiently. Recently our storage team wrote about some new commodity storage technology now in use at Flickr which increases efficiency. But we also looked into how much data we store for each photo. In the past we stored many sizes of every photo to make serving fast. We wanted to challenge that model and find the minimal set of data to store.

Thumbnail Footprint Reduction

One of our biggest opportunities for byte per photo improvement is through reduction in the footprint of Flickr’s “thumbnails”. Thumbnail is a bit of a misnomer at Flickr; our thumbnails are as large as 2048 pixels on their longest side, so at Flickr we usually refer to these as resizes.  We create these resizes in order to provide a consistent,  fast experience for our users over a variety of use cases.



Different sizes used in different contexts. From left to right: Cameraroll uses small thumbnails, to enable fast navigation through many sizes. Our Photo Page uses our largest, most detailed sizes. Search uses sizes in between these two extremes. Red panda photos by Mathias Appel.

The selection of sizes has grown semi-organically over the years, and all told, we serve eleven different resizes per photo which, in sum, use nearly as much storage as the original photo. Almost 90% of this storage is held in the handful of resizes 640px and larger, so we targeted our efforts at eliminating some of these sizes.



Left: Distribution of byte size by resize dimension. Storage is concentrated in images with largest dimensions. Right: size distribution after largest sizes eliminated.

A Few Approaches

A simple approach to this problem would be just to cease offering some of the larger sizes.    For instance, we could drop the 1600px image from our API and require the design to adjust.   However, this requires compromises that we didn’t want to take on. Instead we took on a pretty ambitious goal: maintain our largest resize, usually 2048px wide, as a source image and create any other moderate or large-sized resizes on-the-fly from this source, without sacrificing image quality or significantly affecting performance. Using the original uploaded photo as a resize source image was impractical, as these can be very large and exist in a variety of formats.

Sounds easy, right? We already resize images when users upload, so why not just use that same technology on serving. Well, almost. The problem with the naive approach is that high-quality resizing of JPEGs is a lot slower than is widely known. A tool we use frequently, GraphicsMagick, produces beautiful images but takes over 225ms to resize a 2048px JPEG down to 1600px, depending on quality settings. This is slow enough that this method would impact user experience, and would require many CPUs to handle our load. Ymagine,  a high-performance CPU-based tool we’ve open sourced,  is twice as fast as GraphicsMagick(!). We use Ymagine extensively on smaller images, but for the large sizes we’re targeting we needed even more performance. A GPU-based solution ultimately filled our needs.

Our GPU-based Solution

We created a tier of dedicated resize servers, each with an GPU co-processor. Each of these boards has two GPUs, each with 1500+ “cores”, running at just under 1GHz. These cores aren’t anywhere near as performant as a CPU core, but there are many of them. We tested a range of server-grade boards to find the best performing type for our workload. Many manufacturers offer consumer-grade boards with incredible specifications and lower price points, but these lack server-grade cooling and other features such as ECC RAM. One member of our team had experience using these lower grade boards in a previous application and recommended against it.



Resize system architecture

On these resize servers we run a fairly vanilla Apache with a plugin written in C++.  This server responds to resize requests, reads our source image from disk into shared memory,  and hands off requests off to persistent resize daemons that do all communication with our  GPUs.  A daemon-type approach is necessary due to a somewhat lengthy initialization process with our GPUs.

Our resize daemons transfer JPEGs from shared memory to GPU device memory. Once here,  the real image processing takes place. The JPEGs are decoded, cropped, sharpened, resized, re-sharpened as needed, re-encoded as JPEGs, and finally transferred back to shared memory.    From shared memory, our Apache module returns the resized JPEG to the caller.



A simple resize pipeline. Post-sharpening overcomes fuzziness introduced when downscaling.

There are several accepted resize algorithms, but to retain the Flickr “look”, we implemented the same Lanczos resize and kernel sharpening algorithms that we’ve used for years in CUDA.     This had the added benefit of being able to directly compare images generated through GraphicsMagick and our GPU-based code.

Performance

With significant optimization, this code is able to resize our 2048px JPEGs to 1600px in under 16ms. This is more than 15x faster than GraphicsMagick and nearly 10x faster than Ymagine.  Resizes from 2048px to 640px take under 10ms. Equally noteworthy,  at peak load,  each resize server can perform over 300 resizes per second.



Performance of different resize approaches.

Although these timings are quite fast,  the source image for our resizes  is larger, byte-wise, than the images it is resizing to,  requiring additional I/O. For example,  a typical 2048px source JPEG is roughly 600kB and our typical 1024px JPEGs are just under 200kB. This difference in size leads to roughly 35ms additional I/O time per resize.

Taking it slow

As our GPU code is new and images are our most important product, this change carries some risk. We’ve addressed this with extensive testing, progressive rollout and provisions for rollback. We also used some insights into our user behavior to roll this solution out in a very controlled manner.

Conclusion

This system is currently in production and as we roll it out more fully, has the potential to cut the resize footprint of the majority of our photos by 50%, with negligible impact on performance and image appearance. We also have the ability to apply this same footprint reduction technique to images uploaded in the past, which has the potential to reduce our storage growth to zero for a significant period of time.

Credits

This project would not have been possible with hard work of Peter Norby, Tague Griffith, John Ko and many others.

Much Photos!

Introducing the New! Shiny! Photolist framework

photolist-sky
Blue skies. Mostly.

Here at Flickr, we have photos. Lots of photos. Like, billions and billions of photos. So, it’s pretty important for us to be able to show you more than one at once.

We have used what we call the “justified algorithm” to lay out photos for a while now, but as we move more and more pages onto our new-ish isomorphic node.js stack, we determined it was time to revisit the algorithm and create an updated implementation.

A few of us here in Frontend-landia got together to figure out all the things this new shiny should be able to do. With a lot of projects in full swing and on the near horizon, we came up with a pretty significant list, including but not limited to:
– Easy for developers to use
– Fit into any kind of container
– Support pagination (in both directions!) and infinite scroll
– Jank-free, butter-silky-baby-smooth scrolling
– Support layouts other than justified, like square thumbnails and grid layout with native aspect ratio

After some brainstorming, drawing of diagrams, and gummi bear consumption, we got to work building out the framework and the underlying algorithm.

Rejustification

photolist-sky
Drawing of diagrams

The basics of the justified algorithm aren’t too complex. The goal is for the layout module to accept a list of photo aspect ratios, and return a list of rectangles. A layout consists of a number of rows of items (photos), each with a target height and allowable height deviation above and below. This, along with the container width, gives us a minimum and maximum row aspect ratio.

Photolist: variable row height
Fig. 1: The justified algorithm: dimensions

We push each photo into a row; once the row is filled up, we move on to the next. It goes a little something like this:

  1. Iterate over each photo in the list to display
  2. Create a new row if there’s not currently an open row
  3. Attempt to add the photo to the current row at its native aspect ratio and at the target row height
  4. If the new row aspect ratio is less than the minimum row aspect ratio, continue adding photos until the aspect ratio is greater than the maximum aspect ratio
  5. Either keep or drop the last added photo, depending on which generates a row aspect ratio closer to the target row aspect ratio; adjust the row height as needed, and seal the row
  6. Repeat until all the photos have been laid out.

Photolist: row filling
Fig. 2: The justified algorithm: row filling

It’s Never That Easy…

The justified algorithm described above is the primary responsibility of the layout module. In practice, however, there are a number of other things the layout must handle to get good results for all use cases, and to communicate the results to other parts of the framework.

Diffs

One key feature of the layout module is how it organizes its results. To minimize the amount of processing required to update photos as the layout changes, the layout module returns pre-sorted diffs, each with a specific purpose:

  • new items, used to create new photos and put them in place
  • layout-changed items, used to resize/reposition existing photos
  • visibility-changed items, used to wake/sleep existing photos
  • widows and orphans (leading and trailing items) (read on!)

The container view can process only the parts of the layout response that are necessary, given the current state of the whole framework, to keep processing time down and keep performance up.

Widows and orphans


Annie the Musical,
Annie the Musical, by Eva Rinaldi

Some photolist pages on Flickr use infinite scrolling, and some display results one page at a time. Regardless of how a page shows its photos, it starts to feel messy when there is an incomplete row of photos hanging off the end of the page. If there is more content in the set, the last row should be full. However, since we fetch photos from the API in fixed batch sizes, things don’t always work out so nicely, leaving “leftovers” in the bottom row. Borrowing from typesetting terminology, we call these leftover photos orphans. (We can also paginate backwards; leftovers at the top are technically widows but we’ll just keep using the term orphans for simplicity.)

The layout notes these incomplete rows and hides them from the rest of the framework until the next page of content loads in. (This led to frequent and questionable metaphors about “orphan suppression” and “orphan rehydration.”) When orphans are to be hidden, the layout simply keeps them out of the diff. When the orphans are brought back in as the next page loads, the layout prepends them to the next diff. The container view is none the wiser.

This logic gets even more fun when you consider that it must perform in all of these use cases:

  • fixed page size (book-style) pagination
  • downward-scrolling infinite pagination
  • upward-scrolling infinite pagination (enter into an “infinite” content set somewhere other than the beginning); this requires right-to-left layout!

There’s also the case of the end of an “infinite” content set (scrolling down to the end or up to the beginning); in these cases, we still want the row to appear complete, and still must maintain the native aspect ratio of the photo. Therefore, we allow the row height to grow as much as it needs in this case only.

Bonus Round!

You might have noticed that Flickr is kind of a big site with, like, lots of photos. And we display photos in lots of different ways, with lots of different use cases. The photolist framework bends over backwards to support all of those, including:

  • forward and backward pagination
  • infinite scroll and fixed page-size pagination
  • specific aspect ratios (e.g. squares)
  • fixed number of rows
  • fast relayout (only a few milliseconds for thousands of photos)

Going into detail on each of those features is way beyond the scope of this blog post, but suffice it to say the framework is built to handle just about anything Flickr can throw at it. The one exception is the upcoming Camera Roll (coming soon to those of you who don’t yet have it!), which is Too Extreme for this framework, so we devised something special just for that page.

The whole enchilada


Mmm... enchiladas,
Mmm… enchiladas, by jeffreyww

The layout is at the heart of the photolist framework, but wait — there’s more! The main components of the framework are the layout (dissected above), the container view / controller, and the subviews (usually containing photos).

Photolist: the whole enchilada
Fig. 3: Relationship of view/controller, layout, and subviews, and changing subview states during downward scroll

The container view does a lot of fancy things, like:

  • loading in photos as you scroll down or paginate
  • triggering a relayout when the container size changes (i.e. when you resize your window)
  • matching up server-rendered HTML with clientside JavaScript objects (see isomorphic JavaScript, and an upcoming blog post about the Hermes stack at Flickr).

Its primary job, though, is to act as the conduit between the layout module and the individual subviews.

Every time a layout is processed or changes, it returns a “layout response” to the container view. The layout response contains a list of rectangles and wake/sleep flags (actually, a list of lists; see Diffs above); the container view relays that new information on to each individual subview to determine position and visibility. The container view doesn’t even need to know about the layout details — each subview adjusts itself to its layout data all on its own.

The subviews each have a decent amount of intelligence of their own, performing such tasks as:

  • choosing the most appropriate photo file size to fit the layout rectangle
  • adding/removing itself to/from the DOM as instructed by the layout to maintain good scroll performance
  • providing an annotation and interaction layer for titles, faves, comments, etc.

Coming soon to a webpage near you

The new photolist framework is certainly not a one-size-fits-all solution; it’s tailored for Flickr’s specific use cases. However, we tried to design and build it to be as broadly useful for Flickr as possible; as we continue to move parts of the site onto the new frontend stack and innovate new features, it’s critical to have solid components upon which we can build Flickr’s future. The layout algorithm is probably useful for many applications though, and we hope you gained some insight into how you might implement your own.

The photolist framework is already live in a number of places on the site, including the new Unified Search pages (currently in Beta), the Create / Wall art pages, the Group pool preview, and is coming soon to a number of other pages.

As always, if you’re interested in helping with that “and more” part, we’d love to have you! Stop by our jobs page and drop us a line.

33 Browser Stats You Just Might Believe

We care an awful lot about the kinds of browsers and computers visiting Flickr. As people update to the latest versions of their browsers, the capabilities we can build against improve, which lets us build cool new things. At the same time, if lots of people continue using older browsers then we have to do extra work to gracefully support them.

These days we not only have incredibly capable browsers, but thanks to the transparent and rapid update process of Chrome, Firefox, and soon Internet Explorer (hooray!), we can rely on new features rapidly showing up en masse. This is crazy great, but it doesn’t mean that we can stop paying attention to our usage statistics. In fact, as people spend more time on their phones, there’s as much of a need for a watchful eye as ever.

We’ve never really shared our internal numbers, but we thought it would be interesting to take a look at the browsers Flickr visitors used in 2014. We use these numbers constantly to inform our project planning. Since limitations in older browsers take time to support we have to be judicious in picking which battles to fight. As you’ll see below, these numbers can be quite dynamic with a popular browser dropping to nearly 0% market-share in just a year. Let’s dive in and see some specifics.

Fort Vancouver
Fort Vancouver by Kate Dickerson

Top level OSes and browsers

At the highest level we learn a lot by looking at our OS family data. Probably the most notable thing here is how much of our traffic is coming from mobile devices. Moreover, the rate of growth is eye-popping. And this is just our website – this data doesn’t include our iOS or Android clients at all. A quarter of our traffic is from mobile devices.

OSes in use on Flickr.com
2013 Q4 2014 Q4 Y/Y
Windows 56.55% 50.61% -5.94
Macintosh 21.49% 21.42% -0.07
iOS 11.09% 17.61% 6.52
Android 5.39% 7.82% 2.43
Other 5.48% 2.54% -2.94

Let’s slice things slightly differently and look at browser families. We greatly differ from internet-wide traffic in that IE isn’t the outright majority browser. In fact, it clocks in at only the #4 position. More than half of Flickr visitors use a Webkit/Webkit-heritage browser (Safari and Chrome, respectively). Chrome rapidly climbed into its leadership position over the last few years and it’s stabilized there. Safari is hugely buoyed by iOS’s incredible growth numbers, while IE has been punished by Windows’s Flickr market-share decline.

Browsers in use on Flickr.com
2013 Q4 2014 Q4 Y/Y
Chrome 35.71% 35.42% -0.29
Safari 24.11% 27.50% 3.39
Firefox 17.94% 18.29% 0.35
Internet Explorer 13.98% 10.31% -3.67
Other 8.26% 8.48% 0.22

Fine-grained details

We can go a step further and see many details in the individual versions of OSes and browsers out there. It’s one thing to say “Windows is down 6% over the year” but another to say “the growth rate for the latest version of Windows is 350% year over year.” When we look at the individual versions we can infer quite a bit of detail around update rates and changes in the landscape.

OS version details

A few highlights:

  • Windows 7 is on the decline, XP and Vista fell by roughly 50% each, and Windows 8 and 8.1 are surging ahead.
  • iOS 8.1 and Android 5.0 don’t appear in the list due to their late appearance in Q4. Our current monthly numbers have iOS 8.1 far outpacing every other iOS version.
  • OS X 10.10 has accelerated Mac user upgrades; since its launch 10.9 has shed over a percent per month, and the legacy versions have sharply accelerated their decline.
OS versions in use on Flickr.com
2013 Q4 2014 Q4 Y/Y
Windows NT 3.39% 0% -3.39
Windows XP 10.12% 4.49% -5.63
Windows Vista 3.56% 2.41% -1.15
Windows 7 36.29% 33.14% -3.15
Windows 8 2.01% 2.31% 0.30
Windows 8.1 1.06% 8.22% 7.16
Macintosh OS X 10.5* 0.65% 0.65
Macintosh OS X 10.6* 2.90% 2.90
Macintosh OS X 10.7* 1.91% 1.91
Macintosh OS X 10.8* 1.83% 1.83
Macintosh OS X 10.9* 8.26% 8.26
Macintosh OS X 10.10 0% 5.69% 5.69
iOS 4.3 0.19% 0% -0.19
iOS 5.0 0.12% 0% -0.12
iOS 5.1 0.59% 0% -0.59
iOS 6.0 0.42% 0% -0.42
iOS 6.1 2.02% 0.61% -1.41
iOS 7.0 7.36% 1.54% -5.82
iOS 7.1 0% 5.76% 5.76
iOS 8.0 0% 3.27% 3.27
Android 2.3 0.77% 0% -0.77
Android 4.0 0.82% 0% -0.82
Android 4.1 2.11% 1.22% -0.89
Android 4.2 0.84% 1.16% 0.32
Android 4.3 0.39% 0.56% 0.17
Android 4.4 0% 3.80% 3.80
Linux 4.37% 1.94% -2.43

* We didn’t start breaking out individual versions of OS X until Q1 2014. So unfortunately for this post we don’t have great info breaking down the versions of OS X, but we will in the future. OS X 10.10 did not exist in Q1 2014 so it’s counted as a natural 0% in our Q1 data.

Browser version details

These are the most dynamic numbers of the bunch. If there’s one thing they prove, it’s how incredibly effective the upgrade policies of Chrome and Firefox are. Where Safari and IE have years-old versions still hanging on (I’m looking at you Safari 5.1 and IE 8.0), virtually every Chrome and Firefox user is using a browser released within the last six weeks. That’s a hugel powerful thing. The IE team has suggested that Windows 10’s Project Spartan will adopt this policy, which is absolutely fantastic news. A few highlights:

  • Despite not being on a continuous upgrade cycle, Safari and IE were able to piggyback on successful OS launches to consolidate their users on their latest releases.
  • IE 8.0 is the only non-latest version of IE still holding on, thanks to its status as the latest version available for the still somewhat popular Windows XP.
OS versions in use on Flickr.com
2013 Q4 2014 Q4 Y/Y
Chrome 22.0.1229 1.67% 0% -1.67
Chrome 29.0.1547.76 1.39% 0% -1.39
Chrome 30.0.1599.101 8.94% 0% -8.94
Chrome 30.0.1599.69 3.74% 0% -3.74
Chrome 31.0.1650.57 6.08% 0% -6.08
Chrome 31.0.1650.63 6.91% 0% -6.91
Chrome 37.0.2062.124 0% 4.59% 4.59
Chrome 38.0.2125.104 0% 3.05% 3.05
Chrome 38.0.2125.111 0% 6.65% 6.65
Chrome 39.0.2171.71 0% 4.09% 4.09
Chrome 39.0.2171.95 0% 4.51% 4.51
Safari 5.0 1.96% 0% -1.96
Safari 5.1 5.60% 2.50% -3.10
Safari 6.0 6.21% 0.86% -5.35
Safari 7.0 7.29% 7.25% -0.04
Safari 7.1 0% 3.12% 3.12
Safari 8.0 0% 10.10% 10.10
Firefox 22.0 1.62% 0% -1.62
Firefox 24.0 5.50% 0% -5.50
Firefox 25.0 6.46% 0% -6.46
Firefox 26.0 1.90% 0% -1.90
Firefox 32.0 0% 4.92% 4.92
Firefox 33.0 0% 7.10% 7.10
Firefox 34.0 0% 3.52% 3.52
MSIE 8.0 3.69% 1.00% -2.69
MSIE 9.0 3.04% 1.22% -1.82
MSIE 10.0 5.94% 0% -5.94
MSIE 11.0 0% 6.69% 6.69
Generic WebKit 4.0* 3.18% 2.46% -0.72
Mozilla 5.0* 3.18% 4.80% 1.62
Opera 9.80 1.46% 0% -1.46

* These are catch-all versions of Mozilla-based and Webkit-based browsers that aren’t themselves Firefox, Safari, or Chrome.

A word on methodology

These numbers were anonymously collected using Yahoo’s in-house metrics libraries. The numbers here are aggregated over the course of three months each, making these numbers lagging indicators. This is why the latest releases, like Android 5.0 and iOS 8.1, are under-represented – they hadn’t yet enjoyed one full quarter when 2014 came to a close.

Further reading

There are a number of excellent sites out there watching similar browser statistics on a continuing basis. A few of them are:

  • Ars Technica – on a monthly basis they analyze raw data from Net Market Share with insightful commentary.
  • Net Market Share – while Ars does a bang-up job, it’s helpful to sift the data yourself to find the answers to your questions.
  • Peter-Paul Koch – No one shines a sharper light on the state of browsers than PPK, with just one example being his attention to disambiguating the various versions of Chromium out there (part two).

 
 

Flickr September 2014

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Introducing: Flickr PARK or BIRD

park OR bird
Zion National Park Utah by Les Haines Creative Commons License Secretary Bird by Bill Gracey Creative Commons License

tl;dr: Check it out at parkorbird.flickr.com!

We at Flickr are not ones to back down from a challenge. Especially when that challenge comes in webcomic form. And especially when that webcomic is xkcd. So, when we saw this xkcd comic we thought, “we’ve got to do that”:

xkcd-1425
Creative Commons License

In fact, we already had the technology in place to do these things.  Like the woman in the comic says, determining whether a photo with GPS info embedded into it was taken in a national park is pretty straightforward. And, the Flickr Vision team has been working for the last year or so to be able to recognize more than 1000 things in images using deep convolutional neural nets. Incidentally, one of the things we’re pretty good at recognizing is birds!

We put those things together, and thus was born parkorbird.flickr.com!

Recognizing Stuff in Images with Deep Networks

The thing we’re really excited to show off with PARK or BIRD is our image recognition technology. To recognize 1000+ things, we employ a deep convolutional neural network similar to the one depicted below.

conv-net

This model transforms an input image into a representation in which different objects and scenes are easily distinguishable by a simple binary classification algorithm, like an SVM. It does this by passing the image through a series of layers, where each layer computes a function of the output of the layer below it.

Each successive one of these layers, after training on millions of images, has learned to recognize higher- and higher-level features of images and the ways these features go together to form different objects and scenes. For example, the first layer might recognize the most basic image features, such as short straight lines, corners, and small circular arcs. The next layer might recognize higher level combinations of those features, such as circles or other basic shapes. Further layers might recognize higher-level concepts, like eyes and beaks, and even further ones might recognize heads and wings. For an example of what this looks like, check out Figure 2 in this paper by Matt Zeiler and Rob Fergus.

As the image passes through these layers, they are “activated” in different ways depending on the features they’ve seen in the input image, and at the top of this network—after the image is transformed by the bottom layer, and that transformation of the image is transformed by the next layer, and that transformation of the transformation of the image is transformed by the next layer, and so on— a short floating-point vector summarizing all of the various activations at each layer is output. We pass this floating-point vector into more than 1000 binary classifiers, each of which is trained to give us a yes/no answer to identify a specific object/scene class. And, of course, one of those classes is birds!

The Flickr Vision team is already applying this deep network to Flickr photos to help people more more easily find what they’re looking for via Flickr search, and we plan to integrate it into Flickr in other cool ways in the future. We’re also working on other innovative computer vision and image recognition technologies that will make it easier for Flickr members to find and organize their photos.

Acknowledgements

The Flickr Vision and Search team is awesome and PARK or BIRD is built upon technologies that we all pitched in on. Here we all are (at least most of us), in all our beautiful glory. Thanks Vision/Search! Thanks also to Stephen Woods, Bart Thomee, John Ko, Mike Shema, and Sean Perkins, all of whom provided a lot of help getting PARK or BIRD off the ground.

Flickr flamily floto

If this all sounds like a challenge you’re interested in helping out with, you should join us! Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

The Ins and Outs of the Yahoo Flickr Creative Commons 100 Million Dataset

This past summer we (Yahoo Labs and Flickr) released the YFCC100M dataset that is the largest and most ambitious collection of Flickr photos and videos ever, containing 99,206,564 photos and 793,436 videos from 581,099 different photographers. We’re super excited about the dataset, because it is a reflection of how Flickr and photography have evolved over the past 10 years. And it contains photos and videos of almost everything under the sun (and yes, loads of cats).

We’ve received a lot of emails and tweets asking for more details about the dataset, so in this blog post, we’ll gladly tell you. Each of the 100 million photos and videos is associated with a Creative Commons license that indicates how it may be used by others. The table below shows the complete breakdown of licenses in our dataset. Approximately 31.8% is marked for commercial use, while 17.3% has the most liberal license, which only requires attribution to the photographer.

License Photos Videos
17,210,144 137,503
9,408,154 72,116
4,910,766 37,542
12,674,885 102,288
28,776,835 235,319
26,225,780 208,668

The photos and videos themselves are very diverse. We’ve found photos showing street scenes captured as part of photographer Andy Nystrom‘s life-logging activities, photos of real-world events like protests and rallies, as well as photos of natural phenomena.

Five years of Iraq war die-in IMG_9793 851-Aurora Borealis Northern Lights from Lodge near Fairbanks 1 Sep 28, 2011 1-11 AM 1600x1060
Steve Rhodes
Andy Nystrom
BJ Graf

To understand more about the visual content of the photos in the dataset, the Flickr Vision team used a deep-learning approach to find the presence of visual concepts, such as people, animals, objects, events, architecture, and scenery across a large sample of the corpus. There’s a diverse collection of visual concepts present in the photos and videos, ranging from indoor to outdoor images, faces to food, nature to automobiles.

Concept Count
outdoor 32,968,167
indoor 12,522,140
face 8,462,783
people 8,462,783
building 4,714,916
animal 3,515,971
nature 3,281,513
landscape 3,080,696
tree 2,885,045
sports 2,817,425
architecture 2,539,511
plant 2,533,575
house 2,258,396
groupshot 2,249,707
vehicle 2,064,329
water 2,040,048
mountain 2,017,749
automobile 1,351,444
car 1,340,751
food 1,218,207
concert 1,174,346
flower 1,164,607
game 1,110,219
text 1,105,763
night 1,105,296

There are 68,971,123 photos and videos in the set that have user-annotated tags. If we look at specific tags used, we see it is very common for people to use the year of capture, the camera brand, place names, scenery, and activities as tags. The top 25 tags (excluding the years of capture) and how often they were used are listed below, as well as the tag frequency distribution for the 100 most-frequently used tags.

User Tag Count
nikon 1,195,576
travel 1,195,467
usa 1,188,344
canon 1,101,769
london 996,166
japan 932,294
france 917,578
nature 872,029
art 854,669
music 826,692
europe 782,932
beach 758,799
united states 743,470
england 739,346
wedding 728,240
city 689,518
italy 688,743
canada 686,254
new york 685,311
vacation 680,142
germany 672,819
party 663,968
park 651,717
people 641,285
water 640,234

User tag distribution in the YFCC100M Dataset

Some photos and videos (3,350,768 to be exact) carry machine tags. Noteworthy machine tags are those having the “siwild” namespace, referring to photos uploaded by scientists of the Smithsonian, and the “taxonomy” namespace, which refers to photos in which flora and fauna have been carefully classified. The most frequently occurring namespace, “uploaded,” refers to the applications used to share the photos on Flickr, which are principally the Flickr and Instagram iOS apps. Other interesting machine tags are those referring to the different filters that can be applied to a photo, or roughly 750,000 photos. Overall, most machine tags are related to food and drink, events, camera and application metadata, as well as locations.

Machine Tag Count
uploaded 1,917,650
siwild 1,169,957
taxonomy 1,067,857
foursquare 894,265
exif 617,287
flickriosapp 538,829
geo 443,762
sequence 429,948
lastfm 313,379
flickrandroidapp 222,238

In terms of locations, the photos and videos in the dataset have been taken all over the world. In total, 48,366,323 photos and 103,506 videos were geotagged. The most popular cities where photos and videos were shot are concentrated in the United States, principally New York City, San Francisco, Los Angeles, Chicago, and Seattle; in Europe, they were principally London, Berlin, Barcelona, Rome and Amsterdam. There are also photos that have been taken in remote locations like Kiribati, icy places like Svalbard, and exotic places like Comoros. In fact, photos and videos from this dataset have been taken in 249 different territories (countries, islands, etc) around the world, and even in international waters or airspace.

One Million Creative Commons Geo-tagged Photos

Our dataset further reveals that there are many different cameras in use within the Flickr community. The Canon EOS 400D and 350D have a lead over the Nikon D90 (calm down…we’re not starting anything by saying that). Apple’s iPhones form the most popular type of cameraphone.

Make Camera Count
Canon EOS 400D 2,539,571
Canon EOS 350D 2,140,722
Nikon D90 1,998,637
Canon EOS 5D Mark II 1,896,219
Nikon D80 1,719,045
Canon EOS 7D 1,526,158
Canon EOS 450D 1,509,334
Nikon D40 1,358,791
Canon EOS 40D 1,334,891
Canon EOS 550D 1,175,229
Nikon D7000 1,068,591
Nikon D300 1,053,745
Nikon D50 1,032,019
Canon EOS 500D 1,031,044
Nikon D700 942,806
Apple iPhone 4 922,675
Nikon D200 919,688
Canon EOS 20D 843,133
Canon EOS 50D 831,570
Canon EOS 30D 820,838
Canon EOS 60D 772,700
Apple iPhone 4S 761,231
Apple iPhone 743,735
Nikon D70 742,591
Canon EOS 5D 699,381

Our collection of 100 million photos and videos marks a new milestone in the history of datasets. The collection is one of the largest released for academic use, and it’s incredibly varied—not just in terms of the content shown in the photos and videos, but also the locations where they were taken, the photographers who took them, the tags that were applied, the cameras that were used, etc. The best thing about the dataset is that it is completely free to download by anyone, given that all photos and videos have a Creative Commons license. Whether you are a researcher, a developer, a hobbyist or just plain curious about online photography, the dataset is the best way to study and explore a wide sample of Flickr photos and videos.  Happy researching and happy hacking!

Performance improvements for photo serving

We’ve been working to make Flickr faster for our users around the world. Since the primary photo storage locations are in the US, and information on the internet travels at a finite speed, the farther away a Flickr user is located from the US, the slower Flickr’s response time will be. Recently, we looked at opportunities to improve this situation. One of the improvements involves keeping temporary copies of recently viewed photos in locations nearer to users.  The other improvement aims to get a benefit from these caches even when a user views a photo that is not already in the cache.

Regional Photo Caches

For a few years, we’ve deployed regional photo caches located in Switzerland and Singapore. Here’s how this works. When one of our users in Vietnam requests a photo, we copy it temporarily to Singapore. When a second user requests the same photo, from, say, Kuala Lumpur, the photo is already present in Singapore. Flickr can respond much faster using this copy (only a few hundred kilometers away) instead of using the original file back in the US (over 8,000 km away).

The first piece of our solution has been to create additional caches closer to our users. We expanded our regional cache footprint around two months ago. Our Australian users, among others, should now see dramatically faster load times. Australian users will now see the average image load about twice as fast as it did in March.

We’re happy with this improvement and we’re planning to add more regional caches over the next several months to help users in other regions.

Cache Prefetch

When users in locations far from the US view photos that are already in the cache, the speedup can be up to 10x, but only for the second and subsequent viewers. The first viewer still has to wait for the file to travel all the way from the US. This is important because there are so many photos on Flickr that are viewed infrequently. It’s likely that a given photo will not be present in the cache. One example is a user looking at their Auto Upload album. Auto uploaded photos are all private initially. Scrolling through this album, it’s likely that very few of the photos will be in their regional cache, since no other users would have been able to see them yet.

It turns out that we can even help the first viewer of a photo using a trick called cache warming.

To understand how caching warming works, you need to understand a bit about how we serve images. For example, say that I’m a user in Spain trying to access the photostream of a user, Martin Brock, in the US. When my request for Martin Brock’s Photostream at https://www.flickr.com/photos/martinbrock/ hits our backend servers, our code quickly determines the most recent photos Martin has uploaded that are visible to me, which sizes will fit best in my browser, and the URLs of those images. It then sends me the list of those URLs in an HTML response. The user’s web browser reads the HTML, finds the image URLs and starts loading them from the closest regional cache.


Standard image fetch

So you’re probably already guessing how to speed things up.  The trick is to take advantage of the time in between when the server knows which images will be needed and the time when the browser starts loading them from the closest cache. This period of time can be in the range of hundreds of milliseconds. We saw an opportunity during this time to send the needed images over to the viewer’s regional cache in advance of their browser requesting the images. If we can “win the race” to do this, the viewer’s experience will be much faster, since images will load from the local cache instead of loading from the US.

To take advantage of this opportunity, we created a new “cache warming” process called The Warmer. Once we’ve determined which images will be requested (the first few photos in Martin’s photostream) we send  a message from the API servers to The Warmer.

The Warmer listens for messages and, based on the user’s location, it determines from which of the Flickr regional caches the user will likely request the image. It then pushes the image out to this cache.



Optimized image fetch, with cache warming path indicated in red

Getting this to work well required a few optimizations.

Persistent connections

Yahoo encrypts all traffic between our data centers. This is great for security, but the time to set up a secure connection can be considerable. In our first iteration of The Warmer, this set up time was so long that we rarely got the photo to the cache in time to benefit a user. To eliminate this cost, we used an Nginx proxy which maintains persistent connections to our remote data centers. When we need to push an image out – a secure connection is already set up and waiting to be used.

Transport layer

The next optimization we made helped us reduce the cost of sending messages to The Warmer.  Since the data we’re sending always fits in one datagram, and we also don’t care too much if a small percentage of these messages are never received, we don’t need any of the socket and connection features of TCP. So instead of using HTTP, we created a simple JSON format for sending messages using UDP datagrams. Another reason we chose to use UDP is that if The Warmer is not available or is reacting slowly, we don’t want that to cause slowdowns in the API.

Queue management

Naturally, some images are quite popular and it would waste resources to push them to the same cache repeatedly. So, the third optimization we applied was to maintain a list of recently pushed images in The Warmer. This simple “de-deduplication” cut the number of requests made by The Warmer by 60%. Similarly, The Warmer drops any incoming requests that are more than fifty milliseconds old. This “time-to-live” provides a safety valve in case The Warmer has fallen behind and can’t catch up.


def warm_up_url(params):
  requested_jpg = params['jpg']

  colo_to_warm = params['colo_to_warm']
  curl = &quot;curl -H 'Host: &quot; + colo_to_warm + &quot;' '&quot; + keepalive_proxy + &quot;/&quot; + requested_jpg + &quot;'&quot;
  os.system(curl)

if __name__ == '__main__':

# create the worker pool

  from multiprocessing.pool import ThreadPool
  worker_pool = ThreadPool(processes=100)

  while True:

    # receive requests
    json_data, addr = sock.recvfrom(2048)

    params = json.loads(json_data)

    requested_jpg = warm_params['jpg']
    colo_to_warm =
      determine_colo_to_warm(params['http_endpoint'])

    if recently_warmed(colo_to_warm, requested_jpg) :
      continue

    if request_too_old(params) :
      continue

    # warm up urls
    params['colo_to_warm'] = colo_to_warm

    warm_result = worker_pool.apply_async(warm_up_url,(params,))

Cache Warmer pseudocode

Java

Our initial implementation of the Warmer was in Python, using a ThreadPool. This allowed very rapid prototyping and worked great — up to a point. Profiling the Python code, we found a large portion of time spent in socket calls. Since there is so little code in The Warmer, we tried porting to Java. A nearly line-for-line translation resulted in a greater than 10x increase in capacity.

Results

When we began this process, we weren’t sure whether The Warmer would be able to populate caches before the user requests came in. We were pleasantly surprised when we first enabled it at scale. In the first region where we’ve deployed The Warmer (Western Europe), we observed a reduced median latency of more than 200 ms, 95% of photos requests sped up by at least 100 ms, and for a small percentage of photos we see over 400 ms reduction in latency. As we continue to deploy The Warmer in additional regions, we expect to see similar improvements.

Next Steps

In addition to deploying more regional photo caches and continuing to improve prefetching performance, we’re looking at a few more techniques to make photos load faster.

Compression

Overall Flickr uses a light touch on compression. This results in excellent image quality at the cost of relatively large file sizes. This translates directly into longer load times for users. With a growing number of our users connecting to Flickr with wireless devices, we want to make sure we can give users a good experience regardless of whether they have a high-speed LTE connection or two-bars of 3G in the countryside. An important goal will be to make these changes with little or no loss in image quality.

We are also testing alternative image encoding formats (like WebP). Under certain conditions WebP compression may offer better image quality at the same compression ratio than JPEG can achieve.

Geolocation and routing

It turns out it’s not straightforward to know which photo cache is going to give the best performance for a user. It depends on a lot of factors, many of which change over time — sometimes suddenly. We think the best way to do this is with a system that adapts dynamically to “Internet weather.”

Cache intelligence

Today, if a user needs to see a medium sized version of an image, and that version is not already present in the cache, the user will need to wait to retrieve the image from the US, even if a larger version of the image is already in the cache. In this case, there is an opportunity to create the smaller version at the cache layer and avoid the round-trip to the US.

Overall we’re happy with these improvements and we’re excited about the additional opportunities we have to continue to make the Flickr experience super fast for our users. Thanks for following along.

Flickr flamily floto

Like what you’ve read and want to make the jump with us? We’re hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Exploring Life Without Compass

Compass is a great thing. At Flickr, we’re actually quite smitten with it. But being conscious of your friends’ friends is important (you never know who they’ll invite to your barbecue), and we’re not so sure about this “Ruby” that Compass is always hanging out with. Then there’s Ruby’s friend Bundler who, every year at the Christmas Party, tells the same stupid story about the time the police confused him with a jewelry thief. Enough is enough! We’ve got history, Compass, but we just feel it might be time to try seeing other people.

Solving for Sprites

In order to find a suitable replacement (and for a bit of closure), we had to find out what kept us relying on Compass for so long. We knew the big one coming in to this great experiment: sprites. Flickr is a huge site with many different pages, all of which have their own image folders that need to be sprited together. There are a few different options for compiling sprites on your own, but we liked spritesmith for its multiple image rendering engines. This gives us some flexibility in dependencies.

A grunt task is available for spritesmith, but it assumes you are generating only one sprite. Our setup is a bit more complex and we’d like to keep our own sprite mixin intact so we don’t actually have to change a line of code. With spritesmith and our own runner to iterate over our sprite directories, we can easily create the sprites and output the dimensions and urls via a simple Handlebars template to a Sass file.

{{#each sprites}}
    {{#each images}}
        %{{../dir}}-{{name}}-dimensions {
            width: {{coords.width}}px;
            height: {{coords.height}}px;
        }
        %{{../dir}}-{{name}}-background {
            background: image-url('{{../url}}') -{{coords.x}}px -{{coords.y}}px no-repeat;
        }
    {{/each}}
{{/each}}

You could easily put all three of these rules in the same declaration, but we have some added flexibility in mind for our mixin.

It’s important to note that, because we’re using placeholders (the % syntax in Sass), nothing is actually written out unless we use it. This keeps our compiled CSS nice and clean (just like Compass)!

@import 'path/to/generated/sprite/file'

@mixin background-sprite($icon, $set-dimensions: false) {
    @extend %#{$spritePath}-#{$icon}-background;

    @if $set-dimensions == true {
        @extend %#{$spritePath}-#{$icon}-dimensions;
    }
}

Here, our mixin uses the Sass file we generated to provide powerful and flexible sprites. Note: Although retina isn’t shown here, adding support is as simple as extending the Sass mixin with appropriate media queries. We wanted to keep the example simple for this post, but it gives you an idea of just how extensible this setup is!

Now that the big problem is solved, what about the rest of Compass’s functionality?

Completing the Package

How do we account for the remaining items in the Compass toolbox? First, it’s important to find out just how many mixins, functions, and variables are used. An easy way to find out is to compile with Sass and see how much it complains!


sass --update assets/sass:some-temp-dir

Depending on the complexity of your app, you may see quite a lot of these errors.


error assets/css/base.scss (Line 3: Undefined mixin 'font-face'.)

In total, we’re missing 16 mixins provided by Compass (and a host of variables). How do we replace all the great mixin functionality of Compass? With mixins of the same name, node-bourbon is a nice drop-in replacement.

What is the point of all this work again?

The Big Reveal

Now that we’re comfortably off Compass, how exactly are we going to compile our Sass? Well try not to blink, because this is the part that makes it all worthwhile.

Libsass is a blazing-fast C port of the Sass compiler that exposes bindings to modules like node-sass.

Just how fast? With Compass, our compile times were consistently around a minute and a half to two minutes. Taking care of spriting ourselves and using libsass for Sass compilation, we’re down to 5 seconds. When you deploy as often as we do at Flickr (in excess of 10 times a day), that adds up and turns into some huge savings!

What’s the Catch?

There isn’t one! Oh, okay. Maybe there are a few little ones. We’re pretty willing to swallow them though. Did you see that compile time?!

There are some differences, particularly with the @extend directive, between Ruby Sass and libsass. We’re anticipating that these small kinks will continue to be ironed out as the port matures. Additionally, custom functions aren’t supported yet, so some extensibility is lost in coming from Ruby (although node-sass does have support for the image-url built-in which is the only one we use, anyway).

With everything taken into account, we’re counting down the days until we make this dream a reality and turn it on for our production builds.

Flickr flamily floto

Like what you’ve read and want to make the jump with us? We’re hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.