A Summer at Flickr

This summer I had the unforgettable opportunity to work side-by-side with the some of the smartest, photography-loving software engineers I’ve ever met. Looking back on my first day at Flickr HQ – beginning with a harmonious welcome during Flickr Engineering’s weekly meeting – I can confidently say that over the past ten weeks I have become a much better software engineer.

One of my projects for the summer was to build a new and improved locking manager that controls the distribution of locks for offline tasks (or OLTs for short). Flickr uses OLTs all the time for data migration, uploading photos, updates to accounts, and more. An OLT needs to acquire a lock on a shared resource, such as a group or an account, to prevent other OLTs from accessing the same resource at the same time. The OLT will then release the lock when it’s done modifying the shared data. Myles wrote an excellent blog post on how Flickr uses offline tasks here.

When building a distributed lock system, we need to take into account a couple of important details. First, we need to make sure that all the lock servers are consistent. One way to maintain consistency is to elect one server to act as a master and the remaining servers as slaves, where the master server is responsible for data replication among the slave servers. Second, we need to account for network and hardware failures – for instance, if the master server goes down for some reason, we need to quickly elect a new master server from one of the slave servers. The good news is, Apache ZooKeeper is an open-source implementation of master-slave data replication, automatic leader election, and atomic distributed data reads and writes.


Offline tasks send lock acquire and release requests through ZooLocker. ZooLocker in turn interfaces with the ZooKeeper cluster to create and delete znodes that correspond to the individual locks.

In the new locking system (dubbed “ZooLocker”), each lock is stored as a unique data node (or znode) on the ZooKeeper servers. When a client acquires a lock, ZooLocker creates a znode that corresponds to the lock. If the znode already exists, ZooLocker will tell the client that the lock is currently in use. When a client releases the lock, ZooLocker deletes the corresponding znode from memory. ZooLocker stores helpful debugging information, such as the owner of the lock, the host it was created on, and the maximum amount of time to hold on to the lock, in a JSON-serialized format in the znode. ZooLocker also periodically scans through each znode in the ZooKeeper ensemble to release locks that are past their expiration time.

My locking manager is already serving locks in production. In spite of sudden spikes in lock acquire and release requests by clients, the system holds up pretty well.


A graph of the number of lock acquire requests in ZooLocker per second

My summer internship at Flickr has been an incredibly valuable experience for me. I have demystified the process of writing, testing, and integrating code into a running system that millions of people around the world use each and every day. I have also learned about the amazing work going on in the engineering team, the ups and downs the code deploy process, and how to dodge the incoming flying finger rockets that the Flickr team members fling at each other.  My internship at Flickr is an experience I will never forget, and I am very grateful to the entire Flickr team for giving me the opportunity to work and learn from them this summer.

Pre-generating Justified Views

On May 20th, we introduced our Justified layout to the Photostream page. Ever since launch, we’ve been working hard to improve the performance of this page, and this week we’ve deployed a change that dramatically reduces the time it takes to display photos. The secret? Cookies.

At a high level, our Justified algorithm works like this:

  1. Take, as input, the browser viewport width and a list of photos
  2. Begin to lay those photos out in a row sequentially, using the maximum allowed height and scaling the width proportionately
  3. If a row becomes longer than the viewport width, reduce the height of that row and all the photos in it until the width is correct

Because we need the viewport width, we have to run this algorithm entirely in the browser. And because we won’t know which particular photo size to request until we’ve run the algorithm, we can’t start downloading the photos until very late in the process. This is why, up until Friday, when you loaded a photostream page, you saw the spinning blue and pink balls before the photos loaded.

Last week we were able to make one key change: we now pre-generate the layout on the server. This means that we know exactly which image sizes we need at the very top of the page, and can start downloading them immediately. It also means the spinning balls aren’t needed anymore. The end result is that the first photo on the page now loads seven times faster than on May 20th.

“Time to First Photo” on the Photostream page

One question remains: we need client viewport width in order to generate the layout, so how are we able to pre-generate it on the server? The first time you come to any Flickr page, we store the width of your browser window in a cookie. We can then read that cookie on the server on subsequent page loads. This means we aren’t able to pre-generate the photostream layout the very first time you come to the site. It also means that the layout will occasionally be incorrect, if you have resized the browser window since the last time you visited Flickr; we deal with this by always correcting the layout on the client, if a mismatch is detected.

This is one of many performance improvements we’re working on after our 5/20 release (we’ve also deployed some improvements to the homepage activity feed). Expect to see the performance continue to improve on the redesigned pages in the coming weeks and months.

Adventures in Jank Busting: Parallax, performance, and the new Flickr Home Page

tl;dr version: transform3d() is your friend, but use sparingly. Chrome Dev Tools are awesome.

What’s old is new again: Stealing from the arcade

Back in 1985, games like Super Mario Bros. popularized the effect of horizontal parallax scrolling – a technique wherein the background moves at a slower speed relative to the foreground, giving the impression of depth. In 2013, the web is seeing a trend of vertical parallax effects as developers look to make pages feel more interactive and responsive. Your author’s $0.25, for the record, is that we’ll continue to see arcade and demoscene-era effects being ported over to the mainstream web in creative ways as client-side performance improves.

While the effect can be aesthetically pleasing when executed correctly, parallax motion tends to be expensive to render – and when performance is lacking, the user’s impression of your site may follow suit.

Vertical parallaxing is pretty straightforward in behaviour: For every Y pixels of vertical axis scrolling, move an absolutely-positioned image in the same direction at scale. There are some additional considerations for the offset of the element on the page and how far it can move, but the implementation remains quite simple.

The following are some findings made while working on Flickr’s redesigned signed-out homepage at flickr.com/new/, specifically related to rendering and scrolling performance.

Events and DOM performance

To optimize performance in the browser environment, it’s important to consider the expensive parts of DOM “I/O”. You ideally want a minimal amount of both, particularly since this is work being done during scrolling. Executing JavaScript on scroll is one of the worst ways to interrupt the browser, typically because it’s done to introduce reflow/layout and painting – thus, denying the browser the chance to use GPU/hardware-accelerated scrolling. window.onscroll() can also fire very rapidly on desktops, making way for a veritable flood of expensive scroll → reflow → paint operations if your “paths” are not fast.

A typical parallax implementation will hook into window.onscroll(), and will update the backgroundPosition or marginTop of an element with a background image attached in order to make it move. An <img> could be used here, but backgrounds are convenient because they allow for positioning in relative units, tiling and so on.

A minimal parallax example, just the script portion:

window.onscroll = function(e) {

  var parallax = document.getElementById('parallax-background');

  parallax.style.marginTop = (window.scrollY/2) + 'px';

}

This could work for a single element, but quickly breaks down if multiple elements are to be updated. In any case, references to the DOM should be cached for faster look-ups; reading window.scrollY and other DOM attributes can be expensive due to potential to cause layout/reflow themselves, and thus should also be stored in a local variable for each onscroll() event to minimize thrashing.

An additional performance consideration: Should all parallax elements always be moved, even those which are outside of the viewport? Quick tests suggested savings were negligible in this case at best. Even if the additional work to determine in-view elements were free, moving only one element did not notably improve performance relative to moving only three at a time. It appears that Webkit’s rendering engine is smart about this, as the only expensive operations seen here involve painting things within the viewport.

In any event, using marginTop or backgroundPosition alone will not perform well as neither take advantage of hardware-accelerated compositing.

And now, VOIDH (Video Or It Didn’t Happen) of marginTop-based parallax performing terribly:

[flickr video=8859509880 secret=412e7dafff w=639 h=364]

Look at that: Terrible. Jank city! You can try it for yourself in your browser of choice, via via ?notransform=1.

Enter the GPU: Hardware Acceleration To The Rescue

Despite caching our DOM references and scroll properties, the cost of drawing all these pixels in software is very high. The trick to performance here is to have the GPU (hardware) take on the job of accelerating the compositing of the expensive bits, which can be done much faster than in software rendering.

Elements can be promoted to a “layer” in rendering terms, via CSS transforms like translate3d(). When this and other translateZ()-style properties are applied, the element will then be composited by the GPU, avoiding expensive repaints. In this case, we are interested in having fast-moving parallax backgrounds. Thus, translate3d() can be applied directly to the parallax element.

window.onscroll = function(e) {

  // note: most browsers presently use prefixes: webkitTransform, mozTransform etc.

  var parallax = document.getElementById('parallax-background');

  parallax.style.transform = 'translate3d(0px,' + (window.scrollY/2) + 'px, 0px)';

}

In webkit-based browsers like Safari and Chrome, the results speak for themselves.

[flickr video=8758952624 secret=f414d2f2fd w=640 h=448]

Look, ma! Way less jank! The performance here is comparable to the same page with parallax disabled (i.e., regular browser scrolling/drawing.)

GPU acceleration sounds like a magic bullet, but it comes at the cost of both video memory and cache. If you create too many layers, you may see rendering problems and even degraded performance – so use them selectively.

It’s also worth noting that browser-based hardware acceleration and performance rests on having agreement between the browser, drivers, OS, and hardware. Firefox might be sluggish on one machine, and butter-smooth on another. High-density vs. standard-resolution screens make a big difference in paint cost. All GPUs are made equal, but some GPUs are made more equal than others. The videos and screenshots for this post were made on my work laptop, which may perform quite differently than your hardware. Ultimately, you need to test your work on a variety of devices to see what real-world performance is like – and this is where Chrome’s dev tools come in handy.

Debugging Render Performance

In brief, Chrome’s Developer Tools are awesome. Chrome Canary typically has the freshest features in regards to profiling, and Safari also has many of the same. The features most of interest to this entry are the Timeline → Frames view, and the gear icon’s “Show Paint rectangles” and “Show composited layer borders” options.



Timeline → Frames view: Helpful in identifying expensive painting operations.



Paint rectangles + composited layer borders, AKA “Plaid mode.” Visually identify layers.

Timeline → Frames view

Timeline’s “Frames” view allows you to see how much time is spent calculating and drawing each frame during runtime, which gives a good idea of rendering performance. To maintain a refresh rate of 60 frames per second, each frame must take no longer than 16 milliseconds to draw. If you have JS doing expensive things during scroll events, this can be particularly challenging.

Expensive frames in Flickr’s case stem primarily from occasional decoding of JPEGs and non-cached image resizes, and more frequently, compositing and painting. The less of each that your page requires, the better.

Paint rectangles

It is interesting to see what content is being painted (and re-painted) by the browser, particularly during scroll events. Enabling paint rectangles in Chrome’s dev tools results in red highlights on paint areas. In the ideal case when scrolling, you should see red only around the scrollbar area; in the best scenario, the browser is able to efficiently move the rest of the HTML content in hardware-accelerated fashion, effectively sliding it vertically on the screen without having to perform expensive paint operations. Script-based DOM updates, CSS :hover and transition effects can all cause painting to happen when scrolling, so keep these things in mind as well.

Composited layer borders

As mentioned previously, layers are elements that have been promoted and are to be composited by the GPU, via CSS properties like translate3d(), translateZ() and so on. With layer borders enabled, you can get a visual representation of promoted elements and review whether your CSS is too broad or too specific in creating these layers. The browser itself may also create additional layers based on a number of scenarios such as the presence of child elements, siblings, or elements that overlap an existing layer.

Composited borders are shown in brown. Cyan borders indicate a tile, which combines with other tiles to form a larger composited layer.

Other notes

Image rendering costs

When using parallax effects, “full-bleed” images that cover the entire page width are also popular. One approach is to simply use a large centered background image for all clients, regardless of screen size; alternately, a responsive @media query-style approach can be taken where separate images are cut to fit within common screen widths like 1024, 1280, 1600, 2048 etc. In some cases, however, the single-size approach can work quite nicely.

In the case of the Flickr homepage, the performance cost in using 2048-pixel-wide background images for all screens seemed to be negligible – even in spite of “wasted” pixels for those browsing at 1024×768. The approach we took uses clip-friendly content, typically a centered “hero” element with shading and color that extends to the far edges. Using this approach, the images are quite width-agnostic. The hero-style images also compress quite nicely as JPEGs thanks to their soft gradients and lighting; as one example, we got a 2048×950-pixel image of a flower down to 68 KB with little effort.

Bandwidth aside, the 2048-pixel-wide images clip nicely on screens down to 1024 pixels in width and with no obvious flaws. However, Chrome’s dev tools also show that there are costs associated with decoding, compositing, re-sizing and painting images which should be considered.

Testing on my work laptop*, “Image resized (non-cached)” is occasionally shown in Chrome’s timeline → frames view after an Image Decode (JPEG) operation – both of which appear to be expensive, contributing to a frame that took 60 msec in one case. It appears that this happens the first time a large parallax image is scrolled into the viewport. It is unclear why there is a resize (and whether it can be avoided), but I suspect it’s due to the retina display on this particular laptop. I’m not using background-size or otherwise applying scaling/sizing in CSS, merely positioning eg., background-position:50% 50%; and background-repeat: no-repeat;. As curiosity sets in, this author will readily admit he has some more research to do on this front. ;)

There are also aspects to RAM and caching that can affect GPU performance. I did not dig deeply into this specifically for Flickr’s new homepage, but it is worth considering the impact of the complexity and number of layers present and active at any given time. If regular scrolling triggers non-cached image resizes each time an asset is scrolled into view, there may be a cache eviction problem stemming from having too many layers active or present at once.

* Work laptop: Mid-2012 15″ Retina MBP, 16 GB RAM, 2.6 Ghz Intel i7, NVIDIA GeForce GT 650M/1024 MB, OS X 10.8.3.

Debugging in action

Here are two videos showing performance of flickr.com/new/ with transforms disabled and enabled, respectively.

Transforms off (marginTop-based parallax)

[flickr video=8845365987 secret=cfe2155de7 w=640 h=361]

Notice the huge spikes on the timeline with transforms disabled, indicating many frames that are taking up to 80 msec to draw; this is terrible for performance, “blowing the frame budget” of 16 ms, and lowers UI responsiveness significantly. Red paint rectangles indicate that the whole viewport is being repainted on scroll, a major contributor to performance overhead. With compositing borders, you see that every “strip” of the page – each parallax background, in effect – is rendered as a single layer. A quick check of the FPS meter and “continuous repaint” graphs does not look great.

Side note: Continuous repaint is most useful when not scrolling. The feature causes repeated painting of the viewport, and displays an FPS graph with real-time performance numbers. You can go into the style editor while continuous repaint is on and flip things off, e.g., disabling box-shadow, border-radius or hiding expensive elements via display to see if the frame rate improves.

Transforms on (translate3d()-based parallax)

[flickr video=8845359795 secret=5fa8467374 w=639 h=364]

With GPU acceleration, you see much-improved frame times, thus a higher framerate and a smoother, more-responsive UI. There is still the occasional spike when a new image scrolls into view, but there is much less jank overall than before. Paint rectangles are much less common thanks to the GPU-accelerated compositing, and layer borders now indicate that more individual elements are being promoted. The FPS meter and continuous repaint mode does have a few dips and spikes, but performance is notably improved.

You may notice that I intentionally trigger video playback in this case, to see how it performs. The flashing red is the result of repainting as the video plays back – and in spite of overflow: hidden-based clipping we apply for the parallax effect on the video, it’s interesting to notice that the overflowed content, while not visible, is also being painted.

Miscellany

Random bits: HTML5 <video>



A frame from a .webm and H.264-encoded video, shown in the mobile portion of Flickr’s redesigned home page.

We wanted the signed-out Flickr homepage to highlight our mobile offerings, including an animation or video showing a subtly-rotating iPhone demoing the Flickr iOS app on a static background. Instead of an inline video box, it felt appropriate to have a full-width video following the pattern used for the parallax images. The implementation is nearly the same, and simply uses a <video> element instead of a CSS background.

With video, the usual questions came up around performance and file size: Would a 2048-pixel-wide video be too heavy for older computers to play back? What if we cropped the video only to be as wide as needed to cover the area being animated (eg., 500 pixels), and used a static JPEG background to fill in the remainder of the space?

As it turned out, encoding a 2048×700 video and positioning it like a background element – even including a slight parallax effect – was quite reasonable. Playback was flawless on modern laptops and desktops, and even a 2006-era 1.2 GHz Fujitsu laptop running WinXP was able to run the video at reasonable speed. Per rendering documentation from the Chrome team, <video> elements are automatically promoted to layers for the GPU where applicable and thus benefit from accelerated rendering. Due to the inline nature of the video, we excluded it from display on mobile devices, and show a static image to clients that don’t support HTML5 video.

Perhaps the most interesting aspect of the video was file size. In our case, the WebM-encoded video (supported natively in Chrome, Firefox, and Opera) was clearly able to optimize for the low amount of motion within the wide frame, as eight seconds of 2048×700 video at 24 fps resulted in a .webm file of only 900 KB. Impressive! By comparison, the H.264-encoded equivalent ended up being about 3.8 MB, with a matching data rate of ~3.8 mbps.

The “Justified” View

It’s worth mentioning that the Justified photos at the bottom of the page lazy-load in, and have been excluded from any additional display optimizations in this case. There is an initial spike with the render and subsequent loading of images, but things settle down pretty quickly. Blindly assigning translate-type transforms to the Justified photo container – a complex beast in and of itself – causes all sorts of rendering hell to break loose.

In Review

This article represents my findings and approach to getting GPU-accelerated compositing working for background images in the Webkit-based Chrome and Safari browsers, in May 2013. With ever-changing rendering engines getting smarter over time, your mileage may vary as the best route to the “fast path” changes. As numerous other articles have said regarding performance, “don’t guess it, test it.”

To recap:

  • Painting: Expensive. Repaints should be minimal, and limited to small areas. Reduce by carefully choosing layers.
  • Compositing: Good! Fast when done by the GPU.
  • Layers: The secret to speed, when done correctly. Apply sparingly.

References / further reading:

… And finally, did I mention we’re hiring? (hint: view-source :))

Using Redis as a Secondary Index for MySQL

Hey, did you notice, on the brand-spanking-new Yahoo homepage, right there on the side of the page, it’s photos from your Flickr contacts (or maybe your groups)! No? Go check it out, I’ll wait.

Ok, great, you’re back! What you should have seen, assuming you have Flickr contacts (or are a member of some groups), is photos from your most recently active contact (or group!). Something like… this:

Flickr on Yahoo.com

(thanks schill!)

What you see above is the 10 most recent photos from my contact who most recently uploaded any photos. The homepage retrieves this data by making a call to a specially tailored method from the Flickr API.

The Latency Problem

In order to ensure performance for Yahoo.com, this API method had very tight SLAs; we can’t slow down the page for the millions of homepage visitors to pull in Flickr content, after all. While the method can return different data sets depending on the most recent activity from your Flickr contacts and groups, for the sake of this post we’re going to focus exclusively on the contacts case. The first step to returning data is to get your most recently active contact, and to do that we need a list of all of your contacts sorted by how recently they’ve uploaded a photo. In an ideal world, the SQL query would be something along the lines of:

SELECT contact_id FROM Contacts WHERE user_id = ? ORDER BY date_upload LIMIT 1;

Easy, right? Sadly, no. Due to the way our contact relationships are stored, the query we need to run is more complex than the above. Still, for the vast majority of Flickr users, the *actual* SQL query performed just fine, usually in less than 1ms. However, Flickr users come in all shapes and sizes. Some users have a few contacts, some have hundreds, and some… have tens of thousands. Unfortunately, the runtime for the query scaled proportionally to number of contacts the user has. I won’t delve into the specifics of the query or how we store contact relationships, but suffice it to say we investigated possible changes to both the query and to the indexes in MySQL, but neither was sufficient to give us the performance we needed across all possible use cases. With that in mind, we had to look elsewhere for optimizations.

The First Attempt

With a pure MySQL solution off the table, our first thoughts for optimization avenues turned to the obvious: denormalization. If we stored the 10 most recent photos from your most recently active contact ahead of time, then getting that list when the homepage was rendered would be trivial regardless of how many contacts you have. At Flickr we’re big fans of Redis, so our thoughts immediately turned to using it as a store for the denormalized contact data.

In order to denormalize the data, we need to constantly process six different user actions:

  • Add a contact
  • Remove a contact
  • Change a contact relationship
  • User uploads a photo
  • User deletes a photo
  • User changes a photo’s privacy

Each one of these actions can impact what photos you should see on the Yahoo homepage, and therefore must update the denormalized data appropriately. Because the photo related actions can potentially impact thousands of users, if not tens or hundreds of thousands (everyone who calls the photo owner a contact would need to have their denormalized data updated on upload), they must be processed by our offline task system to ensure site performance is not adversely impacted. Unfortunately, this is where we ran into a snag. The nature of Flickr’s offline task system is such that the order of processing is not guaranteed. In most cases this is not a problem, however when it comes to maintaining an accurate list of denormalized data not being able to predict the outcome of running multiple tasks is problematic.

Out of Order

Imagine the following scenario: you add a contact then quickly remove them. This results in two tasks being added, one to add and one to remove. If they’re processed in the order in which the actions were taken, everything is fine. However, if they’re processed out of order then when everything is said and done the denormalized data no longer accurately reflects your actual contact relationships. Maybe the next action that updates your data will correct this problem, or maybe it will introduce another problem. Over time, it’s likely that a not-insignificant number of users will have denormalized data that is out of sync with reality.


Out of order

Out of order by Foomandoonian

Ok, let’s try this another way

Looking back at our original problem, the bottleneck was not in generating the list of photos for your most recently active contact, it was just in finding who your most recently active contact was (specifically if you have thousands or tens of thousands of contacts). What if, instead of fully denormalizing, we just maintain a list of your recently active contacts? That would allow us to optimize the slow query, much like a native MySQL index would; instead of needing to look through a list of 20,000 contacts to see which one has uploaded a photo recently, we only need to look at your most recent 5 or 10 (regardless of your total contacts count)!

In the end, this is exactly what we ended up doing. We still process offline tasks to keep track of your contacts’ activity, but now the set of actions we need to track is smaller:

  • Add a new contact
  • Remove a contact
  • User uploads a photo

Each task maintains a per-user sorted set where the set member is the contact id, and the score is the timestamp of when that contact last uploaded a photo. So, for example, if a user (user id 12345) adds a new contact (user id 98765) we simply do:

ZADD user_12345 1363665432 98765

Removing a contact is the opposite, using ZREM as the yin to ZADD‘s yang:

ZREM user_12345 98765

When a user uploads a new photo it’s once again a ZADD (though it’s a ZADD against the set for every user that calls the uploading user a contact). If the uploader is already in a given user’s set, this will simply update the score, if not then the uploader will be added to the set.

As things stand right now, the set will grow without bound, which is obviously not much help if our goal is to limit the number of contacts we need to check against when querying the DB. The solution to this is to cap the set. After each ZADD we check to see if the size of the set has exceeded a threshold (generally we store an additional 20% on top of the data we absolutely need), and if so, we remove all of the extra records using ZREMRANGEBYRANK:

ZREMRANGEBYRANK user_12345 0 ($collection_size - $max_size) - 1

Where $collection_size is the current number of members in the set and $max_size is the maximum number of members we want to store. Note that we’re removing from the head of the set. Redis stores data in sorted sets in ascending order, so the least recently active contacts are at the beginning of the set.

Akin to how we must cap the set to keep it below a maximum size, we also have a threshold on the other end of the spectrum to keep the set above a minimum size. If a user happens to remove their 10 most recently active contacts then there would be no data in their set, and the Redis index would be of little value. With that in mind, any time the user removes a contact, we check to see if the size of the set has dropped under the minimum threshold, and if so we repopulate the data based on their remaining contacts. This is slightly more complex than a simple Redis command, so we’ll use actual PHP to explain:

//
// $key is the redis ZSET key, e.g. user_12345
//
$count = redis_zset_zcard($key);
if ($count &lt; $MIN_SET_SIZE) {
      if ($count &gt; 0) {
        $current_contacts = redis_zset_zrange($key, 0, -1);
    } else {
        $current_contacts = array();
    }

    //
    // In this call, the second parameter is the number of contacts
    // to return and the third parameter is a list of users to
    // exclude from the response. Since they’re already in the
    // set, there’s no need to add them again
    //
    $contacts = contacts_most_recently_uploaded_list($user, $MAX_SET_SIZE - $count,
        $current_contacts);

    foreach ($contacts as $contact) {
        redis_zset_zadd($key, $contact['last_upload'], $contact['id']);
    }
}

Remember, this is all being done outside of the context of a page request, so there’s no harm in spending a little bit of extra time to ensure when the API is called we have a reliable index that we can use to optimize the DB query.

The final action we need to take is to actually query the set to get the list of contacts so we can actually do said DB query optimization. This is done with a standard ZREVRANGE:

ZREVRANGE user_12345 0 10

Similar to how we cap the set by removing members from the beginning because those are the least recently active, when we want the most recently active we use ZREVRANGE to get members at the end of the set.

You’re probably wondering, what about the other three events that generated tasks for the fully denormalized solution? How are we able to get by without them? Well, because we’ll just be using the list of contacts to optimize a live DB query, we can take some liberties with data purity in Redis. Because we store multiple recently active contacts, it doesn’t matter if, for example, the most recent contact in your Redis set has deleted his most recent photos or made them all private. When we query the DB, we further restrict the list based on your current relationship with a contact and photo-visibility, so any issues with Redis being out of sync sort themselves out automatically.

Take the above scenario where a user adds and quickly removes a contact. If the remove is processed first and the contact remains in the user’s Redis set, when we go to query the live contacts DB, we’ll get no results for that relationship and move on to the next contact in the set—crisis averted. There is a slight problem with the reverse scenario wherein a user removes a contact and then adds them back. If the tasks are processed out of order, there’s a chance that the add may not end up being reflected in the Redis set. This is the only chance for corruption under this system (as opposed to the fully denormalized solution where many tasks could interact and corrupt one another), and it’s likely to be fixed the next time any of the other actions occur. Furthermore, the only downside of this corruption is a user seeing photos from their second most recently active contact; this is a condition we’re willing to live with given the overall gains provided by the solution as a whole.

Not All Wine and Roses

The dataset involved in this solution is one of the largest we’ve pushed at Redis so far, and it’s not without its pitfalls. Namely, as the size of the dataset increases, the amount of time spent doing RDB saves also increases. This can introduce latency into Redis commands while the save is in progress. From Redis Persistence:

RDB needs to fork() often in order to persist on disk using a child process. Fork() can be time consuming if the dataset is big, and may result in Redis to stop serving clients for some millisecond or even for one second if the dataset is very big and the CPU performance not great. AOF also needs to fork() but you can tune how often you want to rewrite your logs without any trade-off on durability.

This is a key factor to be aware of when optimizing Redis performance. The overall speed of the system can be adversely impacted by RDB saves, therefore taking steps to minimize the time spent saving is critical. We’ve solved this problem by isolating these writes to their own Redis instance, thereby limiting the size of the dataset to only keys related to contacts activity. In the long term, as activity increases, it’s likely that we’ll need to further reduce the number of writes-per-instance by sharding this contact activity to a number of Redis instances. In some cases, multiple small Redis instances running on a single host can be preferable to one large Redis instance. As mentioned in the Redis Persistence guide, RDB saves can suffer with a slow CPU; if upgrading hardware (mostly faster CPUs to improve fork() performance), is possible, that’s certainly another option to investigate.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.