Redis Global Locks Redux

In my last post I described how we use Redis to manage a global lock that allows us to automatically failover to a backup process if there was a problem in the primary process. The method described allegedly allowed for any number of backup processes to work in conjunction to pick up on primary failures and take over processing.

Locks #1
Locks #1 by Christoph Kummer

Thanks to an astute reader, it was pointed out that the code in the blog wouldn’t actually work as advertised:

 

The Problem

Nolan correctly noticed that when the backup processes attempts to acquire the lock via SETNX, that lock key will already exist from when it was acquired by the primary, and thus all subsequent attempts to acquire locks will simply end up constantly trying to acquire a lock that can never be acquired. As a reminder, here’s what we do when we check back on the status of a lock:

function checkLock(payload, lockIdentifier) {
    client.get(lockIdentifier, function(error, data) {
        // Error handling elided for brevity
        if (data !== DONE_VALUE) {
            acquireLock(payload, data + 1, lockCallback);
        } else {
            client.del(lockIdentifier);
        }
    });
}

And here’s the relevant bit from acquireLock that calls SETNX:

    client.setnx(lockIdentifier, attempt, function(error, data) {
        if (error) {
            logger.error("Error trying to acquire redis lock for: %s", lockIdentifier);
            return callback(error, dataForCallback(false));
        }

        return callback(null, dataForCallback(data === 1));
    });

So, you’re thinking, how could this vaunted failover process ever actually work? The answer is simple: the code from that post isn’t what we actually run. The actual production code has a single backup process, so it doesn’t try to re-acquire the lock in the event of failure, it just skips right to trying to send the message itself. In the previous post, I described a more general solution that would work for any number of backup processes, but I missed this one important detail.

That being said, with some relatively minor changes, it’s absolutely possible to support an arbitrary number of backup processes and still maintain the use of the global lock. The trivial solution is to simply have the backup process delete the key before trying to re-acquire the lock (or, technically acquire it anew). However, the problem with that becomes apparent pretty quickly. If there are multiple backup processes all deleting the lock and trying to SETNX a new lock again, there’s a good chance that a race condition could arise wherein one of backups deletes a lock that was acquired by another backup process, rather than the failed lock from the primary.

The Solution

Thankfully, Redis has a solution to help us out here: transactions. By using a combination of WATCH, MULTI, and EXEC, we can perform actions on the lock key and be confident that no one has modified it before our actions can complete. The process to acquire a lock remains the same: many processes will issue a SETNX and only one will win. The changes come into play when the processes that didn’t acquire the lock check back on its status. Whereas before, we simply checked the current value of the lock key, now we must go through the above described Redis transaction process. First we watch the key, then we do what amounts to a check and set (albeit with a few different actions to perform based on the outcome of the check):

function checkLock(payload, lockIdentifier, lastCount) {
    client.watch(lockIdentifier);
    client.multi()
        .get(lockIdentifier)
        .exec(function(error, replies) {
            if (!replies) {
                // Lock value changed while we were checking it, someone else got the lock
                client.get(lockIdentifier, function(error, newCount) {
                    setTimeout(checkLock, LOCK_EXPIRY, payload, lockIdentifier, newCount);
                });

                return;
            }

            var currentCount = replies[0];
            if (currentCount === null) {
                // No lock means someone else completed the work while we were checking on its status and the key has already been deleted
                return;
            } else if (currentCount === DONE_VALUE) {
                // Another process completed the work, let’s delete the lock key
                client.del(lockIdentifier);
            } else if (currentCount == lastCount) {
                // Key still exists, and no one has incremented the lock count, let’s try to reacquire the lock
                reacquireLock(payload, lockIdentifier, currentCount, doWork);
            } else {
                // Key still exists, but the value does not match what we expected, someone else has reacquired the lock, check back later to see how they fared
                setTimeout(checkLock, LOCK_EXPIRY, payload, lockIdentifier, currentCount);
            }
        });
}

As you can see, there are five basic cases we need to deal with after we get the value of the lock key:

  1. If we got a null reply back from Redis, that means that something else changed the value of our key, and our exec was aborted; i.e. someone else got the lock and changed its value before we could do anything. We just treat it as a failure to acquire the lock and check back again later.
  2. If we get back a reply from Redis, but the value for the key is null, that means that the work was actually completed and the key was deleted before we could do anything. In this case there’s nothing for us to do at all, so we can stop right away.
  3. If we get back a value for the lock key that is equal to our sentinel value, then someone else completed the work, but it’s up to us to clean up the lock key, so we issue a Redis DEL and call our job done.
  4. Here’s where things get interesting: if the key still exists, and its value (the number of attempts that have been made) is equal to our last attempt count, then we should try and reacquire the lock.
  5. The last scenario is where the key exists but its value (again, the number of attempts that have been made) does not equal our last attempt count. In this case, someone else has already tried to reacquire the lock and failed. We treat this as a failure to acquire the lock and schedule a timeout to check back later to see how whoever did acquire the lock got on. The appropriate action here is debatable. Depending on how long your underlying work takes, it may be better to actually try and reacquire the lock here as well, since whoever acquired the lock may have already failed. This can, however, lead to premature exhaustion of your attempt allotment, so to be safe, we just wait.

So, we’ve checked on our lock, and, since the previous process with the lock failed to complete its work, it’s time to actually try and reacquire the lock. The process in this case is similar to the above inasmuch as we must use Redis transactions to manage the reacquisition process, thankfully however, the steps are (somewhat) simpler:

function reacquireLock(payload, lockIdentifier, attemptCount, callback) {
    client.watch(lockIdentifier);
    client.get(lockIdentifier, function(error, data) {
        if (!data) {
            // Lock is gone, someone else completed the work and deleted the lock, nothing to do here, stop watching and carry on
            client.unwatch();
            return;
        }

        var attempts = parseInt(data, 10) + 1;

        if (attempts > MAX_ATTEMPTS) {
            // Our allotment has been exceeded by another process, unwatch and expire the key
            client.unwatch();
            client.expire(lockIdentifier, ((LOCK_EXPIRY / 1000) * 2));
            return;
        }

        client.multi()
            .set(lockIdentifier, attempts)
            .exec(function(error, replies) {
                if (!replies) {
                    // The value changed out from under us, we didn't get the lock!
                    client.get(lockIdentifier, function(error, currentAttemptCount) {
                        setTimeout(checkLock, LOCK_TIMEOUT, payload, lockIdentifier, currentAttemptCount);
                    });
                } else {
                    // Hooray, we acquired the lock!
                    callback(null, {
                        "acquired" : true,
                        "lockIdentifier" : lockIdentifier,
                        "payload" : payload
                    });
                }
            });
    });
}

As with checkLock we start out by watching the lock key, and proceed do a (comparitively) simplified check and set. In this case, we’ve “only” got three scenarios to deal with:

  1. If we’ve already exceeded our allotment of attempts, it’s time to give up. In this case, the allotment was actually exceeded in another worker, so we can just stop right away. We make sure to unwatch the key, and set it expire at some point far enough in the future that any remaining processes attempting to acquire locks will also see that it’s time to give up.

Assuming we’re still good to keep working, we try and update the lock key within a MULTI/EXEC block, where we have our remaining two scenarios:

  1. If we get no replies back, that again means that something changed the value of the lock key during our transaction and the EXEC was aborted. Since we failed to acquire the lock we just check back later to see what happened to whoever did acquire the lock.
  2. The last scenario is the one in which we managed to acquire the lock. In this case we just go ahead and do our work and hopefully complete it!

Bonus!

To make managing global locks even easier, I’ve gone ahead and generalized all the code mentioned in both this and the previous post on the subject into a tidy little event based npm package: https://github.com/yahoo/redis-locking-worker. Here’s a quick snippet of how to implement global locks using this new package:

var RedisLockingWorker = require("redis-locking-worker”);

var SUCCESS_CHANCE = 0.15;

var lock = new RedisLockingWorker({
    "lockKey" : "mylock",
    "statusLevel" : RedisLockingWorker.StatusLevels.Verbose,
    "lockTimeout" : 5000,
    "maxAttempts" : 5
});

lock.on("acquired", function(lastAttempt) {
    if (Math.random() <= SUCCESS_CHANCE) {
        console.log("Completed work successfully!", lastAttempt);
        lock.done(lastAttempt);
    } else {
        // oh no, we failed to do work!
        console.log("Failed to do work");
    }
});
lock.acquire();

There’s also a few other events you can use to track the lock status:

lock.on("locked", function() {
    console.log("Did not acquire lock, someone beat us to it");
});

lock.on("error", function(error) {
    console.error("Error from lock: %j", error);
});

lock.on("status", function(message) {
    console.log("Status message from lock: %s", message);
});

More Bonus!

If you don’t need the added complexity if multiple backup processes, I also want to give credit to npm user pokehanai who took the methodology described in the original post and created a generalized version of the two-worker solution: https://npmjs.org/package/redis-paired-worker.

Wrapping Up

So there you have it! Coordinating work on any number of processes across any number of hosts couldn’t be easier! If you have any questions or comments on this, please feel free to follow up on Twitter.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Web workers and YUI

(Flickr is hiring! Check out our open job postings and what it’s like to work at Flickr.)

Web workers are awesome. They’ll change the way you think about JavaScript.

Factory Scenes : Consolidated/Convair Aircraft Factory San Diego

Chris posted an excellent writeup on how we do client-side Exif parsing in the new Uploader, which is how we can display thumbnails before uploading your photos to the Flickr servers. But parsing metadata from hundreds of files can be a little expensive.

In the old days, we’d attempt to divide our expensive JS into smaller parts, using setTimeout to yield to the UI thread, crossing our fingers, and hoping that the user could still scroll and click when they wanted to. If that didn’t work, then the feature was simply too fancy for the web.

Since then, a lot has happened. People started using better browsers. HTML got an orange logo. Web workers were discovered.

So now we can run JavaScript in separate threads (“parallel execution environments”), without interrupting the standard UI stuff the browser is always working on. We just need to put our job code in a separate file, and instantiate a web worker.

Without YUI

For simple, one-off tasks, you can just write some JavaScript in a new file and upload it to your server. Then create a worker like this:

var worker = new Worker('my_file.js');

worker.addEventListener('message', function (e) {
	// do something with the message from the worker
});

// pass some data into the worker
worker.postMessage({
	foo: bar
});

Of course, the worker thread won’t have access to anything in the main thread. You can post messages containing anything that’s JSON compatible, but not functions, cyclical references, or special objects like File references.

That means any modules or helper functions you’ve defined in your main thread are out of bounds, unless you’ve also included them in your worker file. That can be a drag if you’re accustomed to working in a framework.

With YUI

Practically speaking, a worker thread isn’t very different from the main thread. Workers can’t access the DOM, and they have a top-level self object instead of window. But plenty of our existing JavaScript modules and helper functions would be very useful in a worker thread.

Flickr is built on YUI. Its modular architecture is powerful and encourages clean, reusable code. We have a ton of small JS files—one per module—and the YUI Loader figures out how to put them all together into a single URL.

If we want to write our worker code like we write our normal code, our worker file can’t be just my_file.js. It needs to be a full combo URL, with YUI running inside it.

An aside for the brogrammers who have never seen modular JS in practice

Loader dynamically loads script and css files for YUI modules as well as external modules. It includes the dependency information for the version of the library in use, and will automatically pull in dependencies for the modules requested.

In development, we have one JS file per module. Let’s say photo.js, kitten.js, and puppy.js.

A page full of kitten photos might require two of those modules. So we tell YUI that we want to use photo.js and kitten.js, and the YUI Loader appends a script node with a combo URL that looks something like this:

<script src="/combo.php?photo.js&kitten.js">.

On our server, combo.php finds the two files on disk and prints out the contents, which are immediately executed inside the script node.

C-c-c-combo

Of course, the main thread is already running YUI, which we can use to generate the combo URL required to create a worker.

That URL needs to return the following:

  1. YUI.add() statements for any required modules. (Don’t forget yui-base)
  2. YUI.add() statement for the primary module with the expensive code.
  3. YUI.add() statement to execute the primary module.

Ok, so how do we generate this combo URL? Like so:

//
// Make a reference to our original YUI configuration object,
// with all of our module definitions and combo handler options.
//
// To make sure it's as clean as possible, we use a clone of the
// object from before we passed it into YUI.
//

var yconf = window.yconf; // global for demo purposes

//
// Y.Loader.resolve can be used to generate a combo URL with all
// the YUI modules needed within the web worker. (YUI 3.5 or later)
//
// The YUI Loader will bypass any required modules that have
// already been loaded in this instance, so in addition to the
// clean configuration object, we use a new YUI instance.
//

var Y2 = YUI(Y.merge(yconf));

var loader = new Y2.Loader({
	// comboBase must be on the same domain as the main thread
	comboBase: '/local/combo/path/',
	combine: true,
	ignoreRegistered: true,
	maxURLLength: 2048,
	require: ['my_worker_module']
});

var out = loader.resolve(true);

var combo_url = out.js[0];

Then, also in the main thread, we can start the worker instance:

//
// Use the combo URL to create a web worker.
// This is when the combo URL is downloaded, parsed, 
// and executed.
//

var worker = new window.Worker(combo_url);

To start using YUI, we need to pass our YUI config object into the worker thread. That could have been part of the combo URL, but our YUI config is pretty specific to the particular page you’re on, so we need to reuse the same object we started with in the main thread. So we use postMessage to pass it from the main thread to the worker:

//
// Post the YUI config into the worker.
// This is when the worker actually starts its work.
//

worker.postMessage({
	yconf: yconf
});

Now we’re almost done. We just need to write the worker code that waits for our YUI config before using the module. So, at the bottom of the combo response, in the worker thread:

self.addEventListener('message', function (e) {

	if (e.data.yconf) {

		//
		// make sure bootstrapping is disabled
		//
		
		e.data.yconf.bootstrap = false;

		//
		// instantiate YUI and use it to execute the callback
		//
		
		YUI(e.data.yconf).use('my_worker_module', function (Y) {

			// do some hard work!

		});

	}

}, false);

Yeah, I know the back-and-forth between the main thread and the worker makes that look complicated. But it’s actually just a few steps:

  1. Main thread generates a combo URL and instantiates a Web Worker.
  2. Worker thread parses and executes the JS returned by that URL.
  3. Main thread posts the page’s YUI config into the worker thread.
  4. Worker thread uses the config to instantiate YUI and “use” the worker module.

That’s it. Now get to work!

Raising the bar on web uploads

With over seven billion photos uploaded since day one, it’s safe to say that uploading is an important part of the Flickr experience.

There are numerous ways to get photos onto Flickr, but the native web-based one at flickr.com/photos/upload/ is especially important as it typically accounts for a majority of uploads to the site.

A brief history of Flickr “Web Uploadrs”

Flickr “Flashy” Uploadr UI (2008) vs. Basic Uploadr UI

Earlier versions of Flickr’s web-based upload UI used a simple <form> with six file inputs, and no more. As the site grew in scale, the native web upload experience had to scale to match. In early 2008, an HTML/Flash hybrid upgrade added support for batch file selection, allowing up to several gigabytes of files to be uploaded in one session. This was a much-needed step in the right direction.

The “flashy” uploader does one thing – sending lots of files – fast, and reliably. However, it was not designed to tackle the other tasks one often performs on photos including adding and editing of metadata, sorting and organizing. As a result, “upload and organize” has traditionally been reinforced as two separate actions on Flickr when using the web-based UI.

The new (mostly-HTML5-based) shiny

Thanks to HTML5-based features in newer browsers, we have been able to build a new uploader that’s pretty slick, and is more desktop application-like than ever before; it brings us closer to the idea of a one-stop “upload and organize” experience. At the same time, the UI also retains common web conventions and has a distinct Flickr feel to it. We think the result is a pretty good mix, combining some of the best parts of both.

As feedback from a group of beta testers have confirmed, it can also be deceivingly fast.

The new Flickr Web Uploader. It’s powerful, it’s got a dark background, and it’s fast.

Features: An Overview

Here are a few fun things the new uploader does:

  • Drag and drop batches of files from your OS. Where present and supported, EXIF thumbnails are shown in the UI almost immediately.

  • Fluid photo “grid” shows photo thumbnails, allows larger, lightbox-style previews, inline editing of description/title and rotation.

  • Mouse and keyboard-based grid selection and rearrange functionality similar to that of desktops.

  • “Editor panel” shows state of current selection, provides powerful batch editing features (title + description, adding of tags, people, sets, license, privacy etc.)

  • “Info” mode shows overlay icons on grid items, allowing for a quick overview of pending edits (privacy, people, tags etc.)

  • Auto-retry and recovery cases for dropped / lost connection cases

Technical Bits

A small book could probably be written on the process, prototypes and technology decisions made during the development of this uploader, but we’ll save the gory details for a couple of in-depth blog posts which will highlight specific parts of the UI. In the meantime, here are some notes on the tech used:

  • HTML5 File APIs

    Modern browser file APIs make up the core of file handling functionality, including drag-and-dropping of files right into the browser. FileReader-type APIs allow access to data from disk, enabling things like EXIF thumbnail parsing and retrieval where supported. EXIF parsing is almost instantaneous and thumbnails are hugely valuable, of course, in prompting users’ editing decisions.

    (For browsers without the relevant file APIs, a Flash-based fallback is used in which case file drag-and-drop is not supported, and EXIF thumb previews are not implemented.)

  • CSS3

    Thanks to growing support across newer browsers, we’ve been able to produce a modern design that takes advantage of CSS-based gradients to achieve visual goals that would have traditionally required external images, and occasionally, hacks or shims in our HTML and JavaScript.

    CSS3′s border-radius, text-shadow and box-shadow are also featured nicely in this new design, alongside visual transform effects such as rotate, zoom and scale. Eagle-eyed users of newer Webkit builds such as Chrome Canary may even see a little use of filter with blur here and there.

    CSS transitions are also featured extensively in the new uploader, a notable shift away from animation sequences which would traditionally have been calculated and rendered by JavaScript. Good candidates for transitions include the expanding or collapsing of a menu section, or a background color fade when a text area is focused, for example.

    While triggering transitions and/or transforms can be a little quirky depending on the current “state” of the element (for example, an element just added to the DOM may need a moment to settle and be rendered before transitioning,) the advantage of using CSS vs. JS for “enhancement”-style UI effects like these is absolutely clear.

  • YUI3

    Thanks to YUI3, the new Flickr Uploader is a highly-modularized, component-based application. The editr module itself is comprised of about 35 sub-modules, following YUI’s standard module pattern. In Flickr’s case, modules are defined as being JavaScript, CSS or string (i.e., language translation) components. This compartmentalization approach reduces the overall complexity of code, encourages extensibility and allows developers to work on features within a specific scope.

A sneak peek: Screencast (Beta Version)

At time of writing, the new uploader is being gradually rolled out to the masses. For those who haven’t seen it yet, here’s a demo screencast of an earlier beta version showing some of the interactions for common upload and editing use cases. (Best viewed full-screen, and with “HD” on.) The video gives an idea of what the experience is like, but it’s best seen in person. We’ve really had a lot of fun building this one.

Building Fast Client-side Searches

Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client.

Server Side: Cache Everything

To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc. Whenever a member updates their information, we update the cache for all of their contacts. Since a single member who changes their contact information can require updating the contacts cache for hundreds or even thousands of other members, we rely upon prioritized tasks in our offline queue system.

Testing the Performance of Different Data Formats

Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. This means that we need to defer loading until it’s needed, and that we have to be able to request, download, and parse the contact list in the amount of time it takes a member to go from hovering over a text field to typing a name.

With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. We started with Ajax and XML; this proved to be the slowest by far, so much so that the larger test cases wouldn’t even run to completion (the tags used to create the XML structure also added a lot of weight to the filesize). It appeared that using XML was out of the question.

BoSelectaJsonGoodFunTimes: eval() is Slow

DJ Bo Selecta on the decks

Next we tried using Ajax to fetch the list in the JSON format (and having eval() parse it). This was a major improvement, both in terms of filesize across the wire and parse time.

While all of our tests ran to completion (even the 10,000 contacts case), parse time per contact was not the same for each case; it geometrically increased as we increased the number of contacts, up to the point where the 10,000 contact case took over 80 seconds to parse — 400 times slower than our goal of 200ms. It seemed that JavaScript had a problem manipulating and eval()ing very large strings, so this approach wasn’t going to work either.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,617 1536 81312 0.14 7.66
4,878 681 18842 0.14 3.86
2,979 393 6987 0.13 2.35
1,914 263 3381 0.14 1.77
1,363 177 1837 0.13 1.35
798 109 852 0.14 1.07
644 86 611 0.13 0.95
325 44 252 0.14 0.78
260 36 205 0.14 0.79
165 24 111 0.15 0.67

JSON and Dynamic Script Tags: Fast but Insecure

Working with the theory that large string manipulation was the problem with the last approach, we switched from using Ajax to instead fetching the data using a dynamically generated script tag. This means that the contact data was never treated as string, and was instead executed as soon as it was downloaded, just like any other JavaScript file. The difference in performance was shocking: 89ms to parse 10,000 contacts (a reduction of 3 orders of magnitude), while the smallest case of 172 contacts only took 6ms. The parse time per contact actually decreased the larger the list became. This approach looked perfect, except for one thing: in order for this JSON to be executed, we had to wrap it in a callback method. Since it’s executable code, any website in the world could use the same approach to download a Flickr member’s contact list. This was a deal breaker.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,709 1105 89 0.10 0.01
4,877 508 41 0.10 0.01
2,979 308 26 0.10 0.01
1,915 197 19 0.10 0.01
1,363 140 15 0.10 0.01
800 83 11 0.10 0.01
644 67 9 0.10 0.01
325 35 8 0.11 0.02
260 27 7 0.10 0.03
172 18 6 0.10 0.03

Going Custom

Custom Ride

Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.

Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.

that.contacts = o.responseText.split("\c");

for (var n = 0, len = that.contacts.length, contactSplit; n < len; n++) {

	contactSplit = that.contacts[n].split("\a");

	that.contacts[n] = {};
	that.contacts[n].n = contactSplit[0];
	that.contacts[n].e = contactSplit[1];
	that.contacts[n].u = contactSplit[2];
	that.contacts[n].r = contactSplit[3];
	that.contacts[n].s = contactSplit[4];
	that.contacts[n].f = contactSplit[5];
	that.contacts[n].a = contactSplit[6];
	that.contacts[n].d = contactSplit[7];
	that.contacts[n].y = contactSplit[8];
}

Though this technique sounds like it would be slow, it actually performed on par with native JSON parsing (it was a little faster for cases containing less than 1000 contacts, and a little slower for those over 1000). It also had the smallest filesize: 80% the size of the JSON data for the same number of contacts. This is the format that we ended up using.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,741 818 173 0.08 0.02
4,877 375 50 0.08 0.01
2,979 208 34 0.07 0.01
1,916 144 21 0.08 0.01
1,363 93 16 0.07 0.01
800 58 10 0.07 0.01
644 46 8 0.07 0.01
325 24 4 0.07 0.01
260 14 3 0.05 0.01
160 13 3 0.08 0.02

Searching

Ben to the Rescue

Now that we have a giant array of contacts in JavaScript, we needed a way to search through them and select one. For this, we used YUI’s excellent AutoComplete widget. To get the data into the widget, we created a DataSource object that would execute a function to get results. This function simply looped through our contact array and matched the given query against four different properties of each contact, using a regular expression (RegExp objects turned out to be extremely well-suited for this, with the average search time for the 10,000 contacts case coming in under 38ms). After the results were collected, the AutoComplete widget took care of everything else, including caching the results.

There was one optimization we made to our AutoComplete configuration that was particularly effective. Regardless of how much we optimized our search method, we could never get results to return in less than 200ms (even for trivially small numbers of contacts). After a lot of profiling and hair pulling, we found the queryDelay setting. This is set to 200ms by default, and artificially delays every search in order to reduce UI flicker for quick typists. After setting that to 0, we found our search times improved dramatically.

The End Result

Head over to your Contact List page and give it a whirl. We are also using the Bo Selecta with FlickrMail and the Share This widget on each photo page.

YUI Blog: Improving The Flickr Upload Exprience With YUI Uploader

water pipe

Visual analogy of simultaneous file uploading. Also, internet/pipe joke goes here.

As a site which has many nifty JavaScript-driven features, Flickr makes good use of the Yahoo! User Interface library for much of its JavaScript DOM, Event handling and Ajax functionality.

One of the fancier widgets we’ve implemented is a flashy browser-based Web Uploadr which uses the YUI Uploader component (a combination of JavaScript and Flash) which allows for faster batch uploads, progress reporting, a nicer UI and overall improved user experience.

Head over to the YUI Blog and check out how Flickr uses YUI Uploader to provide a faster, shinier upload experience.

Lessons Learned while Building an iPhone Site

The Explore Page in the iPhone site

A few weeks ago we released a version of the Flickr site tailored specifically for the iPhone. Developing this site was very different from any other project I’ve worked on; there seems to be a new set of frontend rules for developing high-end mobile sites. A lot of the current best practices get thrown out the window in the quest for minimum page weight and fastest load times over slow cellular connections.

Here are a few of the lessons we learned (sometimes painfully) while developing this site.

1. Don’t Use a JavaScript Library or CSS Framework

This was one of the hardest things for me to come to terms with. I’m a huge fan of libraries, especially YUI, mostly because they allow me to spend my time creating new stuff instead of working around crazy browser quirks. But these libraries walk a fine line; by definition, they must work across a wide array of browsers and offer enough features to make them worth using. This means they potentially contain a lot of code that you don’t care about and won’t use. This code is dead weight to your site.

With such a high percentage of normal web users on broadband connections, we’ve gotten cavalier about what we can include in our pages. 250 KB of JavaScript or more isn’t uncommon for a large site these days. But for sites that are meant to be viewed over slow cellular connections like EDGE, 250 KB is an impossible amount of data. The only way to get the size of your JavaScript down is to selectively pull code out of libraries, and include only what you use. This means you can rip out code meant only for browsers that you won’t support (modular libraries like the new YUI 3.0 allow you to only include the code you use, preventing this problem somewhat).

The same goes for CSS. Frameworks make development faster and your final product more robust, but they, like the JavaScript libraries, include code for situations you won’t have to deal with. Every line in your CSS must be custom; each property must be scrutinized to ensure it’s needed.

2. Load Page Fragments Instead of Full Pages

Loading fragments saves 92.2% of the page size

When navigating through a site, most of what changes from page to page is the actual content; the JavaScript, CSS, header and footer stay mostly the same. We can use this to our advantage by only loading the part of each page that changes. We did this by hijacking all links of the page: when a link is clicked, we intercept the event, fetch the page fragment using Ajax, and insert the HTML into a new div. This has several benefits:

  • Since you control the entire life cycle of the page fetch, you can display loading indicators or a wireframe version of the page while new pages load
  • All pages that have been fetched will exist within the DOM; clicking the back button (or clicking on a link for a page that has already been fetched) results in an instantaneous page load
  • The page fragments are extremely small; ours are about 800 bytes (gzipped) on average

Using this system complicates your code a bit. You need JavaScript to handle the hijacking, the page fragment insertion, and the address bar hash changes (which allow the back and forward buttons to work normally). You also need your backend to recognize requests made with Ajax, and to only send the page content instead of the full HTML document. And lastly, if you want normal URLs to work, and each of your pages to be bookmarkable, you will need even more JavaScript.

Despite these downsides, the benefits can’t be ignored. The extra JavaScript code is a one-time cost, but the extra page content that we would have downloaded is saved for every page load. We found it was worth the complication and additional JS in order to dramatically reduce the time it took to load each page.

3. Don’t Build for Just One Device

It’s really tempting to build the site for just the iPhone: you can use modern CSS (including things like CSS3 selectors and transformations), you don’t have to hack around annoying browser quirks, and testing is extremely easy. But any single device, even one as ubiquitous as the iPhone, has a limited share of the mobile market, especially internationally. Rarely can you justify the cost of creating a one-off site for a very small number of your users.

Luckily the current generation of high-end mobile browsers is excellent in terms of support for modern features. Many phones use a WebKit derivative, including the iPhone, and Symbian and Android phones. Other phones either come with or can use Opera Mobile or the new mobile version of Firefox (called Fennec). For the most part, very few changes are needed in order to support these browsers.

Most of the differences lie in layout. It’s important to structure your pages around a grid that can expand as a percentage of the page width. This allows your layouts to work on many different screen sizes and orientations. The iPhone, for example, allows both landscape and portrait viewing styles, which have vastly different layout requirements. By using percentages, you can have the content fill the screen regardless of orientation. Another option is to detect viewport width and height, and use JavaScript to dynamically adjust classes based on those measurements (but we found this was overkill; CSS can handle most situations on its own).

4. Optimize Everything

The browsers on mobile devices operate under much stricter constraints than their desktop cousins. Slower CPUs, smaller amounts of memory, and smaller hard drives mean that less data can be cached. On the iPhone, for instance, only files smaller than 25 KB are cached. This puts very specific limits of the size of your files. For a large site like Flickr, 25 KB worth of JavaScript and CSS barely scratches the surface. To put our files under the limit, we ran everything through the YUI Compressor using the most aggressive settings. We ran all images through compression tools as well (we like pngout and Smushit), reducing each image file by an average of 40%. We also made heavy use of sprites, where possible.

In the end, we were able to go from 90+ second load times over EDGE to less than 7 for an empty cache experience. Using page fragments, we are able to load and display new pages in under a second (though the images in those pages take longer to load). These are not trivial gains, and make the difference between a good mobile experience and a one that is so awful the user gives up halfway through the page load.

5. Tell the User What is Happening

Once we hijacked all clicks actions in order to load page fragments, it wasn’t always clear to the user if anything was happening when they clicked on a link. This is especially true on touch devices, where it is difficult to know if the device even detected your action. To combat this problem, we added loading indicators to every link. These tell the user that something is happening, and reassures them that their action was detected. It also makes the pages seem to load much faster, since something is happening right away; if the indicators weren’t there, it would seem like nothing was happening for a few seconds, and then the page would load suddenly. In our testing, these indicators were the difference between a UI that seems snappy and responsive, and one that seemed slow and inconsistent.

Loading indicators

One Easy Option

The iUI framework implements a lot of these practices for you, and might be a good place to start in developing any mobile site (though keep in mind it was developed specifically for the iPhone). We found it especially useful in the early stages of development, though eventually we pulled it out and wrote custom code to run the site.

Making a better Flickr Web Uploadr (Or, “Web Browsers Aren’t Good At Uploading Files By Themselves”)

Sometimes when browsers won’t do what you want by themselves, you have to get creative.

A Brief History Of Web Uploading

As any developer who’s suffered through form-based uploading will understand, browsers have very limited native support for selecting and uploading files. While useable, Flickr’s form-based upload needed a refresh that would allow for batch selection and other improvements. After some consideration, Flash’s file-handling capabilities combined with the usual HTML/CSS/JS looked to be the winning solution.

In the past, ActiveX controls and Firefox extensions provided enhanced web-based upload experiences on Yahoo! Photos, supporting batch uploads, per-file progress , error reporting and so on; however, the initial browser-specific download/install requirement was “just another thing in the way” of a successful experience, not to mention one limited to Firefox and Internet Explorer. With Flickr’s new web Uploadr, my personal goals were to minimize or eliminate an install/set-up process altogether whenever possible, while at the same time keeping the approach browser-agnostic. Because of Flash’s distribution amongst Flickr users, it was safe to have as a requirement for the new experience. (In the non-flash/unsupported cases, browsers fall through to the old form-based Uploadr.)

And Now, For Something Completely Different

By using Flash to push files to Flickr, a number of advantages were clear over the old form-based method:

  • Batch file selection
  • File details (size, date etc.) for UI, business logic
  • Improved upload speed (faster than native browser form-based upload)
  • “Per-file”, asynchronous upload (as opposed to posting all data at once)
  • Upload progress reporting (per-file and overall)

Flash is able to do batch selection through standard operating system dialogs, report file names and size information, POST file data and read responses. Flickr’s new web Uploadr uses these features to provide a much-needed improvement over the old form-based Uploadr. The Flash component was developed by Allen Rabinovich on the Yahoo! Flash Platform Team. http://developer.yahoo.com/flash/

This Flash-based upload method did come with a few technical quirks, but ultimately we were still able to make signed calls to the Flickr API and upload files.

Now You Can, Too!

The Flash and client-side code which underlies the Flickr Web Uploadr is part of the Yahoo User Interface Library, available as the YUI Uploadr component.

It’s The Little Things That Count: UI Feedback

Given that Flash reports both file size and bytes uploaded, it made sense to show progress in the UI. In addition to per-file and overall progress in-page, the page’s title as shown in a browser window or tab also updates to reflect overall progress during upload – for example, “(42% complete) Flickr: Upload Photos”

Under Firefox, an .GIF-based “favicon” replaces the static Flickr icon, showing animation in the browser address bar while uploading is active. This combined with the title change is a nice indication of activity and status while the page is “working”, a handy way of checking progress without requiring the user to work to bring the window or tab back into focus.

In showing attention to detail in the UI and finding creative solutions to common browser drawbacks, a much nicer web upload experience is most certainly possible.

Scott Schiller is a front-end engineer and self-professed “DHTML + web standards evangelist / resident DJ and record crate digger” who works on Flickr. He enjoys making browsers do nifty things with client-side code, and making designers happy in bringing their work to life with close attention to detail. His personal site is a collection of random client-side experiments. http://flickr.com/photos/schill/