Web workers and YUI

(Flickr is hiring! Check out our open job postings and what it’s like to work at Flickr.)

Web workers are awesome. They’ll change the way you think about JavaScript.

Factory Scenes : Consolidated/Convair Aircraft Factory San Diego

Chris posted an excellent writeup on how we do client-side Exif parsing in the new Uploader, which is how we can display thumbnails before uploading your photos to the Flickr servers. But parsing metadata from hundreds of files can be a little expensive.

In the old days, we’d attempt to divide our expensive JS into smaller parts, using setTimeout to yield to the UI thread, crossing our fingers, and hoping that the user could still scroll and click when they wanted to. If that didn’t work, then the feature was simply too fancy for the web.

Since then, a lot has happened. People started using better browsers. HTML got an orange logo. Web workers were discovered.

So now we can run JavaScript in separate threads (“parallel execution environments”), without interrupting the standard UI stuff the browser is always working on. We just need to put our job code in a separate file, and instantiate a web worker.

Without YUI

For simple, one-off tasks, you can just write some JavaScript in a new file and upload it to your server. Then create a worker like this:

var worker = new Worker('my_file.js');

worker.addEventListener('message', function (e) {
	// do something with the message from the worker
});

// pass some data into the worker
worker.postMessage({
	foo: bar
});

Of course, the worker thread won’t have access to anything in the main thread. You can post messages containing anything that’s JSON compatible, but not functions, cyclical references, or special objects like File references.

That means any modules or helper functions you’ve defined in your main thread are out of bounds, unless you’ve also included them in your worker file. That can be a drag if you’re accustomed to working in a framework.

With YUI

Practically speaking, a worker thread isn’t very different from the main thread. Workers can’t access the DOM, and they have a top-level self object instead of window. But plenty of our existing JavaScript modules and helper functions would be very useful in a worker thread.

Flickr is built on YUI. Its modular architecture is powerful and encourages clean, reusable code. We have a ton of small JS files—one per module—and the YUI Loader figures out how to put them all together into a single URL.

If we want to write our worker code like we write our normal code, our worker file can’t be just my_file.js. It needs to be a full combo URL, with YUI running inside it.

An aside for the brogrammers who have never seen modular JS in practice

Loader dynamically loads script and css files for YUI modules as well as external modules. It includes the dependency information for the version of the library in use, and will automatically pull in dependencies for the modules requested.

In development, we have one JS file per module. Let’s say photo.js, kitten.js, and puppy.js.

A page full of kitten photos might require two of those modules. So we tell YUI that we want to use photo.js and kitten.js, and the YUI Loader appends a script node with a combo URL that looks something like this:

<script src="/combo.php?photo.js&kitten.js">.

On our server, combo.php finds the two files on disk and prints out the contents, which are immediately executed inside the script node.

C-c-c-combo

Of course, the main thread is already running YUI, which we can use to generate the combo URL required to create a worker.

That URL needs to return the following:

  1. YUI.add() statements for any required modules. (Don’t forget yui-base)
  2. YUI.add() statement for the primary module with the expensive code.
  3. YUI.add() statement to execute the primary module.

Ok, so how do we generate this combo URL? Like so:

//
// Make a reference to our original YUI configuration object,
// with all of our module definitions and combo handler options.
//
// To make sure it's as clean as possible, we use a clone of the
// object from before we passed it into YUI.
//

var yconf = window.yconf; // global for demo purposes

//
// Y.Loader.resolve can be used to generate a combo URL with all
// the YUI modules needed within the web worker. (YUI 3.5 or later)
//
// The YUI Loader will bypass any required modules that have
// already been loaded in this instance, so in addition to the
// clean configuration object, we use a new YUI instance.
//

var Y2 = YUI(Y.merge(yconf));

var loader = new Y2.Loader({
	// comboBase must be on the same domain as the main thread
	comboBase: '/local/combo/path/',
	combine: true,
	ignoreRegistered: true,
	maxURLLength: 2048,
	require: ['my_worker_module']
});

var out = loader.resolve(true);

var combo_url = out.js[0];

Then, also in the main thread, we can start the worker instance:

//
// Use the combo URL to create a web worker.
// This is when the combo URL is downloaded, parsed, 
// and executed.
//

var worker = new window.Worker(combo_url);

To start using YUI, we need to pass our YUI config object into the worker thread. That could have been part of the combo URL, but our YUI config is pretty specific to the particular page you’re on, so we need to reuse the same object we started with in the main thread. So we use postMessage to pass it from the main thread to the worker:

//
// Post the YUI config into the worker.
// This is when the worker actually starts its work.
//

worker.postMessage({
	yconf: yconf
});

Now we’re almost done. We just need to write the worker code that waits for our YUI config before using the module. So, at the bottom of the combo response, in the worker thread:

self.addEventListener('message', function (e) {

	if (e.data.yconf) {

		//
		// make sure bootstrapping is disabled
		//
		
		e.data.yconf.bootstrap = false;

		//
		// instantiate YUI and use it to execute the callback
		//
		
		YUI(e.data.yconf).use('my_worker_module', function (Y) {

			// do some hard work!

		});

	}

}, false);

Yeah, I know the back-and-forth between the main thread and the worker makes that look complicated. But it’s actually just a few steps:

  1. Main thread generates a combo URL and instantiates a Web Worker.
  2. Worker thread parses and executes the JS returned by that URL.
  3. Main thread posts the page’s YUI config into the worker thread.
  4. Worker thread uses the config to instantiate YUI and “use” the worker module.

That’s it. Now get to work!

Building Fast Client-side Searches

Yesterday we released a new people selector widget (which we’ve been calling Bo Selecta internally). This widget downloads a list of all of your contacts, in JavaScript, in under 200ms (this is true even for members with 10,000+ contacts). In order to get this level of performance, we had to completely rethink how we send data from the server to the client.

Server Side: Cache Everything

To make this data available quickly from the server, we maintain and update a per-member cache in our database, where we store each member’s contact list in a text blob — this way it’s a single quick DB query to retrieve it. We can format this blob in any way we want: XML, JSON, etc. Whenever a member updates their information, we update the cache for all of their contacts. Since a single member who changes their contact information can require updating the contacts cache for hundreds or even thousands of other members, we rely upon prioritized tasks in our offline queue system.

Testing the Performance of Different Data Formats

Despite the fact that our backend system can deliver the contact list data very quickly, we still don’t want to unnecessarily fetch it for each page load. This means that we need to defer loading until it’s needed, and that we have to be able to request, download, and parse the contact list in the amount of time it takes a member to go from hovering over a text field to typing a name.

With this goal in mind, we started testing various data formats, and recording the average amount of time it took to download and parse each one. We started with Ajax and XML; this proved to be the slowest by far, so much so that the larger test cases wouldn’t even run to completion (the tags used to create the XML structure also added a lot of weight to the filesize). It appeared that using XML was out of the question.

BoSelectaJsonGoodFunTimes: eval() is Slow

DJ Bo Selecta on the decks

Next we tried using Ajax to fetch the list in the JSON format (and having eval() parse it). This was a major improvement, both in terms of filesize across the wire and parse time.

While all of our tests ran to completion (even the 10,000 contacts case), parse time per contact was not the same for each case; it geometrically increased as we increased the number of contacts, up to the point where the 10,000 contact case took over 80 seconds to parse — 400 times slower than our goal of 200ms. It seemed that JavaScript had a problem manipulating and eval()ing very large strings, so this approach wasn’t going to work either.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,617 1536 81312 0.14 7.66
4,878 681 18842 0.14 3.86
2,979 393 6987 0.13 2.35
1,914 263 3381 0.14 1.77
1,363 177 1837 0.13 1.35
798 109 852 0.14 1.07
644 86 611 0.13 0.95
325 44 252 0.14 0.78
260 36 205 0.14 0.79
165 24 111 0.15 0.67

JSON and Dynamic Script Tags: Fast but Insecure

Working with the theory that large string manipulation was the problem with the last approach, we switched from using Ajax to instead fetching the data using a dynamically generated script tag. This means that the contact data was never treated as string, and was instead executed as soon as it was downloaded, just like any other JavaScript file. The difference in performance was shocking: 89ms to parse 10,000 contacts (a reduction of 3 orders of magnitude), while the smallest case of 172 contacts only took 6ms. The parse time per contact actually decreased the larger the list became. This approach looked perfect, except for one thing: in order for this JSON to be executed, we had to wrap it in a callback method. Since it’s executable code, any website in the world could use the same approach to download a Flickr member’s contact list. This was a deal breaker.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,709 1105 89 0.10 0.01
4,877 508 41 0.10 0.01
2,979 308 26 0.10 0.01
1,915 197 19 0.10 0.01
1,363 140 15 0.10 0.01
800 83 11 0.10 0.01
644 67 9 0.10 0.01
325 35 8 0.11 0.02
260 27 7 0.10 0.03
172 18 6 0.10 0.03

Going Custom

Custom Ride

Having set the performance bar pretty high with the last approach, we dove into custom data formats. The challenge would be to create a format that we could parse ourselves, using JavaScript’s String and RegExp methods, that would also match the speed of JSON executed natively. This would allow us to use Ajax again, but keep the data restricted to our domain.

Since we had already discovered that some methods of string manipulation didn’t perform well on large strings, we restricted ourselves to a method that we knew to be fast: split(). We used control characters to delimit each contact, and a different control character to delimit the fields within each contact. This allowed us to parse the string into contact objects with one split, then loop through that array and split again on each string.

that.contacts = o.responseText.split("\c");

for (var n = 0, len = that.contacts.length, contactSplit; n < len; n++) {

	contactSplit = that.contacts[n].split("\a");

	that.contacts[n] = {};
	that.contacts[n].n = contactSplit[0];
	that.contacts[n].e = contactSplit[1];
	that.contacts[n].u = contactSplit[2];
	that.contacts[n].r = contactSplit[3];
	that.contacts[n].s = contactSplit[4];
	that.contacts[n].f = contactSplit[5];
	that.contacts[n].a = contactSplit[6];
	that.contacts[n].d = contactSplit[7];
	that.contacts[n].y = contactSplit[8];
}

Though this technique sounds like it would be slow, it actually performed on par with native JSON parsing (it was a little faster for cases containing less than 1000 contacts, and a little slower for those over 1000). It also had the smallest filesize: 80% the size of the JSON data for the same number of contacts. This is the format that we ended up using.

Contacts File Size (KB) Parse Time (ms) File Size per Contact (KB) Parse Time per Contact (ms)
10,741 818 173 0.08 0.02
4,877 375 50 0.08 0.01
2,979 208 34 0.07 0.01
1,916 144 21 0.08 0.01
1,363 93 16 0.07 0.01
800 58 10 0.07 0.01
644 46 8 0.07 0.01
325 24 4 0.07 0.01
260 14 3 0.05 0.01
160 13 3 0.08 0.02

Searching

Ben to the Rescue

Now that we have a giant array of contacts in JavaScript, we needed a way to search through them and select one. For this, we used YUI’s excellent AutoComplete widget. To get the data into the widget, we created a DataSource object that would execute a function to get results. This function simply looped through our contact array and matched the given query against four different properties of each contact, using a regular expression (RegExp objects turned out to be extremely well-suited for this, with the average search time for the 10,000 contacts case coming in under 38ms). After the results were collected, the AutoComplete widget took care of everything else, including caching the results.

There was one optimization we made to our AutoComplete configuration that was particularly effective. Regardless of how much we optimized our search method, we could never get results to return in less than 200ms (even for trivially small numbers of contacts). After a lot of profiling and hair pulling, we found the queryDelay setting. This is set to 200ms by default, and artificially delays every search in order to reduce UI flicker for quick typists. After setting that to 0, we found our search times improved dramatically.

The End Result

Head over to your Contact List page and give it a whirl. We are also using the Bo Selecta with FlickrMail and the Share This widget on each photo page.

YUI Blog: Improving The Flickr Upload Exprience With YUI Uploader

water pipe

Visual analogy of simultaneous file uploading. Also, internet/pipe joke goes here.

As a site which has many nifty JavaScript-driven features, Flickr makes good use of the Yahoo! User Interface library for much of its JavaScript DOM, Event handling and Ajax functionality.

One of the fancier widgets we’ve implemented is a flashy browser-based Web Uploadr which uses the YUI Uploader component (a combination of JavaScript and Flash) which allows for faster batch uploads, progress reporting, a nicer UI and overall improved user experience.

Head over to the YUI Blog and check out how Flickr uses YUI Uploader to provide a faster, shinier upload experience.