Avoiding Dragons: A Practical Guide to Drag ’n’ Drop

You, the enterprising programmer, know about parsing EXIF from photos in the browser and even how and why to power this parsing with web workers. “But,” you ask yourself, “how do I get those photos into the browser in the first place?”

The oldest and most low-tech solution is the venerable <input type="file" name="foo">. This plops the old standby file button on your page and POSTs the file’s contents to your server upon form submission.

To address many of this simple control’s limitations we debuted a Flash-based file uploader in 2008. This workhorse has been providing per-file upload statuses, batch file selection, and robust error handling for the last four years through Flash’s file system APIs.

These days we can thankfully do this work without plugins. Not only can we use XHR to POST files and provide all the other fancy info we’ve long needed Flash for, but now we can pair this with something much better than an <input>: drag and drop. This allows people drag files directly into a browser window from the iPhotos, Lightrooms, and Windows Explorers of the world.

Let’s take a look at how this works.

Foundations first

Workmen laying the cornerstone, construction of the McKim BuildingWorkmen laying the cornerstone, construction of the McKim Building by Boston Public Library

Let’s begin with our simple fallback, a – yes – <input type="file">.

HTML: <input type="file" multiple accept="image/*,video/*">
JS: Y.all('input[type=file]').on('change', handleBrowse);

Here we start with an <input> that accepts multiple files and knows it only accepts images and videos. Then, we bind an event handler to its change event. That handler can be very simple:

function handleBrowse(e) {
	// get the raw event from YUI
	var rawEvt = e._event;
	
	// pass the files handler into the loadFiles function
	if (rawEvt.target &amp;&amp; rawEvt.target.files) {
		loadFiles(rawEvt.target.files);
	}
}

A simple matter of handing the event object’s file array off to our universal function that adds files to our upload queue. Let’s take a look at this file loader:

function loadFiles(files) {
	updateQueueLength(count);
	
	for (var i = 0; i &lt; files.length; i++) {
		var file = files[i];
		
		if (File &amp;&amp; file instanceof File) {
			enqueueFileAddition(file);
		}
	}
}

Looks clear – it’s just going over the file list and adding them to a queue. “But wait,” you wonder, “why all this queue nonsense? Why not just kick off an XHR for the file right now?” Indeed, we’ve stuck in a layer of abstraction here that seems unnecessary. And for now it is. But suppose our pretty synchronous world were soon to become a whole lot less synchronous – that could get real fun in a hurry. For now, we’ll put that idea aside and take a look at these two queue functions themselves:

function updateQueueLength(quantity) {
	state.files.total += quantity;
}

function enqueueFileAddition(file) {
	state.files.handles.push(file);
	
	// If all the files we expect have shown up, then flush the queue.
	if (state.files.handles.length === state.files.total) {
		for (var i = 0, len = state.files.total; i &lt; len; i++) {
			addFile(state.files.handles[i]);
		}
		
		// reset the state of the world
		state.files.handles = [];
		state.files.total = 0;
	}
}

Pretty straightforward. One function for leaving a note of how many files we expect, one function to add files and see if we have all the files we expect. If so, pass along everything we have to addFile() which sends the file into our whirlwind of XHRs heading off to the great pandas in the sky.

Droppin’ dragons

Droppin’ dragonsDroppin’ dragons by Phil Dokas

While all of that is well and good, it was all for a ho-hum <input> element! Let’s hook a modern browser’s drag and drop events into this system:

document.addEventListener('drop', function(e) {
	if (e.dataTransfer &amp;&amp; e.dataTransfer.files) {
		loadFiles(e.dataTransfer.files);
	}
});

The drag and drop API is a fairly complicated one, but it thankfully makes the task of reading files out of a drop event easy. Every drop will have a dataTransfer attribute and when there’s at least one file in the drag that member will itself have a files attribute.

In fact, when you’re only concerned about handling files dragged directly into the browser you could call it a day right here. The loadFiles() function we wrote earlier knows how to handle instances of the File class and that’s exactly what dataTransfer.files stores. Easy!

Put it up to eleven

While easy is a good thing, awesome is awesome. How could we make dragging files into a browser even better? Well, how about cutting down on the trouble of finding the folder with your photos somewhere on your desktop, opening it, and then dragging those files into the browser? What if we could just drag the folder in and call it a day?

goes to 11goes to 11 by Rick Kimpel

Try to drag a folder into the browser with the current state of our code; what happens? Our code tells the browser to treat all dropped file system objects as files. So what ultimately happens for folders is a very elaborate “nothing”. To fix this, we need to tell the browser how to handle directories. In our case, we want it to recursively walk every directory it sees and pick out the photos from each.

From here on out we’re going to be treading over tumultuous land, rife with rapidly changing specs and swiftly updating browsers. This becomes immediately apparent in how we begin to add support for directories. We need to update our drop event handler like this:

document.addEventListener('drop', function(e) {
	if (e.dataTransfer &amp;&amp; e.dataTransfer.items) {
		loadFiles(e.dataTransfer.items);
	}
	else if (e.dataTransfer &amp;&amp; e.dataTransfer.files) {
		loadFiles(e.dataTransfer.files);
	}
});

Items? Files? The difference is purely a matter of one being the newer interface where development happens and the other being the legacy interface. This is spelled out a bit in the spec, but the short of it is that the files member will be kept around for backwards compatibility while newer abilities will be built in the items namespace. Our code above prefers to use the items attribute if available, while falling back to files for compatibility. The real fun is what comes next.

You see, the items namespace deals with Items, not Files. Items can be thought of as pointers to things in the file system. Thankfully, that includes the directories we’re after. But unfortunately, this is the file system and the file system is slow. And JavaScript is single-threaded. These two facts together are a recipe for latency. The File System API tackles this problem with the same solution as Node.js: asynchronicity. Most of the functions in the API accept a callback that will be invoked when the disk gets around to providing the requested files. So we’ll have to update our code to do two new things: 1) translate items into files and 2) handle synchronous and asynchronous APIs.

So what do these changes look like? Let’s turn back to loadFiles() and teach it how to handle these new types of files. Taking a look at the spec for the Item class, there appears to be a getAsFile() function and that sounds perfect.

function loadFiles(files) {
	updateQueueLength(count);
	
	for (var i = 0; i &lt; files.length; i++) {
		var file = files[i];
		
		if (typeof file.getAsFile === 'function') {
			enqueueFileAddition(file.getAsFile());
		}
		else if (File &amp;&amp; file instanceof File) {
			enqueueFileAddition(file);
		}
	}
}

Easy – but, there’s a problem. The getAsFile() function is very literal. It assumes the Item points to a file. But directories aren’t files and that means this method won’t meet our needs. Fortunately, there is a solution and that’s through yet another data type, the Entry. An Entry is much like a File, but it can also represent directories. As mentioned in this WHATWG wiki document, there is a proposed method, getAsEntry(), in the Item interface that allows you to grab an Entry for its file system object. It’s browser prefixed for now, so let’s add that in as well.

function loadFiles(files) {
	updateQueueLength(count);
	
	for (var i = 0; i &lt; files.length; i++) {
		var file = files[i];
		var entry;
		
		if (file.getAsEntry) {
			entry = file.getAsEntry();
		}
		else if (file.webkitGetAsEntry) {
			entry = file.webkitGetAsEntry();
		}
		else if (typeof file.getAsFile === 'function') {
			enqueueFileAddition(file.getAsFile());
		}
		else if (File &amp;&amp; file instanceof File) {
			enqueueFileAddition(file);
		}
	}
}

So what we have now is a way of handling native files and a way of turning Items into Entries. Now we need to figure out if the Entry is a file or a directory and then handle that appropriately.

What we’ll do is queue up any File objects we run across and skip the loop ahead to the next object. But if we have an Item and successfully turn it into an Entry then we’ll try to resolve this down to a file or a directory.

function loadFiles(files) {
	updateQueueLength(count);
	
	for (var i = 0; i &lt; files.length; i++) {
		var file = files[i];
		var entry, reader;
		
		if (file.getAsEntry) {
			entry = file.getAsEntry();
		}
		else if (file.webkitGetAsEntry) {
			entry = file.webkitGetAsEntry();
		}
		else if (typeof file.getAsFile === 'function') {
			enqueueFileAddition(file.getAsFile());
			continue;
		}
		else if (File &amp;&amp; file instanceof File) {
			enqueueFileAddition(file);
			continue;
		}
		
		if (!entry) {
			updateQueueLength(-1);
		}
		else if (entry.isFile) {
			entry.file(function(file) {
				enqueueFileAddition(file);
			}, function(err) {
				console.warn(err);
			});
		}
		else if (entry.isDirectory) {
			reader = entry.createReader();
			
			reader.readEntries(function(entries) {
				loadFiles(entries);
				updateQueueLength(-1);
			}, function(err) {
				console.warn(err);
			});
		}
	}
}

The code is getting long, but we’re almost done. Let’s unpack this.

The first branch of our new Entry logic ensures that what was returned by webkitGetAsEntry()/getAsEntry() is something useful. When they error they return null and this will happen if an application provides data in the drop event that isn’t a file. To see this in action try dragging a few files in from Preview in Mac OS X – it’s odd behavior, but this adequately cleans it up.

Next we handle files. The Entry spec provides the brilliantly simple isFile and isDirectory attributes. These guarantee whether you have a FileEntry or a DirectoryEntry on your hands. These classes have useful – though as promised, asynchronous – methods and here we use FileEntry’s file() method and enqueue its returned file.

Finally, the unicorn we’re chasing – handling directories. This is a tad more complicated, but the idea is straightforward. We create a DirectoryReader which lets us read its contents through its readEntries() method which provides an array of Entries. And what do we do with these Entries? We recursively call our loadFiles() function with them! In this step we achieve recursively walking a branch of the file system and rooting out every available image. Finally, we decrement the count of expected files by 1 to indicate that this was a directory and it has now been suitably handled.

But there is one more thing.

In that final directory reading step we recursively called loadFiles() with an array of Entries. As of right now, this function only expects to handle Files and Items. Let’s patch up this oversight, add a final bit of error handling, and call it a day.

function loadFiles(files) {
	updateQueueLength(count);
	
	for (var i = 0; i &lt; files.length; i++) {
		var file = files[i];
		var entry, reader;
		
		if (file.isFile || file.isDirectory) {
			entry = file;
		}
		else if (file.getAsEntry) {
			entry = file.getAsEntry();
		}
		else if (file.webkitGetAsEntry) {
			entry = file.webkitGetAsEntry();
		}
		else if (typeof file.getAsFile === 'function') {
			enqueueFileAddition(file.getAsFile());
			continue;
		}
		else if (File &amp;&amp; file instanceof File) {
			enqueueFileAddition(file);
			continue;
		}
		else {
			updateQueueLength(-1);
			continue;
		}
		
		if (!entry) {
			updateQueueLength(-1);
		}
		else if (entry.isFile) {
			entry.file(function(file) {
				enqueueFileAddition(file);
			}, function(err) {
				console.warn(err);
			});
		}
		else if (entry.isDirectory) {
			reader = entry.createReader();
			
			reader.readEntries(function(entries) {
				loadFiles(entries);
				updateQueueLength(-1);
			}, function(err) {
				console.warn(err);
			});
		}
	}
}

All we need to do to handle an Entry is to rely on the fact that Entries have those oh-so-helpful isFile and isDirectory attributes. If we see those we know we have an Entry of one type or another and we know how to work with them, so just skip on down to the FileEntry and DirectoryEntry handling code.

And that, finally, is it. There are many specs with very new data types at play here, but through this turmoil we can achieve some very nice results never before possible in browsers.

Further reading

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Web workers and YUI

(Flickr is hiring! Check out our open job postings and what it’s like to work at Flickr.)

Web workers are awesome. They’ll change the way you think about JavaScript.

Factory Scenes : Consolidated/Convair Aircraft Factory San Diego

Chris posted an excellent writeup on how we do client-side Exif parsing in the new Uploader, which is how we can display thumbnails before uploading your photos to the Flickr servers. But parsing metadata from hundreds of files can be a little expensive.

In the old days, we’d attempt to divide our expensive JS into smaller parts, using setTimeout to yield to the UI thread, crossing our fingers, and hoping that the user could still scroll and click when they wanted to. If that didn’t work, then the feature was simply too fancy for the web.

Since then, a lot has happened. People started using better browsers. HTML got an orange logo. Web workers were discovered.

So now we can run JavaScript in separate threads (“parallel execution environments”), without interrupting the standard UI stuff the browser is always working on. We just need to put our job code in a separate file, and instantiate a web worker.

Without YUI

For simple, one-off tasks, you can just write some JavaScript in a new file and upload it to your server. Then create a worker like this:

var worker = new Worker('my_file.js');

worker.addEventListener('message', function (e) {
	// do something with the message from the worker
});

// pass some data into the worker
worker.postMessage({
	foo: bar
});

Of course, the worker thread won’t have access to anything in the main thread. You can post messages containing anything that’s JSON compatible, but not functions, cyclical references, or special objects like File references.

That means any modules or helper functions you’ve defined in your main thread are out of bounds, unless you’ve also included them in your worker file. That can be a drag if you’re accustomed to working in a framework.

With YUI

Practically speaking, a worker thread isn’t very different from the main thread. Workers can’t access the DOM, and they have a top-level self object instead of window. But plenty of our existing JavaScript modules and helper functions would be very useful in a worker thread.

Flickr is built on YUI. Its modular architecture is powerful and encourages clean, reusable code. We have a ton of small JS files—one per module—and the YUI Loader figures out how to put them all together into a single URL.

If we want to write our worker code like we write our normal code, our worker file can’t be just my_file.js. It needs to be a full combo URL, with YUI running inside it.

An aside for the brogrammers who have never seen modular JS in practice

Loader dynamically loads script and css files for YUI modules as well as external modules. It includes the dependency information for the version of the library in use, and will automatically pull in dependencies for the modules requested.

In development, we have one JS file per module. Let’s say photo.js, kitten.js, and puppy.js.

A page full of kitten photos might require two of those modules. So we tell YUI that we want to use photo.js and kitten.js, and the YUI Loader appends a script node with a combo URL that looks something like this:

<script src="/combo.php?photo.js&kitten.js">.

On our server, combo.php finds the two files on disk and prints out the contents, which are immediately executed inside the script node.

C-c-c-combo

Of course, the main thread is already running YUI, which we can use to generate the combo URL required to create a worker.

That URL needs to return the following:

  1. YUI.add() statements for any required modules. (Don’t forget yui-base)
  2. YUI.add() statement for the primary module with the expensive code.
  3. YUI.add() statement to execute the primary module.

Ok, so how do we generate this combo URL? Like so:

//
// Make a reference to our original YUI configuration object,
// with all of our module definitions and combo handler options.
//
// To make sure it's as clean as possible, we use a clone of the
// object from before we passed it into YUI.
//

var yconf = window.yconf; // global for demo purposes

//
// Y.Loader.resolve can be used to generate a combo URL with all
// the YUI modules needed within the web worker. (YUI 3.5 or later)
//
// The YUI Loader will bypass any required modules that have
// already been loaded in this instance, so in addition to the
// clean configuration object, we use a new YUI instance.
//

var Y2 = YUI(Y.merge(yconf));

var loader = new Y2.Loader({
	// comboBase must be on the same domain as the main thread
	comboBase: '/local/combo/path/',
	combine: true,
	ignoreRegistered: true,
	maxURLLength: 2048,
	require: ['my_worker_module']
});

var out = loader.resolve(true);

var combo_url = out.js[0];

Then, also in the main thread, we can start the worker instance:

//
// Use the combo URL to create a web worker.
// This is when the combo URL is downloaded, parsed, 
// and executed.
//

var worker = new window.Worker(combo_url);

To start using YUI, we need to pass our YUI config object into the worker thread. That could have been part of the combo URL, but our YUI config is pretty specific to the particular page you’re on, so we need to reuse the same object we started with in the main thread. So we use postMessage to pass it from the main thread to the worker:

//
// Post the YUI config into the worker.
// This is when the worker actually starts its work.
//

worker.postMessage({
	yconf: yconf
});

Now we’re almost done. We just need to write the worker code that waits for our YUI config before using the module. So, at the bottom of the combo response, in the worker thread:

self.addEventListener('message', function (e) {

	if (e.data.yconf) {

		//
		// make sure bootstrapping is disabled
		//
		
		e.data.yconf.bootstrap = false;

		//
		// instantiate YUI and use it to execute the callback
		//
		
		YUI(e.data.yconf).use('my_worker_module', function (Y) {

			// do some hard work!

		});

	}

}, false);

Yeah, I know the back-and-forth between the main thread and the worker makes that look complicated. But it’s actually just a few steps:

  1. Main thread generates a combo URL and instantiates a Web Worker.
  2. Worker thread parses and executes the JS returned by that URL.
  3. Main thread posts the page’s YUI config into the worker thread.
  4. Worker thread uses the config to instantiate YUI and “use” the worker module.

That’s it. Now get to work!

Parsing Exif client-side using JavaScript

What is Exif? A short primer.

Exif is short for Exchangeable image file format. A standard that specifies the formats to be used in images, sounds, and tags used by digital still cameras. In this case we are concerned with the tags standard and how it is used in still images.

How Flickr currently parses Exif data.

Currently we parse an image’s Exif data after it is uploaded to the Flickr servers and then expose that data on the photo’s metadata page (http://www.flickr.com/photos/rubixdead/7192796744/meta/in/photostream). This page will show you all the data recorded from your camera when a photo was taken, the camera type, lens, aperture, exposure settings, etc. We currently use ExifTool (http://www.sno.phy.queensu.ca/~phil/exiftool/) to parse all of this data, which is a robust, albeit server side only, solution.

An opportunity to parse Exif data on the client-side

Sometime in the beginning phases of spec’ing out the Uploadr project we realized modern browsers can read an image’s data directly from the disk, using the FileReader API (http://www.w3.org/TR/FileAPI/#FileReader-interface). This lead to the realization that we could parse Exif data while the photo is being uploaded, then expose this to the user while they are editing their photos in the Uploadr before they even hit the Upload button.

Why client-side Exif?

Why would we need to parse Exif on the client-side, if we are parsing it already on the server-side? Parsing Exif on the client-side is both fast and efficient. It allows us to show the user a thumbnail without putting the entire image in the DOM, which uses a lot of memory and kills performance. Users can also add titles, descriptions, and tags in a third-party image editing program saving the metadata into the photo’s Exif. When they drag those photos into the Uploadr, BOOM, we show them the data they have already entered and organized, eliminating the need to enter it twice.

Using Web Workers

We started doing some testing and research around parsing Exif data by reading a file’s bytes in JavaScript. We found a few people had accomplished this already, it’s not a difficult feat, but a messy one. We then quickly realized that making a user’s browser run through 10 megabytes of data can be a heavy operation. Web workers allow us to offload the parsing of byte data into a separate cpu thread. Therefore freeing up the user’s browser, so they can continue using Uploadr while Exif is being parsed.

Exif Processing Flow

Once we had a web worker prototype setup, we next had to write to code that would parse the actual bytes.

The first thing we do is pre-fetch the JavaScript used in the web worker thread. Then when a user adds an image to the Uploadr we create event handlers for the worker. When a web worker calls postMessage() we capture that, check for Exif data and then display it on the page. Any additional processing is also done at this time. Parsing XMP data, for example, is done outside of the worker because the DOM isn’t available in worker threads.

Using Blob.slice() we pull out the first 128kb of the image to limit load on the worker and speed things up. The Exif specification states that all of the data should exist in the first 64kb, but IPTC sometimes goes beyond that, especially when formatted as XMP.

if (file.slice) {
	filePart = file.slice(0, 131072);
} else if (file.webkitSlice) {
	filePart = file.webkitSlice(0, 131072);
} else if (file.mozSlice) {
	filePart = file.mozSlice(0, 131072);
} else {
	filePart = file;
}

We create a new FileReader object and pass in the Blob slice to be read. An event handler is created at this point to handle the reading of the Blob data and pass it into the worker. FileReader.readAsBinaryString() is called, passing in the blob slice, to read it as a binary string into the worker.

binaryReader = new FileReader();

binaryReader.onload = function () {

	worker.postMessage({
		guid: guid,
		binary_string: binaryReader.result
	});

};

binaryReader.readAsBinaryString(filePart);

The worker receives the binary string and passes it through multiple Exif processors in succession. One for Exif data, one for XMP formatted IPTC data and one for unformatted IPTC data. Each of the processors uses postMessage() to post the Exif data back out and is caught by the module. The data is displayed in the uploadr, which is later sent along to the API with the uploaded batch.

On asynchronous Exif parsing

When reading in Exif data asynchronously we ran into a few problems, because processing does not happen immediately. We had to prevent the user from sorting their photos until all the Exif data was parsed, namely the date and time for “order by” sorting. We also ran into a race condition when getting tags out of the Exif data. If a user had already entered tags we want to amend those tags with what was possibly saved in their photo. We also update the Uploadr with data from Exiftool once it is processed on the back-end.

The Nitty Gritty: Creating EXIF Parsers and dealing with typed arrays support

pre-electronic binary code
pre-electronic binary code by dret

Creating an Exif parser is no simple task, but there are a few things to consider:

  • What specification of Exif are we dealing with? (Exif, XMP, IPTC, any and all of the above?)
  • When processing the binary string data, is it big or little endian?
  • How do we read binary data in a browser?
  • Do we have typed arrays support or do we need to create our own data view?

First things first, how do we read binary data?

As we saw above our worker is fed a binary string, meaning this is a stream of ASCII characters representing values from 0-255. We need to create a way to access and parse this data. The Exif specification defines a few different data value types we will encounter:

  • 1 = BYTE An 8-bit unsigned integer
  • 2 = ASCII An 8-bit byte containing one 7-bit ASCII code. The final byte is terminated with NULL.
  • 3 = SHORT A 16-bit (2-byte) unsigned integer
  • 4 = LONG A 32-bit (4-byte) unsigned integer
  • 5 = RATIONAL Two LONGs. The first LONG is the numerator and the second LONG expresses the denominator.
  • 7 = UNDEFINED An 8-bit byte that can take any value depending on the field definition
  • 9 = SLONG A 32-bit (4-byte) signed integer (2’s complement notation)
  • 10 = SRATIONAL Two SLONGs. The first SLONG is the numerator and the second SLONG is the denominator

So, we need to be able to read an unsigned int (1 byte), an unsigned short (2 bytes), an unsigned long (4 bytes), an slong (4 bytes signed), and an ASCII string. Since the we read the stream as a binary string it is already in ASCII, that one is done for us. The others can be accomplished by using typed arrays, if supported, or some fun binary math.

Typed Array Support

Now that we know what types of data we are expecting, we just need a way to translate the binary string we have into useful values. The easiest approach would be typed arrays (https://developer.mozilla.org/en/JavaScript_typed_arrays), meaning we can create an ArrayBuffer using the string we received from from the FileReader, and then create typed arrays, or views, as needed to read values from the string. Unfortunately array buffer views do not support endianness, so the preferred method is to use DataView (http://www.khronos.org/registry/typedarray/specs/latest/#8), which essentially creates a view to read into the buffer and spit out various integer types. Due to lack of great support, Firefox does not support DataView and Safari’s typed array support can be slow, we are currently using a combination of manual byte conversion and ArrayBuffer views.

var arrayBuffer = new ArrayBuffer(this.data.length);
var int8View = new Int8Array(arrayBuffer);

for (var i = 0; i &amp;lt; this.data.length; i++) {
	int8View[i] = this.data[i].charCodeAt(0);
}

this.buffer = arrayBuffer;

this.getUint8 = function(offset) {
	if (compatibility.ArrayBuffer) {

	return new Uint8Array(this.buffer, offset, 1)[0];
	}
	else {
		return this.data.charCodeAt(offset) &amp;amp; 0xff;
	}
}

Above we are creating an ArrayBuffer of length to match the data being passed in, and then creating a view consisting of 8-bit signed integers which allows us to store data into the ArrayBuffer from the data passed in. We then process the charCode() at each location in the data string passed in and store it in the array buffer via the int8View. Next you can see an example function, getUint8(), where we get an unsigned 8-bit value at a specified offset. If typed arrays are supported we use a Uint8Array view to access data from the buffer at an offset, otherwise we just get the character code at an offset and then mask the least significant 8 bits.

To read a short or long value we can do the following:

this.getLongAt = function(offset,littleEndian) {

	//DataView method
	return new DataView(this.buffer).getUint32(offset, littleEndian);

	//ArrayBufferView method always littleEndian
	var uint32Array = new Uint32Array(this.buffer);
	return uint32Array[offset];

	//The method we are currently using
	var b3 = this.getUint8(this.endianness(offset, 0, 4, littleEndian)),
	b2 = this.getUint8(this.endianness(offset, 1, 4, littleEndian)),
	b1 = this.getUint8(this.endianness(offset, 2, 4, littleEndian)),
	b0 = this.getUint8(this.endianness(offset, 3, 4, littleEndian));

	return (b3 * Math.pow(2, 24)) + (b2 &amp;lt;&amp;lt; 16) + (b1 &amp;lt;&amp;lt; 8) + b0;

}

The DataView method is pretty straight forward, as is the ArrayBufferView method, but without concern for endianness. The last method above, the one we are currently using, gets the unsigned int at each byte location for the 4 bytes. Transposes them based on endianness and then creates a long integer value out of it. This is an example of the custom binary math needed to support data view in Firefox.

When originally beginning to build out the Exif parser I found this jDataView (https://github.com/vjeux/jDataView) library written by Christopher Chedeau aka Vjeux (http://blog.vjeux.com/). Inspired by Christopher’s jDataView module we created a DataView module for YUI.

Translating all of this into useful data

There are a few documents you should become familiar with if you are considering writing your own Exif parser:

The diagram above is taken straight from the Exif specification section 4.5.4, it describes the basic structure for Exif data in compressed JPEG images. Exif data is broken up into application segments (APP0, APP1, etc.). Each application segment contains a maker, length, Exif identification code, TIFF header, and usually 2 image file directories (IFDs). These IFD subdirectories contain a series of tags, of which each contains the tag number, type, count or length, and the data itself or offset to the data. These tags are described in Appendix A of the TIFF6 Spec, or at Table 41 JPEG Compressed (4:2:0) File APP1 Description Sample in the Exif spec and also broken down on the Exif spec page created by TsuruZoh Tachibanaya.

Finding APP1

The first thing we want to find is the APP1 marker, so we know we are in the right place. For APP1, this is always the 2 bytes 0xFFE1, We usually check the last byte of this for the value 0xE1, or 225 in decimal, to prevent any endianness problems. The next thing we want to know is the size of the APP1 data, we can use this to optimize and know when to stop reading, which is also 2 bytes. Next up is the Exif header, which is always the 4 bytes 0x45, 0x78, 0x69, 0x66, or “Exif” in ASCII, which makes it easy. This is always followed up 2 null bytes 0x0000. Then begins the TIFF header and then the 0th IFD, where our Exif is stored, followed by the 1st IFD, which usually contains a thumbnail of the image.

We are concerned with application segment 1 (APP1). APP2 and others can contain other metadata about this compressed image, but we are interested in the Exif attribute information.

Wherefore art thou, TIFF header?

Once we know we are at APP1 we can move on to the TIFF header which starts with the byte alignment, 0x4949 (II, Intel) or 0x4D4D (MM, Motorola), Intel being little endian and Motorola being big endian. Then we have the tag marker, which is always 0x2A00 (or 0x002A for big endian): “an arbitrary but carefully chosen number (42) that further identifies the file as a TIFF file”. Next we have the offset to the first IFD, which is usually 0x08000000, or 8 bytes from the beginning of the TIFF header (The 8 bytes: 0x49 0x49 0x2A 0x00 0x08 0x00 0x00 0x00). Now we can begin parsing the 0th IFD!

The diagram above (taken from the TIFF6.0 specification found here: http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf), shows the structure of the TIFF header, the following IFD and a directory entry contained within the IFD.

The IFD starts off with the number of directory entries in the IFD, 2 bytes, then follows with all of the directory entries and ends with the offset to the next IFD if there is one. Each directory entry is 12 bytes long and comprised of 4 parts: the tag number, the data format, the number of components, and the data itself or an offset to the data value in the file. Then follows the offset to the next IFD which is again 8 bytes.

Example: Processing some real world bytes

Let’s run through an example below! I took a screen shot from hexfiend (http://ridiculousfish.com/hexfiend/, which is an awesome little program for looking at raw hex data from any file, I highly recommend it) and highlighted the appropriate bytes from start of image (SOI) to some tag examples.

This is the first 48 bytes of the image file. I’ve grouped everything into 2 byte groups and 12 byte columns, because IFD entries are 12 bytes it makes it easier to read. You can see the start of image marker (SOI), APP1 mark and size, “Exif” mark and null bytes. Next is the beginning of the TIFF header including byte align, the 42 TIFF verification mark, the offset to the 0th IFD, the number of directory entries, and then the first 2 directory entries. These entries are in little endian and I wrote them out as big endian to make them easier to read. Both of these first entries are of ASCII type, which always point to an offset in the file and ends with a null terminator byte.

Writing code to parse Exif

Now that we understand the tag structure and what we are looking for in our 128k of data we sliced from the beginning of the image, we can write some code to do just that. A lot of insipration for this code comes from an exif parser written by Jacob Seidelin, http://blog.nihilogic.dk, the original you can find here: http://www.nihilogic.dk/labs/exif/exif.js. We used a lot of his tag mapping objects to translate the Exif tag number values into tag names as well as his logic that applies to reading and finding Exif data in a binary string.

First we start looking for the APP1 marker, by looping through the binary string recording our offset and moving it up as we go along.

if (dataview.getByteAt(0) != 0xFF || dataview.getByteAt(1) != 0xD8) {
	return;
}
else {
	offset = 2;
	length = dataview.length;
	
	while (offset &amp;lt; length) {
		marker = dataview.getByteAt(offset+1);
		if (marker == 225) {
			readExifData(dataview, offset + 4, dataview.getShortAt(offset+2, true)-2);
			break;
		}
		else if(marker == 224) {
			offset = 20;
		}
		else {
			offset += 2 + dataview.getShortAt(offset+2, true);
		}
	}
}

We check for a valid SOI marker (0xFFD8) and then loop through the string we passed in. If we find the APP1 marker (225) we start reading Exif data, if we find a APP0 marker (224) we move the offset up by 20 and continue reading, otherwise we move the offset up by 2 plus the length of the APP data segment we are at, because it is not APP1, we are not interested.

Once we find what we are looking for we can look for the Exif header, endianness, the TIFF header, and look for IFD0.

function readExifData(dataview, start, length) {

	var littleEndian;
	var TIFFOffset = start + 6;

	if (dataview.getStringAt(iStart, 4) != &quot;Exif&quot;) {
		return false;
	}

	if (dataview.getShortAt(TIFFOffset) == 0x4949) {
		littleEndian = true;
		self.postMessage({msg:&quot;----Yes Little Endian&quot;});
	}
	else if (dataview.getShortAt(TIFFOffset) == 0x4D4D) {
		littleEndian = false;
		self.postMessage({msg:&quot;----Not Little Endian&quot;});
	}
	else {
		return false;
	}

	if (dataview.getShortAt(TIFFOffset+2, littleEndian) != 0x002A) {
		return false;
	}

	if (dataview.getLongAt(TIFFOffset+4, littleEndian) != 0x00000008) {
		return false;
	}

	var tags = ExifExtractorTags.readTags(dataview, TIFFOffset, TIFFOffset+8, ExifExtractorTags.Exif.TiffTags, littleEndian);

This is the first part of the readExifData function that is called once we find our APP1 segment marker. We start by verifying the Exif marker, then figuring out endianness, then checking if our TIFF header verification marker exists (42), and then getting our tags and values by calling ExifExtractorTags.readTags. We pass in the dataview to our binary string, the offset, the offset plus 8, which bypasses the TIFF header, the tags mapping object, and the endianness.

Next we pass that data into a function that creates an object which maps all of the tag numbers to real world descriptions, and includes maps for tags that have mappable values.

this.readTags = function(dataview, TIFFStart, dirStart, strings, littleEndian) {
	var entries = dataview.getShortAt(dirStart, littleEndian);
	var tags = {};
	var i;

	for (i = 0; i &amp;lt; entries; i++) {
		var entryOffset = dirStart + i*12 + 2;
		var tag = strings[dataview.getShortAt(entryOffset, littleEndian)];

		tags[tag] = this.readTagValue(dataview, entryOffset, TIFFStart, dirStart, littleEndian);
	}

	if(tags.ExifIFDPointer) {
		var entryOffset = dirStart + i*12 + 2;
		var IFD1Offset = dataview.getLongAt(entryOffset,littleEndian);

		tags.IFD1Offset = IFD1Offset;
	}

	return tags;
}

This function is quite simple, once we know where we are at of course. For each entry we get the tag name from our tag strings and create a key on a tags object with a value of the tag. If there is an IFD1, we store that offset in the tags object as well. The readTagValue function takes the dataview object, the entry’s offset, the TIFF starting point, the directory starting point (TIFFStart + 8), and then endianness. It returns the tag’s value based on the data type (byte, short, long, ASCII).

We return a tags object which has keys and values for various Exif tags that were found in the IFD. We check if ExifIFDPointer exists on this object, if so, we have IFD entries to pass back out of the worker and show the user. We also check for GPS data and an offset to the next IFD, IFD1Offset, if that exists we know we have another IFD, which is usually a thumbnail image.

if (tags.ExifIFDPointer) {

	var ExifTags = ExifExtractorTags.readTags(dataview, TIFFOffset, TIFFOffset + tags.ExifIFDPointer, ExifExtractorTags.Exif.Tags, littleEndian);

	for (var tag in ExifTags) {
		switch (tag) {
			case &quot;LightSource&quot; :
			case &quot;Flash&quot; :
			case &quot;MeteringMode&quot; :
			case &quot;ExposureProgram&quot; :
			case &quot;SensingMethod&quot; :
			case &quot;SceneCaptureType&quot; :
			case &quot;SceneType&quot; :
			case &quot;CustomRendered&quot; :
			case &quot;WhiteBalance&quot; :
			case &quot;GainControl&quot; :
			case &quot;Contrast&quot; :
			case &quot;Saturation&quot; :
			case &quot;Sharpness&quot; :
			case &quot;SubjectDistanceRange&quot; :
			case &quot;FileSource&quot; :
				ExifTags[tag] = ExifExtractorTags.Exif.StringValues[tag][ExifTags[tag]];
				break;
			case &quot;ExifVersion&quot; :
			case &quot;FlashpixVersion&quot; :
				ExifTags[tag] = String.fromCharCode(ExifTags[tag][0], ExifTags[tag][1], ExifTags[tag][2], ExifTags[tag][3]);
				break;
			case &quot;ComponentsConfiguration&quot; :
				ExifTags[tag] =
					ExifExtractorTags.Exif.StringValues.Components[ExifTags[tag][0]]
					+ ExifExtractorTags.Exif.StringValues.Components[ExifTags[tag][1]]
					+ ExifExtractorTags.Exif.StringValues.Components[ExifTags[tag][2]]
					+ ExifExtractorTags.Exif.StringValues.Components[ExifTags[tag][3]];
				break;
		}
		
		tags[tag] = ExifTags[tag];
	}
}

This is the rest of the readTags function, basically we are checking if ExifIFDPointer exists and then reading tags again at that offset pointer. Once we get another tags object back, we check to see if that tag has a value that needs to be mapped to a readable value. For example if the Flash Exif tag returns 0x0019 we can map that to “Flash fired, auto mode”.

if(tags.IFD1Offset) {
	IFD1Tags = ExifExtractorTags.readTags(dataview, TIFFOffset, tags.IFD1Offset + TIFFOffset, ExifExtractorTags.Exif.TiffTags, littleEndian);
	
	if(IFD1Tags.JPEGInterchangeFormat) {
		readThumbnailData(dataview, IFD1Tags.JPEGInterchangeFormat, IFD1Tags.JPEGInterchangeFormatLength, TIFFOffset, littleEndian);
	}
}

function readThumbnailData(dataview, ThumbStart, ThumbLength, TIFFOffset, littleEndian) {

	if (dataview.length &amp;lt; ThumbStart+TIFFOffset+ThumbLength) {
		return;
	}

	var data = dataview.getBytesAt(ThumbStart+TIFFOffset,ThumbLength);
	var hexData = new Array();
	var i;

	for(i in data) {
		if (data[i] &amp;lt; 16) {
			hexData[i] = &quot;0&quot;+data[i].toString(16);
		}
		else {
			hexData[i] = data[i].toString(16);
		}
	}

	self.postMessage({guid:dataview.guid, thumb_src:&quot;data:image/jpeg,%&quot;+hexData.join('%')});
}

The directory entry for the thumbnail image is just like the others. If we find the IFD1 offset at the end of IFD0, we pass the data back into the readTags function looking for two specific tags: JPEGInterchangeFormat (the offset to the thumbnail) and JPEGInterchangeFormatLength (the size of the thumbnail in bytes). We read in the correct amount of raw bytes at the appropriate offset, convert each byte into hex, and pass it back as a data URI to be inserted into the DOM showing the user a thumbnail while their photo is being uploaded.

As we get data back from the readTags function, we post a message out of the worker with the tags as an object. Which will be caught caught by our event handlers from earlier, shown the user, and stored as necessary to be uploaded when the user is ready.

We use this same process to parse older IPTC data. Essentially we look for an APP14 marker, a Photoshop 3.0 marker, a “8BIM” marker, and then begin running through the bytes looking for segment type, size, and data. We map the segment type against a lookup table to get the segment name and get size number of bytes at the offset to get the segment data. This is all stored in a tags object and passed out of the worker.

XMP data is a little different, even easier. Basically we look for the slice of data surrounded by the values “<x:xmpmeta” to “</x:xmpmeta>” in the binary string, then pass that out of the worker to be parsed via Y.DataType.XML.parse().

Conclusion

In conclusion the major steps we take to process an image’s Exif are:

  1. Initialize a web worker
  2. Get a file reference
  3. Get a slice of the file’s data
  4. Read a byte string
  5. Look for APP1/APP0 markers
  6. Look for Exif and TIFF header markers
  7. Look for IFD0 and IFD1
  8. Process entries from IFD0 and IFD1
  9. Pass data back out of the worker

That is pretty much all there is to reading Exif! The key is to be very forgiving in the parsing of Exif data, because there are a lot of different cameras out there and the format has changed over the years.

One final note: Web workers have made client-side Exif processing feasible at scale. Tasks like this can be performed without web workers, but run the risk of locking the UI thread – certainly not ideal for a web app that begs for user interaction.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

A Chinese puzzle: Unicode and EXIF metadata parsing

Le Presque-Cube

At flickr, there are quite a few photos and you can browse the site in 8 different languages, including Korean and Chinese. Common metadata such as title, description and tags can be pre-populated based on information contained in the image itself, using what is commonly called EXIF information. So, it makes sense to implement this with respect to language and, above all, alphabet/character encodings. 

Un peu de lecture...?

Well, what made sense did not make so much sense anymore when using existing specifications. Here is how we coped with them.

The standards

To begin with, there is not just EXIF. Metadata can actually be written within a picture file in at least 3 different formats:
EXIF itself.
IPTC-IIM headers.
XMP by Adobe.

Of course, these formats are neither mutually exclusive nor completely redundant and this is where this can get tricky.

But let’s step back a moment to describe the specifics of these formats, not in details, but with regards to our need, which is to extract information in a reliable way, independently from how/when/where the image was created.

The EXIF is the oldest form

Old Cars Part 1
Hence, with the most limitations yet the most widespread, as this is often the case in our industry. Thus, we need to deal with it, even though it is radically flawed from an internationalization stand point: text fields are not stored using UTF and most of the time there is no indication of the character set encoding.

IPTC-IIM

In its later versions, it added the optional (Grr!!!) support for a field indicating the encoding of all the string properties in the container.

XMP

At last, text from XMP format is stored in UTF-8. On a side note, the XML based openness of this format is not making things easier, for each application can come up with its own set of metadata. Nevertheless, from an internationalization and structural standpoint, this format is modernly adequate: finally!

So, now that we know what hides behind each of these standards we can start tackling our problem.
Hide n Seek

A solution ?

Rely on existing libraries, when performant

For flickr Desktop client (Windows, Mac), we are using Exiv2 Image metadata library, which helps the initial reconciliation between all fields (especially with EXIF and IPTC-IIM contained within XMP).

The “guesstimation” of character set from EXIF

坐在问号足球上的3D小人高清图片_zcool.com.cn
We first scan the string to see if all bytes are in the range: 0 to 127. If so, we treat the string as ASCII. If not, we scan the string to see if it is consistent with valid UTF-8. If so, we treat the string as UTF-8. Checking against UTF8 validity is not a bullet-proof test. But statistically, this is better than any other scenario. At the last resort, we pick a “reasonable” fallback encoding. For the desktop application, we use hints from the user system. On windows, the GetLocaleInfo function provides with the user default ANSI code page (LOCALE_IDEFAULTANSICODEPAGE), which can be used to specify an appropriate locale for string conversion methods. On Mac OS X, CFStringGetSystemEncoding is our ally. In our case, there is no point to use the characterset of our own application, which we control and is not linked to the characterset of the application that “wrote” the EXIF.

Consolidation: the easy case

The workflow followed by the image we handle is unknown. We can have all 3 formats filled-in but not necessarily by the same application. So, we need a mechanism to consolidate the data. The easy case is for single field such as title and description. We follow the obvious quality hierarchy of the different formats. To extract the description for instance, we first look for the XMP.dc.description field, then XMP.photoshop.Headline (to support the extensibility mentioned before as a side note), then IPTC.Application2.Caption and finally the Exif.Image.ImageDescription. We only keep the first data found and ignore the others. There is only one title and one description per image: might as well take the one we are sure about.

Consolidation: it becomes even trickier for tags.

puzzle pieces
The singleness of the final result disappearing (we deal with a list of tags, not just one single tag), we cannot ignore the “EXIF” tags as easily as for the title and the description case. Fortunately, IPTC Keywords are supposed to be mapped to XMP (dc:subject). Therefore we can take into account the number of keywords that would be extracted from EXIF and the number that would be extracted from XMP. If those equal, we plainly ignore the EXIF. If they don’t, we try to match each guestimation of EXIF keyword against the XMP keywords to avoid duplicates.
 
All in one, quite an interesting issue where, per design, the final solution is going to be an approximation with different results depending on the context. Computers: an exact science?
Computer History Museum

For more information and main reference: http://www.metadataworkinggroup.com/

Photos by Groume, Raïssa Bandou, seanmcgrath, tcp909, aftab., 姒儿喵喵 and Laughing Squid

Lightweight Layout Management in Flash

Resizable apps are groovy. Users want content to fill the screen, and as the variance in screen resolutions has grown this has become more and more important. To make this happen you can either write manual positioning code to place and size items on window resize, or you can define a hierarchy, some properties on elements of the hierarchy and have a layout engine do the work for you. Up until a few years ago, there were no good pre-built solutions to do this in Flash so the only option was to roll your own framework. Nowadays there are several floating around, including this one from the flash platform team at Yahoo!, and of course there is the Flex/MXML route.

XML based UI markup seems to be all the rage these days, but what I found personally through my experience building the jumpcut tools is that this “responsible” approach has drawbacks. A hierarchy of HBoxes and VBoxes can be a lot of overhead for simple projects, and can become unwieldy and rigid when you are trying to develop something quickly. If you are doing anything weird in your UI (overlapping elements, changing things at runtime) it can also feel like trying to fit a square peg in a round hole.

The Best of Both Worlds

So is there any way to get at a sweet-spot, something quick and flexible, yet powerful? The first thing that probably comes to mind if you are a web developer is the CSS/HTML model. In fact, I think would be a rather good option if not for one missing piece, the unsung hero of the web: the HTML rendering engine. If anyone wants to implement Gecko or WebKit in ActionScript I think that would be a fun project, but because the web ecosystem evolved with the markup user and not the layout engine implementer in mind it’s rather complicated. There is also the fact that this model is based on the assumption of a page of fixed width but infinite height which doesn’t exactly suit our purposes.

What we can borrow though is the philosophy of good defaults and implicit rather than explicit layout specification. When you are building out a layout with HTML/CSS you can just drop stuff on the page and things more or less work and resize as you’d expect, then selectively you can move things around / add styling and padding and so on. To do something similar we can take advantage of one of the nice new features of Actionscript 3, the display hierarchy, which allows us to easily traverse, observe, and manage all the DisplayObjects at run time, without keeping track of them on our own as they are created. This way we can recurse through the tree, look for the appropriate properties, and apply positioning and sizing as necessary. If the properties don’t exist we’ll just leave things alone. When you start your project you can just write code to position stuff manually, then as things get more complex and start to solidify you can begin using layout management where it makes sense. I’m a bit of a commitment-phobe myself, so this really appeals to me; what’s especially nice is that you easily plug this in to any existing codebase.

Some Code

So, to get this working all we really need is a singleton that listens to the stage resize event, traverses the tree and places/sizes objects. Mine looks something like this:

  
/*..*/
stage.addEventListener(Event.RESIZE, this.onResize);
/*..*/
 
function onResize(e:Event){
	this.traverse(root);
}
 
function traverse(obj:DisplayObject){

	for (var i=0; i  < obj.numChildren; i++){

		var child = new LayoutWrapper(obj.getChildAt(i));
		if(child.no_layout)
			continue;
	   
		//layout code to determine how to place and size children
		//child.x = something
		//child.y = something


		child.setSize(W, H); // pre-order





		this.traverse(child);

		child.refresh(); // post-order
	   
	}
}


Two things of note here.
First is the LayoutWrapper which I use to set defaults for properties and functions that don’t necessarily exist on all objects, so sort of like slipping in a base class without having to alter any code. I use flash_proxy to make this easier, but you don’t have to do that. One thing you will want to do is store dummy values for width and height (something like _width and _height maybe as a throwback) so you can assign and keep track of the width and height of container elements without actually scaling the object. You’ll also want to decide how you want to handle turning on / off the layout management and what defaults make sense for you. In the above code I’m doing opt-out, using a no_layout variable to selectively turn off layout management where I don’t need it.
Second, I have two hooks for sizing, setSize on the way down, and refresh on the way up. This is useful in some cases because what you want to do at higher levels may depend on the size of children at lower levels which can change on the way down.
The layout code itself is outside the scope of this blog post, but I’ve found that a vertical stack container, a horizontal stack container, padding, and spacing is all I’ve needed to accomplish just about any UI. Of course, you can and should implement any layout tools that are useful to you; different projects may call for different layout containers and properties.

Some Fun

One thing that we are doing by using this approach is pushing the specification of the display tree into run-time. The analogy with HTML here is the part of the DOM that is generated by visible source vs the part of the DOM that is generated by JavaScript.
Going all cowboy like this does have its drawbacks. Specifically it can get a bit confusing as to what the tree looks like, or why this or that is not showing up, where is this Sprite attached, etc. Web developers have the Firebug inspect tool for a very similar problem, which I’ve found rather handy. We can get a poor-man’s version of something like this by writing some code that pretty prints the tree under an object on rollover or use the Flex debugger, but, depending on the size of your project, this can get unwieldy rather quickly.

As an excuse to play with the groovy new flare visualization library recently I tried pointing it at the display tree. The result is rather useful for debugging, especially if you have a lot of state changes / masking / weirdness (plus you look like you are using a debugging tool from the friggin future). Some nice things I’ve added: on roll over the object gets highlighted, I can see an edit some basic properties and see them update live. There’s quite a bit of room here to do something Firebug like, with drag and drop reparenting and the like (but be careful, not too interesting or you’d wind up reimplementing the Flash IDE).

If you’d like to try this yourself, this is the code to populate the flare data structure with the display tree:

public function populateData(){
	// create data and set defaults
	this.graph = new Tree();
	var n = this.graph.addRoot();
	n.data.label = String(root);
	this.populateDataHelper(n, root);
}

 
public function populateDataHelper(n, s){

	try{ // need this in case of TextFields
		var num_children = s.numChildren;
	}
	catch(e:Error){
		return;
	}
	
	//limit in case you have extremely long lists 
	//which can dominate the display
	num_children = num_children < 20 ? num_children : 20; 

	for(var i=0; i  < num_children; i++){
		var child = s.getChildAt(i);

		if(
		child != this //don't show the layout visualizer itself!
		&& child.visible //don't show items that are not visible
		){
			var c = this.graph.addChild(n);
			c.data.sprite = child;
			c.data.label = String(child);
			this.populateDataHelper(c, child)
		}
	}
  
}

Then just apply a visualization to this tree; I've been simply using the demo examples which has been sufficient for my uses, but obviously you can write your own as well.

Here it is in action on the flickr slideshow:

Flickr Uploadr, start to finish now

Starting at Flickr a short nine months ago, I was given the state of the Flickr Uploadr and told to make it better. Better meant many things. It meant cross-platform so we could move forward with one codebase. It meant localized in all of Flickr’s languages without hackery. It meant new features that would make uploading easier and encourage people to add metadata to their photos. And while we didn’t explicitly talk about it at the time, better meant open source.

Settling on a platform

Straight C? No. Java Swing? Adobe AIR? XULRunner? So many choices, each with advantages and disadvantages. I ended up choosing to work with Mozilla’s XULRunner, which is what makes up the guts of Firefox and Thunderbird. The main advantages of XULRunner were the ability to link in outside code libraries (like GraphicsMagick) and the availability of real multithreading.

Learning the hard way

Since the project began I’ve jumped more than a few hurdles. I documented many of the more exciting problems on my blog (rcrowley.org) as I went. Crash course follows:

Cross-platform XPCOM (a howto)

Working from Mark Finkle’s crash course for Windows got me halfway and some other scattered resources helped to piece together the skeleton of an app that will run on both Windows and OS X. The code has evolved quite a bit since then but this process got me on my feet.

XUL overlays demystified

As apps grow you naturally need to break files up to save your sanity. I never found the crystal clear example of overlays that I wanted, so after I trial-and-errored my way out of the corner, I wrote out this common use case that Uploadr uses in several places.

Threading in Gecko 1.9

I’ve been developing against XULRunner 1.9 (and therefore Gecko 1.9) which are the underpinnings of Firefox 3. The thread primitives made available in 1.9 are much nicer than in Gecko 1.8. Uploadr uses a background thread for event queuing and this is a stripped down example of that same pattern.

MD5 in XULRunner (or Firefox extensions)

The Flickr API requires developers to sign calls with MD5. MD5 is built right into PHP but is conspicuously missing from JavaScript. There are JavaScript implementations out there but (just for kicks), here’s how to take advantage of Mozilla’s built-in hashing library.

Fun with Unicode!

Flickr has, from the very beginning, been an international place. Well before it was available in eight languages, it would accept user input in any language through the magic of UTF-8. Uploadr carries on this tradition but to bridge the gap between Windows’ UTF-16 Unicode support and GraphicsMagick’s non-Unicode-iness, some hacks had to be liberally applied. This code has changed a bit since, so check the latest out in Flickr Subversion.

Video interview with the Yahoo! Developer Network

Jeremy Zawodny from the Yahoo! Developer Network came up to San Francisco to chat about the new Flickr Uploadr a few months back. We talked about the development process, open source and where the future might lead.

The future is here now with an extension API ready for use in version 3.1. Check out the documentation and helloworld extension or check out the full source code and build tools.