A Summer at Flickr

This summer I had the unforgettable opportunity to work side-by-side with the some of the smartest, photography-loving software engineers I’ve ever met. Looking back on my first day at Flickr HQ – beginning with a harmonious welcome during Flickr Engineering’s weekly meeting – I can confidently say that over the past ten weeks I have become a much better software engineer.

One of my projects for the summer was to build a new and improved locking manager that controls the distribution of locks for offline tasks (or OLTs for short). Flickr uses OLTs all the time for data migration, uploading photos, updates to accounts, and more. An OLT needs to acquire a lock on a shared resource, such as a group or an account, to prevent other OLTs from accessing the same resource at the same time. The OLT will then release the lock when it’s done modifying the shared data. Myles wrote an excellent blog post on how Flickr uses offline tasks here.

When building a distributed lock system, we need to take into account a couple of important details. First, we need to make sure that all the lock servers are consistent. One way to maintain consistency is to elect one server to act as a master and the remaining servers as slaves, where the master server is responsible for data replication among the slave servers. Second, we need to account for network and hardware failures – for instance, if the master server goes down for some reason, we need to quickly elect a new master server from one of the slave servers. The good news is, Apache ZooKeeper is an open-source implementation of master-slave data replication, automatic leader election, and atomic distributed data reads and writes.


Offline tasks send lock acquire and release requests through ZooLocker. ZooLocker in turn interfaces with the ZooKeeper cluster to create and delete znodes that correspond to the individual locks.

In the new locking system (dubbed “ZooLocker”), each lock is stored as a unique data node (or znode) on the ZooKeeper servers. When a client acquires a lock, ZooLocker creates a znode that corresponds to the lock. If the znode already exists, ZooLocker will tell the client that the lock is currently in use. When a client releases the lock, ZooLocker deletes the corresponding znode from memory. ZooLocker stores helpful debugging information, such as the owner of the lock, the host it was created on, and the maximum amount of time to hold on to the lock, in a JSON-serialized format in the znode. ZooLocker also periodically scans through each znode in the ZooKeeper ensemble to release locks that are past their expiration time.

My locking manager is already serving locks in production. In spite of sudden spikes in lock acquire and release requests by clients, the system holds up pretty well.


A graph of the number of lock acquire requests in ZooLocker per second

My summer internship at Flickr has been an incredibly valuable experience for me. I have demystified the process of writing, testing, and integrating code into a running system that millions of people around the world use each and every day. I have also learned about the amazing work going on in the engineering team, the ups and downs the code deploy process, and how to dodge the incoming flying finger rockets that the Flickr team members fling at each other.  My internship at Flickr is an experience I will never forget, and I am very grateful to the entire Flickr team for giving me the opportunity to work and learn from them this summer.

Redis Global Locks Redux

In my last post I described how we use Redis to manage a global lock that allows us to automatically failover to a backup process if there was a problem in the primary process. The method described allegedly allowed for any number of backup processes to work in conjunction to pick up on primary failures and take over processing.

Locks #1
Locks #1 by Christoph Kummer

Thanks to an astute reader, it was pointed out that the code in the blog wouldn’t actually work as advertised:

 

The Problem

Nolan correctly noticed that when the backup processes attempts to acquire the lock via SETNX, that lock key will already exist from when it was acquired by the primary, and thus all subsequent attempts to acquire locks will simply end up constantly trying to acquire a lock that can never be acquired. As a reminder, here’s what we do when we check back on the status of a lock:

function checkLock(payload, lockIdentifier) {
    client.get(lockIdentifier, function(error, data) {
        // Error handling elided for brevity
        if (data !== DONE_VALUE) {
            acquireLock(payload, data + 1, lockCallback);
        } else {
            client.del(lockIdentifier);
        }
    });
}

And here’s the relevant bit from acquireLock that calls SETNX:

    client.setnx(lockIdentifier, attempt, function(error, data) {
        if (error) {
            logger.error("Error trying to acquire redis lock for: %s", lockIdentifier);
            return callback(error, dataForCallback(false));
        }

        return callback(null, dataForCallback(data === 1));
    });

So, you’re thinking, how could this vaunted failover process ever actually work? The answer is simple: the code from that post isn’t what we actually run. The actual production code has a single backup process, so it doesn’t try to re-acquire the lock in the event of failure, it just skips right to trying to send the message itself. In the previous post, I described a more general solution that would work for any number of backup processes, but I missed this one important detail.

That being said, with some relatively minor changes, it’s absolutely possible to support an arbitrary number of backup processes and still maintain the use of the global lock. The trivial solution is to simply have the backup process delete the key before trying to re-acquire the lock (or, technically acquire it anew). However, the problem with that becomes apparent pretty quickly. If there are multiple backup processes all deleting the lock and trying to SETNX a new lock again, there’s a good chance that a race condition could arise wherein one of backups deletes a lock that was acquired by another backup process, rather than the failed lock from the primary.

The Solution

Thankfully, Redis has a solution to help us out here: transactions. By using a combination of WATCH, MULTI, and EXEC, we can perform actions on the lock key and be confident that no one has modified it before our actions can complete. The process to acquire a lock remains the same: many processes will issue a SETNX and only one will win. The changes come into play when the processes that didn’t acquire the lock check back on its status. Whereas before, we simply checked the current value of the lock key, now we must go through the above described Redis transaction process. First we watch the key, then we do what amounts to a check and set (albeit with a few different actions to perform based on the outcome of the check):

function checkLock(payload, lockIdentifier, lastCount) {
    client.watch(lockIdentifier);
    client.multi()
        .get(lockIdentifier)
        .exec(function(error, replies) {
            if (!replies) {
                // Lock value changed while we were checking it, someone else got the lock
                client.get(lockIdentifier, function(error, newCount) {
                    setTimeout(checkLock, LOCK_EXPIRY, payload, lockIdentifier, newCount);
                });

                return;
            }

            var currentCount = replies[0];
            if (currentCount === null) {
                // No lock means someone else completed the work while we were checking on its status and the key has already been deleted
                return;
            } else if (currentCount === DONE_VALUE) {
                // Another process completed the work, let’s delete the lock key
                client.del(lockIdentifier);
            } else if (currentCount == lastCount) {
                // Key still exists, and no one has incremented the lock count, let’s try to reacquire the lock
                reacquireLock(payload, lockIdentifier, currentCount, doWork);
            } else {
                // Key still exists, but the value does not match what we expected, someone else has reacquired the lock, check back later to see how they fared
                setTimeout(checkLock, LOCK_EXPIRY, payload, lockIdentifier, currentCount);
            }
        });
}

As you can see, there are five basic cases we need to deal with after we get the value of the lock key:

  1. If we got a null reply back from Redis, that means that something else changed the value of our key, and our exec was aborted; i.e. someone else got the lock and changed its value before we could do anything. We just treat it as a failure to acquire the lock and check back again later.
  2. If we get back a reply from Redis, but the value for the key is null, that means that the work was actually completed and the key was deleted before we could do anything. In this case there’s nothing for us to do at all, so we can stop right away.
  3. If we get back a value for the lock key that is equal to our sentinel value, then someone else completed the work, but it’s up to us to clean up the lock key, so we issue a Redis DEL and call our job done.
  4. Here’s where things get interesting: if the key still exists, and its value (the number of attempts that have been made) is equal to our last attempt count, then we should try and reacquire the lock.
  5. The last scenario is where the key exists but its value (again, the number of attempts that have been made) does not equal our last attempt count. In this case, someone else has already tried to reacquire the lock and failed. We treat this as a failure to acquire the lock and schedule a timeout to check back later to see how whoever did acquire the lock got on. The appropriate action here is debatable. Depending on how long your underlying work takes, it may be better to actually try and reacquire the lock here as well, since whoever acquired the lock may have already failed. This can, however, lead to premature exhaustion of your attempt allotment, so to be safe, we just wait.

So, we’ve checked on our lock, and, since the previous process with the lock failed to complete its work, it’s time to actually try and reacquire the lock. The process in this case is similar to the above inasmuch as we must use Redis transactions to manage the reacquisition process, thankfully however, the steps are (somewhat) simpler:

function reacquireLock(payload, lockIdentifier, attemptCount, callback) {
    client.watch(lockIdentifier);
    client.get(lockIdentifier, function(error, data) {
        if (!data) {
            // Lock is gone, someone else completed the work and deleted the lock, nothing to do here, stop watching and carry on
            client.unwatch();
            return;
        }

        var attempts = parseInt(data, 10) + 1;

        if (attempts > MAX_ATTEMPTS) {
            // Our allotment has been exceeded by another process, unwatch and expire the key
            client.unwatch();
            client.expire(lockIdentifier, ((LOCK_EXPIRY / 1000) * 2));
            return;
        }

        client.multi()
            .set(lockIdentifier, attempts)
            .exec(function(error, replies) {
                if (!replies) {
                    // The value changed out from under us, we didn't get the lock!
                    client.get(lockIdentifier, function(error, currentAttemptCount) {
                        setTimeout(checkLock, LOCK_TIMEOUT, payload, lockIdentifier, currentAttemptCount);
                    });
                } else {
                    // Hooray, we acquired the lock!
                    callback(null, {
                        "acquired" : true,
                        "lockIdentifier" : lockIdentifier,
                        "payload" : payload
                    });
                }
            });
    });
}

As with checkLock we start out by watching the lock key, and proceed do a (comparitively) simplified check and set. In this case, we’ve “only” got three scenarios to deal with:

  1. If we’ve already exceeded our allotment of attempts, it’s time to give up. In this case, the allotment was actually exceeded in another worker, so we can just stop right away. We make sure to unwatch the key, and set it expire at some point far enough in the future that any remaining processes attempting to acquire locks will also see that it’s time to give up.

Assuming we’re still good to keep working, we try and update the lock key within a MULTI/EXEC block, where we have our remaining two scenarios:

  1. If we get no replies back, that again means that something changed the value of the lock key during our transaction and the EXEC was aborted. Since we failed to acquire the lock we just check back later to see what happened to whoever did acquire the lock.
  2. The last scenario is the one in which we managed to acquire the lock. In this case we just go ahead and do our work and hopefully complete it!

Bonus!

To make managing global locks even easier, I’ve gone ahead and generalized all the code mentioned in both this and the previous post on the subject into a tidy little event based npm package: https://github.com/yahoo/redis-locking-worker. Here’s a quick snippet of how to implement global locks using this new package:

var RedisLockingWorker = require("redis-locking-worker”);

var SUCCESS_CHANCE = 0.15;

var lock = new RedisLockingWorker({
    "lockKey" : "mylock",
    "statusLevel" : RedisLockingWorker.StatusLevels.Verbose,
    "lockTimeout" : 5000,
    "maxAttempts" : 5
});

lock.on("acquired", function(lastAttempt) {
    if (Math.random() <= SUCCESS_CHANCE) {
        console.log("Completed work successfully!", lastAttempt);
        lock.done(lastAttempt);
    } else {
        // oh no, we failed to do work!
        console.log("Failed to do work");
    }
});
lock.acquire();

There’s also a few other events you can use to track the lock status:

lock.on("locked", function() {
    console.log("Did not acquire lock, someone beat us to it");
});

lock.on("error", function(error) {
    console.error("Error from lock: %j", error);
});

lock.on("status", function(message) {
    console.log("Status message from lock: %s", message);
});

More Bonus!

If you don’t need the added complexity if multiple backup processes, I also want to give credit to npm user pokehanai who took the methodology described in the original post and created a generalized version of the two-worker solution: https://npmjs.org/package/redis-paired-worker.

Wrapping Up

So there you have it! Coordinating work on any number of processes across any number of hosts couldn’t be easier! If you have any questions or comments on this, please feel free to follow up on Twitter.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Highly Available Real Time Push Notifications and You

One of the goals of our recently launched (and awesome!) new Flickr iPhone app was to further increase user engagement on Flickr. One of the best ways to drive engagement is to make sure Flickr users know what’s happening on Flickr in as near-real time as possible. We already have email notifications, but email is no longer a good mechanism for real-time updates. Users may have many email accounts and may not check in frequently causing timeliness to go right out the window. Clearly this called for… PUSH NOTIFICATIONS!

Motor bike racer getting a push start at the track, Brisbane
Motor bike racer getting a push start at the track, Brisbane by State Library of Queensland, Australia

I know, you’re thinking, “anyone can build push notifications, we’ve been doing it since 2009!” Which is, of course, absolutely true. The process for delivering push notifications is well trod territory by this point. So… let’s just skip all that boring stuff and focus on how we decided on the underlying architecture for our implementation. Our decisions focused on four major factors:

  1. Impact to normal page serving times should be minimal
  2. Delivery should be in near-real time
  3. Handle thousands of notifications per second
  4. The underlying services should be highly available

Baby Steps

Given these goals, we started by looking at systems we already have in place. Everyone loves not writing new code, right? Our thoughts immediately went to Flickr’s existing PuSH infrastructure. Our PuSH implementation is a great way to get an overview of relevant activity on Flickr, but it has limitations that made it unsuitable for powering mobile push notifications. The primary concern is that it’s less-near-real time than we’d like it to be. On average, activities occurring on Flickr will be delivered to a subscribed PuSH endpoint within one minute. That’s certainly better than waiting for an email to arrive or waiting until the next time you log in to the site and see your activity feed, but it’s not good enough for mobile notifications! This delay is due to some design decisions at the core of the PuSH system. PuSH is designed to aggregate activity and deliver a periodic digest and, because of this, it has a built in window to allow multiple changes to the same photo to be accumulated. PuSH is also focused on ensured delivery, so it maintains an up to date list of all subscribers. These features, which make PuSH great for the purpose it was designed, make it not-so-great for real time notifications. So, repurposing the PuSH code for reuse in a more real time fashion proved to be untenable.

Tentative Plans

So, what to do? In the end we wound up building a new lightweight event system that is broken up into three phases:

  1. Event Generation
  2. Event Targeting
  3. Message Delivery

Event Generation

The event generation phase happens while processing the response to a user request. As such, we wanted to ensure that there was little to no impact on the response times as a result. To ensure this was the case, all we do here is a lightweight write into a global Redis queue. We store the minimum amount of data possible, just a few identifiers, so we don’t have to make any extra DB calls and slow down the response just to (potentially) kick off a push notification. Everything after this initial Redis action is processed out of band by our deferred task system and has no impact on site performance.

Event Targeting

Next in the process is the event targeting phase. Here we have many workers reading from the global Redis queue. When a worker receives an event from the queue it rehydrates the data and loads up any additional information necessary to act on the notification. This includes checking to see what users should be notified, whether those users have devices that are registered to receive notifications, if they’ve opted out of notifications of this type, and finally if they’ve muted activity for the object in question.

Message Delivery

Flickr’s web-serving stack is PHP, and, up until now, everything described has been processed by PHP. Unfortunately, one area where PHP does not excel is long-lived processes or network connections, both of which make delivering push notifications in real time much easier. Because of this we decided to build the final phase, message delivery, as a separate endpoint in Node.js.

So, the question arose: how do we get messages pending delivery from these PHP workers over to the Node.js endpoints that will actually deliver them? For this, we again turned to Redis, this time using its built in pub/sub functionality. The PHP workers simply publish a message to a Redis channel with the assumption that there’s a Node.js process subscribed to that channel eagerly awaiting some data on which it can act.

After that the Node process delivers the notification to Apple’s APNS push notification system. Communicating with APNS is a well-documented topic, and not one that’s particularly interesting. In fact, I can sum it up with a single link: https://github.com/argon/node-apn, a great npm package for talking to APNS.

The Real Challenge

There is, however, a much more interesting problem to discuss at this point: how do we ensure that delivery to APNS is both scalable and highly available? At first blush, this seems like it could be problematic. What if the Node.js worker has crashed? The message will just be lost to the ether! Solving this problem turned out to be the majority of the work involved in implementing push notifications.

Scalability

The first step to ensuring a service is scalable is to divide the workload. Since Node.js is single threaded, we would already be dividing the workload across individual Node.js processes anyway, so this works out well! When we publish messages to the Redis pub/sub channel, we simply publish to a sharded channel. Each Node.js process subscribes to some subset of those sharded channels, and so will only act on that subset of messages.

APNS, Redis Pub/Sub

Configuring our Node.js processes in this way makes it easy to scale horizontally. Whenever we need to add more processing power to the cluster, we can just add more servers and more shards. This also makes it easy to pull hosts out of rotation for maintenance without impacting message delivery: we simply reconfigure the remaining processes to subscribe to additional channels to pick up the slack.

Availability

Designing for high availability proved to be somewhat more challenging. We needed to ensure that we could lose individual Node processes, a whole server or even an entire data center without degrading our ability to deliver messages. And we wanted to avoid the need for a human in the loop — automatic failover.

We already knew that we’d have multiple hosts running in multiple data centers, so the main question was how to get them coordinating with each other so that we would not lose messages in the event of an outage while also ensuring we would not deliver the same message multiple times. Our first thought experiment along these lines was to implement a relatively complex message passing scheme, where two hosts would subscribe to a given channel, one as the primary and one as the backup. The primary would pass a message to the backup saying that it was starting to process a message, and another when it completed. The backup would wait a certain amount of time to receive the first and then the second message from the primary. If a message failed to arrive, it would assume something had gone wrong with the primary and attempt to complete delivery to Apple’s push notification gateway.

Initial Failover Plan

This plan had two major problems: hosts had to be aware of each other and increasing the number of hosts working in conjunction raised the complexity of ensuring reliable delivery.

We liked the idea of having one host serve as a backup for another, but we didn’t like having to coordinate the interaction between so many moving pieces. To solve this issue we went with a convention based approach. Instead of each host having to maintain a list of its partners, we just use Redis to maintain a global lock. Easy enough, right? Perhaps some code is in order!

Finally, some code!

First we create our Redis clients. We need one client for regular Redis commands we use to maintain the lock, and a separate client for Redis pub/sub commands.

var redis = require("redis");
var client = redis.createClient(config.port, config.host);
var pubsubClient = redis.createClient(config.port, config.host);

Next, subscribe to the sharded channel and set up a message handler:

// We could be subscribing to multiple shards, but for the sake of simplicity we’ll just subscribe to one here
pubsubClient.subscribe("notification_" + shard);
pubsubClient.on("message", handleMessage);

Now, the interesting part. We have multiple Node.js processes subscribed to the same Redis pub/sub channel, and each process is in a different data center. Whenever any of them receive a message, they attempt to acquire a lock for that message:

function handleMessage(channel, message) {
    // Error handling elided for brevity
    var payload = JSON.parse(message);

    acquireLock(payload, 1, lockCallback);
}

Managing locks with Redis is made easy using the SETNX command. SETNX is a “set if not exists” primitive. From the Redis docs:

Set key to hold string value if key does not exist. In that case, it is equal to SET. When key already holds a value, no operation is performed.

If we have multiple processes calling SETNX on the same key, the command will only succeed for the process that first makes the call, and in that case the response from Redis will be 1. For subsequent SETNX commands, the key will already exist, and the response from Redis will be 0. The value we try to set with SETNX keeps track of how many attempts have been made to deliver the message, initially set to one, this allows us to retry failed messages a predefined number of times before giving up entirely.

function acquireLock(payload, attempt, callback) {
    var lockIdentifier = "lock." + payload.identifier;

    function dataForCallback(acquired) {
        return {
            "acquired" : acquired,
            "lockIdentifier" : lockIdentifier,
            "payload" : payload,
            "attempt" : attempt
        };
    }

    // The value of the lock key indicates how many lock attempts have been made
    client.setnx(lockIdentifier, attempt, function(error, data) {
        if (error) {
            logger.error("Error trying to acquire redis lock for: %s", lockIdentifier);
            return callback(error, dataForCallback(false));
        }

        return callback(null, dataForCallback(data === 1));
    });
}

At this point our attempt to acquire the lock has either succeeded or failed, and our callback is invoked. What we do next depends on whether we managed to acquire the lock. If we did acquire the lock, we simply attempt to send the message. If we did not acquire the lock, then we will check back later to see if the message was sent successfully (more on this later):

function lockCallback(error, data) {
    // Again, error handling elided for brevity
    if (data && data.acquired) {
        return sendMessage(data.payload, data.lockIdentifier, data.attempt === MAX_ATTEMPTS);
    } else if (data && !data.acquired) {
        return setTimeout(checkLock, LOCK_EXPIRY, data.payload, data.lockIdentifier);
    }
}

Finally, it’s time to actually send the message! We do some work to process the payload into a form we can use to pass to APNS and send it off. If all goes well, we do one of two things:

  1. If this was our first attempt to send the message, we update the lock key in Redis to a sentinel value indicating we were successful. This is the value the backup processes will check for to determine whether or not sending succeeded.
  2. If this was our last attempt to send the message (i.e. the primary process failed to deliver and now a backup process is handling delivery), we simply delete the lock key.
function sendMessage(payload, lockIdentifier, lastAttempt) {
    // Does some work to process the payload and generate an APNS notification object
    var notification = generateApnsNotification(payload);

    if (notification) {
        // The APNS connection is defined/initialized elsewhere
        apnsConnection.sendNotification(notification);

        if (lastAttempt) {
            client.del(lockIdentifier);
        } else {
            client.set(lockIdentifier, DONE_VALUE);
        }
    }
}

There’s one final piece of the puzzle: checking the lock in the process that did not acquire it initially. Here we issue a Redis GET to retrieve the current value of the lock key. If the process that won the lock managed to send the message, this key should be set to a well known sentinel value. If so, we don’t have any work to do, and we can simply delete the lock. However, if this value is not set to that sentinel value, then something went wrong with delivery in the process that originally acquired the lock and we should step up and try to deliver the message from this backup process:

function checkLock(payload, lockIdentifier) {
    client.get(lockIdentifier, function(error, data) {
        // Error handling elided for brevity
        if (data !== DONE_VALUE) {
            acquireLock(payload, data + 1, lockCallback);
        } else {
            client.del(lockIdentifier);
        }
    });
}

Summing Up

So, there you have it in a nutshell. This method of coordinating between processes makes it very easy to adjust the number of processes subscribing to a given shard’s channels. There’s no need for any process subscribed to a channel to be aware of how many other processes are also subscribed. As long as we have at least two processes in separate data centers subscribing to each shard we are protected from all of the from the following scenarios:

  • The crash of any individual Node.js process
  • The loss of a single host running the Node.js processes
  • The loss of an entire data center containing many hosts running the Node.js processes

Let’s go back over our initial goals and see how we fared:

  1. Impact to normal page serving times should be minimal

We accomplish this by minimizing the workload done as part of the normal browser-driven request/response processing. The deferred task system picks up from there, out of band.

  1. Delivery should be in near-real time

Processing stats from our implementation show that time from user actions leading to event generation to message delivery averages about 400ms and is completely event driven (no polling).

  1. Handle thousands of notifications per second

In stress tests of our system, we were able to process more than 2,000 notifications per second on a single host (8 Node.js workers, each subscribing to multiple shards).

  1. The underlying services should be highly available

The availability design is resilient to a variety of failure scenarios, and failover is automatic.

We hope you’re enjoying push notifications in the new Flickr iPhone app.

Addendum!

There was a minor problem with the code in this post when supporting more than two workers. For a full explanation of the problem and the solution, check out Global Redis Locks Redux.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Designing an OSM Map Style

With the recent change to our map system, we introduced a new map style for our OSM tiles. Since 2008, we’ve used the default OSM styles, which produces map tiles like this:

This style is extremely good at putting a lot of information in front of you. OSM doesn’t know your intended purpose for the maps (navigation, orientation, exploration, city planning, disaster response, etc.), so they err on the side of lots of information. This is good, but with the introduction of TileMill, non-professional cartographers (like myself) can now easily change map styles to better suit our needs. Using TileMill, we decided to take a crack at designing a map that is better suited to Flickr.

On Flickr, we use maps for a very specific purpose: to provide context for a photo. This means there are a lot of map features that we can leave out entirely. We can choose to hide features that are primarily used for navigation (ferry and train routes, bus stops) or for demarcation (city and county boundaries). Roads are useful as orientation tools, but certain road features (like exit numbers on highways) aren’t needed. In the end, we can reduce the data that the map shows to much smaller and more useful subset:

This is the style provided by MapBox’s excellent OSM Bright. As a starting point, this gets us a long way towards our goal of an unobtrusive yet still useful map. We made a few changes to OSM Bright and released them on GitHub as our Pandonia map style. Here are a few examples of the changes we made:

  • Toned down the road, land, and water colors, to allow greater contrast with the pink and blue dots that we use as markers
  • Reduced the density of road and highway names, as well as city, town and state names
  • Removed underground tram and rail line
  • Removed land use overlays for residential, commercial, and industrial zones, as well as parking lots
  • Removed state park overlays that overlapped the water

This is how it looks:

We tried a lot of different color combinations on the road to this style. Here is an animation of the different styles we tried, starting with OSM Bright.

Here it is zoomed in a bit more:

Over the next couple of weeks, we’ll be rolling out this style to all of the places where we use OSM tiles.

These maps are still a work in progress. The world is a big place, and creating a unified style that works well for every single location is challenging. If you notice problems with our new map styles, please let us know!

The great map update of 2012

Today we are announcing an update to the map tiles which we use site wide. A very high majority of the globe will be represented by Nokia’s clever looking tiles.

Nokia map tile

We are not stopping there. As some of you may know, Flickr has been using Open Street Maps (OSM) data to make map tiles for some places. We started with Beijing and the list has grown to twenty one additional places:

Mogadishu
Cairo
Algiers
Kiev
Tokyo
Tehran
Hanoi
Ho Chi Minh City
Manila
Davao
Cebu
Baghdad
Kabul
Accra
Hispaniola
Havana
Kinshasa
Harare
Nairobi
Buenos aires
Santiago

It has been a while since we last updated our OSM tiles. Since 2009, the OSM community has advanced quite a bit in the tools they provide and data quality. I went into a little detail about this in a talk I gave last year.

Introducing Pandonia

Nokia map tile

Today we are launching Buenos Aires and Santiago in a new style. We will be launching more cities in this new style in the near future. They are built from more recent OSM data and they will also have an entirely new style which we call Pandonia. Our new style was designed in TileMill from the osm-bright template, both created by the rad team at MapBox. TileMill changes the game when it comes to styling map tiles. The interface is developed to let you quickly iterate style changes to tiles and see the changes immediately. Ross Harmes will be writing a more detailed account of the work he did to create the Pandonia style. We appreciate the tips and guidance from Eric Gunderson, Tom MacWright, and the rest of the team at MapBox

We are looking forward to updating all of our OSM places with the Pandonia style in the near future and growing to more places after that… Antarctica? Null Island? The Moon? Stay tuned and see…

Changing our Javascript API

To host all of these new tiles we needed to find a flexible javascript api. Cloudmade’s Leaflet is a simple and open source tile serving javascript library. The events and methods map well to our previous JS API, which made upgrading simple for us. All of our existing map interfaces will stay the same with the addition of modern map tiles. They will also support touch screen devices better than ever. Leaflet’s layers mechanism will make it easier for us to blend different tile sources together seamlessly. We have a fork on GitHub which we plan to contribute to as time goes on. We’ll keep you posted.

Group APIs

With over 1.5 million groups, it’s no doubt that they are an important part of Flickr. Today, we’re releasing a few new ways to interact with groups using our API.

Group Membership

Cat meeting...

We are adding two new methods to manage group membership through the API.

flickr.groups.join to join a group. Before calling this method, check if the group has rules using flickr.groups.getInfo. The user needs to agree to the rules before being able to join the group. Pass the accept_rules argument if the user accepted the rules.

flickr.groups.leave to leave a group. The user’s photos can also be deleted when leaving the group by passing the delete_photos argument.

Group Discussions

shut UP WALTON

We are also opening up group discussions in the API. You can now fetch a list of discussion topics for a group using flickr.groups.discuss.topics.getList, with sticky topics first, then regular topics sorted from newest to oldest.

<rsp stat="ok">
    <topics group_id="46744914@N00" iconserver="1" iconfarm="1" name="Tell a story in 5 frames (Visual story telling)" members="12428" privacy="3" lang="en-us" ispoolmoderated="1" total="4621" page="1" per_page="2" pages="2310">
        <topic id="72157625038324579" subject="A long time ago in a galaxy far, far away..." author="53930889@N04" authorname="Smallportfolio_jm08" role="member" iconserver="5169" iconfarm="6" count_replies="8" can_edit="0" can_delete="0" can_reply="0" is_sticky="0" is_locked="" datecreate="1287070965" datelastpost="1336905518">
            <message> ... </message>
        </topic>
    </topics>
</rsp>

flickr.groups.discuss.topics.add to post a new topic to a group, passing a subject and the message content.

Additionally, you can fetch a list of replies for a topic using flickr.groups.discuss.replies.getList, which includes the information for the topic along with all the replies, sorted from oldest to newest.

<rsp stat="ok">
    <replies>
        <topic topic_id="72157625038324579" subject="A long time ago in a galaxy far, far away..." group_id="46744914@N00" iconserver="1" iconfarm="1" name="Tell a story in 5 frames (Visual story telling)" author="53930889@N04" authorname="Smallportfolio_jm08" role="member" author_iconserver="5169" author_iconfarm="6" can_edit="0" can_delete="0" can_reply="0" is_sticky="0" is_locked="" datecreate="1287070965" datelastpost="1336905518" total="8" page="1" per_page="3" pages="2">
            <message> ... </message>
        </topic>
        <reply id="72157625163054214" author="41380738@N05" authorname="BlueRidgeKitties" role="member" iconserver="2459" iconfarm="3" can_edit="0" can_delete="0" datecreate="1287071539" lastedit="0">
            <message> ... </message>
        </reply>
    </replies>
</rsp>

flickr.groups.discuss.replies.add to post a reply to a topic, passing the message content.

flickr.groups.discuss.replies.edit to edit a reply, passing the updated message.

flickr.groups.discuss.replies.delete to delete a reply.

You can only edit and delete replies when authorized as the owner of the reply. For now, it is not possible to edit or delete a topic through the API.

If you have any questions, comments, concerns, or just want to chat about these methods or anything else related to the API, please join the Flickr Developer mailing list.

Photos from fofurasfelinas and larissa_allen.

Liquid Photo Page Layout

The Flickr photo page has gone through several revisions over the years. It was initially designed for 800×600 pixel displays, with a 500 pixel wide photo and a 250 pixel wide sidebar.


The 500×375 photo takes up 9.1% of the 1905×1079 pixels available in my viewport

By 2010, display resolutions had increased significantly, and 1024×768 became the new standard for our smallest supported resolution. We launched a re-designed photo page, designed for a width of 960. It featured a 640 pixel wide photo and a sidebar of 300 pixels.


The 640×480 photo takes up 14.9% of the 1905×1079 pixels available in my viewport

Since then the number of different display resolutions has increased and larger sizes have become more popular, but the number of users still on 1024×768 displays have made it hard to increase the width of the page beyond 960. We realized that we would always have to support smaller monitors, but that there was no reason not to give bigger photos to those with larger monitors. The recent launch of the 800, 1600, and 2048 photo sizes gave us a lot of different options for showing big, beautiful photos to members, and we wanted to take advantage of that. Starting today, we will display the biggest photo that we can on the photo page for your monitor.


The 1213×910 photo takes up 53.7% of the 1905×1079 pixels available in my viewport

Algorithmic

As you use the new liquid photo page, you may notice that the page content doesn’t always fill the entire viewport. This is because we created an algorithm for taking the width and height into account that will display content at a width that will best showcase the most common photo ratio, the 4:3. Here are the goals of that algorithm:

  1. Show the biggest photo the window allows
  2. Ensure the title and the sidebar are visible
  3. Keep the width of the page consistent across all photo pages, regardless of the individual photo dimensions
  4. Whenever possible, prefer native dimensions of a photo size (i.e., resist downsampling and never upsample)

Going Big

Big photos are really compelling. We knew from using the Flickr Light Box that our members’ photos look amazing at full screen, and we wanted to give the same experience on the photo page. This part of the algorithm was easy; as soon as the page starts loading, we read the innerWidth and innerHeight of the viewport (or the browser’s equivalent), and then go through the photo sizes that the photo owner allows us to display to find the best fit. If the photo is a little too big for the space we have to work with, we scale it down in the browser.

Providing Context

As great as a giant photo is, a photo is more than just its pixels. The context and story around a photo is just as important. Imagine a photo of a tiger; it’s impressive in its own right, but throw in a map showing that the tiger is in a public park, and a title stating, “A Tiger Escaped From the Zoo!” and then you really have something.</>

We decided that the title and the sidebar are important enough to make it worth showing a slightly smaller photo on the page. We adjusted the algorithm to take into account the width of the sidebar and its gutter (335 pixels) and the height of the first line of the title (45 pixels) when calculating how much available space there is for a photo.

Site Consistency

So far, so good. However, as we used the liquid photo page we noticed that it had one fatal flaw: Since the algorithm uses the dimensions of the photo that you are viewing to adjust the page width, it changes from photo to photo. This mean that if you’re browsing through some photos, the elements of the page are moving around from page to page. This is especially problematic with the header and the Next / Previous buttons; It’s incredibly difficult to navigate around if you always have to hunt around to find them first.

To fix this problem, we decided to make the algorithm ignore the dimensions of the currently displayed photo when calculating page width, and instead to always use the dimensions of an imaginary 4:3 photo. This means that the page width will always be the same for any given combination of viewport width and viewport height, and that the UI elements will be in the same places for each page. The downsides of this are that photos that aren’t 4:3 will have more whitespace around them and even potentially be cut off by the bottom of the page, forcing the viewer to scroll. Using a consistent width is definitely the lesser of the two evils, though. The current photo page has the same problem with photos that are taller than they are wide being below the fold, and we’ve been happily viewing them for years.

Going Native

These days, browsers do a pretty good job scaling a photo down. By default, most browsers err on the side of quality rather than speed, so the resulting photo should look good regardless of the size it is displayed. That being said, if we ever downsample a photo, then we are downloading more pixels than we need and throwing them away. This isn’t good for performance.

We adjusted the algorithm to favor native sizes, even if that means a slightly smaller photo is shown. We coded in detents, so that if a photo size is within 60 pixels of a native size, we will just use that size instead of downsampling a larger one. This means the page loads faster and that most common monitor resolutions will see photos at the native size, as this table illustrates (percentage use data from StatCounter):

Resolution Use % Page width Image size Image width Efficiency
1366 x 768 19.28% 975px Medium 640 640px 100.0%
1024 x 768 18.60% 975px Medium 640 640px 100.0%
1280 x 800 12.95% 1044px Medium 800 709px 88.6%
1280 x 1024 7.48% 1216px Large 1024 881px 86.0%
1440 x 900 6.60% 1135px Medium 800 800px 100.0%
1920 x 1080 5.09% 1359px Large 1024 1024px 100.0%
1600 x 900 3.83% 1135px Medium 800 800px 100.0%
1680 x 1050 3.63% 1359px Large 1024 1024px 100.0%
1360 x 768 2.32% 975px Medium 640 640px 100.0%

Titles Are for Squares, Man

Square photos are an interesting loophole in the way we size photos. Because we’re targeting an imaginary 4:3 photo, square photos will be displayed with more actual pixels than any other size, taking up the full width and height allotted. While browsing the site we noticed this, as well as the fact that the title is never visible. In order to bring the overall pixel count more in line with landscape and portrait photos, we reduce the size of square photos a bit more than the others. This helps ensure that the titles are always visible as well.

Making it Fast

Now that the algorithm is complete, we need to work on the performance. We noticed that reading the viewport dimensions and resizing the page every single time you go to a photo is unnecessary and distracting (since the page loads with a width of 960 and must be adjusted after the JavaScript loads on the page). To fix this, we cache the viewport dimensions in a cookie that can be read by the PHP code that generates the page. The first time you go to a liquid photo page, we have no choice but to adjust the page width on the fly. But every other photo page you visit will have the dimensions stored from the last page, and the page will be rendered with the correct width from the start.

More to Come

We have a lot more changes in store for this year. Stay tuned!

Building The Flickr Web Uploadr: The Grid

The new Flickr Web Uploadr is the result of a good amount of prototyping, research and good old-fashioned testing across the team that built it. This article goes into some of the details behind the “grid” – the area where photo thumbnails are shown – and sheds a little light on some of the thinking and logic behind the scenes. It’s a little lengthy, but don’t worry, there are pictures!

In April 2012, Flickr started rolling out its new web-based upload UI to the masses. We’re stoked to see it out there, and user feedback has been overwhelmingly positive. The product is an ongoing work in progress and enhancements are still being added, but the core is quite well-established and the experience is a significant upgrade over the one provided by the previous web-based uploadr.

Flickr Web Uploader UI (2012)
The new Flickr Web Uploadr. It’s powerful, it’s got a dark background, and it’s fast.

The new uploadr has also simply been fun to work on; there are numerous interesting challenges in terms of UI, interactions, performance and sheer scale on the front-end that we had to feel confident in tackling before we were able to commit to moving forward with the project.

Building The Grid: Prototypes

Initial discussions about the new Flickr uploadr weren’t too detailed, because I think everyone already had a pretty good idea of what we wanted to see in a browser: Something more desktop-like, feature-wise (like our older XUL-based Flickr Uploadr application) that would load and show photo thumbnails in a grid arrangement, with a desktop-like selection and batch editing model.

The next step was to start building a prototype in plain old HTML, CSS and JavaScript, and then figure out how many photos we could potentially get into the thing before it broke down. Could the grid handle selection and editing of 1,000 items? 10,000 items? I was cautiously optimistic. A continuous joke I had with the team was that I had built this before, in 2005: The project was an adventurous redesign of Yahoo! Photos, and joking aside, it actually did share a lot of design and interaction elements in common with what we were about to build. In 2005, we were targeting IE 6 and Firefox 1.5, so the landscape has changed a lot in terms of support and performance. Seven years later, it was fun to review some of the lessons and fun bits from the Y! Photos redesign as applicable to Flickr.

Prototype: Fluid Grid Layout

Some of the first prototypes involved building a grid layout, forming a two-column page that would be fluid to the browser width. We wanted to guarantee at least three photos per row would show in the grid, so the thumbnails could scale themselves relative to the browser size in order to fit in the space – easily done via CSS’ min-width and max-width attributes.


A very early version of the uploadr UI.

The earliest prototypes simply populated the DOM with a few hundred copies of a cloned photo item “template”, to give the idea of what a busy UI might look like. It was mostly just HTML and CSS at this point.

With the grid rendering in fluid form as a series of inline-block <li> elements, the next thing to start was the selection model.

Selection and Drag Events

Building a desktop-like selection and drag-and-drop model can be a technical challenge, given the underlying complexity. As anyone who’s built one of these will understand, there are a whole ton of interactions one must consider and account for between event monitoring, coordinate tracking, drag-to-select vs. rearrange intents, event cancellation, handling of invalid actions and so on.

Selection

In general, all user interactions start with watching mousedown() events inside the grid area. If mousedown() fires within “whitespace”, any existing selection is reset and mousemove() events are then used to draw a selection marquee which compares coordinates to the grid, highlighting items based on basic region intersection logic (for example, xyToRowCol(), points can be checked to see what grid row/column they fall within and thus “from/to” ranges can be established for a given marquee box.) Once a mouseup() event fires, selection can be completed and the mousemove() and mouseup() handlers released.


Testing the selection UI at various grid sizes.

The above marquee drawing and intersection logic is not terribly fancy, but things start to get interesting when you throw in additional positioning considerations like vertical offset from window scrolling (and drag-initiated window scrolling), browser window resizing affecting layout, positioning of the marquee UI vs. coordinates of the underlying grid items and so forth. Keyboard modifiers can also affect selection mode – whether selection is exclusive, additive or toggle-based – so an intersect does not also always mean “select this item”, too.

Flickr Web Upload UI: Selection Screenshot
Marquee selection mode in action.

Dragging + Rearrange

When mousedown() fires on an unselected grid item, selection can immediately change to only that item (unless selection mode is additive or toggle-based via a modifier key.) If firing on an already-selected item, mousemove() is watched for a “threshold” of perhaps 4+ pixels of movement from the original coordinates, at which point “dragging” becomes active.

Once dragging has begun, the selected grid DOM elements are marked with a “disabled” CSS class, greying them out somewhat to indicate drag state, and mousemove() now moves around a cursor trailer that shows the count of items being rearranged.

Rearrange mode, once entered, is similar to the marquee selection mode except that now only a single mouse coordinate is checked in order to determine what row and column is the current “target” for rearrange – that is, what position the user intends to drag the selected photo(s) to. The logic here can get interesting in edge cases, because the user is able to insert both “before” and “after” a given target point based on whether the cursor is on the left side, or the right side of the target.

In terms of the UI, the current drag target simply has an “insert-before” or “insert-after” CSS class appended to it which results in the appropriate “insert point” marker (a CSS border) being applied to it.

Flickr Web Upload UI: Rearrange
Rearrange mode in action.

Once mouseup() fires on a valid rearrange target, the actual rearrange action is applied to both the UI and data model. The underlying JavaScript re-appends the dragged DOM nodes next to their new target sibling node and then splices the photo item array, matching the order of the array to the new layout shown in the UI.

Additional Selection Interactions

A few other use cases to consider: Clicking an item, then shift + clicking another should have the effect of setting an “anchor point”, and selecting a range of items from X-Y within the grid. The user should be able, once setting an anchor point, to “pivot” from that point by clicking while continuing to hold the shift key. (Put another way, holding shift should not set the anchor point when clicking.)

By holding CTRL (or the Command/Apple key on OS X), selection should be additive and toggle-based. My approach to this meant taking a “snapshot” of the selection when marquee drawing begins, and then applying the logic based on mouse coordinates and keypresses with each draw action. This way, you can draw a marquee over and out of an existing selection, causing it to “toggle” and reset accordingly without losing your original state. A new snapshot is only taken once the selection is finalized at mouseup() time.

Demo video: Uploadr Prototype UI

Here is a screencast of a very early version of the Uploadr grid UI, showing the basics of mouse-based selection interactions, scrolling and resizing. By this time, selection events were also firing and updating the “editr panel” area as well.

Enter The Keyboard

With mouse events working, additional consideration was given to keyboard shortcuts. We intended to have a UI that supported most if not all of the same selection, editing and rearrange actions that could be achieved via the mouse. An important part to making this work involved watching focus inside the grid, tracking the last-known selected item, and supporting the use of the arrow keys as a means of changing focus between grid items.

Focus-based navigation in the grid is interesting, more akin to mouse movement and hover behaviour. It is intentionally separate from keyboard-based selection (which is invoked with a toggle behaviour via the spacebar, or selection and editing of a single item via the return key.) Using this approach, it is relatively easy to navigate and build up a selection of items via the arrow keys and spacebar.

For rearrange, a cut-and-paste approach was used; CTRL or Command/Apple + X (“cut”) are used to begin rearrange, arrow keys set the target rearrange point, and CTRL + V or return will apply the rearrange at the given target. If active, pressing escape will exit rearrange mode.

Performance: Scaling The Front-End

An important step in the grid prototype, once it was rendering in a fluid fashion, was to see find all the ways in which we could get it to break down. Which browsers were first to choke under the DOM load as more nodes were written out? Was layout and rendering the bottleneck? Were too many events firing? Was the JS engine spending too much time updating the DOM?

After rendering several hundred photos in the UI, we started to see evidence of browsers getting laggy in terms of responsiveness, and CPU + RAM use trending upward. With plans to extend this UI to handle numbers of photos in the thousands, a number of optimizations were made up front including aggressive pruning of the DOM as the user scrolled the page.

In brief, the trick is to create a large page with no content and only generate the DOM to reflect the slice of the whole view being shown.

Given events like window scrolling and resize affecting browser coordinates and DOM layout, we are easily able to calculate and cache the changes as they happen, making quick lookups to determine precisely what range of grid items are in view for the user. A single “page” of grid items can then be generated on the fly, appended to the DOM and shown to the browser. Events like browser resize invalidates the coordinate cache, so the DOM reflows and the grid refresh / display process repeats itself in a throttled fashion when this happens.

Event Throttling: Responsiveness’ Dirty Little Secret

Native DOM events are useful, but they can fire quite aggressively and left unchecked, can really hurt the performance of your application. Scrolling and resize are good examples for the grid case, as we want the UI to respond with an updated display pretty quickly when scrolling – but we know that we only have to show new items when a new row comes into view, which is typically only every 200 vertical pixels. With resizing, we only need to reflow the grid when resizing has added or removed enough horizontal room that we’ve lost or gained a new column.

In short, if you know events will fire often, subscribe to all of them but only do expensive work if there are real changes to apply. Alternately, you could only let resize handlers (for example) fire once every 500 milliseconds and do the work every time, so your handler only fires twice a second in the worst-case scenario.

Cache The Hell Out Of The DOM

This was hinted at previously, but is worth repeating: Get references and read values once, particularly from the DOM, and cache them when initially retrieving and updating them in response to events. If you know what a value is going to be, don’t query for it.

In JavaScript, an internal lookup is far faster than reaching out to query the DOM for attributes like offsetWidth, for example. Simply reading certain attributes of DOM nodes can cause layout and reflow to happen in the browser, which means you’re making the browser do more work for information that is likely unchanged. Thrown into a loop mixed with DOM writes, this makes for pretty disastrous browser performance.

JavaScript frameworks like YUI et al should do their own caching of this data, but I see no downside in grabbing and storing this stuff locally yourself; as the implementer, you have the best idea of what data is most static and what is not.

Additionally, try to read at once and write at once to the DOM; don’t have loops that do a write and then a read, for example. Try to write DOM interactions that follow the browser’s rendering model, minimizing the back-and-forth of layout/reflow/display calculations. Use document fragments to build up collections of DOM nodes, and append them once to the DOM vs. using innerHTML, or – worse – multiple appendChild() calls. Don’t query className when you likely know what it’s going to be; track that state internally in JS, instead, and only write changes out to the DOM.

“Stateful” CSS Class Names

I’ve been a fan of the concept of “stateful” CSS – eg., .is_selected { border: red; } for years. Not only is state consistent, but using CSS in this way also encourages better separation of concerns (and less temptation to add or remove DOM nodes via JS when making changes.)

When you want to grey something out, for example, you may set a disabled property to true on a JS object. That easily translates to a CSS class name change including .disabled {} applied to the relevant DOM node. As a result, your DOM is logically reflecting your JS state. It’s also helpful when troubleshooting, because you can add the class name to nodes ad-hoc when testing UI features.

For the grid’s purposes, every grid item contains all relevant “states” and the markup for those states – selection, thumbnail, progress, overlay icons, messages, errors and so forth. This makes it very easy to change the item’s display with a single, or few additional CSS class names, and minimizes the amount of work JS has to do to update the DOM. It is also trivial to combine states this way, also – e.g., a photo upload that has a thumbnail, but is in a “failed” state because it’s over-size.

While uploading, for example, a grid item may have class="has-thumbnail working selected", then completes with class="has-thumbnail has-fullsize-thumbnail complete" when the upload has finished. All JS did here was update the class name (and while actively uploading, redraw a small progress meter on the item.) Thus, JS/DOM interaction is fairly minimal.

A single CSS change can also completely change the display of the grid, also. “Info view” is one example of this. When enabled, a single additional class on the grid container causes all photo items to show overlay icons with their privacy state, and additional icons if they have tags, are in a set and so on.

Flickr Web Upload UI: Info View
“Info” view, showing overlays with privacy, state and other information.

Broadcast Events FTW

Events are a great way for modular bits of code, written by the same or separate people, to work on separate problems independently. Among other things, the grid listens for events regarding file addition, removal, progress and success / failure states from the upload queue module. The grid generates and fires events itself reflecting changes around selection, editing and arrangement as the user is doing their work, which are picked up by the “editr panel” at left that updates to reflect the selection state. Provided that events are kept as simple notifications and relatively one-way, there is little risk of complex event-related tracing in the unlikely, er – event – that something that goes wrong.

Flickr uses YUI 3 extensively, and we write and plug our application code into the system as YUI 3 modules. In addition to the excellent modular framework approach, we take advantage of the DOM and Event functionality in particular.

In Summary

The grid is only one of several modules that make up the new Flickr Web Uploadr, and is primarily responsible for the display and updating of photo thumbnails, selection, arrangement and basic metadata. There is a lot more going on in terms of JavaScript and network state under the hood, including API calls and permissions; posts highlighting some of the other fun areas are forthcoming.

As it turns out, building a feature-rich browser-based application for millions of people that looks good, is fast and supports many use cases including constraints and unexpected error conditions, can be a challenge. It’s also part of the fun.

Flickr flamily floto

Like this post? Have a love of online photography? Want to work with us? Flickr is hiring engineers, designers and product managers in our San Francisco office. Find out more at flickr.com/jobs.

Raising the bar on web uploads

With over seven billion photos uploaded since day one, it’s safe to say that uploading is an important part of the Flickr experience.

There are numerous ways to get photos onto Flickr, but the native web-based one at flickr.com/photos/upload/ is especially important as it typically accounts for a majority of uploads to the site.

A brief history of Flickr “Web Uploadrs”

Flickr “Flashy” Uploadr UI (2008) vs. Basic Uploadr UI

Earlier versions of Flickr’s web-based upload UI used a simple <form> with six file inputs, and no more. As the site grew in scale, the native web upload experience had to scale to match. In early 2008, an HTML/Flash hybrid upgrade added support for batch file selection, allowing up to several gigabytes of files to be uploaded in one session. This was a much-needed step in the right direction.

The “flashy” uploader does one thing – sending lots of files – fast, and reliably. However, it was not designed to tackle the other tasks one often performs on photos including adding and editing of metadata, sorting and organizing. As a result, “upload and organize” has traditionally been reinforced as two separate actions on Flickr when using the web-based UI.

The new (mostly-HTML5-based) shiny

Thanks to HTML5-based features in newer browsers, we have been able to build a new uploader that’s pretty slick, and is more desktop application-like than ever before; it brings us closer to the idea of a one-stop “upload and organize” experience. At the same time, the UI also retains common web conventions and has a distinct Flickr feel to it. We think the result is a pretty good mix, combining some of the best parts of both.

As feedback from a group of beta testers have confirmed, it can also be deceivingly fast.

The new Flickr Web Uploader. It’s powerful, it’s got a dark background, and it’s fast.

Features: An Overview

Here are a few fun things the new uploader does:

  • Drag and drop batches of files from your OS. Where present and supported, EXIF thumbnails are shown in the UI almost immediately.

  • Fluid photo “grid” shows photo thumbnails, allows larger, lightbox-style previews, inline editing of description/title and rotation.

  • Mouse and keyboard-based grid selection and rearrange functionality similar to that of desktops.

  • “Editor panel” shows state of current selection, provides powerful batch editing features (title + description, adding of tags, people, sets, license, privacy etc.)

  • “Info” mode shows overlay icons on grid items, allowing for a quick overview of pending edits (privacy, people, tags etc.)

  • Auto-retry and recovery cases for dropped / lost connection cases

Technical Bits

A small book could probably be written on the process, prototypes and technology decisions made during the development of this uploader, but we’ll save the gory details for a couple of in-depth blog posts which will highlight specific parts of the UI. In the meantime, here are some notes on the tech used:

  • HTML5 File APIs

    Modern browser file APIs make up the core of file handling functionality, including drag-and-dropping of files right into the browser. FileReader-type APIs allow access to data from disk, enabling things like EXIF thumbnail parsing and retrieval where supported. EXIF parsing is almost instantaneous and thumbnails are hugely valuable, of course, in prompting users’ editing decisions.

    (For browsers without the relevant file APIs, a Flash-based fallback is used in which case file drag-and-drop is not supported, and EXIF thumb previews are not implemented.)

  • CSS3

    Thanks to growing support across newer browsers, we’ve been able to produce a modern design that takes advantage of CSS-based gradients to achieve visual goals that would have traditionally required external images, and occasionally, hacks or shims in our HTML and JavaScript.

    CSS3′s border-radius, text-shadow and box-shadow are also featured nicely in this new design, alongside visual transform effects such as rotate, zoom and scale. Eagle-eyed users of newer Webkit builds such as Chrome Canary may even see a little use of filter with blur here and there.

    CSS transitions are also featured extensively in the new uploader, a notable shift away from animation sequences which would traditionally have been calculated and rendered by JavaScript. Good candidates for transitions include the expanding or collapsing of a menu section, or a background color fade when a text area is focused, for example.

    While triggering transitions and/or transforms can be a little quirky depending on the current “state” of the element (for example, an element just added to the DOM may need a moment to settle and be rendered before transitioning,) the advantage of using CSS vs. JS for “enhancement”-style UI effects like these is absolutely clear.

  • YUI3

    Thanks to YUI3, the new Flickr Uploader is a highly-modularized, component-based application. The editr module itself is comprised of about 35 sub-modules, following YUI’s standard module pattern. In Flickr’s case, modules are defined as being JavaScript, CSS or string (i.e., language translation) components. This compartmentalization approach reduces the overall complexity of code, encourages extensibility and allows developers to work on features within a specific scope.

A sneak peek: Screencast (Beta Version)

At time of writing, the new uploader is being gradually rolled out to the masses. For those who haven’t seen it yet, here’s a demo screencast of an earlier beta version showing some of the interactions for common upload and editing use cases. (Best viewed full-screen, and with “HD” on.) The video gives an idea of what the experience is like, but it’s best seen in person. We’ve really had a lot of fun building this one.

Building an HTML5 Photo Editor

Introducing guest blogger, Ari Fuchs. He is a Lead API Engineer and Developer Evangelist at Aviary. He has spent the last 3 years building out Aviary’s internal and external facing APIs, and is now working with partners to bring Aviary’s tools to the masses. He also did a lot of work to bring the Aviary editor to Flickr. Follow him on Twitter and send him a nice message to make him feel better about his stolen bike. Now, on to his post…

At Aviary, we’ve been passionate about photos since day one. It’s been five years since we released our first creative tool, Phoenix, a powerful, free Flash-based photo editor. Phoenix offered functionality on par with Adobe Photoshop 5 and a price point that opened its usage to anyone with an internet connection. As amateur photographers worldwide began trying their hand at editing, we watched our product join the ranks of a small number of companies working to democratize the process of photo editing for the first time.

Around two years ago we began rethinking the future of our tool set. While our original tools offered incredible functionality, they did have a learning curve which meant that the average person couldn’t just sit down and begin editing without investing time to become familiar with the tools. We wanted to build a powerful editor that anyone could use.

Because we were rebuilding the editor from the ground up, we took the opportunity to switch from a Flash based solution to one built using HTML5 technologies. We saw this as an opportunity to build on a growing standard, and to support the most platforms.

In fall of 2010 we released our HTML5 photo editor which has evolved into the product we’re proud to share with you today.

Widget Encapsulation

During our initial foray into the online editor space, we took a straightforward approach by having API users launch our editor in a new page or window. This simplified integrations and allowed us to own the editing experience.

When we rebuilt our editor in JavaScript, we took the opportunity to re-architect our API as well. Our first big change was making the editor embeddable. This meant that third party developers could load the editor on their own sites, maintaining user engagement while controlling their experience. We built out customization options that allowed the site owner to decide which tools appeared in the editor. A real estate site, for example, might not want its users adding mustache stickers to appliances in photos.

Our editor, unlike many rich HTML widgets, does not require an iframe and is truly embedded into a hosting webpage. This posed many challenges during development, but the result is a more seamless, lightweight integration.

Aviary embeded in Flickr

Constructor API

When we rebuilt our API, we took a leap by assuming that web developers integrating our editor would have experience with other JavaScript libraries and plugins. We built our API to use a Constructor method that accepts a configuration object to allow for the aforementioned tool customization. The configuration object is also used to configure callbacks, image URLs, language settings, etc., and allows us to continue building out our API without losing backwards compatibility.

Simplifying the Save Process

Saving image data is always a challenge in the browser, and can require various cross-browser workarounds. An obvious method would be to initiate a form post to the server and include the base64 image data in a hidden field. This breaks in Safari, where form fields have an undocumented value length limit. We worked around this by switching to an ajax post with the appropriate CORS headers to get around cross domain issues. In browsers that don’t support CORS, we fall back to the form post method.

To hide this complexity from the developer, we’ve abstracted the save process completely. When a user saves an edited image, we temporarily save the image data to our own servers and return a public URL so the host application can download the image to their own.

High Resolution Photos

One of the coolest features of our editor is the high resolution image support — that being said, it certainly has a number of challenges. There’s the practical issue of limited real estate in the browser (keep an eye out for updates addressing this in the near future), as well as performance issues that are harder to quantify. Even in Flash based tools, the size of the image you can edit in the browser is limited by a number of gating factors: hardware specs, number of running processes, etc. To get around these client limitations, we’ve set a configurable maxSize on the editor and added a configuration field for an original-resolution version of the image to be edited: hiresUrl.

When a hiresUrl is supplied, every user edit action is logged. On save, the aptly named “actionlist” is sent to our server along with the hiresUrl. When it hits our render farm, the actionlist is replayed on the high resolution image, and the final results are returned to the host site via a new hiresUrl.

{
    "metadata": {
        "imageorigsize": [
            800,
            530
        ]
    },
    "actionlist": [
        {
            "action": "setfeathereditsize",
            "width": 800,
            "height": 530
        },
        {
            "action": "flatten"
        },
        {
            "action": "redeye",
            "radius": 5,
            "pointlist": [
                [545, 183], [546,183], [547,182], [548,181], [548,179], [548,177], [547,177], [545,177], [544,177], [543,177], [542,177], [541,179], [541,181], [541,183], [542,184]
            ]
        },
        {
            "action": "redeye",
            "radius": 5,
            "pointlist": [
                [481, 191], [481,193], [481,195], [482,196], [483,197], [484,198], [485,197], [485,196], [485,193], [485,190], [485,189], [485,188], [484,188], [482,188], [480,189], [480,190], [480, 191]
            ]
        },
        {
            "action": "sharpen",
            "value": 21.69312,
            "flatten": true
        }
    ]
}

As a side note, we maintain feature parity across all of our platforms (mobile included) by prototyping new tools and filters in the JavaScript first, and then porting them to C for our render farm and Android, and then to Objective-C for our iPhone SDK. By maintaining feature parity and synchronizing output across platforms, we’re able to ensure that users get the edits they expect on their high resolution photos, and we keep the door open for future server-side support for our mobile SDKs where the original photo might not be stored on the device.

Tools and Libraries

We use some pretty awesome tools to help us maintain cross-browser compatibility.

LESS CSS

We moved a lot of the cross-browser concerns to build-time with LESS and a library of mix-ins inspired initially by Twitter Bootstrap, though the final result is wholly our own. LESS’s color math and variables let us achieve a textured and rounded look and feel while minimizing complexity during development.

/* LESS */
.avpw_inset_button_group {
#gradient > .vertical(lighten(@conveyorBelt, 4%), darken(@conveyorBelt, 1%));
.box-shadow(inset 0 0 4px darken(@conveyorBelt, 20%));
.border-radius(8px);
}

/* EXPANDED */
.avpw_inset_button_group {
  background-color: #2a2a2a;
  background-repeat: repeat-x;
  background-image: -khtml-gradient(linear, left top, left bottom, from(#383838), to(#2a2a2a));
  background-image: -moz-linear-gradient(top, #383838, #2a2a2a);
  background-image: -ms-linear-gradient(top, #383838, #2a2a2a);
  background-image: -webkit-gradient(linear, left top, left bottom, color-stop(0%, #383838), color-stop(100%, #2a2a2a));
  background-image: -webkit-linear-gradient(top, #383838, #2a2a2a);
  background-image: -o-linear-gradient(top, #383838, #2a2a2a);
  background-image: linear-gradient(top, #383838, #2a2a2a);
  filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#383838', endColorstr='#2a2a2a', GradientType=0);
  -webkit-box-shadow: inset 0 0 4px #000000;
  -moz-box-shadow: inset 0 0 4px #000000;
  box-shadow: inset 0 0 4px #000000;
  -webkit-border-radius: 8px;
  -moz-border-radius: 8px;
  border-radius: 8px;
}

CSS3

With CSS3, we’ve just about managed a complete break from the DHTML effects of the past. The new UI uses CSS3 transitions and transforms wherever possible to remain future-proof.

Flash

Yes, our editor does indeed have a Flash fallback for browsers that lack certain HTML5 features (namely canvas). We initially built the editor as a move away from Flash, but because of the legacy IE7 and IE8 userbases on our larger partner sites, we had to go back and rebuild certain components in Flash to support those browsers.

We’ve architected the editor so that Flash is only being used where necessary. Some tools, such as draw, have been completely rebuilt in Flash; for others, like effects, the bitmap data is being exported and manipulated in JavaScript (using a reverse implementation of pibeca). This allows for code reuse, and enables us to build new features faster with more backwards compatibility.

Future

While the feedback for our editor has been overwhelmingly fantastic, we’re continuing to work hard building out new tools and features, and performance enhancements to our existing set.