The 32 Days Of Christmas!

LEGO City Advent Calendar - Day 7

When you have thousands of photos, it can be hard to find the photo you’re looking for. Want to search for that Christmas cat you saw at last year’s party? And what if that party wasn’t on Christmas day, but sometime the week before? To help improve the search ranking and relevance of national, personal, and religious holiday photos, we first have to see when the photos were taken; when, for example, is the Christmas season?

Understanding what people are looking for when they search for their own photos is an important part of improving Flickr. Earlier this year, we began a study (which will be published at CHI 2016 under the same name as this post) by trying to understand how people searched for their personal photos. We showed a group of 74 participants roughly 20 of their own photos on Flickr, and asked them what they’d put into the Flickr search box to find those photos. We did this a total of 1492 times.

It turns out 12% of the time people used a temporal term in searches for their own photos, meaning a word connected to time in some way. These might include a year (2015), a month (January), a season (winter), or a holiday or special event (Thanksgiving, Eid al-Fitr, Easter, Passover, Burning Man). Often, however, the date and time on the photograph didn’t match the search term: the year would be wrong, or people would search for a photograph of snow the weekend after Thanksgiving with the word “winter,” despite the fact that winter doesn’t officially begin until December 21st in the U.S. So we wanted to understand that situation: how often does fall feel like winter?

To answer this, we mapped 78.8 million Flickr photos tagged with a season name to the date the photo was actually taken.

Seasons Tagged by Date

As you’d expect, most of the photographs tagged with a season are taken during that season: 66% of photos tagged “winter” were taken between December 22 and March 20. About 9% of search words are off by two seasons: photos tagged “summer” that were taken between December 21st and March 20th, for example. We expect this may reflect antipodean seasons: while most Flickr users are in the Northern Hemisphere, it doesn’t seem unreasonable that 5% of “summer” photographs might have been taken in the Southern Hemisphere. More interesting, we think, are the off-by-one cases, like fall photographs labeled as “winter,” where we believe that the photo represents the experience of winter, regardless of the objective reality of the calendar. For example, if it snows the day after Thanksgiving, it definitely feels like winter.

On the topic of Thanksgiving, let’s look at photographs tagged “thanksgiving.”

Percentage of Photos Tagged "Thanksgiving"
The six days between November 22nd and 27th—the darkest blue area—cover 65% of the photos. Expanding that range to November 15–30th covers 83%. Expanding to all of November covers 85%, and including October (and thus Canadian Thanksgiving, in gray in early October) brings the total to 90%. But that means that 10% of all photos tagged “thanksgiving” are outside of this range. Every date in that image represents a total of a minimum of 40 photographs taken on that day between 2003 and 2014 inclusive, uploaded to Flickr and tagged “thanksgiving” with the only white spaces being days that don’t exist, like February 30th or April 31st. Manual verification of some of the public photos tagged “thanksgiving” on arbitrarily chosen dates finds these photographs tagged “thanksgiving” included pumpkins or turkeys, autumnal leaves or cornucopias—all images culturally associated with the holiday.

Not all temporal search terms are quite so complicated; some holidays are celebrated and photographed on a single day each year, like Canada Day (July 1st) or Boxing Day (December 26th). While these holidays can be easily translated to date queries, other holidays have more complicated temporal patterns. Have a look at these lunar holidays.

Lunar Holidays Tagged by Date

There are some events that occur on a lunar calendar like Chinese New Year, Easter, Eid (both al-Fitr and al-Adha), and Hanukkah. These events move around in a regular, algorithmically determinable, but sometimes complicated, way. Most of these holidays tend to oscillate as a leap calculation is added periodically to synchronize the lunar timing to the solar calendar. However Eids, on the Hijri calendar, have no such leap correction, and we see photos tagged “Eid” edge forward year after year.

Some holidays and events, like birthdays, happen on every day of the week. But they’re often celebrated, and thus photographed, on Friday, Saturday, and Sunday:

Day of the week tagged Birthday

So to get back to our original question: when are photos tagged “Christmas” actually taken?

Days tagged with Christmas

As you can see, more photos tagged “Christmas” are taken on December 25th than on any other day (19%). Christmas Eve is a close second, at 12%. If you look at other languages, this difference practically goes away: 9.2% of photos tagged “Noel” are taken on Christmas Eve, and 9.6% are taken on Christmas; “navidad” photos are 11.3% on Christmas Eve and 12.0% on Christmas. But Christmas photos are taken throughout December. We can now set a threshold for a definition of Christmas: say if at least 1% of the photos tagged “Christmas” were taken on that day, we’d rank it more relevant. That means that every day from December 1st to January 1st hits that definition, with December 2nd barely scraping in. That makes…32 days of Christmas!

Merry Christmas and Happy Holidays—for all the holidays you celebrate and photograph.

PS: Flickr is hiring! Labs is hiring! Come join us!

Much Photos!

Introducing the New! Shiny! Photolist framework

photolist-sky
Blue skies. Mostly.

Here at Flickr, we have photos. Lots of photos. Like, billions and billions of photos. So, it’s pretty important for us to be able to show you more than one at once.

We have used what we call the “justified algorithm” to lay out photos for a while now, but as we move more and more pages onto our new-ish isomorphic node.js stack, we determined it was time to revisit the algorithm and create an updated implementation.

A few of us here in Frontend-landia got together to figure out all the things this new shiny should be able to do. With a lot of projects in full swing and on the near horizon, we came up with a pretty significant list, including but not limited to:
– Easy for developers to use
– Fit into any kind of container
– Support pagination (in both directions!) and infinite scroll
– Jank-free, butter-silky-baby-smooth scrolling
– Support layouts other than justified, like square thumbnails and grid layout with native aspect ratio

After some brainstorming, drawing of diagrams, and gummi bear consumption, we got to work building out the framework and the underlying algorithm.

Rejustification

photolist-sky
Drawing of diagrams

The basics of the justified algorithm aren’t too complex. The goal is for the layout module to accept a list of photo aspect ratios, and return a list of rectangles. A layout consists of a number of rows of items (photos), each with a target height and allowable height deviation above and below. This, along with the container width, gives us a minimum and maximum row aspect ratio.

Photolist: variable row height
Fig. 1: The justified algorithm: dimensions

We push each photo into a row; once the row is filled up, we move on to the next. It goes a little something like this:

  1. Iterate over each photo in the list to display
  2. Create a new row if there’s not currently an open row
  3. Attempt to add the photo to the current row at its native aspect ratio and at the target row height
  4. If the new row aspect ratio is less than the minimum row aspect ratio, continue adding photos until the aspect ratio is greater than the maximum aspect ratio
  5. Either keep or drop the last added photo, depending on which generates a row aspect ratio closer to the target row aspect ratio; adjust the row height as needed, and seal the row
  6. Repeat until all the photos have been laid out.

Photolist: row filling
Fig. 2: The justified algorithm: row filling

It’s Never That Easy…

The justified algorithm described above is the primary responsibility of the layout module. In practice, however, there are a number of other things the layout must handle to get good results for all use cases, and to communicate the results to other parts of the framework.

Diffs

One key feature of the layout module is how it organizes its results. To minimize the amount of processing required to update photos as the layout changes, the layout module returns pre-sorted diffs, each with a specific purpose:

  • new items, used to create new photos and put them in place
  • layout-changed items, used to resize/reposition existing photos
  • visibility-changed items, used to wake/sleep existing photos
  • widows and orphans (leading and trailing items) (read on!)

The container view can process only the parts of the layout response that are necessary, given the current state of the whole framework, to keep processing time down and keep performance up.

Widows and orphans


Annie the Musical,
Annie the Musical, by Eva Rinaldi

Some photolist pages on Flickr use infinite scrolling, and some display results one page at a time. Regardless of how a page shows its photos, it starts to feel messy when there is an incomplete row of photos hanging off the end of the page. If there is more content in the set, the last row should be full. However, since we fetch photos from the API in fixed batch sizes, things don’t always work out so nicely, leaving “leftovers” in the bottom row. Borrowing from typesetting terminology, we call these leftover photos orphans. (We can also paginate backwards; leftovers at the top are technically widows but we’ll just keep using the term orphans for simplicity.)

The layout notes these incomplete rows and hides them from the rest of the framework until the next page of content loads in. (This led to frequent and questionable metaphors about “orphan suppression” and “orphan rehydration.”) When orphans are to be hidden, the layout simply keeps them out of the diff. When the orphans are brought back in as the next page loads, the layout prepends them to the next diff. The container view is none the wiser.

This logic gets even more fun when you consider that it must perform in all of these use cases:

  • fixed page size (book-style) pagination
  • downward-scrolling infinite pagination
  • upward-scrolling infinite pagination (enter into an “infinite” content set somewhere other than the beginning); this requires right-to-left layout!

There’s also the case of the end of an “infinite” content set (scrolling down to the end or up to the beginning); in these cases, we still want the row to appear complete, and still must maintain the native aspect ratio of the photo. Therefore, we allow the row height to grow as much as it needs in this case only.

Bonus Round!

You might have noticed that Flickr is kind of a big site with, like, lots of photos. And we display photos in lots of different ways, with lots of different use cases. The photolist framework bends over backwards to support all of those, including:

  • forward and backward pagination
  • infinite scroll and fixed page-size pagination
  • specific aspect ratios (e.g. squares)
  • fixed number of rows
  • fast relayout (only a few milliseconds for thousands of photos)

Going into detail on each of those features is way beyond the scope of this blog post, but suffice it to say the framework is built to handle just about anything Flickr can throw at it. The one exception is the upcoming Camera Roll (coming soon to those of you who don’t yet have it!), which is Too Extreme for this framework, so we devised something special just for that page.

The whole enchilada


Mmm... enchiladas,
Mmm… enchiladas, by jeffreyww

The layout is at the heart of the photolist framework, but wait — there’s more! The main components of the framework are the layout (dissected above), the container view / controller, and the subviews (usually containing photos).

Photolist: the whole enchilada
Fig. 3: Relationship of view/controller, layout, and subviews, and changing subview states during downward scroll

The container view does a lot of fancy things, like:

  • loading in photos as you scroll down or paginate
  • triggering a relayout when the container size changes (i.e. when you resize your window)
  • matching up server-rendered HTML with clientside JavaScript objects (see isomorphic JavaScript, and an upcoming blog post about the Hermes stack at Flickr).

Its primary job, though, is to act as the conduit between the layout module and the individual subviews.

Every time a layout is processed or changes, it returns a “layout response” to the container view. The layout response contains a list of rectangles and wake/sleep flags (actually, a list of lists; see Diffs above); the container view relays that new information on to each individual subview to determine position and visibility. The container view doesn’t even need to know about the layout details — each subview adjusts itself to its layout data all on its own.

The subviews each have a decent amount of intelligence of their own, performing such tasks as:

  • choosing the most appropriate photo file size to fit the layout rectangle
  • adding/removing itself to/from the DOM as instructed by the layout to maintain good scroll performance
  • providing an annotation and interaction layer for titles, faves, comments, etc.

Coming soon to a webpage near you

The new photolist framework is certainly not a one-size-fits-all solution; it’s tailored for Flickr’s specific use cases. However, we tried to design and build it to be as broadly useful for Flickr as possible; as we continue to move parts of the site onto the new frontend stack and innovate new features, it’s critical to have solid components upon which we can build Flickr’s future. The layout algorithm is probably useful for many applications though, and we hope you gained some insight into how you might implement your own.

The photolist framework is already live in a number of places on the site, including the new Unified Search pages (currently in Beta), the Create / Wall art pages, the Group pool preview, and is coming soon to a number of other pages.

As always, if you’re interested in helping with that “and more” part, we’d love to have you! Stop by our jobs page and drop us a line.

Computer vision at scale with Hadoop and Storm

Recently, the team at Flickr has been working to improve photo search. Before our work began, Flickr only knew about photo metadata — information about the photo included in camera-generated EXIF data, plus any labels the photo owner added manually like tags, titles, and descriptions. Ironically, Flickr has never before been able to “see” what’s in the photograph itself.

Over time, many of us have started taking more photos, and it has become routine — especially with the launch last year of our free terabyte* — for users to have many un-curated photos with little or no metadata. This has made it difficult in some cases to find photos, either your own or from others.

So for the first time, Flickr has started looking at the photo itself**. Last week, the Flickr team presented this technology at the May meeting of the San Francisco Hadoop User’s Group at our new offices in San Francisco. The presentation focuses on how we scaled computer vision and deep learning algorithms to Flickr’s multi-billion image collection using technologies like Apache Hadoop and Storm. (In a future post here, we’ll describe the learning and vision systems themselves in more detail.)

[youtube=http://www.youtube.com/watch?v=OrhAbZGkW8k&w=640&h=360]

Slides available here: Flickr: Computer vision at scale with Hadoop and Storm

Thanks very much to Amit Nithian and Krista Wiederhold (organizers of the SFHUG meetup) for giving us a chance to share our work.

If you’d like to work on interesting challenges like this at Flickr in San Francisco, we’d like to talk to you! Please look here for more information: http://www.flickr.com/jobs

* Today is the first anniversary of the terabyte!

** Your photos are processed by computers – no humans look at them. The automatic tagging data is also protected by your privacy settings.