The 32 Days Of Christmas!

LEGO City Advent Calendar - Day 7

When you have thousands of photos, it can be hard to find the photo you’re looking for. Want to search for that Christmas cat you saw at last year’s party? And what if that party wasn’t on Christmas day, but sometime the week before? To help improve the search ranking and relevance of national, personal, and religious holiday photos, we first have to see when the photos were taken; when, for example, is the Christmas season?

Understanding what people are looking for when they search for their own photos is an important part of improving Flickr. Earlier this year, we began a study (which will be published at CHI 2016 under the same name as this post) by trying to understand how people searched for their personal photos. We showed a group of 74 participants roughly 20 of their own photos on Flickr, and asked them what they’d put into the Flickr search box to find those photos. We did this a total of 1492 times.

It turns out 12% of the time people used a temporal term in searches for their own photos, meaning a word connected to time in some way. These might include a year (2015), a month (January), a season (winter), or a holiday or special event (Thanksgiving, Eid al-Fitr, Easter, Passover, Burning Man). Often, however, the date and time on the photograph didn’t match the search term: the year would be wrong, or people would search for a photograph of snow the weekend after Thanksgiving with the word “winter,” despite the fact that winter doesn’t officially begin until December 21st in the U.S. So we wanted to understand that situation: how often does fall feel like winter?

To answer this, we mapped 78.8 million Flickr photos tagged with a season name to the date the photo was actually taken.

Seasons Tagged by Date

As you’d expect, most of the photographs tagged with a season are taken during that season: 66% of photos tagged “winter” were taken between December 22 and March 20. About 9% of search words are off by two seasons: photos tagged “summer” that were taken between December 21st and March 20th, for example. We expect this may reflect antipodean seasons: while most Flickr users are in the Northern Hemisphere, it doesn’t seem unreasonable that 5% of “summer” photographs might have been taken in the Southern Hemisphere. More interesting, we think, are the off-by-one cases, like fall photographs labeled as “winter,” where we believe that the photo represents the experience of winter, regardless of the objective reality of the calendar. For example, if it snows the day after Thanksgiving, it definitely feels like winter.

On the topic of Thanksgiving, let’s look at photographs tagged “thanksgiving.”

Percentage of Photos Tagged "Thanksgiving"
The six days between November 22nd and 27th—the darkest blue area—cover 65% of the photos. Expanding that range to November 15–30th covers 83%. Expanding to all of November covers 85%, and including October (and thus Canadian Thanksgiving, in gray in early October) brings the total to 90%. But that means that 10% of all photos tagged “thanksgiving” are outside of this range. Every date in that image represents a total of a minimum of 40 photographs taken on that day between 2003 and 2014 inclusive, uploaded to Flickr and tagged “thanksgiving” with the only white spaces being days that don’t exist, like February 30th or April 31st. Manual verification of some of the public photos tagged “thanksgiving” on arbitrarily chosen dates finds these photographs tagged “thanksgiving” included pumpkins or turkeys, autumnal leaves or cornucopias—all images culturally associated with the holiday.

Not all temporal search terms are quite so complicated; some holidays are celebrated and photographed on a single day each year, like Canada Day (July 1st) or Boxing Day (December 26th). While these holidays can be easily translated to date queries, other holidays have more complicated temporal patterns. Have a look at these lunar holidays.

Lunar Holidays Tagged by Date

There are some events that occur on a lunar calendar like Chinese New Year, Easter, Eid (both al-Fitr and al-Adha), and Hanukkah. These events move around in a regular, algorithmically determinable, but sometimes complicated, way. Most of these holidays tend to oscillate as a leap calculation is added periodically to synchronize the lunar timing to the solar calendar. However Eids, on the Hijri calendar, have no such leap correction, and we see photos tagged “Eid” edge forward year after year.

Some holidays and events, like birthdays, happen on every day of the week. But they’re often celebrated, and thus photographed, on Friday, Saturday, and Sunday:

Day of the week tagged Birthday

So to get back to our original question: when are photos tagged “Christmas” actually taken?

Days tagged with Christmas

As you can see, more photos tagged “Christmas” are taken on December 25th than on any other day (19%). Christmas Eve is a close second, at 12%. If you look at other languages, this difference practically goes away: 9.2% of photos tagged “Noel” are taken on Christmas Eve, and 9.6% are taken on Christmas; “navidad” photos are 11.3% on Christmas Eve and 12.0% on Christmas. But Christmas photos are taken throughout December. We can now set a threshold for a definition of Christmas: say if at least 1% of the photos tagged “Christmas” were taken on that day, we’d rank it more relevant. That means that every day from December 1st to January 1st hits that definition, with December 2nd barely scraping in. That makes…32 days of Christmas!

Merry Christmas and Happy Holidays—for all the holidays you celebrate and photograph.

PS: Flickr is hiring! Labs is hiring! Come join us!

The Ins and Outs of the Yahoo Flickr Creative Commons 100 Million Dataset

This past summer we (Yahoo Labs and Flickr) released the YFCC100M dataset that is the largest and most ambitious collection of Flickr photos and videos ever, containing 99,206,564 photos and 793,436 videos from 581,099 different photographers. We’re super excited about the dataset, because it is a reflection of how Flickr and photography have evolved over the past 10 years. And it contains photos and videos of almost everything under the sun (and yes, loads of cats).

We’ve received a lot of emails and tweets asking for more details about the dataset, so in this blog post, we’ll gladly tell you. Each of the 100 million photos and videos is associated with a Creative Commons license that indicates how it may be used by others. The table below shows the complete breakdown of licenses in our dataset. Approximately 31.8% is marked for commercial use, while 17.3% has the most liberal license, which only requires attribution to the photographer.

License Photos Videos
17,210,144 137,503
9,408,154 72,116
4,910,766 37,542
12,674,885 102,288
28,776,835 235,319
26,225,780 208,668

The photos and videos themselves are very diverse. We’ve found photos showing street scenes captured as part of photographer Andy Nystrom‘s life-logging activities, photos of real-world events like protests and rallies, as well as photos of natural phenomena.

Five years of Iraq war die-in IMG_9793 851-Aurora Borealis Northern Lights from Lodge near Fairbanks 1 Sep 28, 2011 1-11 AM 1600x1060
Steve Rhodes
Andy Nystrom
BJ Graf

To understand more about the visual content of the photos in the dataset, the Flickr Vision team used a deep-learning approach to find the presence of visual concepts, such as people, animals, objects, events, architecture, and scenery across a large sample of the corpus. There’s a diverse collection of visual concepts present in the photos and videos, ranging from indoor to outdoor images, faces to food, nature to automobiles.

Concept Count
outdoor 32,968,167
indoor 12,522,140
face 8,462,783
people 8,462,783
building 4,714,916
animal 3,515,971
nature 3,281,513
landscape 3,080,696
tree 2,885,045
sports 2,817,425
architecture 2,539,511
plant 2,533,575
house 2,258,396
groupshot 2,249,707
vehicle 2,064,329
water 2,040,048
mountain 2,017,749
automobile 1,351,444
car 1,340,751
food 1,218,207
concert 1,174,346
flower 1,164,607
game 1,110,219
text 1,105,763
night 1,105,296

There are 68,971,123 photos and videos in the set that have user-annotated tags. If we look at specific tags used, we see it is very common for people to use the year of capture, the camera brand, place names, scenery, and activities as tags. The top 25 tags (excluding the years of capture) and how often they were used are listed below, as well as the tag frequency distribution for the 100 most-frequently used tags.

User Tag Count
nikon 1,195,576
travel 1,195,467
usa 1,188,344
canon 1,101,769
london 996,166
japan 932,294
france 917,578
nature 872,029
art 854,669
music 826,692
europe 782,932
beach 758,799
united states 743,470
england 739,346
wedding 728,240
city 689,518
italy 688,743
canada 686,254
new york 685,311
vacation 680,142
germany 672,819
party 663,968
park 651,717
people 641,285
water 640,234

User tag distribution in the YFCC100M Dataset

Some photos and videos (3,350,768 to be exact) carry machine tags. Noteworthy machine tags are those having the “siwild” namespace, referring to photos uploaded by scientists of the Smithsonian, and the “taxonomy” namespace, which refers to photos in which flora and fauna have been carefully classified. The most frequently occurring namespace, “uploaded,” refers to the applications used to share the photos on Flickr, which are principally the Flickr and Instagram iOS apps. Other interesting machine tags are those referring to the different filters that can be applied to a photo, or roughly 750,000 photos. Overall, most machine tags are related to food and drink, events, camera and application metadata, as well as locations.

Machine Tag Count
uploaded 1,917,650
siwild 1,169,957
taxonomy 1,067,857
foursquare 894,265
exif 617,287
flickriosapp 538,829
geo 443,762
sequence 429,948
lastfm 313,379
flickrandroidapp 222,238

In terms of locations, the photos and videos in the dataset have been taken all over the world. In total, 48,366,323 photos and 103,506 videos were geotagged. The most popular cities where photos and videos were shot are concentrated in the United States, principally New York City, San Francisco, Los Angeles, Chicago, and Seattle; in Europe, they were principally London, Berlin, Barcelona, Rome and Amsterdam. There are also photos that have been taken in remote locations like Kiribati, icy places like Svalbard, and exotic places like Comoros. In fact, photos and videos from this dataset have been taken in 249 different territories (countries, islands, etc) around the world, and even in international waters or airspace.

One Million Creative Commons Geo-tagged Photos

Our dataset further reveals that there are many different cameras in use within the Flickr community. The Canon EOS 400D and 350D have a lead over the Nikon D90 (calm down…we’re not starting anything by saying that). Apple’s iPhones form the most popular type of cameraphone.

Make Camera Count
Canon EOS 400D 2,539,571
Canon EOS 350D 2,140,722
Nikon D90 1,998,637
Canon EOS 5D Mark II 1,896,219
Nikon D80 1,719,045
Canon EOS 7D 1,526,158
Canon EOS 450D 1,509,334
Nikon D40 1,358,791
Canon EOS 40D 1,334,891
Canon EOS 550D 1,175,229
Nikon D7000 1,068,591
Nikon D300 1,053,745
Nikon D50 1,032,019
Canon EOS 500D 1,031,044
Nikon D700 942,806
Apple iPhone 4 922,675
Nikon D200 919,688
Canon EOS 20D 843,133
Canon EOS 50D 831,570
Canon EOS 30D 820,838
Canon EOS 60D 772,700
Apple iPhone 4S 761,231
Apple iPhone 743,735
Nikon D70 742,591
Canon EOS 5D 699,381

Our collection of 100 million photos and videos marks a new milestone in the history of datasets. The collection is one of the largest released for academic use, and it’s incredibly varied—not just in terms of the content shown in the photos and videos, but also the locations where they were taken, the photographers who took them, the tags that were applied, the cameras that were used, etc. The best thing about the dataset is that it is completely free to download by anyone, given that all photos and videos have a Creative Commons license. Whether you are a researcher, a developer, a hobbyist or just plain curious about online photography, the dataset is the best way to study and explore a wide sample of Flickr photos and videos.  Happy researching and happy hacking!