People in Photos: The API Methods

I’ve been racking my brains for a while trying to think of something informative to tell you about the API methods we’re releasing today, but all I keep coming back to is a venerable British advertising slogan for a brand of woodstain: “It does exactly what it says on the tin“.

What does the tin say, then?

photo by 5500

First off, we have a simple accessor method which will return you a list of people for a given photo. It’s called flickr.photos.people.getList, and it takes a photo ID as its sole argument.

But what about the reverse – finding all the photos of a given person? Fret not, that’s why we have flickr.people.getPhotosOf. This method takes a user ID, and since it returns a Standard Photos Response, you can also specify any extra data you want through the extras parameter.

Sometimes, simply consuming data isn’t enough – you will feel a need to create some. If you want to add a person to a photo, simply use flickr.photos.people.add. This method takes a photo_id and a user_id, and can optionally take another 4 arguments (person_x, person_y, person_w, person_h) to specify a “face boundary” for that person.

If you decide you don’t like the face boundary later, there’s always flickr.photos.people.deleteCoords to remove it entirely, or flickr.photos.people.editCoords to update.

Last, but not least, you can remove someone from a photo with flickr.photos.people.delete.

Obviously, all of the above methods require that the calling user is permitted to perform the action in question.

Why so late?

We’re aware that some people have been clamoring for access to these methods for a little while now. And while I’m not generally one for making excuses, I do have a pretty good reason for the delay this time – I broke my leg snowboarding a month ago.

What’s more, I have evidence, in the form of a photo which I marked as containing me, using the Flickr API…

Loaded onto the sledge

So, yeah, sorry about that.

A Chinese puzzle: Unicode and EXIF metadata parsing

Le Presque-Cube

At flickr, there are quite a few photos and you can browse the site in 8 different languages, including Korean and Chinese. Common metadata such as title, description and tags can be pre-populated based on information contained in the image itself, using what is commonly called EXIF information. So, it makes sense to implement this with respect to language and, above all, alphabet/character encodings. 

Un peu de lecture...?

Well, what made sense did not make so much sense anymore when using existing specifications. Here is how we coped with them.

The standards

To begin with, there is not just EXIF. Metadata can actually be written within a picture file in at least 3 different formats:
EXIF itself.
IPTC-IIM headers.
XMP by Adobe.

Of course, these formats are neither mutually exclusive nor completely redundant and this is where this can get tricky.

But let’s step back a moment to describe the specifics of these formats, not in details, but with regards to our need, which is to extract information in a reliable way, independently from how/when/where the image was created.

The EXIF is the oldest form

Old Cars Part 1
Hence, with the most limitations yet the most widespread, as this is often the case in our industry. Thus, we need to deal with it, even though it is radically flawed from an internationalization stand point: text fields are not stored using UTF and most of the time there is no indication of the character set encoding.

IPTC-IIM

In its later versions, it added the optional (Grr!!!) support for a field indicating the encoding of all the string properties in the container.

XMP

At last, text from XMP format is stored in UTF-8. On a side note, the XML based openness of this format is not making things easier, for each application can come up with its own set of metadata. Nevertheless, from an internationalization and structural standpoint, this format is modernly adequate: finally!

So, now that we know what hides behind each of these standards we can start tackling our problem.
Hide n Seek

A solution ?

Rely on existing libraries, when performant

For flickr Desktop client (Windows, Mac), we are using Exiv2 Image metadata library, which helps the initial reconciliation between all fields (especially with EXIF and IPTC-IIM contained within XMP).

The “guesstimation” of character set from EXIF

坐在问号足球上的3D小人高清图片_zcool.com.cn
We first scan the string to see if all bytes are in the range: 0 to 127. If so, we treat the string as ASCII. If not, we scan the string to see if it is consistent with valid UTF-8. If so, we treat the string as UTF-8. Checking against UTF8 validity is not a bullet-proof test. But statistically, this is better than any other scenario. At the last resort, we pick a “reasonable” fallback encoding. For the desktop application, we use hints from the user system. On windows, the GetLocaleInfo function provides with the user default ANSI code page (LOCALE_IDEFAULTANSICODEPAGE), which can be used to specify an appropriate locale for string conversion methods. On Mac OS X, CFStringGetSystemEncoding is our ally. In our case, there is no point to use the characterset of our own application, which we control and is not linked to the characterset of the application that “wrote” the EXIF.

Consolidation: the easy case

The workflow followed by the image we handle is unknown. We can have all 3 formats filled-in but not necessarily by the same application. So, we need a mechanism to consolidate the data. The easy case is for single field such as title and description. We follow the obvious quality hierarchy of the different formats. To extract the description for instance, we first look for the XMP.dc.description field, then XMP.photoshop.Headline (to support the extensibility mentioned before as a side note), then IPTC.Application2.Caption and finally the Exif.Image.ImageDescription. We only keep the first data found and ignore the others. There is only one title and one description per image: might as well take the one we are sure about.

Consolidation: it becomes even trickier for tags.

puzzle pieces
The singleness of the final result disappearing (we deal with a list of tags, not just one single tag), we cannot ignore the “EXIF” tags as easily as for the title and the description case. Fortunately, IPTC Keywords are supposed to be mapped to XMP (dc:subject). Therefore we can take into account the number of keywords that would be extracted from EXIF and the number that would be extracted from XMP. If those equal, we plainly ignore the EXIF. If they don’t, we try to match each guestimation of EXIF keyword against the XMP keywords to avoid duplicates.
 
All in one, quite an interesting issue where, per design, the final solution is going to be an approximation with different results depending on the context. Computers: an exact science?
Computer History Museum

For more information and main reference: http://www.metadataworkinggroup.com/

Photos by Groume, Raïssa Bandou, seanmcgrath, tcp909, aftab., 姒儿喵喵 and Laughing Squid