Language Detection: A Witch’s Brew?

photo by Arbron

Software developers sometimes have a tendency to cling to bits of ancient lore, even after they’ve long-since become irrelevant. Globalization, and specifically language detection, raises an interesting case-in-point.

These days, language detection is really simple in most cases – just use the “Accept-Language” HTTP header, which pretty much every client on the planet passes to you.

“But wait!” I hear you cry. “I thought the Accept-Language header wasn’t to be trusted?”

“Ancient and outdated lore, my friend” I will reply. “Ancient and outdated lore.”

It’s true that the Accept-Language header has a troubled history. Because of this, many developers regard it the way medieval villagers might have regarded a woman with a warty nose and a pet cat – it should be shunned, avoided and possibly burned at the stake.

In the early days of the web, Accept-Language really couldn’t be trusted – many browsers made no real effort to send a meaningful value, often sending nothing (or worse, defaulting to requesting English) unless the user worked their way through a complicated configuration process buried 3 panes back in a labyrinthine dialog box.

I’m rarely comfortable being categorical when talking about software engineering – there are usually too many variables, and too many specific circumstances for blanket assertions to be of use. But I can state with certainty that the “Accept-Language” header works these days, and any case where it doesn’t can (and will) be considered an error on the part of the browser vendor.

In two and a half years of running as an international site, we’ve only ever had one case where it didn’t work. Helio, a cellphone company, had a browser which was custom-built for them in Korea, and had its “Accept-Language” header hard-coded to always request Korean, something which led to much confusion for the Flickr users amongst their American customers.

When we alerted Helio to the problem, however, they were highly responsive – the next release of their software returned the correct value.

Location != Language

Perhaps driven by their superstitious fear of Accept-Language, many developers fall back on other means of determining their visitors’ preferred language. Top of the list is IP-detection – pinpointing the visitor’s current location using a lookup database. This isn’t necessarily a terrible solution, but it’s prone to a couple of problems.

Firstly, even the best IP location databases contain mistakes, or outdated information where netblocks have been reassigned over time.

Secondly, if you only rely on IP detection in order to provide language, travelers will often be highly confused when they reach their destination, connect to your site and are greeted with a language they don’t speak. This can be especially disconcerting if you’ve been using a site regularly for years. This error is easily avoided, simply by applying a cookie the first time you pick a language for a user. By using a cookie, you can guarantee a consistent experience, regardless of where in the world a laptop happens to fly.

None of this is to say that IP detection doesn’t have its place. It’s a useful fallback if you come across a user-agent that doesn’t send Accept-Language, or you have a specific case where you believe you can’t trust the header. But it general it should be just that – a fallback.

Always an option…

To guard against any possibility that you’ll detect the wrong language for a user, it doesn’t hurt to always provide language-switching as an option on every page. You’ll see that we do that on Flickr (with the exception of Organizr)…

You’ll notice that we always render the language in its native name, so that people can find their preferred language, even if they understand nothing else on the page.

Rights and Wrongs

“But!” some of you will cry. “We are a site that deals with media content. There are rights issues and legal constraints – we have to send people in France to our French site”…

…which nicely brings me to the last point of this post. Many global sites will find themselves dealing with various jurisdictional concerns. There’s no reason, however, that the “legal” logic in your code needs to be tied to the interface presentation.

Whilst certain legal texts (Terms and Conditions, etc) can provide interesting challenges, if you keep presentation as an entirely separate consideration from jurisdiction, it’s much easier to provide everyone with the best possible experience, regardless of where they are and what language they speak.

In Summary

All the above can be reduced to a very simple set of points to cut-out-and-keep for the next time you’re asked to create a Global version of a website.

  • Use Accept-Language – it just works
  • Use IP detection as a fallback for language, or a separate test to determine jurisdiction
  • Cookie your visitors’ language preferences, so you are consistent
  • Provide a simple way to switch language, everywhere
  • Treat language and compliance issues as two entirely separate problems

…at least, that’s how we’ve been doing it on Flickr for two and a half years, and it’s worked out pretty well for us so far.

Flipping Out

Flickr is somewhat unique in that it uses a code repository with no branches; everything is checked into head, and head is pushed to production several times a day. This works well for bug fixes that we want to go out immediately, but presents a problem when we’re working on a new feature that takes several months to complete. How do we solve that problem? With flags and flippers!

Feature Flags

Canadian Flag

Flags allow us to restrict features to certain environments, while still using the same code base on all servers. In our code, we’ll write something like this:

if ($cfg.enable_unicorn_polo) {
    // do something new and amazing here.
}
else {
    // do the current boring stuff.
}

We can then set the value of $cfg.enable_unicorn_polo on each environment (false for production, true for dev, etc.). This has the additional benefit of making new feature launches extremely easy: Instead of having to copy a bunch of new code over to the production servers, we simply change a single false to true, which enables the code that has already been on the servers for months.

Feature Flippers

Knife-switch

Flags allows us to enable features on a per environment basis, but what if we wanted to be more granular and enable features on a per user basis? For that we use feature flippers. These allow us to turn on features that we are actively developing without being affected by the changes other developers are making. It also lets us turn individual features on and off for testing.

Here is what the Feature Flip page looks like:

Feature Flipper

Continuously Integrating

Feature flags and flippers mean we don’t have to do merges, and that all code (no matter how far it is from being released) is integrated as soon as it is committed. Deploys become smaller and more frequent; this leads to bugs that are easier to fix, since we can catch them earlier and the amount of changed code is minimized.

This style of development isn’t all rainbows and sunshine. We have to restrict it to the development team because occasionally things go horribly wrong; it’s easy to imagine code that’s in development going awry and corrupting all your data. Also, after launching a feature, we have to go back in the code base and remove the old version (maintaining separate versions of all features on Flickr would be a nightmare). But overall, we find it helps us develop new features faster and with fewer bugs.