Language Detection: A Witch’s Brew?

photo by Arbron

Software developers sometimes have a tendency to cling to bits of ancient lore, even after they’ve long-since become irrelevant. Globalization, and specifically language detection, raises an interesting case-in-point.

These days, language detection is really simple in most cases – just use the “Accept-Language” HTTP header, which pretty much every client on the planet passes to you.

“But wait!” I hear you cry. “I thought the Accept-Language header wasn’t to be trusted?”

“Ancient and outdated lore, my friend” I will reply. “Ancient and outdated lore.”

It’s true that the Accept-Language header has a troubled history. Because of this, many developers regard it the way medieval villagers might have regarded a woman with a warty nose and a pet cat – it should be shunned, avoided and possibly burned at the stake.

In the early days of the web, Accept-Language really couldn’t be trusted – many browsers made no real effort to send a meaningful value, often sending nothing (or worse, defaulting to requesting English) unless the user worked their way through a complicated configuration process buried 3 panes back in a labyrinthine dialog box.

I’m rarely comfortable being categorical when talking about software engineering – there are usually too many variables, and too many specific circumstances for blanket assertions to be of use. But I can state with certainty that the “Accept-Language” header works these days, and any case where it doesn’t can (and will) be considered an error on the part of the browser vendor.

In two and a half years of running as an international site, we’ve only ever had one case where it didn’t work. Helio, a cellphone company, had a browser which was custom-built for them in Korea, and had its “Accept-Language” header hard-coded to always request Korean, something which led to much confusion for the Flickr users amongst their American customers.

When we alerted Helio to the problem, however, they were highly responsive – the next release of their software returned the correct value.

Location != Language

Perhaps driven by their superstitious fear of Accept-Language, many developers fall back on other means of determining their visitors’ preferred language. Top of the list is IP-detection – pinpointing the visitor’s current location using a lookup database. This isn’t necessarily a terrible solution, but it’s prone to a couple of problems.

Firstly, even the best IP location databases contain mistakes, or outdated information where netblocks have been reassigned over time.

Secondly, if you only rely on IP detection in order to provide language, travelers will often be highly confused when they reach their destination, connect to your site and are greeted with a language they don’t speak. This can be especially disconcerting if you’ve been using a site regularly for years. This error is easily avoided, simply by applying a cookie the first time you pick a language for a user. By using a cookie, you can guarantee a consistent experience, regardless of where in the world a laptop happens to fly.

None of this is to say that IP detection doesn’t have its place. It’s a useful fallback if you come across a user-agent that doesn’t send Accept-Language, or you have a specific case where you believe you can’t trust the header. But it general it should be just that – a fallback.

Always an option…

To guard against any possibility that you’ll detect the wrong language for a user, it doesn’t hurt to always provide language-switching as an option on every page. You’ll see that we do that on Flickr (with the exception of Organizr)…

You’ll notice that we always render the language in its native name, so that people can find their preferred language, even if they understand nothing else on the page.

Rights and Wrongs

“But!” some of you will cry. “We are a site that deals with media content. There are rights issues and legal constraints – we have to send people in France to our French site”…

…which nicely brings me to the last point of this post. Many global sites will find themselves dealing with various jurisdictional concerns. There’s no reason, however, that the “legal” logic in your code needs to be tied to the interface presentation.

Whilst certain legal texts (Terms and Conditions, etc) can provide interesting challenges, if you keep presentation as an entirely separate consideration from jurisdiction, it’s much easier to provide everyone with the best possible experience, regardless of where they are and what language they speak.

In Summary

All the above can be reduced to a very simple set of points to cut-out-and-keep for the next time you’re asked to create a Global version of a website.

  • Use Accept-Language – it just works
  • Use IP detection as a fallback for language, or a separate test to determine jurisdiction
  • Cookie your visitors’ language preferences, so you are consistent
  • Provide a simple way to switch language, everywhere
  • Treat language and compliance issues as two entirely separate problems

…at least, that’s how we’ve been doing it on Flickr for two and a half years, and it’s worked out pretty well for us so far.