Charset encoding: UTF-8 vs ISO Latin1


After reading this: http://www.joelonsoftware.com/articles/Unicode.html and making some tests:

I would say that UTF-8 is the way to go  – or even UTF-16 – , it has all the same things that Latin1 can offer + a lot more charsets.

Latin1 is more orientated to the Latin Alphabeth (which is fine if you only aim to that)

UTF8 can represent ANY Unicode charset, not just Wester European countries, but also Eastern europeans.

So for SQL fields i would say utf8_unicode_ci should be the collation of choise.

Conclusion: chose UTF-8!

Note: Because UTF-8 has variable byte length it makes the string comparition operations a little slower than a fixed byte length charset like UTF16 for example.  And it might take more space to store ideographic charsets.

Advertisements