I knew at some point, given the amount of Japanese interspersed with English on my various sites, I would need to change the Latin-1 encoding of my MySQL database (say that five times fast) to UTF-8. It hasn’t been much of an issue because when I display that text on a web site, I send UTF-8 encoding in the header.

I reached a breaking point when with trying to sort a text field with mixed languages. Because the database was encoded in Latin-1, the multibyte strings were sorted as single-byte strings. On a web page, that meant Japanese text would appear in the middle of the sort.

Well, I finally did a Google search on what it takes to perform a Latin-1-to-UTF-8 conversion. So long as your data is fairly clean, it takes about four commands in a shell prompt to finish. And that’s what I did — I dumped my database, converted it and loaded it into an empty server to see the results. I liked what I saw in phpMyAdmin. Hastily, I decided to forge ahead and do a real conversion.

Imagine my surprise when I reloaded my websites and saw question marks where there should have been Japanese text.

After a failed attempt to roll back my database, I did a bit more searching and discovered I needed to send a "Set Names utf8" query to the MySQL server before performing any other queries. Thankfully, that was just a single line to add in my self-written toolkit, which powers this site and others.

That wasn’t the case with Movable Type, however. The question marks showed up in the administration interface. A bit more searching revealed the "SQLSetNames 1" configuration value in mt-config.cgi. Question marks gone.

Had I been diligent, I would have done some work in my development environment before going half-cocked into a database conversion. Instead, I ended up panicking for a full hour trying to get my sites fixed. At the end of it, I was quite pleased when I saw Japanese text sorted separately from English.