The Umlaut Problem - How To Successfully Back Up And Restore MySQL Databases With Special Characters Using MySQLDumper - Page 2
The attentive reader will have noticed that the restore process can only be completed successfully if the backup file uses the same character-set that the new server expects it to.
But how do I know what a MySQL-Server's standard character-set is? Thankfully, MySQLDumper gives us comprehensive information!
If you check Home / MySQL-Variables / Variables you'll find an array of MySQL-settings:
Of these settings the two that are interesting for us are character_set_client and character_set_connection which are usually identical.
These settings tell us what character-set is expected of the "client" when nothing else is declared. The "client" is in this case the program/script which connects to the MySQL-Server.
MySQL converts all data to the proper character-set if that's necessary. We don't need to worry about what standard character-set the database has or how MySQL saves the data internally (character_set_system). What MySQL goes by, when receiving or serving data, is the character-set that was agreed upon connection (or the server's standard character-set). It then converts the data internally if it needs to. So we needn't worry about anything else. If character_set_connection is set to utf8, then we get utf8 encoded data and we have to send utf8 encoded data.
Let me take this opportunity to set another thing straight: the collation setting does not have any effect at all when sending data to the server. It only has a bearing when receiving data from the server. When a query is sorted using the "ORDER BY" - clause, the collation setting controls what criteria are applied. Check out the official MySQL documentation for more information. Was this too theoretical? Okay, here's an example:
Let's assume we have a table with 3 rows (or data sets)
and we output these columns using SELECT * FROM `table` ORDER BY `Name`
Collation latin1_german1_ci (dictionary sorting - umlauts are sorted alphabetically) returns:
Collation latin1_german2_ci (phonebook sorting - umlauts first) return:
So you see you can control the country-specific collation in order to adjust sorting of data sets.
So what do I do if MySQLDumper tells me that the character-sets of the old and new server don't match? You already have an important advantage: You know what character-set the new MySQL-Server expects!
So, what's to be done?
Open the file in TextPad (or NotePad++) and click File / Save As. In the lower part of the menu you can change the character-set by changing the Encoding setting. Default equals latin1. Having saved the file in the correct character-set, you can (for example) use 7Zip (free) to gzip it, upload it to the server and then use MySQLDumper to feed the data into the (new) database.
So what do we do in order to improve MySQLDumper in regards to this?
MySQLDumper has gained a reputation as being easy and safe to use and I want it to stay that way.
So in this case MySQLDumper does not offer a perfect and always-working solution. And that's something we want to change of course.