The Umlaut Problem - How To Successfully Back Up And Restore MySQL Databases With Special Characters Using MySQLDumper - Page 3

More information about MySQLDumper

Based on the described experiences and facts, I hope to achieve the following solution:

  • MySQLDumper (as well as you) needs to know, what character-set the backup file was saved as. Only then can it tell whether the backup is encoded in the same character-set as the new MySQL-Server expects it to be. It would then calibrate the connection character-set.
  • In order to achieve this, MySQLDumper will (in the future) write the character-set name as a comment in the backup file while creating the backup.- When restoring the backup file to a database, it will then check for the character-set that was used and will compare that to the character-set of the new MySQL-Server.
  • With MySQL 4.1 and newer it's possible to tell the server, what character-set the data which it's receiving is encoded in.
  • With MySQL versions prior to 4.1 that is not possible and so in such cases it's up to the user to save the backup file in the required character-set.
    And anyway, there's more than only that reason why I strongly recommend to use at least version 4.1. Anything older than 4.1 is too old and any web host offers you the possibility to switch to a newer MySQL server. (If yours does not, then you still should switch. Switch to another web host.)
  • This applies to backups made with other programs too - the user has to specify the character-set because it's impossible for the program to know.
  • I will attempt to implement some kind of automatic check for umlauts and character-set. Whether that is going to work out, I can't say yet.
  • PHP-modules such as mbstring, which would be able to recognize the character-set of a file, do exist. However, they don't always work reliably. And expecting a server to have certain modules in order for MySQLDumper to work, does not fit in our concept of MySQLDumper. So this solution can't be considered.

This way the Dumper would remain pretty much automatic (the user needn't make any setting changes with the Dumper knowing what to do already), and it would still be very flexible and compatible with backup files created with other scripts / programs.

I hope that this helps make things easier for the user. I will take on this problem and add these functions to MySQLDumper during my next holiday. At the moment the only solution is to manually adjust the character-set of the file.

 

One (second to) last thing

I have ignored MySQL-Version 4.0.x in this article. Utf8 was introduced in this version, but only rudimentarily and it was in an out and out test-phase. I have experienced an incredible amount of problems with MySQL 4.0.x - Servers. Some of them didn't behave in accordance with the documentation and caused me to grow quite a few gray hairs. The official website states:

Because version 4.0.* of MySQL Server are in such low demand we have decided to stop hosting binaries of these older versions.

And there doesn't seem to be any documentation for these versions any more. I however believe that these versions were simply still too buggy and those problems were solved in version 4.1 and later. So MySQL deliberately do not offer this version for download, because it just caused too many problems in regards to character-sets / encoding.

So, should you be on a server still running MySQL 4.0.x, pester your web host until they upgrade to at least version 4.1. Version 4.0.x carries a whole lot of problems, that in some cases cannot really be solved by the user when developing a script / program. This at least has been my experience.

 

Closing words

Hopefully now you have learned that, as a responsible admin you should know the character-set of your backups. Only then do you know for sure that your data can be restored properly if need be. Considering the amount of information needed to correctly judge the cause of a problem (What was the version of the source server? What format was used when saving the backup file?), it becomes obvious why so many attempts to help (without knowing that information) are doomed to fail. It's the same for me too. (Just for the sake of fun, check out the help-threads in some forums - virtually nobody asks what character-set the backup file is saved in. Everyone's trying to help, but they can't without that information. Everyone's trying around, but usually without getting anywhere.)

But if you have read all of this carefully you are now armed with the necessary background knowledge and tools to accomplish most any kind of server-change. Having spent hours, typing all of this to a point of aching fingers, I hope you know the facts now. Do something with that knowledge.

I wish you forever-successful backups!

Share this page:

2 Comment(s)