Add new comment

Want to support HowtoForge? Become a subscriber!
Submitted by Stefan Hinz (not registered) on Fri, 2008-11-14 00:04.

There's a wrong assumption right at the beginning:

"Since version 4.0 MySQL uses utf8 to store data internally."

This is not true. What's true is this:

"mysqldump from MySQL 4.1.2 or later uses utf8, and earlier versions use latin1."

But that's from the MySQL Manual. ;-) See: http://dev.mysql.com/doc/refman/4.1/en/mysqldump.html

There are more wrong assumptions in this article, and generally I find it more confusing than helpful.

The mysqldump client program takes into consideration a whole bunch of character set-related settings when backing up data, and it does that by default, as can be seen in this example:

aphrodite~> mysqldump test > testdump.sql
aphrodite~> cat testdump.sql
-- MySQL dump 10.13  Distrib 5.1.26-rc, for redhat-linux-gnu (i686)
--
-- Host: localhost    Database: test
-- ------------------------------------------------------
-- Server version       5.1.26-rc-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
...

When restoring data backed up this way to a MySQL server running 4.1.01 or later, this should work with no issues "by default", so to speak.

Of course, there are more things that could go wrong. For example, the original data might be from a table that has a latin1 character set, or no explicit character set at all (MySQL prior to 4.1), and the target table might have UTF-8. This would still work one-way for many characters (including German umlauts, probably), but would certainly fail the other way around (when restoring to a MySQL server < 4.1).

For more information (including many examples), please refer to the Character Set Support section in the Internationalization and Localization chapter of the MySQL Reference Manual: http://dev.mysql.com/doc/refman/4.1/en/charset.html

By the way, looking at that page will make you aware that there is in fact documentation available for MySQL 4.0, and even 3.23, while the article says there isn't. Also, the article says that MySQL 4.0 charset behavior differs from what's documented -- if that's true, please say what those differences are, and file a bug against the documentation. Thanks!

Regards,

Stefan (Sun Database Group documentation team)

 

Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.

Reply

*
*
The content of this field is kept private and will not be shown publicly.


*

  • Images can be added to this post.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <div>
  • Lines and paragraphs break automatically.