Evolving A Database With MySQL

Willem Bogaerts - Kratz Business Solutions

Summary

In present-day programming, the data storage is far from the static beast some people believe it once was. Lots of applications have their own piece of data storage in relational databases and the shape of that storage alters as often as the applications themselves: daily. The only problem is that while the structure must change often, the data just has to morph itself around that, and stay essentially what it was, but in the new form.

In this howto, I will show you how to "grow" your own database definition in a way that is maintainable. I will make so-called "recreate" scripts. These scripts can be used to both create a database from scratch and update an existing database. When I say database definition, it will not be limited to table structure, we will "grow" the data also. But not all of it. Only what's necessary.
Why a recreate script? I have had the need for a clean state too often, and I do not want to just copy an existing database to clean it up. These cases include rolling out on a test server, working at home, or the customer asking us to make a version also for their new foreign office. On the other hand, I do want my databases to survive. Even on my development machine. I have entered a lot of useful test data and I don't want to start with a clean database every so often. Off course, the live situation must be painlessly updated as well. A repeatable script helps you even there: if something goes wrong, just fix the error in the script an re-run it. You don't have to do a painful rollback first (structure changes cannot be rolled back by transaction statements).
Interestingly, it does not matter whether you do agile development or not. If you develop in an agile fashion, you will value the database evolution as a natural process, but you will see the light also if you tumble down the waterfall. Applications evolve, and the database will evolve with them. Whether you like it or not.

This howto covers a few years of my own experience. I found it useful enough to share it, and it may be less useful to you. Even so, it may help you think about your own way of development and shine a new light on it. It is not the only way to go, but it enabled me to update websites in two minutes in a controlled way, where others would have done that in two hours and prayed nothing went wrong.

When I was working with my former employer, we never updated a site on a friday, for fear that we had to sacrifice a weekend fixing things. Now, I can easily update a site while I'm on the phone with the customer.

 

Types of Data

There are many types of data, when categorized into what must be done with it. I recognize the following kinds of data:

definition data
Definition data consists of both system data (needed for the program to run) and lookup tables. Definition data is usually well indexed and especially system data must be "grown" with the application. Some universal lookup data can be grown as well, for instance a list of countries. Definition data changes hardly during the run of a program, if at all, and it is often safe to load it into memory at the startup of a program and keep it in memory. Some are settings, and they must be created if they do not exist, but not altered when they do.
live data
Live data changes by definition during the run of an application and are therefore the biggest challenge to the programmer. Live data forces you to use transactions and all kinds of checks for multi-user systems. They are never stored in a recreate script. But the recreate script has to take into account that live data exists and sometimes you will have to provide a stored routine to upgrade it.
test data
Test data fall apart into 3 categories: Initially present data for the client to play with or for demonstration purposes, personal live data on my development machine and generic test data. Generic test data is test data that is too useful to omit on any development or test installation. So it will be in the recreate scripts also, but in a separate file that will not be run on the live installation. Note that the type of data can change. What was "initial data" can become generic test data, for instance.
log data
Log data is mainly stored and hardly ever read. Some log data may not be important to keep forever, like an error log. Once you read the log and fixed the errors, you can just clean it. In general, logs are error-resistant and sometimes not indexed at all for speed. Evolving a log is easy, as these logs do not evolve often, only grow and seldome have data that must be upgraded.

The above types of data and the different actions to be taken with them are the very reason that I do not believe in automatic database synchronization systems.

 

Modular Code

Modular code is necessary to keep the code maintainable. The fact that your database "grows", does not mean it has to grow etirely in one project. Pieces may come from the standard library, along with the application code that uses them.
Unfortunately, SQL itself is not exactly modular. There are two ways to overcome this: by using the SOURCE command of the mysql command-line client or by using an external program as a pre­processor. Personally, I prefer the pre­processor way, as it can be used with other programs than the MySQL command-line client and it makes error line numbers meaningful. You can download the pre­processor scripts I use: one in command-line PHP and one in Python.
Modular code allows you to put large pieces of SQL in different files. So you can put translations in their own file, Object-Relational-Mapping settings in another, have special files for packages from the standard library, etc. Without this, maintaining SQL would hardly be feasible. A master script might look like:

CREATE DATABASE IF NOT EXISTS YourDatabase DEFAULT CHARSET utf8;
USE YourDatabase;
FLUSH TABLES; -- See remark
-- From the standard library:
-- @include(errorhandling/recreate.sql) Definition for error log tables.
-- @include(orm/recreate.sql) Object Relational Mapping tables
-- @include(language/recreate.sql)
-- @include(language/ormsettings.sql)
-- @include(language/standarddata.sql)
-- @include(template/recreate.sql)
-- @include(template/ormsettings.sql)
-- Project specific definitions:
CREATE TABLE IF NOT EXISTS Translation
      (intTranslationId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       intLanguageId INTEGER UNSIGNED NOT NULL,
       vchKeyword VARCHAR(100) NOT NULL,
       txtTranslation TEXT,
       INDEX idxLanguageid(intLanguageId),
       UNIQUE INDEX idxKeywordLanguageid(vchKeyword,intLanguageId),
       CONSTRAINT fkyTranslation_LanguageId FOREIGN KEY(intLanguageId)
               REFERENCES UserLanguage(intUserLanguageId)
               ON DELETE RESTRICT
               ON UPDATE RESTRICT );
-- @Include(specific/translations.sql) Large file
-- @include(specific/ormsettings.sql) Large file
-- @include(specific/templatedata.sql) Generated file. Generated from template files.
-- @include(specific/grants.sql) Comes last: the referred tables must exist. See also Caveat.

 

Drop and Recreate

The simplest form of making a script repeatable is to drop the tables if they exist and create them anew. Before MySQL 5.0, this was about the only way as well. Make use of the IF (NOT) EXISTS clauses in the various CREATE and DROP commands. For some tables, like the error log mentioned earlier, it may even be the preferred way:

DROP TABLE IF EXISTS ErrorLog;
CREATE TABLE ErrorLog
      (intErrorLogId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       txtMessage TEXT,
       dtmWhen DATETIME NOT NULL )
ENGINE=InnoDB;

But most of the times, you do not want the tables to be dropped, as you loose all data in them as well. So this is not an option for tables containing live data. I will show later how to preserve the data by using stored procedures.

 

Sort Order for Foreign Keys

One important thing to remember is that any foreign key must point to an existing primary key. So drop the dependent tables before the referenced tables and create a referenced table before you create the table that depends on it:

DROP TABLE IF EXISTS ErrorStackTrace; -- Depends on ErrorLog
DROP TABLE IF EXISTS ErrorLog;
CREATE TABLE ErrorLog
      (intErrorLogId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       txtMessage TEXT,
       dtmWhen DATETIME NOT NULL )
ENGINE=InnoDB;
CREATE TABLE ErrorStackTrace
      (intErrorStackTraceId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       intErrorLogId INTEGER UNSIGNED NOT NULL,
       intLine INTEGER UNSIGNED DEFAULT NULL,
       vchFile VARCHAR(150) DEFAULT NULL,
       INDEX idxErrorLogId(intErrorLogId),
       CONSTRAINT fkyErrorStackTrace_ErrorLogId FOREIGN KEY(intErrorLogId)
               REFERENCES ErrorLog(intErrorLogId)
               ON DELETE CASCADE
               ON UPDATE CASCADE )
ENGINE=InnoDB;

While we are at it, try to get used to give everything a name. You will need it later, when we get to the stored procedures. Index names need to be only unique within the table, but foreign key constraints need to be unique throughout the whole database. I am fully aware that this causes really ugly constraint names to be used. Only the primary key needs no name, as it is always called "PRIMARY" in MySQL.

 

Relations

Especially evolutionary database development forces you to define your tables right. Put those uniqueness constraints where you need them, and do not forget to name everything that you define. That includes indexes and foreign keys. This will help greatly when putting test data and even system data into te database.
Off course, there are no fixed primary key values (you do use autonumbers as primary keys, I hope), unless you define a table representing an enum in your code.
There are two great features of MySQL to be used with relations: the LAST_INSERT_ID() function and SQL variables. Let me remind you that the variables remain defined as long as the connection is there: you can use them throughout the entire script. An example:

CREATE TABLE IF NOT EXISTS UserLanguage
      (intUserLanguageId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       vchIsoCode VARCHAR(7) NOT NULL COMMENT 'Two-letter code with dialect',
       vchName VARCHAR(100) NOT NULL COMMENT 'in the language itself',
       UNIQUE INDEX idxIsoCode(vchIsoCode),
       UNIQUE INDEX idxName(vchName) )
ENGINE=InnoDB;
INSERT INTO UserLanguage(vchIsoCode,vchName)
       VALUES('NL', 'Nederlands')
       ON DUPLICATE KEY UPDATE intUserLanguageId=LAST_INSERT_ID(intUserLanguageId),
                               vchName=VALUES(vchName);
SET @DutchId=LAST_INSERT_ID();

The INSERT .. ON DUPLICATE KEY construct will be covered in more detail below, but note that the LAST_INSERT_ID() function is used here to set the last inserted ID to the right value in case the record already existed.
We can now use @DutchID wherever we want the ID of the dutch language record.
Off course, instead of using LAST_INSERT_ID(), you can also use a SELECT .. INTO statement to set a variable:

SELECT intUserLanguageId INTO @DutchId FROM UserLanguage WHERE vchIsoCode='NL';

 

Inserts and Updates

MySQL has a few nice additions to the standard INSERT and UPDATE commands. Lots of commands kan have an IGNORE clause, and the INSERT command kan have an ON DUPLICATE KEY clause. These clauses are very useful to make sure some data exists (settings, for example) or is set to a certain value (for system data). The INSERT command also can be used to insert more than one row at a time, which can enhance the legibility of your code. Off course, the LAST_INSERT_ID() can return only one value, so you will have to query for any bulk-inserted values if you want them later.

 

Making Sure Records Exist: INSERT IGNORE

This is a very useful construct for settings. The following example makes sure some settings exist with a default value, but that value will not be overridden when it exists and is changed:

CREATE TABLE IF NOT EXISTS Setting
      (intSettingId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       vchName VARCHAR(100) NOT NULL,
       txtValue TEXT,
       UNIQUE INDEX idxName(vchName) )
ENGINE=InnoDB;
INSERT IGNORE INTO Setting(vchName,txtValue)
       VALUES('ThumbnailWidth', '32'),
             ('ThumbnailHeight', '32'),
             ('PasswordHashSalt', 'Something');

The above snippet can be run as often as you like, because the IF NOT EXISTS clause will only create the table once and the IGNORE clause will insert the data only once. This works because there is a uniqueness constraint on the setting name field. The IGNORE clause means: silently ignore this insertion when it violates a constraint. It is not a general error suppressor: your script will still fail if it contains a syntax error. But that is no problem, is it? Just fix that error and re-run the script. That is why it is repeatable. The nice thing is that the IGNORE clause works on a row-by-row basis: each row is checked for a constraint violation and only the violating rows are ignored.

Alas, this is not the case with the DELETE IGNORE command. This command should be useful for deleting rows that do not violate foreign key constraints, but one violation will stop the whole command. So DELETE IGNORE is best used in stored procedures in a loop.

 

Making Sure Fields Have the Right Value: INSERT .. ON DUPLICATE KEY

Quite often, we need some data to be present for the application to function properly. Think of Object-Relational Mappings, program captions, choice lists, etc. These data also must have the right value and have to be updated if they had another value. For example, an option table could look like:

CREATE TABLE IF NOT EXISTS SelectionOption
      (intSelectionOptionId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       vchSelection VARCHAR(50) NOT NULL,
       vchOptionValue VARCHAR(50) DEFAULT NULL,
       vchOptionCaption VARCHAR(150) NOT NULL,
       UNIQUE INDEX idxSelectionValue(vchSelection,vchOptionValue),
       UNIQUE INDEX idxSelectionCaption(vchSelection,vchOptionCaption) )
ENGINE=InnoDB;
INSERT INTO SelectionOption(vchSelection,vchOptionValue,vchOptionCaption)
       VALUES('boolean', 'false', 'No'),
             ('boolean', 'true', 'Yes'),
             ('tristate', 'false', 'No'),
             ('tristate', 'true', 'Yes'),
             ('tristate', NULL, 'Unknown')
       ON DUPLICATE KEY UPDATE vchOptionCaption=VALUES(vchOptionCaption);

Note that we have two uniqueness constraints here: one "structural" (no two options can evaluate to the same value within a selection) and one "behavioral": the user must be able to see the difference between any two options in a selection. This is a situation that might deliberately fail if two different options in the same selection share the same caption. This is OK, because we do not want that situation to be present in our database.
So what happens if you run and re-run the above snippet? If the options do not exist yet, they are added. If they do exist, the first (structural) uniqueness constraint is violated and only the caption is updated. This allows us to just alter a caption in the above statement and it will be corrected in the database whenever the script is run. If the new caption is not the same as any other caption in the same selection, this update will succeed. If not, we have something to repair and the script will fail. No problem. Just repair the caption and re-run the script.

There are two special functions that come in handy in the ON DUPLICATE KEY clause: LAST_INSERT_ID() and VALUES(). The VALUES() function returns the value that would have been inserted if there was no constraint violation, which is the value we supplied. As you see in the snippet, this is especially useful for multi-row inserts. Note that this also works for INSERT .. SELECT statements.
The LAST_INSERT_ID() function can be used without a parameter to get the last inserted autonumber, but if you supply it an integer, you will set the value, which it returns until either an INSERT command or another LAST_INSERT_ID call changes it. This works nice in combination with an ON DUPLICATE KEY clause: if the row does not exist (no uniqueness constraint is violated), the record is inserted and the ON DUPLICATE KEY part does not run for that record. But if the record exists, the ON DUPLICATE KEY part is run, and you can feed the LAST_INSERT_ID function with the existing primary key value:

INSERT INTO UserLanguage(vchIsoCode,vchName)
       VALUES('NL', 'Nederlands')
       ON DUPLICATE KEY UPDATE intUserLanguageId=LAST_INSERT_ID(intUserLanguageId),
                               vchName=VALUES(vchName);

After this statement, the LAST_INSERT_ID() function returns the ID for the dutch language record, either because it was just inserted or because the LAST_INSERT_ID function was set to it. Remember, if you use a field in the ON DUPLICATE KEY clause, the existing value is used. Use the new value with VALUES(<field>).

 

It's All There in the information_schema

Since MySQL 5, there is a database containing almost everything there is to know about the other databases: information_schema. So if you want to check if a column is already defined, you can just look it up. There is one caveat in Windows: if you forgot to set lower_case_table_names to 2 in my.ini directly after the installation of MySQL, the database name can appear in lower case. As most scripts also have to run on Windows, I take this into account. Off course, if you want to look up something, it must have a name. This is why every index and every foreign key must have a name: otherwise, it would be very hard to check for it and even harder to remove it.

 

Alas We Need Them: Stored Routines

Until now, MySQL has greatly helped us with IGNORE and IF EXISTS clauses. These do not apply when adding or removing columns, alas. There is no such thing as ALTER TABLE .. DROP COLUMN IF EXISTS. That means we will have to query the information_schema database and act upon its contents. As we cannot use IF statements outside stored routines, it makes sense to define a temporary procedure.

As an example, let us suppose the customer wants to be able to set the default option in the earlier given SelectionOption Table. We decide to do this by extracting a Selection table and by making the SelectionOption table depend on it. We will create a "backreference": The selection options refer to the selection, but the selection table also has a column that refers to the default option:

CREATE TABLE IF NOT EXISTS Selection
      (intSelectionId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       vchSelection VARCHAR(50) NOT NULL,
       intDefaultOption INTEGER DEFAULT NULL, -- Cannot have a foreign key constraint.
       UNIQUE INDEX idxSelection(vchSelection) )
ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS SelectionOption
      (intSelectionOptionId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       intSelectionId INTEGER UNSIGNED NOT NULL,
       vchOptionValue VARCHAR(50) DEFAULT NULL,
       vchOptionCaption VARCHAR(150) NOT NULL,
       UNIQUE INDEX idxSelectionIdValue(intSelectionId,vchOptionValue),
       UNIQUE INDEX idxSelectionIdCaption(intSelectionId,vchOptionCaption) )
ENGINE=InnoDB;
DROP PROCEDURE IF EXISTS UpgradeSelectionOption;
DELIMITER //
CREATE PROCEDURE UpgradeSelectionOption()
       MODIFIES SQL DATA
       COMMENT 'Last modified: 20090924' -- See remark about cleaning up.
       BEGIN
       IF 0=(SELECT COUNT(0) FROM information_schema.COLUMNS
                    WHERE UPPER(TABLE_SCHEMA)=UPPER(DATABASE()) -- See remark about lower_case_table_names
                          AND TABLE_NAME='SelectionOption'
                          AND COLUMN_NAME='intSelectionId' ) THEN
          -- First, fill the new Selection table:
          INSERT IGNORE INTO Selection(vchSelection)
                 SELECT DISTINCT vchSelection FROM SelectionOption; -- This field still exists.
          -- Add te new column, but allow it to be empty:
          ALTER TABLE SelectionOption
                ADD COLUMN intSelectionId INTEGER UNSIGNED DEFAULT NULL
                AFTER intSelectionOptionId;
          -- Fill the new column:
          UPDATE SelectionOption INNER JOIN Selection ON SelectionOption.vchSelection=Selection.vchSelection
                 SET SelectionOption.intSelectionId=Selection.intSelectionId;
          -- Now we can safely disallow empty values.
          ALTER TABLE SelectionOption
                MODIFY COLUMN intSelectionId INTEGER UNSIGNED NOT NULL;
          -- Repair the indexes:
          ALTER TABLE SelectionOption
                DROP INDEX idxSelectionCaption,
                DROP INDEX idxSelectionValue,
                ADD UNIQUE INDEX idxSelectionIdValue(intSelectionId,vchOptionValue),
                ADD UNIQUE INDEX idxSelectionIdCaption(intSelectionId,vchOptionCaption);
          -- Add the relation:
          ALTER TABLE SelectionOption
                ADD CONSTRAINT fkySelectionOption_SelectionId FOREIGN KEY(intSelectionId)
                            REFERENCES Selection(intSelectionId)
                            ON DELETE CASCADE
                            ON UPDATE CASCADE;
          -- Finally, drop the old column:
          ALTER TABLE SelectionOption
                DROP COLUMN vchSelection;
       END IF;
       END//
DELIMITER ;
CALL UpgradeSelectionOption();
DROP PROCEDURE UpgradeSelectionOption;
INSERT IGNORE INTO Selection(vchSelection)
       VALUES('boolean'),
             ('tristate');
SELECT intSelectionId INTO @SelectionId FROM Selection WHERE vchSelection='boolean';
INSERT INTO SelectionOption(intSelectionId,vchOptionValue,vchOptionCaption)
       VALUES(@SelectionId, 'false', 'No'),
             (@SelectionId, 'true', 'Yes')
       ON DUPLICATE KEY UPDATE vchOptionCaption=VALUES(vchOptionCaption);
SELECT intSelectionId INTO @SelectionId FROM Selection WHERE vchSelection='tristate';
INSERT INTO SelectionOption(intSelectionId,vchOptionValue,vchOptionCaption)
       VALUES(@SelectionId, 'false', 'No'),
             (@SelectionId, 'true', 'Yes'),
             (@SelectionId, NULL, 'Unknown')
       ON DUPLICATE KEY UPDATE vchOptionCaption=VALUES(vchOptionCaption);

I deliberately chose an example that involves repairing a relation. As you see, the new Selection table is defined before the SelectionOption table because of the dependency.
When this script is run for the first time, the tables will be created and the stored procedure will run. The first statement in the stored procedure sees that the intSelectionId column already exists, and the upgrade will be skipped. After that, the data is ensured to be there in the new structure.
If the SelectionOption table from the earlier example existed before running the script, the table "Selection" is created by the first CREATE TABLE statement. The second CREATE TABLE statement will do nothing, as the "SelectionOption" table already exists. The stored procedure will run, notice that the "intSelectionId" column is missing, and perform the upgrade. After that, the data is ensured to be there in the new structure. Note that if the table contained more than the data given in the scipt, that existing data is nicely migrated.

Tip: I used only one IF statement in the above example for clarity. In practice, I would use an IF statement for the columns, one for the indexes and one for the foreign key constraints. The reason for this is that the upgrade might fail, and re-running the script should fix that after the error is corrected.

 

Procedure templates

Stored procedures are quite powerful and it takes some study to learn the full potential of them. Let me give you a few templates that can be used for the most elementary tasks, such as adding or removing columns, indexes and foreign keys. With such templates in your favourite programmer's editor, it is quite easy to write simple procedures.
The first template is for an empty temporary procedure:

DROP PROCEDURE IF EXISTS Upgrade<TableName>;
DELIMITER //
CREATE PROCEDURE Upgrade<TableName>()
       MODIFIES SQL DATA
       COMMENT 'Last modified: '
       BEGIN
       -- Your magic goes here...
       END//
DELIMITER ;
CALL Upgrade<TableName>();
DROP PROCEDURE Upgrade<TableName>;

The "magic" in this snippet is usually built up of one of the three snippets below. The first is for "missing columns": it checks if a column does not exist in a table. Note that the database name is checked in a case-insensitive manner to work around a bug involving lower_case_table_names on Windows:

IF 0=(SELECT COUNT(0) FROM information_schema.COLUMNS
             WHERE UPPER(TABLE_SCHEMA)=UPPER(DATABASE())
                   AND TABLE_NAME='<YourTableName>'
                   AND COLUMN_NAME='<YourColumnName>' ) THEN
   -- Your magic goes here ...
END IF;

And the magic will often just be an ALTER TABLE command that adds the mising column. If you want to check for existing columns, just replace the "0=" with "1=" or "0<" before the SELECT query. We can do the same for a missing index:

IF 0=(SELECT COUNT(0) FROM information_schema.STATISTICS
             WHERE UPPER(TABLE_SCHEMA)=UPPER(DATABASE())
                   AND TABLE_NAME='<YourTableName>'
                   AND INDEX_NAME='<YourIndexName>' ) THEN
   -- Your magic goes here ...
END IF;

If you are adding a uniqueness constraint, use ALTER IGNORE TABLE to add it. That way, the rows violating the new constraint are silently removed.
The last snippet is for a for missing foreign key constraints:

IF 0=(SELECT COUNT(0) FROM information_schema.TABLE_CONSTRAINTS
             WHERE UPPER(TABLE_SCHEMA)=UPPER(DATABASE())
                   AND TABLE_NAME='<YourTableName>'
                   AND CONSTRAINT_NAME='<YourConstraintName>') THEN
   -- Your magic goes here ...
END IF;

 

Caveats

Tight and Loose Coupling in SQL

In the example master script, I have made the translation table depend on the language package from the standard library:

CREATE TABLE IF NOT EXISTS Translation
      (intTranslationId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       intLanguageId INTEGER UNSIGNED NOT NULL,
       vchKeyword VARCHAR(100) NOT NULL,
       txtTranslation TEXT,
       INDEX idxLanguageid(intLanguageId),
       UNIQUE INDEX idxKeywordLanguageid(vchKeyword,intLanguageId),
       CONSTRAINT fkyTranslation_LanguageId FOREIGN KEY(intLanguageId)
               REFERENCES UserLanguage(intUserLanguageId)
               ON DELETE RESTRICT
               ON UPDATE RESTRICT )
ENGINE=InnoDB;

But what if I wanted to write a translation package in the standard library? I could use the exact same definition, but then the translations would always depend on the same language table, which may not be desirable. You may want to write a translation package that also works with other language tables, like the one in a third-party CMS. To do that, I deliberately omit the foreign key constraint in the package:

CREATE TABLE IF NOT EXISTS Translation
      (intTranslationId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
       intLanguageId INTEGER UNSIGNED NOT NULL,
       vchKeyword VARCHAR(100) NOT NULL,
       txtTranslation TEXT,
       INDEX idxLanguageid(intLanguageId),
       UNIQUE INDEX idxKeywordLanguageid(vchKeyword,intLanguageId) )
ENGINE=InnoDB;

As you see, no table name and no field name are mentioned anymore for the language, so we can use any table or even combination of tables to provide the language ID.
This is equivalent to loose coupilng in programming languages: specifying a dependency that is so general that the exact details can be filled in later. The first definition was an example of tight coupling: a dependency that was fully defined and therefore fixed.
Note that while your standard library might omit foreign key checks to make the packages more universally applicable, there is nothing that stops you from adding the constraints in your project master scripts. Those constraints are now really project-specific.

 

DELETE IGNORE

DELETE IGNORE does not work on a row-by-row basis like INSERT IGNORE does. The command may be useful to clean up initially present test or demo data. If you select single records, this works OK. But with selecting multiple records, the first record that cannot be deleted due to a foreign key violation will stop the entire command.

Therefore, it is best to delete multiple records in a procedure. The following example tries to delete all companies that were created by the demo user. If the company cannot be deleted because of a foreign key constraint, that company is probably still in use and should be left intact.
Note that the last DELETE IGNORE will actually be executed twice, once with the last cursor row and once after the continue handler is processed.

DROP PROCEDURE IF EXISTS CleanupCompanies;
DELIMITER //
CREATE PROCEDURE CleanupCompanies()
       MODIFIES SQL DATA
       COMMENT 'Clean up demo data.'
       BEGIN
       DECLARE blnDone BOOL DEFAULT FALSE;
       DECLARE intCurrentId INTEGER UNSIGNED;
       DECLARE curCompaniesToBeCleaned CURSOR FOR
               SELECT intCompanyId FROM CompanyCleanup;
       DECLARE CONTINUE HANDLER FOR NOT FOUND SET blnDone=TRUE;
       CREATE TEMPORARY TABLE CompanyCleanup
              ENGINE=Memory
              AS SELECT intCompanyId FROM Company INNER JOIN SiteUser ON intCreatedByUserId=intSiteUserId
                        WHERE vchLogin='DemoUser';
       OPEN curCompaniesToBeCleaned;
       REPEAT
              FETCH curCompaniesToBeCleaned INTO intCurrentId;
              DELETE IGNORE FROM Company WHERE intCompanyId=intCurrentId;
              UNTIL blnDone
       END REPEAT;
       DROP TEMPORARY TABLE CompanyCleanup;
       END//
DELIMITER ;
CALL CleanupCompanies();
DROP PROCEDURE CleanupCompanies;

 

GRANTS and the Query Log.

The query log is one of the beautiful features of MySQL: it contains all the queries that are sent to the server. It is a great feature for debugging, and is therefore not enabled by default. If you want to enable the query log on the live server (say, for finding out what went wrong), you will have to realize that this may come with security risks. Anyone who can read the log could easily look for GRANT, CREATE USER or SET PASSWORD statements, for example.
Luckily, you can switch logging off temporarily:

SET @LogSave=@@session.sql_log_off;
SET SESSION sql_log_off=1; -- For security, do not allow the following statements to be logged.
GRANT SELECT ON YourDatabase.* TO 'YourUser'@'localhost' IDENTIFIED BY 'PasswordInPlaintext!';
GRANT SELECT ON YourDatabase.* TO 'YourUser'@'127.0.0.1' IDENTIFIED BY 'PasswordInPlaintext'; -- For SSH tunnels
SET SESSION sql_log_off=@LogSave;

As an alternative, you can choose to never give a plaintext password at all in the scripts. For a GRANT statement, for example, you can use the IDENTIFIED BY PASSWORD clause to use the hash found in the mysql.user table.

 

FLUSH TABLES

On some systems, I noticed that the information_schema database does not always reflect the state of the database correctly. I therefore start my scripts with a FLUSH TABLES command to know for sure I can trust the values from the information_schema.

 

Server Modes

When you write a script that has to run on more than one database instance, it is good to use SQL that works with all server settings. MySQL might run in ANSI mode or in "default" mode, and there are a lot of little variations that can be configured.
One of the first things to look at are the double quotes. A double quote is a string quote in "MySQL default" SQL, but an indentifier quote in ANSI mode, because it is also an identifier quote in the SQL standard. But single quotes are always string quotes and backticks (`) are always identifier quotes, regardless of the server setting. So if you do not need standards compliance, use backticks instead of double quotes, and it is best to never use double quotes for strings.
Another tip is to use the CONCAT() function to concatenate strings and the OR operator in boolean expressions, as the || operator can be either one, depending on the server configuration.
Furthermore it is good to always write the opening parenthesis directly after the function name and to always GROUP BY any fields you select in an aggregate query without using an aggregate function:

SELECT "Sample language ISO code: " || vchIsoCode FROM UserLanguage
       ORDER BY `PRIMARY` LIMIT 1; -- Might not work
SELECT CONCAT('Sample language ISO code: ', vchIsoCode) FROM UserLanguage
       ORDER BY `PRIMARY` LIMIT 1; -- Will work.
SELECT TRUE || FALSE; -- Might return '10' or 1, depending on server mode
SELECT TRUE OR FALSE; -- Will return 1 (=true)
SELECT COUNT (0) FROM UserLanguage; -- Might not work, depending on server mode
SELECT COUNT(0) FROM UserLanguage; -- Will work.
SELECT Street, MIN(HouseNumber) FROM Address; -- Might not work, depending on server mode
SETECT Street, MIN(HouseNumber) FROM Address GROUP BY Street; -- Will work.
-- And don't forget, comments start with minus-minus-space rather than a double minus.
SELECT 1--2; -- Might return 1 or 3
SELECT 1-- 2; -- Returns 1

 

Windows and lower_case_table_names

The lower_case_table_names setting is an attempt to make MySQL work on both case sensitive and case insensitive systems. However, this setting has the worst possible default on Windows: it is set to 1 (cripple all table names and even database names to lower case). This makes developing on Windows for a database running on Linux or BSD almost impossible, as most SHOW commands and the tables in the information_schema will return the wrong names.
So if you install MySQL on Windows, it is good to set this value to 2 (case insensitive) by adding the following line to the [mysqld] section in my.ini:

lower_case_table_names=2

Do so directly after installing MySQL, and before defining any databases. You can off course do this later, but you will notice that a lot of tables and databases remain in lower case.

Note that I do not check the case of the tables themselves. If the script has to run on systems where the lower_case_table_names setting is set to "force lower case", you will have no choice but to use only lower case names. For mixed case databases, the ALTER TABLE commands that follow the check would fail, unless you would go through the trouble to build up the right queries using prepared statements. Doing that would make the code even more difficult to read than it already is.

 

Clean Up the Routines Once They are Rolled out

The upgrade routines do not exactly enhance the legibility of your code. This is why I state a "last modified" comment in them, so I can remove them when I am sure that every installation is upgraded to that state.

 

Character Encodings

Until now, I have spoken English to my database server and used no special characters. But there are enough characters that you may want to enter into your database that are not part or the English language or the default encoding. You can start your script with a SET NAMES command to inform the server of the character encoding used in the script, but you can also set it where you need it. You can prefix the encoding indentifiers in MySQL with an underscore to announce the encoding used for a value:

-- For this to work, the encoding of the source must be utf-8:
SET @EuroSign = _utf8 '€';
-- The same character, but this statement can even be re-encoded and will still work:
SET @EuroSign = _utf8 0xE282AC;
INSERT IGNORE INTO Translation(intLanguageId,vchKeyword,txtTranslation)
       VALUES(@EnglishId, 'PriceRemark', CONCAT('All prices are given in ', @EuroSign, '.'));

If you want to know the hexadecimal value for a character, use a query like this in a database front-end program with good encoding support:

SELECT HEX('€');

Prefix the result with "0x" to get the hexadecimal string representation. You can use the BINARY keyword for dealing with data that you do not want to be encoded:

#!/bin/bash
CONTENT=0x`hexdump -v -e '1 1 "%02X"' Delete16.png`
QUERY="INSERT IGNORE INTO Images(vchName,blbContent) VALUES('Delete', BINARY $CONTENT)"
echo $QUERY | mysql -u root -p YourDatabase

 

It's All SQL

Or is it? Everything that is stored in a database has had the form of SQL, but that may not be the optimal form. You could, for instance, store HTML templates in the database for a content management system. Those templates are more useful as separate files during development, as your programmer's editor is usually a better tool to edit them than your database frontend. You can use a script to "compile" these template files into SQL inserts and just include that generated file in the master script.
We are really close to using a sort of "build" script now, that will update the source from the version control server, generate the included scripts if necessary and run the resulting script on the database. I do this on both the development and test machines, and on the live server. On the live server, this script does not run automatically, so the site is only updated if I choose to.
On the test machine, the script is run automatically by a CRON job, and the results are sent to our monitoring system, along with the results of the unit tests.
On the development machine, I often run the script by hand (without the version control update) after I made some SQL changes and it also runs before shutdown, directly after the version control update.
A query generation script could look like:

#!/bin/bash
# Creates the templates insertion script.
# The template files are all in one directory and are
# named after their code, but with an extension .html
# for ease of editing.
# The contents are left-trimmed for spaces.
HERE=`dirname $0` # so we can refer to paths relative to this script
SQL_SCRIPT="$HERE/../../database/specific/templates.sql" # output
ESCAPER=$HERE/../database/mysqlescape.php
function CodeOf # filename
        {
         basename $1 | egrep -o '^[^\.]{1,}' # Everything before the first dot
        }
function CompactContentOf # text
        {
         egrep -o '[^ ].*$' < $1 | $ESCAPER
        }
echo '-- Generated file. Edit the templates, not these queries:' > $SQL_SCRIPT
for TEMPLATE in $HERE/../../documentation/templates/*.html;do
    echo "INSERT INTO Template(strCode,txtContent)" >> $SQL_SCRIPT
    echo "       VALUES('`CodeOf $TEMPLATE`', _utf8 `CompactContentOf $TEMPLATE`)" >> $SQL_SCRIPT
    echo "       ON DUPLICATE KEY UPDATE txtContent=VALUES(txtContent);" >> $SQL_SCRIPT
done

Be sure to leave the generated SQL scripts out of your version control system by adding them to your "ignore" list. Otherwise, you will almost certainly spend a lot of time resolving conflicts. The MySQL escaper used above can be downloaded in command-line PHP or in Python.

Share this page:

0 Comment(s)