UTF-8 is your friend – Part 2

In part 1 of this blog post series we have spoken about integrating the UTF-8 character set in your server side scripts. In this second part, we will introduce how to integrate UTF-8 in your database using MySQL as an example.

After having collected your data using PERL (or any other programming language) and successfully transformed the foreign character set encoded data to UTF-8, you will most probably want to store them into a backend, such as a database. In our case, we store this data in a MySQL database. For the data to be stored correctly it is of utmost importance that your database or the table that will contain those data is configured for the UTF-8 character set. Unfortunately, MySQL uses by default ISO-8859-1 (a.k.a. latin1), so you will definitely need to change the encoding to UTF-8 if you don’t want to end up with scrambled and weird characters in your tables.

If you want to enable UTF-8 for your whole database, which is by the way the simplest and safest method, you will have to append “CHARACTER SET UTF8” to your “CREATE DATABASE” SQL statement. An example is listed below for a fictive database named “mydb”.

CREATE DATABASE mydb CHARACTER SET UTF8;

It is also possible to configure UTF-8 on a finer level enabling this character set for specific tables of your database by adding “DEFAULT CHARSET=UTF8” to the “CREATE TABLE” statement as shown below for the fictive “mytable” table.

CREATE TABLE mytable (
        col1 VARCHAR(255) NOT NULL,
        col2 VARCHAR(255) NOT NULL
) DEFAULT CHARSET=UTF8;

If you have already created your database you are out of luck, because there is no way to modify the character encoding of the database once it is created. In that case you will need to recreate the whole database or table.

As you can see here, adding the default character encoding while creating your database or table is actually all you need to do to be able to store UTF-8 encoded data into a MySQL database. This is as simple as it gets about storing UTF-8 in a SQL database. In the third and last part of this blog post series we will see how to get UTF-8 correctly output to the web user.

You might also want to read:

  1. UTF-8 is your friend – Part 1

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">