Monday, April 10, 2006

UTF8 CSV files.

We're getting lists to be imported into MySQL in Excel (.xls) formats.
When we use MS Office to export it into CSV,
reading the data gives us gibberish, because
the default encoding is plain old ASCII.

There is not obvious setting available to change the ASCII to UTF8
for the csv file in Microsoft Office.

So I tried Open Office to see if it offers anything better.
In Save As, Change the Type to CSV, then the 'Edit Filter Settings'
checkbox will be enabled. Check this, and when you click save,
a dialog will appear requesting which Character Set to use.

And sure enough, it works like a charm.
The CSV loads up and saves into MySQL OK.

Hooray Open Office. Works better for multilingual requirements.

BTW, if anyone has the solution to export as UTF8 CSV from
MSOffice, please post it here.


15 lewsers:

Bruno Vernay said...

Encoding problems are really annoying. Why isn't everything still not in UTF-8 ?
I had the same problem of EXCEL openning CSV in ASCII not even asking if it is UTF-8. At least, OpenOffice offers the option to open the csv as UTF-8. It could scan the file to find UTF8 characters, but maybe next version ...
I still have a problem to inject a UTF-8 dump into mysql. I used both schell>mysql < dump.sql and mysql>source dump.sql but I get garbage !
the client is on a Linux machine and the serveur on a Windows one.
If you have an idea ?

Anonymous said...

I tried importing a .CSV file containing Chinese(Simplified) characters into MSExcel worksheet in my .NET application. And I had the same problem as others had. The Chinese character sets in the database and the .CSV file dont match. The BOM of the .CSV file is fine however.It is EF BB BF, still the chinese language encoding seems to be different. Only after saving the .CSV with UTF8 encoding throu the file conversion wizard in Word or Excel I get the original characters. Now My problem is that I need to force the action of the File Conversion Wizard in .NET programmatically so that the right language encoding is chosen..Please do tell me if there is any way of specifying the encoding info while importing the csv file to Excel..The locale settings are all Ok..

I had the same problem when I once worked in Perl where I had to change the encoding of Chinese strings (UTF8 encoded)from Simplified Chinese to Simplified Chinese GBK (code page 936)for the Chinese characters to be displayed properly in the Windows and DOS console. I realised that the encoding used by MSDOS and Windows Default encoding chosen for UTF8 strings, were different.And then I found the solution in the CodePage module..I think this info will be useful coz the problems in both the cases are similar..

Do correct me if I’ve understood anything wrongly…I need a solution for overcoming this problem asap..Thanks…

Yoon Kit said...

> Why isn't everything still not in UTF-8 ?

dunno. people like latin i guess...

> I used both schell>mysql < dump.sql
> and mysql>source dump.sql
> but I get garbage !

Thats strange. it worked for me. Have you made sure that you have set your default-character-set=utf8 on the server?

> Now My problem is that I need to
> force the action of the
> File Conversion Wizard in .NET
> programmatically so that the right
> language encoding is chosen..

Sorry dude. This one worked for me
in OpenOffice, and I was happy...
I dont know how to get it to work
for MSOffice, let alone programatically
in .NET.

If you do find out, do tell.


Anonymous said...

in ms excel you have to use the save as with type "unicode text (*.txt) - then it will be saved in utf8 but tab-separated. I hope that helps ...

Jean said...

I have the same problem whith Posgrasql and chienese Char !

I thik the only way is to use OpenOfice.


KoS said...

I had the same problem that excel always imports CSV files as ASCII encoded files.

You can circumvent this problem by using a simple html file that contains only a table and the right meta information in the head, e.g.
meta http-equiv=Content-Type content="text/html; charset=utf-8"

works like a charm with excel!


Andy said...

if anyone is still having this problem, the solution is to add the Byte order mark at the top of the file (utf8 with signature) as described here:

Ming Zhu said...

Thank you so much yk. This is exactly the problem I am seeking answers for.

Ondrej said...

I tried to export data from excel to csv in utf8 encoding. Excel 2007 cannot do this (same as excel 2000).
I tried the open office and it worked.
Thank you for your help.

richerpics said...

I've been having a similar problem: I'm using Excel 2003 and after some searching found that when you go to [save as]/[Tools]/[Web Options]/[Encoding] you can select UTF-8 as the standard to save the document

Nathaniel said...

@richerpics - Even if I follow this procedure, the document doesn't seem to save as UTF-8.

One solution to this problem is to save the file as a .txt file then open it with notepad and save as UTF-8.

آرش رنجبران said...

You can upload your Excel file to google docs and download it as CSV.
It works perfect!!!

(Shame on Excel!)

VitaminLab said...

Rename the .csv to .txt then open in Excel and it gives you the dialog to select UTF-8. Works in Office 7.

Major said...
This comment has been removed by the author.
Major said...

I've solved this using a macro. Excel doesn't appear to support UTF8 as standard from what I've read, so I've done the below (which meets my needs perfectly) as I already needed a macro for what I was doing anyway: