Thread: Character encoding in database dumps
Hi there, My database was created with Unicode encoding, as a fair number of the artists and exhibitions have French names, and thus include accented characters. I thought I'd solved all the various problems this generates, but have just discovered one new one. When I do a dump of the database to back it up (because I'm paranoid and don't want my web host to be my only source of security on this matter), the accented characters don't come through correctly. They all display correctly in phpPgAdmin, and in the PHP pages I've made to search and display the data - it's just in dumps that they're garbled. I presume this means the Unicode encoding is somehow being lost and the data read as standard ASCII. I'm currently using phpPgAdmin to export the data, and it gives me a choice of downloading it as a .sql file or displaying it in the browser window, but either way, the special characters are toasted. Does anyone have any idea how I can do a database dump that keeps the character encoding intact? Thanks, Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Hi Lynna, Have you tried restoring your data from the dumps ? I use Latin1 encoding with my dbs. All russian characters are a mess in the dump files and in the psql terminal as well. However, they restore, display and distribute ok, even into win-platform. Out of curiosity, I renamed one dump.file to dump.html and opened it with a browser and set the browser encoding to cyrillic. Characters display ok. So could this be more an issue with the client application's ability to use proper encoding for displaying the characters right ? Dunno for sure, just my thought out of some experience with cyrillic and Pg. Encodings are a bit of jungle, it seems. BR, Aarni On Wednesday 09 June 2004 21:11, you wrote: > Hi there, > > My database was created with Unicode encoding, as a fair number of the > artists and exhibitions have French names, and thus include accented > characters. I thought I'd solved all the various problems this generates, > but have just discovered one new one. > > When I do a dump of the database to back it up (because I'm paranoid and > don't want my web host to be my only source of security on this matter), > the accented characters don't come through correctly. They all display > correctly in phpPgAdmin, and in the PHP pages I've made to search and > display the data - it's just in dumps that they're garbled. I presume this > means the Unicode encoding is somehow being lost and the data read as > standard ASCII. > > I'm currently using phpPgAdmin to export the data, and it gives me a choice > of downloading it as a .sql file or displaying it in the browser window, > but either way, the special characters are toasted. Does anyone have any > idea how I can do a database dump that keeps the character encoding intact? > > Thanks, > > Lynna -- ------------------------------------------------- Aarni Ruuhimäki ------------------------------------------------- This is a bugfree broadcast to you from a linux system.
В Срд, 09.06.2004, в 20:11, Lynna Landstreet пишет: > Hi there, > > My database was created with Unicode encoding, as a fair number of the > artists and exhibitions have French names, and thus include accented > characters. I thought I'd solved all the various problems this generates, > but have just discovered one new one. > > When I do a dump of the database to back it up (because I'm paranoid and > don't want my web host to be my only source of security on this matter), the > accented characters don't come through correctly. They all display correctly > in phpPgAdmin, and in the PHP pages I've made to search and display the data > - it's just in dumps that they're garbled. I presume this means the Unicode > encoding is somehow being lost and the data read as standard ASCII. > > I'm currently using phpPgAdmin to export the data, and it gives me a choice > of downloading it as a .sql file or displaying it in the browser window, but > either way, the special characters are toasted. Does anyone have any idea > how I can do a database dump that keeps the character encoding intact? How do you determine that the character encoding is wrong? Maybe whatever it is you use to look at the dump just doesn't interpret the data as UTF-8. -- Markus Bertheau <twanger@bluetwanger.de>
on 6/10/04 8:19 AM, Aarni Ruuhimäki at aarni@kymi.com wrote: > Have you tried restoring your data from the dumps ? No - I wanted to, but my web host's control panel appears to be down so I can't create a new database to restore it into, and I don't want to risk restoring it into the existing database in case it screws everything up. And the dump file was still messed up when I tried viewing it in my browser, as you suggest. However, I found the answer. I should have actually figured it out sooner, because I originally had the same problem in reverse when trying to get data *into* the database: FTP programs tend to send text files in ASCII format, which doesn't support Unicode. When I uploaded the text files to \copy my data into the database from, I had to make sure the FTP program used binary mode, and in order to make it do that, I had to change the file extension to something my internet preferences did not brand as ASCII automatically. Turned out to be what was happening here as well. I had to skip using phpPgAdmin's Export function and just use pg_dump via a shell connection, and name the dump something that did not end in .txt or .sql or anything like that, and then download it as binary, and *then* open it in BBEdit with the "Read as" option set to UTF-8. Once I did that, all my special characters were OK. Major headache, though. I'm hoping as things evolve, Unicode support will be built into more programs including those where it's currently not thought necessary like FTP programs. People need to learn that ASCII is *not* necessarily the correct format for all text files... Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
On Fri, Jun 11, 2004 at 15:06:08 -0400, Lynna Landstreet <lynna@gallery44.org> wrote: > > Major headache, though. I'm hoping as things evolve, Unicode support will be > built into more programs including those where it's currently not thought > necessary like FTP programs. People need to learn that ASCII is *not* > necessarily the correct format for all text files... The ascii mode in ftp is for files that need to be converted to or from using LF to end lines from or to using CRLF to end lines. Normally you want to do binary transfers.
>Does anyone have any idea >how I can do a database dump that keeps the character encoding intact? If you can access your database directly with a TCP/IP PostgreSQL client then you can use Eduphant which supports copy to stdout/stdin. This will allow you to copy directly to your local computer in 1 step. <http://aliacta.com/download> Cheers, Marc
On Sat, 12 Jun 2004, M. Bastin wrote: > >Does anyone have any idea > >how I can do a database dump that keeps the character encoding intact? [ Sorry, idle mind. :-) ] Reading a bunch of these messages over the last while, it appears that the database will only support a single character encoding internally/natively. Maybe I am wrong, and maybe this will change in the future. But, if you need to interact in multiple character sets now, this seems to be something which you are now doing with a front-end of some kind. Perhaps the thing to do, is to add a field to your tables (or make them into 2 column arrays?) involving character data, where this new field (or 0'th array element?) stores the character set used when the data was originally input. If a front-end makes a query involving character data, it gets back the character set involved and the character data. Then the front-end has to deal with translating from one character set to another. Just my $0.02 (CDN) Gord
on 6/11/04 3:33 PM, Bruno Wolff III at bruno@wolff.to wrote: >> Major headache, though. I'm hoping as things evolve, Unicode support will be >> built into more programs including those where it's currently not thought >> necessary like FTP programs. People need to learn that ASCII is *not* >> necessarily the correct format for all text files... > > The ascii mode in ftp is for files that need to be converted to or from > using LF to end lines from or to using CRLF to end lines. Normally you want > to do binary transfers. I'm used to having to use ASCII on anything remotely CGI-related because normally if I do binary transfers for any sort of CGI file it breaks the script. But maybe that is the LF issue - I know MacOS, Windows and UNIX all use different sorts of line returns and since CGIs are usually running on UNIX systems, Mac or Windows line returns will mess everything up. I don't know if PHP scripts are as sensitive... But the bigger problem is that it seems like these days most FTP programs - as well as browsers and text editors that have some FTP functionality - automatically set the transfer mode based on the file extension (and text editors like BBEdit *only* have ASCII transfer mode, presumably on the basis that, hey, it's text, and text *always* means ASCII, right?). Some will allow you to override that, some won't. That's why I had to change the extension on my text data files and database dumps in order for a binary transfer to even be possible. It's really annoying - I hate software that thinks it knows what I'm trying to do better than I do. Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org