Thread: Sequence of characters not supported by psql/pg_dump
Hello, I use PostgreSQL 7.0.2 on linux. The base was set with initdb -E UNICODE. I have many Strings with accents (french language). Some of them aren't supported by queries or pg_dump: WORKING EXAMPLE: DB=# select * from element_attribute where java_lang_string like 'Scholtè_s'; doc_id | tag_id | java_lang_string | type | java_lang_integer --------+--------+------------------+------+------------------- (0 rows) NOT WORKING EXAMPLE (psql still waiting for characters to be entered): DB=# select * from element_attribute where java_lang_string like 'Scholtès'; DB'# ERRORS IN FILE GENERATED BY PG_DUMP (AS A RESULT THE WHOLE TABLE WON'T BE RESTORED): 2220 3 Faure java.lang.String \N 2221 3 Rosières java.lang.String \N 2222 3 Rosières java.lang.String \N 2223 3 Rosières java.lang.String \N 2224 3 Rosières java.lang.String \N 2225 3 Rosières java.lang.String \N 2226 3 Rosières java.lang.String \N 2227 3 Scholtès 2228 3 Scholtès 2229 3 Scholtès 2230 3 Scholtès 2231 3 Scholtès 2232 3 Scholtès 2233 3 Scholtès 2234 3 Scholtès 2235 3 Whirlpool java.lang.String \N As you can see, the string ended by 'ès' is quite lethal for Postgre. The success of pg_dump depends on database content ! It seems that any string ending by an accent followed by less than 2 characters is a problem. Is it a known problem? Is it a database configuration problem or a real bug in Postgre? This problem is on my production database. I would appreciate If you could give me an answer soon on that point. Thanks in advance. Samuel Minne
> I use PostgreSQL 7.0.2 on linux. > The base was set with initdb -E UNICODE. > > I have many Strings with accents (french language). > Some of them aren't supported by queries or pg_dump: > > WORKING EXAMPLE: > DB=# select * from element_attribute where java_lang_string like 'Scholtè_s'; Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is encoded in UTF-8? It's 2 bytes long and starting with 0xc... -- Tatsuo Ishii
Le jeu, 11 jan 2001, Tatsuo Ishii a écrit : > > I use PostgreSQL 7.0.2 on linux. > > The base was set with initdb -E UNICODE. > > > > I have many Strings with accents (french language). > > Some of them aren't supported by queries or pg_dump: > > > > WORKING EXAMPLE: > > DB=# select * from element_attribute where java_lang_string like 'Scholtè_s'; > > Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is > encoded in UTF-8? It's 2 bytes long and starting with 0xc... > -- > Tatsuo Ishii Sorry for the delay. We had very strong production constraint, and I had to let it go for a while. Well, How can i know about the internal coding of this letter in the database ? The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode), I can't see any difference, even in the generated dump file. Today I've seen a post about the same problem, that gives a solution to produce a working dump: pg_dump -d $dumpfile. This command produce this kind of lines (you : INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350'); I noticed that using \xxx notation i can handle special characters in queries from psql command line, too. But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting. So, I have the following questions: - What kind of code is this ? - can I get the translation chart somewhere ? - why isn't it UNICODE ? - why do I have to use \xxx code (and not 0x..., or directly the special character like "é") from the psql command line, although it is supposed to support UNICODE ? - why don't we find this kind of characters in the dump file when the -d option is not set ? (I assume this cause the restore to fail, and I think it could be considered as a bug). Thank you for your help
> Le jeu, 11 jan 2001, Tatsuo Ishii a écrit : > > > I use PostgreSQL 7.0.2 on linux. > > > The base was set with initdb -E UNICODE. > > > > > > I have many Strings with accents (french language). > > > Some of them aren't supported by queries or pg_dump: > > > > > > WORKING EXAMPLE: > > > DB=# select * from element_attribute where java_lang_string like 'Scholtè_s'; > > > > Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is > > encoded in UTF-8? It's 2 bytes long and starting with 0xc... > > -- > > Tatsuo Ishii > > Sorry for the delay. We had very strong production constraint, and I had to let it go for a while. > Well, How can i know about the internal coding of this letter in the database ? > The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode), > I can't see any difference, even in the generated dump file. > > Today I've seen a post about the same problem, that gives a solution to produce a working dump: pg_dump -d $dumpfile. > > This command produce this kind of lines (you : > INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350'); > > I noticed that using \xxx notation i can handle special characters in queries from psql command line, too. > But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting. > > So, I have the following questions: > - What kind of code is this ? Probably ISO 8859-1. > - can I get the translation chart somewhere ? > - why isn't it UNICODE ? Because you didn't input as UTF-8. > - why do I have to use \xxx code (and not 0x..., or directly the special character like "é") from the psql command line, > although it is supposed to support UNICODE ? > - why don't we find this kind of characters in the dump file when the -d option is not set ? > (I assume this cause the restore to fail, and I think it could be considered as a bug). > > Thank you for your help In the releases prior 7.1, you need to input UTF-8 explicitely. 7.1 has the ability that does automatic encoding conversion between ISO 8859-1 and UTF-8. That means, if you type in characters in ISO 8859-1, PostgreSQL will convert it to UTF-8 then store into the database. -- Tatsuo Ishii