Thread: Sequence of characters not supported by psql/pg_dump

Sequence of characters not supported by psql/pg_dump

From
Date:
Hello,

    I use PostgreSQL 7.0.2 on linux.
    The base was set with initdb -E UNICODE.

    I have many Strings with accents (french language).
    Some of them aren't supported by queries or pg_dump:

WORKING EXAMPLE:
DB=# select * from element_attribute  where java_lang_string like 'Scholtè_s';
 doc_id | tag_id | java_lang_string | type | java_lang_integer
--------+--------+------------------+------+-------------------
(0
rows)
               

NOT WORKING EXAMPLE (psql still waiting for characters to be entered):
DB=# select * from element_attribute  where java_lang_string like 'Scholtès';
DB'#

ERRORS IN FILE GENERATED BY PG_DUMP (AS A RESULT THE WHOLE TABLE WON'T BE
RESTORED):
2220    3    Faure    java.lang.String    \N
2221    3    Rosières    java.lang.String    \N
2222    3    Rosières    java.lang.String    \N
2223    3    Rosières    java.lang.String    \N
2224    3    Rosières    java.lang.String    \N
2225    3    Rosières    java.lang.String    \N
2226    3    Rosières    java.lang.String    \N
2227    3    Scholtès
2228    3    Scholtès
2229    3    Scholtès
2230    3    Scholtès
2231    3    Scholtès
2232    3    Scholtès
2233    3    Scholtès
2234    3    Scholtès
2235    3    Whirlpool    java.lang.String    \N

    As you can see, the string ended by 'ès' is quite lethal for Postgre.
    The success of pg_dump depends on database content !
    It seems that any string ending by an accent followed by less than 2 characters
is a problem.

    Is it a known problem?
    Is it a database configuration problem or a real bug in Postgre?
    This problem is on my production database. I would appreciate If you could give
me an answer soon on that point.

Thanks in advance.
    Samuel Minne

Re: Sequence of characters not supported by psql/pg_dump

From
Tatsuo Ishii
Date:
>     I use PostgreSQL 7.0.2 on linux.
>     The base was set with initdb -E UNICODE.
>
>     I have many Strings with accents (french language).
>     Some of them aren't supported by queries or pg_dump:
>
> WORKING EXAMPLE:
> DB=# select * from element_attribute  where java_lang_string like 'Scholtè_s';

Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
encoded in UTF-8? It's 2 bytes long and starting with 0xc...
--
Tatsuo Ishii

Re: Sequence of characters not supported by psql/pg_dump

From
Samuel Minne
Date:
Le jeu, 11 jan 2001, Tatsuo Ishii a écrit :
> >     I use PostgreSQL 7.0.2 on linux.
> >     The base was set with initdb -E UNICODE.
> >
> >     I have many Strings with accents (french language).
> >     Some of them aren't supported by queries or pg_dump:
> >
> > WORKING EXAMPLE:
> > DB=# select * from element_attribute  where java_lang_string like 'Scholtè_s';
>
> Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
> encoded in UTF-8? It's 2 bytes long and starting with 0xc...
> --
> Tatsuo Ishii

Sorry for the delay. We had very strong production constraint, and I had to let it go for a while.
Well, How can i know about the internal coding of this letter in the database ?
The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode),
 I can't see any difference, even in the generated dump file.

Today I've seen a post about the same problem, that gives a solution to produce a working dump:  pg_dump -d $dumpfile.

This command produce this kind of lines (you :
INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350');

I noticed that using \xxx notation i can handle special characters in queries from psql command line, too.
But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting.

So, I have the following questions:
- What kind of code is this ?
- can I get the translation chart somewhere ?
- why isn't it UNICODE ?
- why do I have to use  \xxx code (and not 0x..., or directly the special character like "é") from the psql command
line, 
    although it is supposed to support UNICODE ?
- why don't we find this kind of characters in the dump file when the -d  option is not set ?
    (I assume this cause the restore to fail, and I think it could be considered as a bug).

Thank you for your help

Re: Sequence of characters not supported by psql/pg_dump

From
Tatsuo Ishii
Date:
> Le jeu, 11 jan 2001, Tatsuo Ishii a écrit :
> > >     I use PostgreSQL 7.0.2 on linux.
> > >     The base was set with initdb -E UNICODE.
> > >
> > >     I have many Strings with accents (french language).
> > >     Some of them aren't supported by queries or pg_dump:
> > >
> > > WORKING EXAMPLE:
> > > DB=# select * from element_attribute  where java_lang_string like 'Scholtè_s';
> >
> > Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
> > encoded in UTF-8? It's 2 bytes long and starting with 0xc...
> > --
> > Tatsuo Ishii
>
> Sorry for the delay. We had very strong production constraint, and I had to let it go for a while.
> Well, How can i know about the internal coding of this letter in the database ?
> The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode),
>  I can't see any difference, even in the generated dump file.
>
> Today I've seen a post about the same problem, that gives a solution to produce a working dump:  pg_dump -d
$dumpfile.
>
> This command produce this kind of lines (you :
> INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350');
>
> I noticed that using \xxx notation i can handle special characters in queries from psql command line, too.
> But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting.
>
> So, I have the following questions:
> - What kind of code is this ?

Probably ISO 8859-1.

> - can I get the translation chart somewhere ?
> - why isn't it UNICODE ?

Because you didn't input as UTF-8.

> - why do I have to use  \xxx code (and not 0x..., or directly the special character like "é") from the psql command
line, 
>     although it is supposed to support UNICODE ?
> - why don't we find this kind of characters in the dump file when the -d  option is not set ?
>     (I assume this cause the restore to fail, and I think it could be considered as a bug).
>
> Thank you for your help

In the releases prior 7.1, you need to input UTF-8 explicitely. 7.1
has the ability that does automatic encoding conversion between ISO
8859-1 and UTF-8. That means, if you type in characters in ISO 8859-1,
PostgreSQL will convert it to UTF-8 then store into the database.
--
Tatsuo Ishii