Thread: Another encoding issue

Another encoding issue

From
Gavin Sherry
Date:
Hi all,

Here's another interesting encoding issue. I cannot recall having seen it
on the lists.

---
[swm@laptop build7]$ bin/createdb -E LATIN1 test
CREATE DATABASE
[swm@laptop build7]$ cat break.sh
dat=`echo -en "\245\241"`

echo "create table test (d text);"
echo "insert into test values('$dat');"
[swm@laptop build7]$ sh break.sh | bin/psql test
CREATE TABLE
INSERT 0 1
[swm@laptop build7]$ bin/createdb -T test test2
CREATE DATABASE
[swm@laptop build7]$ bin/createdb -T test -E UTF-8 test2
CREATE DATABASE
[swm@laptop build7]$ bin/pg_dump -C test2 > test2.dmp
[swm@laptop build7]$ bin/dropdb test2
DROP DATABASE
[swm@laptop build7]$ bin/psql template1 -f test2.dmp
SET
SET
SET
CREATE DATABASE
ALTER DATABASE
You are now connected to database "test2".
[...]
CREATE TABLE
ALTER TABLE
psql:test2.dmp:345: ERROR:  invalid UTF-8 byte sequence detected near byte
0xa5
CONTEXT:  COPY test, line 1, column d: "  "
[...]
---

Until createdb() is a lot more sophisticated, we cannot translate
characters between encodings. I don't think this is a huge issue though,
as most people are only going to be creating empty databases anyway.
Still, it probably requires documentation.

Thoughts?

Thanks,

Gavin


Re: Another encoding issue

From
Christopher Kings-Lynne
Date:
If we're bringing up odd encoding issues, why not talk about the mystery 
encoding of the shared catalogs? :)

Basically depending on which database you're logged into when you alter 
a catalog will affect what encoding the new object appears as in the 
shared catalog.

This for one makes it impossible for us in phpPgAdmin to display a list 
of databases, where some database names are in EUC and some are in UTF-8 
and some are in LATIN5...

I bring it up as I notice that in MySQL 5 at least, all system object 
names (in our case that'd be all strings in the shared catalogs) are 
stored in UTF-8, always.

Chris


Gavin Sherry wrote:
> Hi all,
> 
> Here's another interesting encoding issue. I cannot recall having seen it
> on the lists.
> 
> ---
> [swm@laptop build7]$ bin/createdb -E LATIN1 test
> CREATE DATABASE
> [swm@laptop build7]$ cat break.sh
> dat=`echo -en "\245\241"`
> 
> echo "create table test (d text);"
> echo "insert into test values('$dat');"
> [swm@laptop build7]$ sh break.sh | bin/psql test
> CREATE TABLE
> INSERT 0 1
> [swm@laptop build7]$ bin/createdb -T test test2
> CREATE DATABASE
> [swm@laptop build7]$ bin/createdb -T test -E UTF-8 test2
> CREATE DATABASE
> [swm@laptop build7]$ bin/pg_dump -C test2 > test2.dmp
> [swm@laptop build7]$ bin/dropdb test2
> DROP DATABASE
> [swm@laptop build7]$ bin/psql template1 -f test2.dmp
> SET
> SET
> SET
> CREATE DATABASE
> ALTER DATABASE
> You are now connected to database "test2".
> [...]
> CREATE TABLE
> ALTER TABLE
> psql:test2.dmp:345: ERROR:  invalid UTF-8 byte sequence detected near byte
> 0xa5
> CONTEXT:  COPY test, line 1, column d: "  "
> [...]
> ---
> 
> Until createdb() is a lot more sophisticated, we cannot translate
> characters between encodings. I don't think this is a huge issue though,
> as most people are only going to be creating empty databases anyway.
> Still, it probably requires documentation.
> 
> Thoughts?
> 
> Thanks,
> 
> Gavin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq



Re: Another encoding issue

From
Tom Lane
Date:
Gavin Sherry <swm@linuxworld.com.au> writes:
> Here's another interesting encoding issue. I cannot recall having seen it
> on the lists.

This problem has been mentioned before, eg here
http://archives.postgresql.org/pgsql-hackers/2005-03/msg01004.php
(that whole thread is relevant to the problem).

But I agree it's not well documented.
        regards, tom lane