Vacuuming unicode database - Mailing list pgsql-general

From Tambet Matiisen
Subject Vacuuming unicode database
Date
Msg-id 81132473206F3A46A72BD6116E1A06AE1B15C8@black.aprote.com
Whole thread Raw
Responses Re: Vacuuming unicode database  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
My Postgres databases used to have default (SQL_ASCII) encoding. I could
store any 8-bit character in it regardless of actual charset, because
all clients also used default encoding and no charset conversion was
done.

Now we started a new project with Qt3 and it's Postgres driver defaults
to UNICODE client encoding. This time charset conversion comes to play
and messes up all 8-bit characters. This forces me to create all
databases with correct charsets. So I recreated my database with UNICODE
encoding and imported all data in it, paying attention to client
encoding. I didn't re-initdb whole cluster, only recreated problematic
database. Everything seems to work fine, only I can't vacuum the
database:

epos=# vacuum verbose analyze;
INFO:  --Relation pg_catalog.pg_description--
INFO:  Pages 18: Changed 0, Empty 0; Tup 1895: Vac 0, Keep 0, UnUsed 5.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_16416--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing pg_catalog.pg_description
INFO:  --Relation pg_catalog.pg_group--
INFO:  Pages 1: Changed 0, Empty 0; Tup 31: Vac 0, Keep 0, UnUsed 31.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_1261--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing pg_catalog.pg_group
ERROR:  Invalid UNICODE character sequence found (0xdc6b)

Table pg_group is giving errors, because I have group name with 8-bit
characters in it. As I understand, groups are common for all databases
and pg_group is created during initdb, so it should be considered having
SQL_ASCII charset, not UNICODE. Seems like a bug to me?

What would you suggest in this case:
1. Re-initdb with UNICODE encoding and recreate all databases. Basically
all databases should have the same encoding.
2. Use some single-byte encoding for database, instead of UNICODE.
Vacuuming wouldn't complain any more, but I have some doubts that CREATE
GROUP "Name with 8-bit characters" behaves differently depending on
encoding of the active database.

  Tambet

pgsql-general by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Commercial support?
Next
From: Dennis Gearon
Date:
Subject: Re: Sorting Problem