Database object names and libpq in UTF-8 locale on Windows - Mailing list pgsql-hackers

From Sebastien FLAESCH
Subject Database object names and libpq in UTF-8 locale on Windows
Date
Msg-id 507FBA9A.6000701@4js.com
Whole thread Raw
Responses Re: Database object names and libpq in UTF-8 locale on Windows  (Sebastien FLAESCH <sf@4js.com>)
List pgsql-hackers
Hello,

Using PostgreSQL 9.2.1 on Windows, I am facing a strange character set issue
with a UTF-8 database.

Maybe this is expected, but want to be sure that I am not missing something.

On Windows, I have created a database with:
  ENCODING = 'UTF-8'  LC_COLLATE = 'English_United States.1252'  LC_CTYPE = 'English_United States.1252'

I am using libpq with a simple C program.
The program handles UTF-8 strings.

While it's not a traditional "UNICODE" Windows application using WCHAR, it
should be possible to send and get back UTF-8 data with PostgreSQL.

Interacting with the Windows system in the system locale is another story, but
from a pure C / SQL / libpq point of view, as long as the PostgreSQL client
encoding is properly defined to UTF-8, it should work, and it does mainly:

- I can use UTF-8 string constants in my queries.
- I can pass UTF-8 data to the database with parameterized queries.
- I can fetch UTF-8 data from the database.
- I can create db object names with UTF-8 characters.

But the db object names must be specified with double quotes:
When I do not use quoted db object names, I get a strange problem.
The table is created, I can use it in my program, I can even use it in the
pgAdmin query tool, but in the pgAdmin db browser, there is no table name
displayed in the treeview...

Further, when selecting schema information from pg_class, I can see that the
table exists, but there is nothing displayed in the relname attribute...

It appears that the problem disappears when using a "C" collation are char type:
  ENCODING = 'UTF-8'  LC_COLLATE = 'C'  LC_CTYPE = 'C'

I suspect this has something to do with the fact that non-quoted identifiers
are converted to lowercase, and because my LC_CTYPE is English_United States.1252,
the conversion to lowercase fails...

But:

- why does PostgreSQL accept to create the table with invalid UTF-8 characters?

- why can I make queries in the query tool with the UTF-8 table name?
(note that show client_encoding in the query tool gives me "UNICODE" - is this
the same as "UTF-8"?)


So is there a bug, or do I have to use a C collation and char type for UTF-8
databases on Windows?

I know that UTF-8 is not supported by the Windows C/POSIX library (setlocale,
and co).

Does it mean that it's not realistic to use UTF-8 encoding for PostgreSQL
databases on Windows...?

Thanks for reading.
Sebastien FLAESCH
Four Js Development Tools



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Next
From: Kyotaro HORIGUCHI
Date:
Subject: [BUG] False indication in pg_stat_replication.sync_state