Re: Database object names and libpq in UTF-8 locale on Windows - Mailing list pgsql-hackers

From Sebastien FLAESCH
Subject Re: Database object names and libpq in UTF-8 locale on Windows
Date
Msg-id 50857A22.4010405@4js.com
Whole thread Raw
In response to Database object names and libpq in UTF-8 locale on Windows  (Sebastien FLAESCH <sf@4js.com>)
Responses Re: Database object names and libpq in UTF-8 locale on Windows  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Hi all,
I would appreciate some help or comments on this topic...
Is this the wrong mailing list to ask such question?
Seb

On 10/18/2012 10:15 AM, Sebastien FLAESCH wrote:
> Hello,
>
> Using PostgreSQL 9.2.1 on Windows, I am facing a strange character set
> issue
> with a UTF-8 database.
>
> Maybe this is expected, but want to be sure that I am not missing
> something.
>
> On Windows, I have created a database with:
>
> ENCODING = 'UTF-8'
> LC_COLLATE = 'English_United States.1252'
> LC_CTYPE = 'English_United States.1252'
>
> I am using libpq with a simple C program.
> The program handles UTF-8 strings.
>
> While it's not a traditional "UNICODE" Windows application using WCHAR, it
> should be possible to send and get back UTF-8 data with PostgreSQL.
>
> Interacting with the Windows system in the system locale is another
> story, but
> from a pure C / SQL / libpq point of view, as long as the PostgreSQL client
> encoding is properly defined to UTF-8, it should work, and it does mainly:
>
> - I can use UTF-8 string constants in my queries.
> - I can pass UTF-8 data to the database with parameterized queries.
> - I can fetch UTF-8 data from the database.
> - I can create db object names with UTF-8 characters.
>
> But the db object names must be specified with double quotes:
> When I do not use quoted db object names, I get a strange problem.
> The table is created, I can use it in my program, I can even use it in the
> pgAdmin query tool, but in the pgAdmin db browser, there is no table name
> displayed in the treeview...
>
> Further, when selecting schema information from pg_class, I can see that
> the
> table exists, but there is nothing displayed in the relname attribute...
>
> It appears that the problem disappears when using a "C" collation are
> char type:
>
> ENCODING = 'UTF-8'
> LC_COLLATE = 'C'
> LC_CTYPE = 'C'
>
> I suspect this has something to do with the fact that non-quoted
> identifiers
> are converted to lowercase, and because my LC_CTYPE is English_United
> States.1252,
> the conversion to lowercase fails...
>
> But:
>
> - why does PostgreSQL accept to create the table with invalid UTF-8
> characters?
>
> - why can I make queries in the query tool with the UTF-8 table name?
> (note that show client_encoding in the query tool gives me "UNICODE" -
> is this
> the same as "UTF-8"?)
>
>
> So is there a bug, or do I have to use a C collation and char type for
> UTF-8
> databases on Windows?
>
> I know that UTF-8 is not supported by the Windows C/POSIX library
> (setlocale,
> and co).
>
> Does it mean that it's not realistic to use UTF-8 encoding for PostgreSQL
> databases on Windows...?
>
> Thanks for reading.
> Sebastien FLAESCH
> Four Js Development Tools
>
>




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [PATCH] Support for Array ELEMENT Foreign Keys
Next
From: Stephen Frost
Date:
Subject: Re: Successor of MD5 authentication, let's use SCRAM