Re: UTF-8 encoding problem w/ libpq - Mailing list pgsql-hackers

From ktm@rice.edu
Subject Re: UTF-8 encoding problem w/ libpq
Date
Msg-id 20130603144759.GG2892@aart.rice.edu
Whole thread Raw
In response to UTF-8 encoding problem w/ libpq  (Martin Schäfer <Martin.Schaefer@cadcorp.com>)
Responses Re: UTF-8 encoding problem w/ libpq
List pgsql-hackers
On Mon, Jun 03, 2013 at 03:40:14PM +0100, Martin Schäfer wrote:
> I try to create database columns with umlauts, using the UTF8 client encoding. However, the server seems to mess up
thecolumn names. In particular, it seems to perform a lowercase operation on each byte of the UTF-8 multi-byte
sequence.
> 
> Here is my code:
> 
>     const wchar_t *strName = L"id_äß";
>     wstring strCreate = wstring(L"create table test_umlaut(") + strName + L" integer primary key)";
> 
>     PGconn *pConn = PQsetdbLogin("", "", NULL, NULL, "dev503", "postgres", "******");
>     if (!pConn) FAIL;
>     if (PQsetClientEncoding(pConn, "UTF-8")) FAIL;
> 
>     PGresult *pResult = PQexec(pConn, "drop table test_umlaut");
>     if (pResult) PQclear(pResult);
> 
>     pResult = PQexec(pConn, ToUtf8(strCreate.c_str()).c_str());
>     if (pResult) PQclear(pResult);
> 
>     pResult = PQexec(pConn, "select * from test_umlaut");
>     if (!pResult) FAIL;
>     if (PQresultStatus(pResult)!=PGRES_TUPLES_OK) FAIL;
>     if (PQnfields(pResult)!=1) FAIL;
>     const char *fName = PQfname(pResult,0);
> 
>     ShowW("Name:     ", strName);
>     ShowA("in UTF8:  ", ToUtf8(strName).c_str());
>     ShowA("from DB:  ", fName);
>     ShowW("in UTF16: ", ToWide(fName).c_str());
> 
>     PQclear(pResult);
>     PQreset(pConn);
> 
> (ShowA/W call OutputDebugStringA/W, and ToUtf8/ToWide use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8.)
> 
> And this is the output generated:
> 
> Name:     id_äß
> in UTF8:  id_äß
> from DB:  id_ã¤ãÿ
> in UTF16: id_???
> 
> It seems like the backend thinks the name is in ANSI encoding, not in UTF-8.
> If I change the strCreate query and add double quotes around the column name, then the problem disappears. But the
originalname is already in lowercase, so I think it should also work without quoting the column name.
 
> Am I missing some setup in either the database or in the use of libpq?
> 
> I’m using PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
> 
> The database uses:
> ENCODING = 'UTF8'
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
> 
> Thanks for any help,
> 
> Martin
> 

Hi Martin,

If you do not want the lowercase behavior, you must put double-quotes around the
column name per the documentation:

http://www.postgresql.org/docs/9.2/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

section 4.1.1.

Regards,
Ken



pgsql-hackers by date:

Previous
From: Martin Schäfer
Date:
Subject: UTF-8 encoding problem w/ libpq
Next
From: "Ben Zeev, Lior"
Date:
Subject: Re: PostgreSQL Process memory architecture