Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS
Date
Msg-id BANLkTimjbSEFTsqOVgRgvgr+KRnzg2BZCw@mail.gmail.com
Whole thread Raw
In response to Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS
List pgsql-hackers


On Wed, Jun 8, 2011 at 6:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
2011/6/7 Jeevan Chalke <jeevan.chalke@enterprisedb.com>:
> since we smash the identifier to lower case using
> downcase_truncate_identifier() function, the solution is to make this
> function should be wide-char aware, like LOWER() function functionality.
>
> I see some discussion related to downcase_truncate_identifier() and
> wide-char aware function, but seems like we lost somewhere.
> (http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
> This invalid byte sequence issue seems like a more serious issue, because it
> might lead e.g to pg_dump failures.

It's a problem, but without an efficient algorithm for Unicode case
folding, any fix we attempt to implement seems like it'll just be
moving the problem around.

Agree.

I read on other mail thread that str_tolower() is a  wide-character-aware lower function but it is also a collation-aware and hence might change its behaviour wrt change in locale. However, Tom suggested that we need to have non-locale-dependent case folding algorithm.

But still for same locale on same machine, where we can able to create a table, insert some data, we cannot retrieve it. Don't you think it is more serious and we need a quick solution here? As said earlier it may even lead to pg_dump failures. Given that str_tolower() functionality is locale dependent but still it will resolve this particular issue. Not sure, there might be a performance issue but at-least we are not giving an error.

Please excuse me, if community already had a lot of discussion and kept this behaviour intentionally knowing all these errors and serious issues.

Thanks



--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: WALInsertLock contention
Next
From: Merlin Moncure
Date:
Subject: Re: literature on write-ahead logging