Question about Encoding a Custom Type - Mailing list pgsql-hackers

From David E. Wheeler
Subject Question about Encoding a Custom Type
Date
Msg-id 95FC9074-9199-4214-93B8-51B1264DFDD5@kineticode.com
Whole thread Raw
Responses Re: Question about Encoding a Custom Type  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Howdy,

Possibly showing my ignorance here, but as I'm working on updating  
citext to be locale-aware and to work on 8.3, I've run into this  
peculiarity:

try=# \encoding
UTF8
try=# select setting from pg_settings where name = 'lc_collate';
setting
-------------
en_US.UTF-8
(1 row)

try=# create table try (name citext);
try=# insert into try (name) values ('aardvark'), ('AAA');
try=# select name,  name = 'aaa' from try;
name   | ?column?
----------+----------
aardvark | f
AAA      | t
(2 rows)

try=# insert into try (name) values ('aba'), ('ABC'), ('abc');
try=# select name,  name = 'aaa' from try;
name   | ?column?
----------+----------
aardvark | f
AAA      | t
aba      | f
ABC      | f
abc      | f
(5 rows)

try=# insert into try (name) values ('AAAA');
try=# select name,  name = 'aaa' from try;
ERROR:  invalid byte sequence for encoding "UTF8": 0xf6bd
HINT:  This error can also happen if the byte sequence does not match  
the encoding expected by the server, which is controlled by  
"client_encoding".

I've no idea what could be different about 'AAAA' vs. any other value.  
And if I do either of these:

select name,  name = 'aaa'::text from try;
select name,  name::text = 'aaa' from try;

It just works. I'm mystified.

My casts:

CREATE CAST (citext AS text)    WITHOUT FUNCTION AS IMPLICIT;
CREATE CAST (citext AS varchar) WITHOUT FUNCTION AS IMPLICIT;
CREATE CAST (citext AS bpchar)  WITHOUT FUNCTION AS IMPLICIT;
CREATE CAST (text AS citext)    WITHOUT FUNCTION AS ASSIGNMENT;
CREATE CAST (varchar AS citext) WITHOUT FUNCTION AS ASSIGNMENT;
CREATE CAST (bpchar AS citext)  WITHOUT FUNCTION AS ASSIGNMENT;

Question about the code? It's all here (for now):

https://svn.kineticode.com/citext/trunk/

Hrm. Fiddling a bit more, I find that this fails, too:

try=# select citext_smaller( 'aardvark'::citext,  
'AARDVARKasdfasdfasdfasdf'::citext );
ERROR:  invalid byte sequence for encoding "UTF8": 0xc102
HINT:  This error can also happen if the byte sequence does not match  
the encoding expected by the server, which is controlled by  
"client_encoding".

So I guess that something must be up with citext_smaller(). It's quite  
simple, though:

PG_FUNCTION_INFO_V1(citext_smaller);

Datum citext_smaller (PG_FUNCTION_ARGS) {   text * left  = PG_GETARG_TEXT_P(0);   text * right = PG_GETARG_TEXT_P(1);
PG_RETURN_TEXT_P(citextcmp( PG_ARGS ) < 0 ? left : right );
 
}

Context:
 https://svn.kineticode.com/citext/trunk/citext.c

Anyone have any idea? Feedback would be *most* appreciated.

Thanks,

David


pgsql-hackers by date:

Previous
From: Devrim GÜNDÜZ
Date:
Subject: Re: pltcl broken on tcl8.5 ?
Next
From: Andrew Dunstan
Date:
Subject: Re: pltcl broken on tcl8.5 ?