Re: text and bytea - Mailing list pgsql-general

From hernan gonzalez
Subject Re: text and bytea
Date
Msg-id 48692c2d0802250852w7401bdc4i6f420d4eecf3e933@mail.gmail.com
Whole thread Raw
In response to Re: text and bytea  ("hernan gonzalez" <hgonzalez@gmail.com>)
Responses Re: text and bytea
List pgsql-general
Another example  (Psotgresql 8.3.0, UTF-8  server/client encoding)

test=# create table chartest ( c text);
test=# insert into chartest (c) values ('¡Hasta mañana!');
test=# create view vchartest as
 select encode(convert_to(c,'LATIN9'),'escape') as c1 from chartest;

test=# select c,octet_length(c) from chartest ;
       c        | octet_length
----------------+--------------
 ¡Hasta mañana! |           16

test=# select c1,octet_length(c1) from vchartest ;
      c1      | octet_length
--------------+--------------
 Hasta maana! |           14

(the field is seen as as text by postgresql, with the default
encoding.. UTF8; it is actually not)

test=# select * from vchartest where c1 like '%a%';
      c1
--------------
 Hasta maana!
(1 row)

test=# select * from vchartest where c1 ilike '%a%';
ERROR:  invalid byte sequence for encoding "UTF8": 0xa1
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

That "ilike" breaks and "like" is rather random, it seems that the
later has some optimization does not check the validty of the utf8
stream. But thats not the point. The point is that IMO postgresql
should always handle text in the backend encoding, there should no
exists funcions that are designed to produce/consume texts in other
encodings. Perhaps the "encode" function is ill defined, and should be
rethinked. Two alternatives:
1. For special binary-to-ascii encodings (base64,hex). Keep its
present signature but remove/deprecate the "escape" type. It returns a
text in the backend encoding.
2  For arbitrary binary encodings. Change its signature so that it
returns bytea.
Of course, all this applies symmetrically to decode().

Appart, and in accordance with this, I think to_ascii() should accept
only one text argument.

pgsql-general by date:

Previous
From: Gregory Stark
Date:
Subject: Re: process pool
Next
From: Enrico Sirola
Date:
Subject: Re: APEX / HTML DB for PostgreSQL