Re: Queries with Regular Expressions - Mailing list pgsql-general

From John D. Burger
Subject Re: Queries with Regular Expressions
Date
Msg-id eba60a0256eed477b6be464374a7594c@mitre.org
Whole thread Raw
In response to Queries with Regular Expressions  ("Silas Justiniano" <silasju@gmail.com>)
Responses Re: Queries with Regular Expressions  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Queries with Regular Expressions  ("Silas Justiniano" <silasju@gmail.com>)
List pgsql-general
> But I just can't make it work correctly using brackets:
> SELECT field FROM table WHERE field ~* 'ch[aã]o';
>
> It just returns tuples that have 'chao', but not 'chão'.
>
> My queries are utf-8 an the database is SQL_ASCII.

I suspect the bracketed expression is turning into [aXY], where XY is
the two-byte sequence corresponding to ã in UTF8.  So the regular
expression is only going to match strings of the form chao, chXo and
chYo.  To make sure that this is what's happening, try this:

   select length('ã');

I bet you get back 2, not 1.  I don't know if a UTF8 database will
handle this correctly or not.  The safest thing to do may be to use
queries like this:

   SELECT field FROM table WHERE field ~* 'ch(a|ã)o';

- John D. Burger
   MITRE

pgsql-general by date:

Previous
From: David Gama Rodrí­guez
Date:
Subject: %Re: % tsearch gendict
Next
From: Scott Ribe
Date:
Subject: Re: "Upcalls" (sort of) from the database