Thread: Regular expressions w localization problem

Regular expressions w localization problem

From
Manuel Sugawara
Date:
I'm facing a problem I have seen before but never got the time to
report until now that is biting me again. I have a table which has
some CHECK constrains using regular expressions and one check is
failing when I try to insert some valid data. The constraint
definition is:

«nombre propio válido» CHECK (nombre ~ '^[[:upper:]][[:lower:]]*([-\'. [:alpha:]]+)?$'::text)

And the data is ``José Luis''. Using a SELECT with the expression and
the data leads true as expected:

masm=# select 'José Luis' ~ '^[[:upper:]][[:lower:]]*([-\'. [:alpha:]]+)?$';
 ?column?
----------
 t
(1 fila)

However when I try to insert one tuple using the same data it fails:

masm=# INSERT INTO persona.persona (nombre, fecha_nacimiento, sexo) VALUES ('José Luis', CAST('1979-06-15' AS date),
'm');
ERROR:  el nuevo registro para la relación «persona» viola la restricción check «nombre propio válido»

Wired. A workaround that used to work was to put the select statement
and then the insert:

select 'José Luis' ~ '^[[:upper:]][[:lower:]]*([-\'. [:alpha:]]+)?$';
INSERT INTO persona.persona (nombre, fecha_nacimiento, sexo) VALUES ('José Luis', CAST('1979-06-15' AS date), 'm');

but it doesn't any more :-(. Btw, this is PostgreSQL 8.0.4, running on
Fedora Core 4.

Best Regards,
Manuel.

Re: Regular expressions w localization problem

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> I'm facing a problem I have seen before but never got the time to
> report until now that is biting me again. I have a table which has
> some CHECK constrains using regular expressions and one check is
> failing when I try to insert some valid data. The constraint
> definition is:

> �nombre propio v�lido� CHECK (nombre ~ '^[[:upper:]][[:lower:]]*([-\'. [:alpha:]]+)?$'::text)

> And the data is ``Jos� Luis''.

I believe the meanings of [[:upper:]] and friends depend on the database
locale, and the current code for them probably doesn't work at all in
multibyte database encodings either.  So, what locale and encoding
are you using?

            regards, tom lane

Re: Regular expressions w localization problem

From
Manuel Sugawara
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I believe the meanings of [[:upper:]] and friends depend on the
> database locale,

Indeed they depend on the locale settings, that's why this used to
work.

> and the current code for them probably doesn't work at all in
> multibyte database encodings either.  So, what locale and encoding
> are you using?

The locale is es_MX and encoding latin1. (UTF-8 doesn't play very well
with regular expressions :-().

Regards,
Manuel.