Thread: PostgreSQL 6.4.2 locale regexp and like problem

PostgreSQL 6.4.2 locale regexp and like problem

From
Petr Hubeny
Date:
Hi,

I've recently found an interesting problem with using czech locale
and the regular expressions matching beginning of the line. I traced
the problem and found it is caused by the optimalization within
function makeIndexable, especially in the way the string match_most is
constructed. What happens? The expression

text ~ '^regexp'

is rewritten into

( text ~ '^regexp' ) AND ( text >= 'regexp' ) AND ( text >= 'regexp\377' )

( N.B.: The same applies for expression "text LIKE 'match%'". )

HOWEVER, in czech locale is 'regexp\377' < 'regexp' ! So the expression
is doomed to be false.

So I'd like to ask you: Is there any general (read: locale independent)
way to create a 'match_most' string that would allow for such optimalization?

Thanks for your patience,

Psh

--

Mgr. Petr Hubený                ICQ UIN: 12472987


Re: [BUGS] PostgreSQL 6.4.2 locale regexp and like problem

From
Petr Hubeny
Date:
Regarding my previous post:

> HOWEVER, in czech locale is 'regexp\377' < 'regexp' ! So the expression
> is doomed to be false.

Sorry, in czech locale is 'regexp\377' > 'regexp',
the problem is that 'r\377' < 'rA', 'r\377' < 'ra' and even 'r\377' < 'r0'.
So there is slight chance that the expression will not be false :-)

Psh

--

Mgr. Petr Hubený                ICQ UIN: 12472987