Re: [HACKERS] another locale problem - Mailing list pgsql-hackers

From Daniel Kalchev
Subject Re: [HACKERS] another locale problem
Date
Msg-id 199906111638.TAA10681@dcave.digsys.bg
Whole thread Raw
In response to Re: [HACKERS] another locale problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] another locale problem
List pgsql-hackers
>>>Tom Lane said:> Daniel Kalchev <daniel@digsys.bg> writes:> > To summarize the problem. If key contains (equivalent
cyrillic>> letters) 'ABC', 'ABCD', 'DAB' and 'ABX' and the query is:> > > SELECT key FROM t WHERE key ~* '^AB';> > >
indexscan will be used and the correct tuples ('ABC', 'ABCD' and> > 'ABX') will be returned. If the query is> > >
SELECTkey FROM t WHERE key ~* '^ab';> > > index scan will be used and no tuples will be returned.> > Hm.  Is it
possiblethat isalpha() is doing the wrong thing on your> machine?  makeIndexable() currently assumes that isalpha()
returnstrue> for any character that is subject to case conversion, but I wonder> whether that's a good enough test.
 

In fact, after giving it some though... the expression in gram.y

                    (strcmp(opname,"~*")
== 0 && isalpha(n->val.val.str[pos])))

is wrong. The statement in my view decides that a regular expression is not 
indexable if it contains special characters or if it contains non-alpha 
characters. Therefore, the statement should be written as:
                    (strcmp(opname,"~*")
== 0 && !isalpha((unsigned char)n->val.val.str[pos])))

(two fixes :) This makes indexes work for '^abc' (lowercase ASCII). But does 
not find anything, which means regex does not work. It does not work for both 
ASCII and non-ASCII text/patterns. :-(
> The other possibility is that regexp's internal handling of> case-insensitive matching is not right.

I believe it to be terribly wrong, and some releases ago it worked with 8-bit 
characters by just compiling it with -funsigned-char. Now this breaks things...

Daniel



pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] "DML"
Next
From: Don Baccus
Date:
Subject: Re: [HACKERS] "DML"