Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing? - Mailing list pgsql-general

From Tom Lane
Subject Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Date
Msg-id 495.941820396@sss.pgh.pa.us
Whole thread Raw
In response to Re: [GENERAL] indexed regex select optimisation missing?  ("Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>)
Responses Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?  (Charles Tassell <ctassell@isn.net>)
Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?  (Stuart Woolford <stuartw@newmail.net>)
List pgsql-general
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
> Reviewing my email logs from June, most of the work on this has to do with
> people who needs locales, and potentially multibyte character sets. Tom
> Lane is of the opinion that this particular optimization needs to be moved
> out of the parser, and deeper into the planner or optimizer/rewriter,
> so a good fix may be some ways out.

Actually, that part is already done: addition of the index-enabling
comparisons is gone from the parser and is now done in the optimizer,
which has a whole bunch of benefits (one being that the comparison
clauses don't get added to the query unless they are actually used
with an index!).

But the underlying LOCALE problem still remains: I don't know a good
character-set-independent method for generating a "just a little bit
larger" string to use as the righthand limit.  If anyone out there is
an expert on foreign and multibyte character sets, some help would
be appreciated.  Basically, given that we know the LIKE or regex
pattern can only match values beginning with FOO, we want to generate
string comparisons that select out the range of values that begin with
FOO (or, at worst, a slightly larger range).  In USASCII locale it's not
hard: you can do
    field >= 'FOO' AND field < 'FOP'
but it's not immediately obvious how to make this idea work reliably
in the presence of odd collation orders or multibyte characters...

BTW: the \377 hack is actually wrong for USASCII too, since it'll
exclude a data value like 'FOO\377x' which should be included.

            regards, tom lane

pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] indexed regex select optimisation missing?
Next
From: The Hermit Hacker
Date:
Subject: PostgreSQL v6.5.3 Released