Re: [PATCHES] LIKE indexing - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [PATCHES] LIKE indexing
Date
Msg-id Pine.LNX.4.30.0108201832370.822-100000@peter.localdomain
Whole thread Raw
Responses Re: [PATCHES] LIKE indexing
List pgsql-hackers
Tom Lane writes:

> How can A = B not imply A LIKE B?

Well, according to my reading of the spec, it apparently can.  Space
padding can be weird that way.  But see below why I think there are much
worse alternatives.

>                  4) If the i-th substring specifier of PCV is neither an
>                    arbitrary character specifier nor an arbitrary string
>                    specifier, then the i-th substring of MCV is equal to
>                    that substring specifier according to the collating
>                    sequence of the <like predicate>, without the appending
>                    of <space> characters to MCV, and has the same length as
>                    that substring specifier.
>
> The bit about "without the appending of <space> characters" *might*
> mean that LIKE is always supposed to treat trailing blanks as
> significant, but I'm not sure.

That's how I read it.

> The text does seem to say that it's okay to add trailing blanks to the
> pattern to produce a match, when the collating sequence is PAD SPACE
> type (bpchar in our terms).

I can't find that.

> In any case, Hiroshi is dead right that LIKE is supposed to perform
> collating-sequence-dependent comparison,

As I have answered to Hiroshi, I think that would really be brain-dead.
It would alienate LIKE from how pattern matching normally operates.  If we
make the assumption that strcoll(A, B) can be 0 for wildly different
values of A and B (for an appropriate definition of "different"), then the
following things could happen:

-> A = B does not imply A ~ B

-> A LIKE 'foobar%' does not imply A LIKE 'foo%' (because 'foobar' is a
single collating element that sorts like 'xyz').

-> A LIKE '%foo%' does not imply that POSITION('foo' IN A) <> 0  (The SQL
POSITION function does not mention using the collating sequence.)

I'm also quite suspicious about the wording "...and has the same length as
that substring specifier."  For instance, it might be nearly reasonable to
define a German locale where ü (u umlaut) and ue are equivalent.  But then
while 'xüy' = 'xuey' (a strict interpretation of the SQL standard might
deny this because of the padding, but "The result of the comparison of X
and Y is given by the collating sequence CS.", and I define mine that
way), but 'xüy' NOT LIKE 'xuey' because of that rule.  Voilà, it can
happen after all.

I think this rule is a mistake designed by committee and must be struck
down by community. ;-)

> and this probably means that this whole approach is a dead end :-(

Blech... ;-)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Using textin/textout vs. scribbling around
Next
From: Peter Eisentraut
Date:
Subject: Status of ipcclean