Home > mailing lists

Re: [PATCHES] LIKE indexing - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: [PATCHES] LIKE indexing
Date	August 20, 2001 17:43:51
Msg-id	Pine.LNX.4.30.0108201832370.822-100000@peter.localdomain Whole thread Raw
Responses	Re: [PATCHES] LIKE indexing
List	pgsql-hackers

Tree view

Tom Lane writes:

> How can A = B not imply A LIKE B?

Well, according to my reading of the spec, it apparently can.  Space
padding can be weird that way.  But see below why I think there are much
worse alternatives.

>                  4) If the i-th substring specifier of PCV is neither an
>                    arbitrary character specifier nor an arbitrary string
>                    specifier, then the i-th substring of MCV is equal to
>                    that substring specifier according to the collating
>                    sequence of the <like predicate>, without the appending
>                    of <space> characters to MCV, and has the same length as
>                    that substring specifier.
>
> The bit about "without the appending of <space> characters" *might*
> mean that LIKE is always supposed to treat trailing blanks as
> significant, but I'm not sure.

That's how I read it.

> The text does seem to say that it's okay to add trailing blanks to the
> pattern to produce a match, when the collating sequence is PAD SPACE
> type (bpchar in our terms).

I can't find that.

> In any case, Hiroshi is dead right that LIKE is supposed to perform
> collating-sequence-dependent comparison,

As I have answered to Hiroshi, I think that would really be brain-dead.
It would alienate LIKE from how pattern matching normally operates.  If we
make the assumption that strcoll(A, B) can be 0 for wildly different
values of A and B (for an appropriate definition of "different"), then the
following things could happen:

-> A = B does not imply A ~ B

-> A LIKE 'foobar%' does not imply A LIKE 'foo%' (because 'foobar' is a
single collating element that sorts like 'xyz').

-> A LIKE '%foo%' does not imply that POSITION('foo' IN A) <> 0  (The SQL
POSITION function does not mention using the collating sequence.)

I'm also quite suspicious about the wording "...and has the same length as
that substring specifier."  For instance, it might be nearly reasonable to
define a German locale where ü (u umlaut) and ue are equivalent.  But then
while 'xüy' = 'xuey' (a strict interpretation of the SQL standard might
deny this because of the padding, but "The result of the comparison of X
and Y is given by the collating sequence CS.", and I define mine that
way), but 'xüy' NOT LIKE 'xuey' because of that rule.  Voilà, it can
happen after all.

I think this rule is a mistake designed by committee and must be struck
down by community. ;-)

> and this probably means that this whole approach is a dead end :-(

Blech... ;-)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

pgsql-hackers by date:

From: Bruce Momjian
Date: 20 August 2001, 16:48:44
Subject: Re: Using textin/textout vs. scribbling around

From: Peter Eisentraut
Date: 20 August 2001, 17:44:54
Subject: Status of ipcclean

Re: [PATCHES] LIKE indexing - Mailing list pgsql-hackers

Previous

Next