Re: [PATCHES] LIKE indexing - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: [PATCHES] LIKE indexing |
Date | |
Msg-id | Pine.LNX.4.30.0108201832370.822-100000@peter.localdomain Whole thread Raw |
Responses |
Re: [PATCHES] LIKE indexing
|
List | pgsql-hackers |
Tom Lane writes: > How can A = B not imply A LIKE B? Well, according to my reading of the spec, it apparently can. Space padding can be weird that way. But see below why I think there are much worse alternatives. > 4) If the i-th substring specifier of PCV is neither an > arbitrary character specifier nor an arbitrary string > specifier, then the i-th substring of MCV is equal to > that substring specifier according to the collating > sequence of the <like predicate>, without the appending > of <space> characters to MCV, and has the same length as > that substring specifier. > > The bit about "without the appending of <space> characters" *might* > mean that LIKE is always supposed to treat trailing blanks as > significant, but I'm not sure. That's how I read it. > The text does seem to say that it's okay to add trailing blanks to the > pattern to produce a match, when the collating sequence is PAD SPACE > type (bpchar in our terms). I can't find that. > In any case, Hiroshi is dead right that LIKE is supposed to perform > collating-sequence-dependent comparison, As I have answered to Hiroshi, I think that would really be brain-dead. It would alienate LIKE from how pattern matching normally operates. If we make the assumption that strcoll(A, B) can be 0 for wildly different values of A and B (for an appropriate definition of "different"), then the following things could happen: -> A = B does not imply A ~ B -> A LIKE 'foobar%' does not imply A LIKE 'foo%' (because 'foobar' is a single collating element that sorts like 'xyz'). -> A LIKE '%foo%' does not imply that POSITION('foo' IN A) <> 0 (The SQL POSITION function does not mention using the collating sequence.) I'm also quite suspicious about the wording "...and has the same length as that substring specifier." For instance, it might be nearly reasonable to define a German locale where ü (u umlaut) and ue are equivalent. But then while 'xüy' = 'xuey' (a strict interpretation of the SQL standard might deny this because of the padding, but "The result of the comparison of X and Y is given by the collating sequence CS.", and I define mine that way), but 'xüy' NOT LIKE 'xuey' because of that rule. Voilà, it can happen after all. I think this rule is a mistake designed by committee and must be struck down by community. ;-) > and this probably means that this whole approach is a dead end :-( Blech... ;-) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
pgsql-hackers by date: