Thread: LIKE indexing
Here's the patch for review. It adds the non-locale operator classes for all three character types and the analogous selectivity estimation changes. Basically, I'm confident this works, but as some people seem to have doubts I show it here first. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Attachment
I'm fairly close to committing a wholesale rearrangement of pg_opclass and friends (per previous discussions, mostly with Oleg). This is going to create some conflicts with your patch :-(. Who gets to go first? regards, tom lane
Tom Lane writes: > I'm fairly close to committing a wholesale rearrangement of pg_opclass > and friends (per previous discussions, mostly with Oleg). This is going > to create some conflicts with your patch :-(. Who gets to go first? If your changes only conflict in the system catalog headers I can redo those when you're done. (Presuming there's documentation coming along.) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > If your changes only conflict in the system catalog headers I can redo > those when you're done. (Presuming there's documentation coming along.) Actually, all those uses of op_class() are going to conflict too... op_class doesn't need an AM OID parameter anymore (and besides which, I renamed it to op_in_opclass). If you feel ready to commit, do so, and I'll do the merge. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > Here's the patch for review. A few gripes: + The optimizer can also use a B-Tree index for queries involving the + pattern matching operators <literal>LIKE</>, + <literal>ILIKE</literal>, <literal>~</literal>, and + <literal>~*</literal>, <emphasis>if</emphasis> the pattern is + anchored to the beginning of the string, e.g., <literal>col LIKE + 'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not + <literal>col LIKE 'bar'</literal>. However, if your server does The "but not" part is wrong: col LIKE 'bar' works perfectly fine as an indexable LIKE query. Perhaps you meant "but not col LIKE '%foo'". While it's okay to treat text and varchar alike, I object to treating bpchar as equivalent to the other two. Shouldn't the bpchar versions of these functions strip trailing spaces before comparing? Seems to me you should provide "$<>$" operators for completeness, even though they're not essential for btree opclasses. I think that these operators may be useful for more than just this one purpose, so we shouldn't set up artificial roadblocks. I don't like the fact that you added expected-output rows to opr_sanity; seems like tweaking the queries to allow $<$ etc as expected names would be more appropriate. regards, tom lane
Tom Lane writes: > The "but not" part is wrong: col LIKE 'bar' works perfectly fine as > an indexable LIKE query. Perhaps you meant "but not col LIKE '%foo'". Thanks. That was a mixup with the POSIX regexp style. > While it's okay to treat text and varchar alike, I object to treating > bpchar as equivalent to the other two. Shouldn't the bpchar versions of > these functions strip trailing spaces before comparing? I had thought a long time about this and I couldn't see a reason why. The reason is that the LIKE operator for bpchar does take the blanks into account, so it effectively doesn't care whether the blanks are the result of padding or explicit input. E.g., peter=# set enable_indexscan to off; SET VARIABLE peter=# create table test1 (a char(5)); CREATE peter=# insert into test1 values ('four'); INSERT 16560 1 peter=# select * from test1 where a like 'four'::bpchar; a --- (0 rows) /* * If we had stripped spaces here we would have gotten a false positive. */ peter=# select * from test1 where a like 'fou_'::bpchar; a --- (0 rows) /* * Since the padding here is after the wildcard character and is thus * stripped in the analysis, the augmented expression still holds. */ peter=# select * from test1 where a like 'fou%'::bpchar; a ------- four (1 row) /* same here */ I would also argue that the notion of a direct binary comparision would not benefit from space stripping. > Seems to me you should provide "$<>$" operators for completeness, even > though they're not essential for btree opclasses. Will do. > I don't like the fact that you added expected-output rows to opr_sanity; > seems like tweaking the queries to allow $<$ etc as expected names would > be more appropriate. Ok. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > peter=# create table test1 (a char(5)); > CREATE > peter=# insert into test1 values ('four'); > INSERT 16560 1 > peter=# select * from test1 where a like 'four'::bpchar; > a > --- > (0 rows) I think this is an erroneous result, actually, seeing as how regression=# select 'four '::bpchar = 'four'::bpchar; ?column? ---------- t (1 row) How can A = B not imply A LIKE B? (This may be related to Hiroshi's concerns.) I dug into the spec to see what it has to say, and came up with this rather opaque prose: 4) If the i-th substring specifier of PCV is neither an arbitrary character specifier nor an arbitrary string specifier, then the i-th substring of MCV is equal to that substring specifier according to the collating sequence of the <like predicate>, without the appending of <space> characters to MCV, and has the same length as that substring specifier. The bit about "without the appending of <space> characters" *might* mean that LIKE is always supposed to treat trailing blanks as significant, but I'm not sure. The text does seem to say that it's okay to add trailing blanks to the pattern to produce a match, when the collating sequence is PAD SPACE type (bpchar in our terms). In any case, Hiroshi is dead right that LIKE is supposed to perform collating-sequence-dependent comparison, and this probably means that this whole approach is a dead end :-( regards, tom lane