Thread: Re: citext like searches using index

Re: citext like searches using index

From
"David E. Wheeler"
Date:
On Mar 17, 2013, at 6:35 AM, Thorbjørn Weidemann <thorbjoern@weidemann.name> wrote:

> Hi David,
>
> I found your email-address on http://www.postgresql.org/docs/9.2/static/citext.html. I hope it's ok to contact you
thisway. 
> I would like to thank you for taking the time to make citext available for Postgres, and I hope you can help me with
thisproblem. 
>
> In my workplace we are considering using citext in a Progress -> Postgresql conversion that is underway. The Progress
databasealways searches case-insensitively. 
>
> Simply creating a normal btree index on a citext column makes = searches use the index, but I have not been able to
createan index that can be used for searches like 
> select <column> from <table> where <column> LIKE 'ide%';
>
> During this investigation I found out that even for varchar columns I have to append the varchar_pattern_ops opclass
tothe column when creating the index for it to be used for LIKE searches. But there is no citext_pattern_ops opclass. 
>
> Is there currently any way to create an index that can be used to speed up searches like the one above?
> If not, do you have any idea how it might be implemented? Perhaps I could give it a try myself.
>
> Thank you in advance for any suggestions you might have.

I would think that text_pattern_ops would work, no?

Best,

David





Re: citext like searches using index

From
"David E. Wheeler"
Date:
On Mar 20, 2013, at 1:45 AM, David E. Wheeler <david@kineticode.com> wrote:

>> Is there currently any way to create an index that can be used to speed up searches like the one above?
>> If not, do you have any idea how it might be implemented? Perhaps I could give it a try myself.
>>
>> Thank you in advance for any suggestions you might have.
>
> I would think that text_pattern_ops would work, no?

Thorbjørn replied to me privately in the negative.

Hackers, what would be required to get an index on a CITEXT column to support LIKE?

Thanks,

David




Re: citext like searches using index

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> Hackers, what would be required to get an index on a CITEXT column to support LIKE?

The LIKE index optimization is hard-wired into
match_special_index_operator(), which never heard of citext's ~~
operators.

I've wanted for years to replace that mechanism with something that
would support plug-in extensions, but have no very good idea how to
do it.

A bigger problem though is that the LIKE optimization is generally
pretty ineffective for ILIKE (which is what you're really asking for
here) because we can't assume that both case versions are consecutive
in the index.  I think the optimization just punts upon seeing any
letter anyway, if the operator is ILIKE.

Or in short: fixing this is a research problem.
        regards, tom lane



Re: citext like searches using index

From
Andres Freund
Date:
On 2013-03-30 23:35:24 -0400, Tom Lane wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
> > Hackers, what would be required to get an index on a CITEXT column to support LIKE?
> 
> The LIKE index optimization is hard-wired into
> match_special_index_operator(), which never heard of citext's ~~
> operators.
> 
> I've wanted for years to replace that mechanism with something that
> would support plug-in extensions, but have no very good idea how to
> do it.

> A bigger problem though is that the LIKE optimization is generally
> pretty ineffective for ILIKE (which is what you're really asking for
> here) because we can't assume that both case versions are consecutive
> in the index.  I think the optimization just punts upon seeing any
> letter anyway, if the operator is ILIKE.

I think the most realistic way to get this part - while being far from easy -
is to support case-insensitive locales. Then it would "only" need some
magic to link a normal locale to its case-insensitive locale which then
could be used to check for the presence of an appropriate index.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: citext like searches using index

From
Peter Eisentraut
Date:
On 3/30/13 11:35 PM, Tom Lane wrote:
> The LIKE index optimization is hard-wired into
> match_special_index_operator(), which never heard of citext's ~~
> operators.
> 
> I've wanted for years to replace that mechanism with something that
> would support plug-in extensions, but have no very good idea how to
> do it.

I have been thinking there should be a GiST index that associates with
the pattern matching operators.  I haven't worked out the details, but
at least this seems it might be the right framework to solve this problem.




Re: citext like searches using index

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 3/30/13 11:35 PM, Tom Lane wrote:
>> The LIKE index optimization is hard-wired into
>> match_special_index_operator(), which never heard of citext's ~~
>> operators.
>> 
>> I've wanted for years to replace that mechanism with something that
>> would support plug-in extensions, but have no very good idea how to
>> do it.

> I have been thinking there should be a GiST index that associates with
> the pattern matching operators.  I haven't worked out the details, but
> at least this seems it might be the right framework to solve this problem.

No, not really.  You can already build things like that with GiST, see
pg_trgm (which already does LIKE and will soon be able to do regexes).

The issue with the LIKE special case is that left-anchored patterns
are (to some extent) indexable with ordinary btree indexes, and so we
want to exploit that rather than tell people they have to have a whole
other index.  It doesn't make sense IMO to try to mark ~~ as a btree
operator, because btree doesn't really have any ability for
operator-specific code to do what would have to be done.  What does make
sense is for the planner to understand how to extract a btree-indexable
clause out of what's in the query, as that (a) keeps the complexity
out of the run-time machinery, and (b) provides an opportunity for the
planner to estimate whether the whole thing is worth the trouble or
not, which it frequently isn't in LIKE cases.

So it's a pretty special case, but there are just enough instances of it
to wish for some not-so-hard-wired way to deal with it.
        regards, tom lane



Re: citext like searches using index

From
Peter Eisentraut
Date:
On 4/2/13 10:26 AM, Tom Lane wrote:
> The issue with the LIKE special case is that left-anchored patterns
> are (to some extent) indexable with ordinary btree indexes, and so we
> want to exploit that rather than tell people they have to have a whole
> other index.

In practice, you need an index specifically for pattern matching anyway.The difference would be that instead of using a
differentoperator
 
class that has no recorded association with the original operator, you'd
use a different access method that is associated with the operator in
normal ways.

> So it's a pretty special case, but there are just enough instances of it
> to wish for some not-so-hard-wired way to deal with it.

Are there any widely known non-built-in cases besides citext?
Presumably the reason you use citext is that you use a lot of letters.
So I kind of doubt that there are going to be a lot of cases of pattern
searches on citext with left-anchored patterns starting with (a
sufficient number of) non-letters.  It's possible, of course, but
doesn't seem important enough by itself.



Re: citext like searches using index

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On 4/2/13 10:26 AM, Tom Lane wrote:
>> The issue with the LIKE special case is that left-anchored patterns
>> are (to some extent) indexable with ordinary btree indexes, and so we
>> want to exploit that rather than tell people they have to have a whole
>> other index.

> In practice, you need an index specifically for pattern matching anyway.

Well, you do if you want to search for general patterns, but there's
been enough interest in the LIKE optimization as-is to make me think
we could not get away with removing it.

>> So it's a pretty special case, but there are just enough instances of it
>> to wish for some not-so-hard-wired way to deal with it.

> Are there any widely known non-built-in cases besides citext?

Well, indxpath.c knows about text LIKE and network subset operators,
and it would be nice if it knew how to do the same type of optimization
for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
doesn't seem implementable in the current infrastructure because ranges
aren't all built-in types.  I agree that the citext case isn't too
compelling in isolation, but surely the range case is interesting.
        regards, tom lane



Re: citext like searches using index

From
"David E. Wheeler"
Date:
On Apr 2, 2013, at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Are there any widely known non-built-in cases besides citext?
> 
> Well, indxpath.c knows about text LIKE and network subset operators,
> and it would be nice if it knew how to do the same type of optimization
> for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
> doesn't seem implementable in the current infrastructure because ranges
> aren't all built-in types.  I agree that the citext case isn't too
> compelling in isolation, but surely the range case is interesting.

Is this knowledge encapsulated in a to-do?

David




Re: citext like searches using index

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Apr 2, 2013, at 8:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Are there any widely known non-built-in cases besides citext?

>> Well, indxpath.c knows about text LIKE and network subset operators,
>> and it would be nice if it knew how to do the same type of optimization
>> for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
>> doesn't seem implementable in the current infrastructure because ranges
>> aren't all built-in types.  I agree that the citext case isn't too
>> compelling in isolation, but surely the range case is interesting.

On further reflection, I withdraw the claim that range-inclusion
couldn't be implemented in the current design.  Although the various
range types might not be built-in, the "anyelement <@ anyrange" operator
*is* built-in, so its OID could be added to the switch statements in
indxpath.c.  I don't think it'd be terribly difficult to then add
datatype-agnostic code to pry apart the range value and construct a
derived "btree_indexed_col >= lowbound AND btree_indexed_col <= highbound"
indexclause.  But this could not be extended to citext or other plugin
extensions, because their operators don't have hard-wired OIDs.

Anyway, even if the specific claim about ranges is bogus, I still think
there are enough data points to justify the idea that a more extensible
mechanism would be worth having.  At the same time, there's a reason
why it's not yet got to the top of anyone's priority list.

> Is this knowledge encapsulated in a to-do?

I added an item to the "Indexes" section of the TODO page.
        regards, tom lane



Re: citext like searches using index

From
"David E. Wheeler"
Date:
On Apr 2, 2013, at 10:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Is this knowledge encapsulated in a to-do?
> 
> I added an item to the "Indexes" section of the TODO page.

Great, thanks.

David