Thread: What type of index do I need for this JOIN?

What type of index do I need for this JOIN?

From
Robert James
Date:
I'm doing a JOIN which is very slow:

JOIN t ON t.f1 LIKE (q.f1 || '%')

t1 has an INDEX on (f1, f2) which I thought would help for this.  But
Postgres seems to still use a (very slow) Nested Loop.  What type of
index would be appropriate for this?

(My goal is to join on a substring starting from the first character)


Re: What type of index do I need for this JOIN?

From
Craig Ringer
Date:
On 08/14/2013 06:05 AM, Robert James wrote:
> I'm doing a JOIN which is very slow:
>
> JOIN t ON t.f1 LIKE (q.f1 || '%')
>
> t1 has an INDEX on (f1, f2) which I thought would help for this.  But
> Postgres seems to still use a (very slow) Nested Loop.  What type of
> index would be appropriate for this?

You'll need a text_pattern_ops index.

http://www.postgresql.org/docs/current/static/indexes-opclass.html

http://www.postgresql.org/docs/current/static/indexes-types.html

"The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE and ~ if the pattern is a constant and
is anchored to the beginning of the string — for example, col LIKE
'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your
database does not use the C locale you will need to create the index
with a special operator class to support indexing of pattern-matching
queries; see Section 11.9 below."

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: What type of index do I need for this JOIN?

From
Robert James
Date:
On 8/13/13, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 08/14/2013 06:05 AM, Robert James wrote:
>> I'm doing a JOIN which is very slow:
>>
>> JOIN t ON t.f1 LIKE (q.f1 || '%')
>>
>> t1 has an INDEX on (f1, f2) which I thought would help for this.  But
>> Postgres seems to still use a (very slow) Nested Loop.  What type of
>> index would be appropriate for this?
>
> You'll need a text_pattern_ops index.
>
> "The optimizer can also use a B-tree index for queries involving the
> pattern matching operators LIKE and ~ if the pattern is a constant and
> is anchored to the beginning of the string — for example, col LIKE
> 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your
> database does not use the C locale you will need to create the index
> with a special operator class to support indexing of pattern-matching
> queries; see Section 11.9 below."

I'm confused: What's the difference between
  col LIKE  'foo%'
and
  col LIKE f1 || '%'
?
Either way, it's anchored to the beginning of the string.

And, being that there is a difference (ie the pattern needs to be
constant): How will text_pattern_ops help? The only difference I see
is that text_patterns_ops is locale sensitive, needed for locale other
than C.


Re: What type of index do I need for this JOIN?

From
John R Pierce
Date:
On 8/13/2013 8:34 PM, Robert James wrote:
> I'm confused: What's the difference between
>    col LIKE  'foo%'
> and
>    col LIKE f1 || '%'
> ?
> Either way, it's anchored to the beginning of the string.

the first one can be analyzed by the planner, the latter would be much
harder to pre-figure.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: What type of index do I need for this JOIN?

From
Kevin Grittner
Date:
Robert James <srobertjames@gmail.com> wrote:

> I'm confused: What's the difference between
>   col LIKE  'foo%'
> and
>   col LIKE f1 || '%'
> ?

The planner knows that 'foo%' doesn't start with a wildcard.

> Either way, it's anchored to the beginning of the string.

Not necessarily.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: What type of index do I need for this JOIN?

From
Robert James
Date:
On 8/14/13, Kevin Grittner <kgrittn@ymail.com> wrote:
> Robert James <srobertjames@gmail.com> wrote:
>
>> I'm confused: What's the difference between
>>   col LIKE  'foo%'
>> and
>>   col LIKE f1 || '%'
>> ?
>
> The planner knows that 'foo%' doesn't start with a wildcard.
>
>> Either way, it's anchored to the beginning of the string.
>
> Not necessarily.

I see, yes, f1 might include a wildcard.

Is there a way to escape f1 so that wildcards aren't interpreted?
That's anyway the behavior I need, of course.  And will that help the
planner?

What type of index will help the planner here?


Re: What type of index do I need for this JOIN?

From
Kevin Grittner
Date:
Robert James <srobertjames@gmail.com> wrote:
> On 8/14/13, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Robert James <srobertjames@gmail.com> wrote:
>>
>>> I'm confused: What's the difference between
>>>   col LIKE  'foo%'
>>> and
>>>   col LIKE f1 || '%'
>>> ?
>>
>> The planner knows that 'foo%' doesn't start with a wildcard.
>>
>>> Either way, it's anchored to the beginning of the string.
>>
>> Not necessarily.
>
> I see, yes, f1 might include a wildcard.

Exactly.

> Is there a way to escape f1 so that wildcards aren't interpreted?
> That's anyway the behavior I need, of course.  And will that help
> the planner?

I don't think there is any way to get that to work in general, and
in particular you are trying to have the right-hand side of the
LIKE treated as having escapes; it would be a lot to ask of the
planner to somehow recognize that part of the result of the
(concatenation) expression should be treated as escaped and part
not.

> What type of index will help the planner here?

Well, with the query as you have it, you might get a trigram index
to be of some help, but I think you might want to give up on LIKE
or regular expressions.  Perhaps you could do a range test
directly, rather than wrangling the wildcards:

  (col >= f1 AND col <= (f1 || 'zzz'))

You would probably want to write a function to calculate that
ending value; I'm just trying to give a rough suggestion here.
The idea is to avoid the danger of wildcards in the f1 values by
constructing the range without scanning for special characters and
basing the test on that.

You could *also* do the LIKE test if desired.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company