Thread: OR with multi-key indexes

OR with multi-key indexes

From
Bruce Momjian
Date:
Here is a comment in path/indxpath.c that says they don't want to use
multi-key indexes with OR clauses.  Of course, we now support multi-key
indexes, and this code was disabled anyway because it was broken.  (In
fact, it was disabled by having SingleAttributeIndex() always return
false.

Is there any reason we can't use multi-key indexes if the first key
matches our OR column?  I don't see why not.  Also, I don't know how to
handle the case where we specify the first key of a multi-key index in
an AND clause, and specify the second key in an OR clause because the
AND's are handled in a separate are of the code.  Any ideas how to
implement this?  (Of course, the code would still use the index for the
AND, but I don't know how to bring the AND case into my OR index clause
processing area.)

---------------------------------------------------------------------------


    /*
     * 1. If this index has only one key, try matching it against
     * subclauses of an 'or' clause.  The fields of the clauseinfo nodes
     * are marked with lists of the matching indices no path are actually
     * created.
     *
     * XXX NOTE:  Currently btrees dos not support indices with > 1 key, so
     * the following test will always be true for now but we have decided
     * not to support index-scans on disjunction . -- lp
     */
    if (SingleAttributeIndex(index))
    {

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
>
> Here is a comment in path/indxpath.c that says they don't want to use
> multi-key indexes with OR clauses.  Of course, we now support multi-key
> indexes, and this code was disabled anyway because it was broken.  (In
> fact, it was disabled by having SingleAttributeIndex() always return
> false.
>
> Is there any reason we can't use multi-key indexes if the first key
> matches our OR column?  I don't see why not.  Also, I don't know how to
                          ^^^^^^^^^^^^^^^^^^^
Me too.

> handle the case where we specify the first key of a multi-key index in
> an AND clause, and specify the second key in an OR clause because the

What do you mean? Example please...

> AND's are handled in a separate are of the code.  Any ideas how to
> implement this?  (Of course, the code would still use the index for the
> AND, but I don't know how to bring the AND case into my OR index clause
> processing area.)

Vadim

Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> > Here is a comment in path/indxpath.c that says they don't want to use
> > multi-key indexes with OR clauses.  Of course, we now support multi-key
> > indexes, and this code was disabled anyway because it was broken.  (In
> > fact, it was disabled by having SingleAttributeIndex() always return
> > false.
> >
> > Is there any reason we can't use multi-key indexes if the first key
> > matches our OR column?  I don't see why not.  Also, I don't know how to
>                           ^^^^^^^^^^^^^^^^^^^
> Me too.

Good.  Code restriction removed.

>
> > handle the case where we specify the first key of a multi-key index in
> > an AND clause, and specify the second key in an OR clause because the
>
> What do you mean? Example please...
>
> > AND's are handled in a separate are of the code.  Any ideas how to
> > implement this?  (Of course, the code would still use the index for the
> > AND, but I don't know how to bring the AND case into my OR index clause
> > processing area.)

    create table test (x int4, y int4);
    create index i_test on test(x,y);
    insert into test values(1,2);
    select * from test where x=3 and (y=1 or y=2);

This is going to use the i_test index, but only with key x=3, and do a
scan of the index looking for y=1 or y=2, and will not use the second
key of the index.

I don't know how to pass information to the OR code so it will know the
first part of the index is matched, and it can compare the second key.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
>
>         create table test (x int4, y int4);
>         create index i_test on test(x,y);
>         insert into test values(1,2);
>         select * from test where x=3 and (y=1 or y=2);
>
> This is going to use the i_test index, but only with key x=3, and do a
> scan of the index looking for y=1 or y=2, and will not use the second
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Server will fetch heap tuple for each tuple with x = 3
returned by index access methods and call ExecQual...

> key of the index.

There are two ways.

I. Rewrite this into

(x = 3 and y = 1) or (x = 3 and y = 2)

and scan index twice using both keys for finding index tuples.

II. Extend multi-key indexing: (y = 1 or y = 2) could be
qualified by index access methods itself because of Y is
one of index keys. Only first key would be used for finding
index tuples but additional qualification could decrease
number of heap_fetch calls and this would be nice!

This feature would be also usefull for:

create index on table (a,b,c);
select * from table where a = 1 and c = 2;
                                    ^^^^^
    additional qualification would be performed on index level

Personally, I would like to see II implemented first because
of it works for both query examples.

Vadim

Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> >         create table test (x int4, y int4);
> >         create index i_test on test(x,y);
> >         insert into test values(1,2);
> >         select * from test where x=3 and (y=1 or y=2);
> >
> > This is going to use the i_test index, but only with key x=3, and do a
> > scan of the index looking for y=1 or y=2, and will not use the second
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Server will fetch heap tuple for each tuple with x = 3
> returned by index access methods and call ExecQual...
>
> > key of the index.
>
> There are two ways.
>
> I. Rewrite this into
>
> (x = 3 and y = 1) or (x = 3 and y = 2)
>
> and scan index twice using both keys for finding index tuples.

Yes.  One way to do this may be to have OR processed AFTER AND in the
optimizer, and use the existing index matches to look for matches
further down the key.  I am not sure where the actual multiple index
plans are created, so I am not sure how to implement this.

> II. Extend multi-key indexing: (y = 1 or y = 2) could be
> qualified by index access methods itself because of Y is
> one of index keys. Only first key would be used for finding
> index tuples but additional qualification could decrease
> number of heap_fetch calls and this would be nice!

I see what you are saying.  In the first case, you look for key1,key2a,
then for key1,key2b, etc.  In the second case, you look for key1, then
sitting on key1, you can get key2a, key2b, etc, without having to find
key1 for every OR.

> This feature would be also usefull for:
>
> create index on table (a,b,c);
> select * from table where a = 1 and c = 2;
>                                     ^^^^^
>     additional qualification would be performed on index level
>
> Personally, I would like to see II implemented first because
> of it works for both query examples.

Doesn't the existing code already use both keys in the above query.
What is gained by moving this to the index access methods?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
>
> > >         create index i_test on test(x,y);
> > >         insert into test values(1,2);
> > >         select * from test where x=3 and (y=1 or y=2);
>
> > II. Extend multi-key indexing: (y = 1 or y = 2) could be
> > qualified by index access methods itself because of Y is
> > one of index keys. Only first key would be used for finding
> > index tuples but additional qualification could decrease
> > number of heap_fetch calls and this would be nice!
>
> > This feature would be also usefull for:
> >
> > create index on table (a,b,c);
> > select * from table where a = 1 and c = 2;
> >                                     ^^^^^
> >     additional qualification would be performed on index level
> >
> > Personally, I would like to see II implemented first because
> > of it works for both query examples.
>
> Doesn't the existing code already use both keys in the above query.
> What is gained by moving this to the index access methods?

I hadn't time to implement this year ago...

Let's say we have 1000 tuples with a = 1 and only 10 with
a = 1 and c = 2 - currently, all 1000 index tuples will be returned
to Executor and all corresponding 1000 heap tuples will be fetched...
Having this feature, only 10 index tuples would be returned
and heap_fetch would be called 10 times only.

Vadim

Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> > > >         create index i_test on test(x,y);
> > > >         insert into test values(1,2);
> > > >         select * from test where x=3 and (y=1 or y=2);
> >
> > > II. Extend multi-key indexing: (y = 1 or y = 2) could be
> > > qualified by index access methods itself because of Y is
> > > one of index keys. Only first key would be used for finding
> > > index tuples but additional qualification could decrease
> > > number of heap_fetch calls and this would be nice!
> >
> > > This feature would be also usefull for:
> > >
> > > create index on table (a,b,c);
> > > select * from table where a = 1 and c = 2;
> > >                                     ^^^^^
> > >     additional qualification would be performed on index level
> > >
> > > Personally, I would like to see II implemented first because
> > > of it works for both query examples.
> >
> > Doesn't the existing code already use both keys in the above query.
> > What is gained by moving this to the index access methods?
>
> I hadn't time to implement this year ago...
>
> Let's say we have 1000 tuples with a = 1 and only 10 with
> a = 1 and c = 2 - currently, all 1000 index tuples will be returned
> to Executor and all corresponding 1000 heap tuples will be fetched...
> Having this feature, only 10 index tuples would be returned
> and heap_fetch would be called 10 times only.

OK, stupid question, but why do we have multi-key indexes then?  Just to
allow UNIQUE index failure?

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
Vadim Mikheev
Date:
Bruce Momjian wrote:
>
> > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be
> > > > qualified by index access methods itself because of Y is
> > > > one of index keys. Only first key would be used for finding
> > > > index tuples but additional qualification could decrease
> > > > number of heap_fetch calls and this would be nice!
> > >
> > > > This feature would be also usefull for:
> > > >
> > > > create index on table (a,b,c);
> > > > select * from table where a = 1 and c = 2;
> > > >                                     ^^^^^
> > > >     additional qualification would be performed on index level
> > > >
> > > > Personally, I would like to see II implemented first because
> > > > of it works for both query examples.
> > >
> > > Doesn't the existing code already use both keys in the above query.
> > > What is gained by moving this to the index access methods?
> >
> > I hadn't time to implement this year ago...
> >
> > Let's say we have 1000 tuples with a = 1 and only 10 with
> > a = 1 and c = 2 - currently, all 1000 index tuples will be returned
> > to Executor and all corresponding 1000 heap tuples will be fetched...
> > Having this feature, only 10 index tuples would be returned
> > and heap_fetch would be called 10 times only.
>
> OK, stupid question, but why do we have multi-key indexes then?  Just to
> allow UNIQUE index failure?

In the example above only 1st and _3rd_ index keys are used
in WHERE and so only 1st key will be used by index.
For cases like WHERE a = 1 and b = 3 index will use
1st and 2nd keys.
For cases like WHERE a = 1 and b = 3 and c = 2 index will use
all three keys.

Extending II means not using 3rd key to _find_ index tuples
but using 3rd key to reduce # of index tuples returned
to executor. Btree will read all tuples with a = 3 but
will not return index tuple with a = 3 and c = 0 (ie).

Vadim

Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> > > > > II. Extend multi-key indexing: (y = 1 or y = 2) could be
> > > > > qualified by index access methods itself because of Y is
> > > > > one of index keys. Only first key would be used for finding
> > > > > index tuples but additional qualification could decrease
> > > > > number of heap_fetch calls and this would be nice!
> > > >
> > > > > This feature would be also usefull for:
> > > > >
> > > > > create index on table (a,b,c);
> > > > > select * from table where a = 1 and c = 2;
> > > > >                                     ^^^^^
> > > > >     additional qualification would be performed on index level
> > > > >
> > > > > Personally, I would like to see II implemented first because
> > > > > of it works for both query examples.
> > > >
> > > > Doesn't the existing code already use both keys in the above query.
> > > > What is gained by moving this to the index access methods?
> > >
> > > I hadn't time to implement this year ago...
> > >
> > > Let's say we have 1000 tuples with a = 1 and only 10 with
> > > a = 1 and c = 2 - currently, all 1000 index tuples will be returned
> > > to Executor and all corresponding 1000 heap tuples will be fetched...
> > > Having this feature, only 10 index tuples would be returned
> > > and heap_fetch would be called 10 times only.
> >
> > OK, stupid question, but why do we have multi-key indexes then?  Just to
> > allow UNIQUE index failure?
>
> In the example above only 1st and _3rd_ index keys are used
> in WHERE and so only 1st key will be used by index.
> For cases like WHERE a = 1 and b = 3 index will use
> 1st and 2nd keys.
> For cases like WHERE a = 1 and b = 3 and c = 2 index will use
> all three keys.
>
> Extending II means not using 3rd key to _find_ index tuples
> but using 3rd key to reduce # of index tuples returned
> to executor. Btree will read all tuples with a = 3 but
> will not return index tuple with a = 3 and c = 0 (ie).

Oh, stupid me.  I see now.  Hardly seems worth adding the extra code to
reduce a match.  I agree, let the executor handle it.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> >         create table test (x int4, y int4);
> >         create index i_test on test(x,y);
> >         insert into test values(1,2);
> >         select * from test where x=3 and (y=1 or y=2);
> >
> > This is going to use the i_test index, but only with key x=3, and do a
> > scan of the index looking for y=1 or y=2, and will not use the second
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Server will fetch heap tuple for each tuple with x = 3
> returned by index access methods and call ExecQual...
>
> > key of the index.
>
> There are two ways.
>
> I. Rewrite this into
>
> (x = 3 and y = 1) or (x = 3 and y = 2)
>
> and scan index twice using both keys for finding index tuples.
>
> II. Extend multi-key indexing: (y = 1 or y = 2) could be
> qualified by index access methods itself because of Y is
> one of index keys. Only first key would be used for finding
> index tuples but additional qualification could decrease
> number of heap_fetch calls and this would be nice!
>
> This feature would be also usefull for:
>
> create index on table (a,b,c);
> select * from table where a = 1 and c = 2;
>                                     ^^^^^
>     additional qualification would be performed on index level
>
> Personally, I would like to see II implemented first because
> of it works for both query examples.

I now understand how this feature could be used in both cases where we
match key1 and key3, and when we use OR on second key.

I see what you are saying, that the values are already in the index, so
you could scan the index looking for the third key or the OR clause, and
not have to fetch the heap to look it up.

I will add this to the TODO list under performance.  It is sort of
exotic, but why not add it:

* use index to restrict rows returned by multi-key index for
        non-consecutive keys or OR clauses

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] OR with multi-key indexes

From
David Hartwig
Date:
Bruce,

I have been following this thread great enthusiasm.  I seem to have missed
something though.    Have you been CVSing these changes, and as such, should I
be able to use indexes on ORs?



Re: [HACKERS] OR with multi-key indexes

From
Bruce Momjian
Date:
> Bruce,
>
> I have been following this thread great enthusiasm.  I seem to have missed
> something though.    Have you been CVSing these changes, and as such, should I
> be able to use indexes on ORs?
>
>
>
>

Yes.  Do an update because I have been fixing some minor problems.
Should work now.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)