Thread: If an index is based on 3 columns will a query using two of the columns utilize the index?

If an index is based on 3 columns will a query using two of the columns utilize the index?

From

"Reid Thompson"

Date:

12 September 2005, 10:44:12

Example:
assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index?

thanks,
reid

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Alvaro Herrera

Date:

12 September 2005, 10:59:37

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
> Example:
> assume a table of 10 columns, three of which are fname, lname, and dob.
> If an index is created on (fname, lname, dob), will a query that
> utilizes two of the columns ( select 'data' from table where fname = 'X'
> and lname = 'Y') utilize the index?

Yes, if it is selective enough.  (It _can_ use the index, which does not
mean that it _will_ use it.)  Note that if your example query used the
columns (lname, dob), the answer would be "no."

--
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Michael Fuhr

Date:

12 September 2005, 11:05:02

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
> assume a table of 10 columns, three of which are fname, lname, and dob.
> If an index is created on (fname, lname, dob), will a query that
> utilizes two of the columns ( select 'data' from table where fname = 'X'
> and lname = 'Y') utilize the index?

See "Multicolumn Indexes" in the "Indexes" chapter of the documentation.

http://www.postgresql.org/docs/8.0/interactive/indexes-multicolumn.html

You can use EXPLAIN to see whether the planner will use an index for
a particular query.

http://www.postgresql.org/docs/8.0/interactive/performance-tips.html#USING-EXPLAIN

Note, however, that the planner will ignore an index and use a
sequential scan if it thinks the latter will be faster, so if you
want to see whether the query *can* use an index (as opposed to
*will* use it) then you could execute "SET enable_seqscan TO off"
and then run EXPLAIN (don't forget to RESET enable_seqscan or SET
it back to "on" when you're done testing).

--
Michael Fuhr

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

"Reid Thompson"

Date:

12 September 2005, 11:05:36

Alvaro Herrera wrote:
> On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
>> Example:
>> assume a table of 10 columns, three of which are fname, lname, and
>> dob. If an index is created on (fname, lname, dob), will a query that
>> utilizes two of the columns ( select 'data' from table where fname =
>> 'X' and lname = 'Y') utilize the index?
>
> Yes, if it is selective enough.  (It _can_ use the index,
> which does not mean that it _will_ use it.)  Note that if
> your example query used the columns (lname, dob), the answer would be
> "no."
>
Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?



reid

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Michael Fuhr

Date:

12 September 2005, 11:18:53

On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
> Alvaro Herrera wrote:
> > Note that if your example query used the columns (lname, dob),
> > the answer would be "no."
>
> Why is that? In order to use an index, does the query have to utilize
> the 'first' element of the index?

In released versions of PostgreSQL, yes.  Version 8.1 will remove
that restriction.

--
Michael Fuhr

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Alvaro Herrera

Date:

12 September 2005, 11:32:11

On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
> Alvaro Herrera wrote:
> > On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
> >> Example:
> >> assume a table of 10 columns, three of which are fname, lname, and
> >> dob. If an index is created on (fname, lname, dob), will a query that
> >> utilizes two of the columns ( select 'data' from table where fname =
> >> 'X' and lname = 'Y') utilize the index?
> >
> > Yes, if it is selective enough.  (It _can_ use the index,
> > which does not mean that it _will_ use it.)  Note that if
> > your example query used the columns (lname, dob), the answer would be
> > "no."
>
> Why is that? In order to use an index, does the query have to utilize
> the 'first' element of the index?

The "leftmost part."  There's no way to scan an index if you don't know
the key.  On a btree index, the key is ordered, and the columns at the
left are more significant than those at the right.  If you don't provide
a value for the leftmost (first) column, there's no way to start
scanning the index because there's no starting point.

I don't think that was nearly clear enough, but OTOH I haven't had any
coffee today yet.

--
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Tom Lane

Date:

12 September 2005, 11:38:59

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:
>> Example:
>> assume a table of 10 columns, three of which are fname, lname, and dob.
>> If an index is created on (fname, lname, dob), will a query that
>> utilizes two of the columns ( select 'data' from table where fname = 'X'
>> and lname = 'Y') utilize the index?

> Yes, if it is selective enough.  (It _can_ use the index, which does not
> mean that it _will_ use it.)  Note that if your example query used the
> columns (lname, dob), the answer would be "no."

Actually, that last point is not true anymore as of 8.1 --- see this
thread:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
which led to this patch:
http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

I missed the fact that the documentation said it wouldn't work though.
Will fix...

            regards, tom lane

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Tom Lane

Date:

12 September 2005, 11:54:11

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:
>> Why is that? In order to use an index, does the query have to utilize
>> the 'first' element of the index?

> The "leftmost part."  There's no way to scan an index if you don't know
> the key.  On a btree index, the key is ordered, and the columns at the
> left are more significant than those at the right.  If you don't provide
> a value for the leftmost (first) column, there's no way to start
> scanning the index because there's no starting point.

Actually, btree doesn't have any particular problem with that --- it
just starts the scan at the beginning of the index.  However the other
index types do all require a constraint on the first index column;
for instance hash has to be able to determine a hash value.

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.

            regards, tom lane

Re: If an index is based on 3 columns will a query using

From

Teodor Sigaev

Date:

12 September 2005, 12:25:38

> Greg Stark suggests here:
> http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
> that GiST could also be fixed to work with any subset of the index
> columns, but it hasn't been done yet, unless Teodor and Oleg snuck
> something in during that last round of GiST work.

GiST may work with any subset of index columns too. Even in existing code I
don't see any problem except NULL in a first column. GiST doesn't store tuples
with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.

After our work about WAL-lization GiST, it may work with "invalid" tuples
(possibly occured after crash recovery), so itsn't a big deal to add support
NULL in a first column. But freeze date is outdated... Should I add or leave it
to 8.2?



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Greg Stark

Date:

12 September 2005, 12:56:02

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>
> > Yes, if it is selective enough.  (It _can_ use the index, which does not
> > mean that it _will_ use it.)  Note that if your example query used the
> > columns (lname, dob), the answer would be "no."
>
> Actually, that last point is not true anymore as of 8.1 --- see this
> thread:
> http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
> which led to this patch:
> http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

Did that patch actually implement "skip scanning"?

The comment seems to only describe removing the restriction from the planner.
Which would make it theoretically possible but presumably the the cost
estimator should ensure it essentially never gets chosen for btree indexes.
The btree index would very very rarely help since it would require a complete
index scan.

I guess I could see some corner cases where it would help. Very wide tables
with an index on a few very selective relatively narrow columns. So the index
could be scanned in its entirety much faster than a full table scan. But the
index would have to be *much* narrower than the table and quite selective
to overcome the random access penalty.

Skip scanning would make it much more likely to be helpful.

Also, I think Oracle has another scan method called a "fast index scan" that
basically does a full sequential scan of the index. So the tuples come out
unordered but the access pattern is sequential. Would that be a good TODO for
Postgres? Is it feasible given the index disk structures in Postgres?

--
greg

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Tom Lane

Date:

12 September 2005, 13:37:43

Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

> Did that patch actually implement "skip scanning"?

No, it just removed the planner's arbitrary assumption that the index
methods wouldn't cope.  Skip scanning is actually something rather
different anyway.

> The comment seems to only describe removing the restriction from the planner.
> Which would make it theoretically possible but presumably the the cost
> estimator should ensure it essentially never gets chosen for btree indexes.

btcostestimate does understand this now.

> I guess I could see some corner cases where it would help. Very wide tables
> with an index on a few very selective relatively narrow columns. So the index
> could be scanned in its entirety much faster than a full table scan. But the
> index would have to be *much* narrower than the table and quite selective
> to overcome the random access penalty.

With a bitmap index scan the penalty wouldn't be so high.

> Also, I think Oracle has another scan method called a "fast index scan" that
> basically does a full sequential scan of the index. So the tuples come out
> unordered but the access pattern is sequential. Would that be a good TODO for
> Postgres? Is it feasible given the index disk structures in Postgres?

I think this would probably fail under concurrent update conditions: you
couldn't guarantee not to miss or multiply return index entries.  There
is interlocking in an index-order scan that prevents such problems, but
I don't see how it'd work for a physical-order scan.

You could probably make it work if you were willing to lock out writers
for the duration of the scan, but that'd severely restrict the
usefulness I would think.  I'm also not sure how we'd express such a
constraint within the system...

            regards, tom lane

Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

From

Tom Lane

Date:

12 September 2005, 13:39:58

Teodor Sigaev <teodor@sigaev.ru> writes:
> GiST may work with any subset of index columns too. Even in existing code I
> don't see any problem except NULL in a first column. GiST doesn't store tuples
> with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.

Well, that's exactly the problem :-(.  Or at least one of the problems;
the other being what you'd use as search key to find such tuples.

> After our work about WAL-lization GiST, it may work with "invalid"
> tuples (possibly occured after crash recovery), so itsn't a big deal
> to add support NULL in a first column. But freeze date is
> outdated... Should I add or leave it to 8.2?

Too late for 8.1 I'd say --- this definitely sounds like a new feature
rather than a bug fix.

            regards, tom lane