Thread: If an index is based on 3 columns will a query using two of the columns utilize the index?
If an index is based on 3 columns will a query using two of the columns utilize the index?
From
"Reid Thompson"
Date:
Example: assume a table of 10 columns, three of which are fname, lname, and dob. If an index is created on (fname, lname, dob), will a query that utilizes two of the columns ( select 'data' from table where fname = 'X' and lname = 'Y') utilize the index? thanks, reid
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Alvaro Herrera
Date:
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: > Example: > assume a table of 10 columns, three of which are fname, lname, and dob. > If an index is created on (fname, lname, dob), will a query that > utilizes two of the columns ( select 'data' from table where fname = 'X' > and lname = 'Y') utilize the index? Yes, if it is selective enough. (It _can_ use the index, which does not mean that it _will_ use it.) Note that if your example query used the columns (lname, dob), the answer would be "no." -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Michael Fuhr
Date:
On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: > assume a table of 10 columns, three of which are fname, lname, and dob. > If an index is created on (fname, lname, dob), will a query that > utilizes two of the columns ( select 'data' from table where fname = 'X' > and lname = 'Y') utilize the index? See "Multicolumn Indexes" in the "Indexes" chapter of the documentation. http://www.postgresql.org/docs/8.0/interactive/indexes-multicolumn.html You can use EXPLAIN to see whether the planner will use an index for a particular query. http://www.postgresql.org/docs/8.0/interactive/performance-tips.html#USING-EXPLAIN Note, however, that the planner will ignore an index and use a sequential scan if it thinks the latter will be faster, so if you want to see whether the query *can* use an index (as opposed to *will* use it) then you could execute "SET enable_seqscan TO off" and then run EXPLAIN (don't forget to RESET enable_seqscan or SET it back to "on" when you're done testing). -- Michael Fuhr
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
"Reid Thompson"
Date:
Alvaro Herrera wrote: > On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: >> Example: >> assume a table of 10 columns, three of which are fname, lname, and >> dob. If an index is created on (fname, lname, dob), will a query that >> utilizes two of the columns ( select 'data' from table where fname = >> 'X' and lname = 'Y') utilize the index? > > Yes, if it is selective enough. (It _can_ use the index, > which does not mean that it _will_ use it.) Note that if > your example query used the columns (lname, dob), the answer would be > "no." > Why is that? In order to use an index, does the query have to utilize the 'first' element of the index? reid
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Michael Fuhr
Date:
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: > Alvaro Herrera wrote: > > Note that if your example query used the columns (lname, dob), > > the answer would be "no." > > Why is that? In order to use an index, does the query have to utilize > the 'first' element of the index? In released versions of PostgreSQL, yes. Version 8.1 will remove that restriction. -- Michael Fuhr
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Alvaro Herrera
Date:
On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: > Alvaro Herrera wrote: > > On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: > >> Example: > >> assume a table of 10 columns, three of which are fname, lname, and > >> dob. If an index is created on (fname, lname, dob), will a query that > >> utilizes two of the columns ( select 'data' from table where fname = > >> 'X' and lname = 'Y') utilize the index? > > > > Yes, if it is selective enough. (It _can_ use the index, > > which does not mean that it _will_ use it.) Note that if > > your example query used the columns (lname, dob), the answer would be > > "no." > > Why is that? In order to use an index, does the query have to utilize > the 'first' element of the index? The "leftmost part." There's no way to scan an index if you don't know the key. On a btree index, the key is ordered, and the columns at the left are more significant than those at the right. If you don't provide a value for the leftmost (first) column, there's no way to start scanning the index because there's no starting point. I don't think that was nearly clear enough, but OTOH I haven't had any coffee today yet. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote: >> Example: >> assume a table of 10 columns, three of which are fname, lname, and dob. >> If an index is created on (fname, lname, dob), will a query that >> utilizes two of the columns ( select 'data' from table where fname = 'X' >> and lname = 'Y') utilize the index? > Yes, if it is selective enough. (It _can_ use the index, which does not > mean that it _will_ use it.) Note that if your example query used the > columns (lname, dob), the answer would be "no." Actually, that last point is not true anymore as of 8.1 --- see this thread: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php which led to this patch: http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php I missed the fact that the documentation said it wouldn't work though. Will fix... regards, tom lane
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote: >> Why is that? In order to use an index, does the query have to utilize >> the 'first' element of the index? > The "leftmost part." There's no way to scan an index if you don't know > the key. On a btree index, the key is ordered, and the columns at the > left are more significant than those at the right. If you don't provide > a value for the leftmost (first) column, there's no way to start > scanning the index because there's no starting point. Actually, btree doesn't have any particular problem with that --- it just starts the scan at the beginning of the index. However the other index types do all require a constraint on the first index column; for instance hash has to be able to determine a hash value. Greg Stark suggests here: http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php that GiST could also be fixed to work with any subset of the index columns, but it hasn't been done yet, unless Teodor and Oleg snuck something in during that last round of GiST work. regards, tom lane
> Greg Stark suggests here: > http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php > that GiST could also be fixed to work with any subset of the index > columns, but it hasn't been done yet, unless Teodor and Oleg snuck > something in during that last round of GiST work. GiST may work with any subset of index columns too. Even in existing code I don't see any problem except NULL in a first column. GiST doesn't store tuples with leading NULL value (gist.c lines 174, 326), so index doesn't contained them. After our work about WAL-lization GiST, it may work with "invalid" tuples (possibly occured after crash recovery), so itsn't a big deal to add support NULL in a first column. But freeze date is outdated... Should I add or leave it to 8.2? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > > Yes, if it is selective enough. (It _can_ use the index, which does not > > mean that it _will_ use it.) Note that if your example query used the > > columns (lname, dob), the answer would be "no." > > Actually, that last point is not true anymore as of 8.1 --- see this > thread: > http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php > which led to this patch: > http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php Did that patch actually implement "skip scanning"? The comment seems to only describe removing the restriction from the planner. Which would make it theoretically possible but presumably the the cost estimator should ensure it essentially never gets chosen for btree indexes. The btree index would very very rarely help since it would require a complete index scan. I guess I could see some corner cases where it would help. Very wide tables with an index on a few very selective relatively narrow columns. So the index could be scanned in its entirety much faster than a full table scan. But the index would have to be *much* narrower than the table and quite selective to overcome the random access penalty. Skip scanning would make it much more likely to be helpful. Also, I think Oracle has another scan method called a "fast index scan" that basically does a full sequential scan of the index. So the tuples come out unordered but the access pattern is sequential. Would that be a good TODO for Postgres? Is it feasible given the index disk structures in Postgres? -- greg
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php > Did that patch actually implement "skip scanning"? No, it just removed the planner's arbitrary assumption that the index methods wouldn't cope. Skip scanning is actually something rather different anyway. > The comment seems to only describe removing the restriction from the planner. > Which would make it theoretically possible but presumably the the cost > estimator should ensure it essentially never gets chosen for btree indexes. btcostestimate does understand this now. > I guess I could see some corner cases where it would help. Very wide tables > with an index on a few very selective relatively narrow columns. So the index > could be scanned in its entirety much faster than a full table scan. But the > index would have to be *much* narrower than the table and quite selective > to overcome the random access penalty. With a bitmap index scan the penalty wouldn't be so high. > Also, I think Oracle has another scan method called a "fast index scan" that > basically does a full sequential scan of the index. So the tuples come out > unordered but the access pattern is sequential. Would that be a good TODO for > Postgres? Is it feasible given the index disk structures in Postgres? I think this would probably fail under concurrent update conditions: you couldn't guarantee not to miss or multiply return index entries. There is interlocking in an index-order scan that prevents such problems, but I don't see how it'd work for a physical-order scan. You could probably make it work if you were willing to lock out writers for the duration of the scan, but that'd severely restrict the usefulness I would think. I'm also not sure how we'd express such a constraint within the system... regards, tom lane
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?
From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes: > GiST may work with any subset of index columns too. Even in existing code I > don't see any problem except NULL in a first column. GiST doesn't store tuples > with leading NULL value (gist.c lines 174, 326), so index doesn't contained them. Well, that's exactly the problem :-(. Or at least one of the problems; the other being what you'd use as search key to find such tuples. > After our work about WAL-lization GiST, it may work with "invalid" > tuples (possibly occured after crash recovery), so itsn't a big deal > to add support NULL in a first column. But freeze date is > outdated... Should I add or leave it to 8.2? Too late for 8.1 I'd say --- this definitely sounds like a new feature rather than a bug fix. regards, tom lane