Thread: Covering Indexes
Hackers, Very interesting design document for SQLite 4: http://www.sqlite.org/src4/doc/trunk/www/design.wiki I'm particularly intrigued by "covering indexes". For example: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); This allows the following query to do an index-only scan: SELECT c, d FROM table1 WHERE a=? AND b=?; Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. And I wonder if it would work well with expressions, too? David
<div class="moz-cite-prefix">On 06/28/2012 02:16 PM, David E. Wheeler wrote:<br /></div><blockquote cite="mid:7138506E-2A5D-4AA6-A8CD-DC9FB4D7344E@justatheory.com"type="cite"><pre wrap="">Hackers, Very interesting design document for SQLite 4: <a class="moz-txt-link-freetext" href="http://www.sqlite.org/src4/doc/trunk/www/design.wiki">http://www.sqlite.org/src4/doc/trunk/www/design.wiki</a> I'm particularly intrigued by "covering indexes". For example: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); This allows the following query to do an index-only scan: SELECT c, d FROM table1 WHERE a=? AND b=?; Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. And I wonder if it would work well with expressions, too? David </pre></blockquote><br /> This is analogous to SQL Server's "include" :<br /><br /><div class="container" title="Hint: double-clickto select code"><div class="line number1 index0 alt2"><code class="java plain">CREATE NONCLUSTERED INDEX my_idx</code></div><divclass="line number2 index1 alt1"><code class="java plain">ON my_table (status)</code></div><div class="linenumber3 index2 alt2"><code class="java plain">INCLUDE (someColumn, otherColumn)</code></div></div><br /> Whichis useful, but bloats the index.<br /><pre class="moz-signature" cols="72">-- Andreas Joseph Krogh<a class="moz-txt-link-rfc2396E" href="mailto:andreak@officenet.no"><andreak@officenet.no></a> - mob: +47 909 56 963 Senior Software Developer / CEO - OfficeNet AS - <a class="moz-txt-link-freetext" href="http://www.officenet.no">http://www.officenet.no</a> Public key: <a class="moz-txt-link-freetext" href="http://home.officenet.no/~andreak/public_key.asc">http://home.officenet.no/~andreak/public_key.asc</a> </pre>
On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote: > Hackers, > > Very interesting design document for SQLite 4: > > http://www.sqlite.org/src4/doc/trunk/www/design.wiki > > I'm particularly intrigued by "covering indexes". For example: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > This allows the following query to do an index-only scan: > > SELECT c, d FROM table1 WHERE a=? AND b=?; > > Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. > > And I wonder if it would work well with expressions, too? > > David IRC MS SQL also allow unindexed columns in the index. -- Rob Wultsch wultsch@gmail.com
On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote: > Hackers, > > Very interesting design document for SQLite 4: > > http://www.sqlite.org/src4/doc/trunk/www/design.wiki > > I'm particularly intrigued by "covering indexes". For example: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); I don't see the virtue of this in this case. Since the index is not unique, why not just put the index on (a,b,c,d) and be done with it? Is there some advantage to be had in inventing a way to store c and d in the index without having them usable for indexing? Cheers, Jeff
On 06/28/2012 11:37 AM, Jeff Janes wrote: > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david@justatheory.com> wrote: >> Hackers, >> >> Very interesting design document for SQLite 4: >> >> http://www.sqlite.org/src4/doc/trunk/www/design.wiki >> >> I'm particularly intrigued by "covering indexes". For example: >> >> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > I don't see the virtue of this in this case. Since the index is not > unique, why not just put the index on (a,b,c,d) and be done with it? > Is there some advantage to be had in inventing a way to store c and d > in the index without having them usable for indexing? > Presumably the comparison function will be faster the fewer attributes it needs to compare. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 06/28/2012 11:37 AM, Jeff Janes wrote: >> On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david@justatheory.com> wrote: >>> I'm particularly intrigued by "covering indexes". For example: >>> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); >> I don't see the virtue of this in this case. Since the index is not >> unique, why not just put the index on (a,b,c,d) and be done with it? > Presumably the comparison function will be faster the fewer attributes > it needs to compare. Well, no, because queries will only be comparing the columns used in the query. Insertions would likely actually be faster with the extra columns considered significant, since we've known for a long time that btree doesn't especially like large numbers of identical index entries. When this came up a couple weeks ago, the argument that was made for it was that you could attach non-significant columns to an index that *is* unique. That might or might not be a wide enough use-case to justify adding such a horrid kludge. regards, tom lane
Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012: > When this came up a couple weeks ago, the argument that was made for it > was that you could attach non-significant columns to an index that *is* > unique. That might or might not be a wide enough use-case to justify > adding such a horrid kludge. The other question is whether such an index would prevent an update from being HOT when the non-indexed values are touched. That could be a significant difference. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. That could be a > significant difference. I don't see Index-Only-Scans being something that will be used in "high churn" tables. So as long as the value of these "covering/included" fields is tied to index-only scans, maybe it isn't a problem? Of course, we have have a hard time convincing people that the "index only" scans they want can't be "index only" because heap pages aren't "all visible"... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Alvaro Herrera <alvherre@commandprompt.com> writes: > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. Surely it would *have* to, whether the columns are significant or not for uniqueness purposes. Else an index-only scan gives the wrong value after the update. regards, tom lane
On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012: > >> When this came up a couple weeks ago, the argument that was made for it >> was that you could attach non-significant columns to an index that *is* >> unique. That might or might not be a wide enough use-case to justify >> adding such a horrid kludge. > > The other question is whether such an index would prevent an update from > being HOT when the non-indexed values are touched. That seems like an easy question to answer. How could it not disable HOT and still work correctly? > That could be a > significant difference. True, adding the covering column would not always be a win. But surely it more likely to be a win when it can be done without adding yet another index that also needs to be maintained. Cheers, Jeff
On Thu, Jun 28, 2012 at 7:02 AM, Rob Wultsch <wultsch@gmail.com> wrote: > On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote: >> Hackers, >> >> Very interesting design document for SQLite 4: >> >> http://www.sqlite.org/src4/doc/trunk/www/design.wiki >> >> I'm particularly intrigued by "covering indexes". For example: >> >> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); >> >> This allows the following query to do an index-only scan: >> >> SELECT c, d FROM table1 WHERE a=? AND b=?; >> >> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. >> >> And I wonder if it would work well with expressions, too? >> >> David > > IRC MS SQL also allow unindexed columns in the index. > > -- > Rob Wultsch > wultsch@gmail.com MS SQL Server does support including non-key columns in indexes, allowing index only scans for queries returning these columns. Their syntax is different from that proposed in the linked SQLite document. To the best of my experience, the advantages in SQL Server to such an index (as opposed to just adding the columns to the index normally) are as follows: 1- You can include extra columns in a unique index which do not participate in the uniqueness determination. 2- The non-key columns can be of types which normally cannot be included in a b-tree index due to lack of proper sorting operators. 3- The resulting index is smaller, because the non-key columns are only contained in leaf nodes, not in internal nodes. 4- The insert/update overhead is lower. Of course, an implementation in a different database system, such as Postgres, may or may not have the same set of benefits. Number 4 in particular seems to be dependent on the details of the b-tree implementation. In my mind numbers 1 and 2 are the most compelling arguments in favor of this feature. Whether the it's worth the effort of coding the feature would depend on how well the above benefits (or any benefits I missed) hold true, and how useful such an index actually proved for index only scans in Postgres.
On 28 June 2012 14:02, Rob Wultsch <wultsch@gmail.com> wrote: > On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote: >> I'm particularly intrigued by "covering indexes". For example: >> >> CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > IRC MS SQL also allow unindexed columns in the index. For what it's worth, DB2 also has this feature, written roughly the same way as MS SQL Server: CREATE INDEX cover1 ON table1(a, b) INCLUDE (c, d). http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0000919.html Oracle doesn't seem to have this feature (and the SQL standard doesn't mention indexes at all).
Hi all, > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote: > I don't see the virtue of this in this case. Since the index is not > unique, why not just put the index on (a,b,c,d) and be done with it? > Is there some advantage to be had in inventing a way to store c and d > in the index without having them usable for indexing? Why not restrict it to UNIQUE indexes ? For not unique indexes it has no advantages (it could be in fact indexed on all columns anyway as an implementation detail). For the unique case the problem of many identical entries mentioned by Tom is not relevant, so the additional data will only bloat the index but not otherwise affect the index performance. Would this get close enough to index-covered table ? _That_ would be interesting - I have a really big table (table/index size: 370G/320G, ~8*10^9 rows) which is basically using double space because it's primary key is covering all columns of the table. Cheers, Csaba.
On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote: > Hi all, > > > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote: > > I don't see the virtue of this in this case. Since the index is not > > unique, why not just put the index on (a,b,c,d) and be done with it? > > Is there some advantage to be had in inventing a way to store c and d > > in the index without having them usable for indexing? > > Why not restrict it to UNIQUE indexes ? > > For not unique indexes it has no advantages (it could be in fact indexed > on all columns anyway as an implementation detail). > > For the unique case the problem of many identical entries mentioned by > Tom is not relevant, so the additional data will only bloat the index > but not otherwise affect the index performance. > > Would this get close enough to index-covered table ? _That_ would be > interesting - I have a really big table (table/index size: 370G/320G, > ~8*10^9 rows) which is basically using double space because it's primary > key is covering all columns of the table. I was wondering if there was some way to specify an expression index that did a unique index check on some columns but included more columns not part of the unique index. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Le vendredi 6 juillet 2012 15:41:01, Bruce Momjian a écrit : > On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote: > > Hi all, > > > > > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler > > > <david@justatheory.com> wrote: I don't see the virtue of this in this > > > case. Since the index is not unique, why not just put the index on > > > (a,b,c,d) and be done with it? Is there some advantage to be had in > > > inventing a way to store c and d in the index without having them > > > usable for indexing? > > > > Why not restrict it to UNIQUE indexes ? > > > > For not unique indexes it has no advantages (it could be in fact indexed > > on all columns anyway as an implementation detail). > > > > For the unique case the problem of many identical entries mentioned by > > Tom is not relevant, so the additional data will only bloat the index > > but not otherwise affect the index performance. > > > > Would this get close enough to index-covered table ? _That_ would be > > interesting - I have a really big table (table/index size: 370G/320G, > > ~8*10^9 rows) which is basically using double space because it's primary > > key is covering all columns of the table. > > I was wondering if there was some way to specify an expression index > that did a unique index check on some columns but included more columns > not part of the unique index. I haven't tryed it, but I suppose that Exclusion Constraint should allow that. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Csaba Nagy <ncslists@googlemail.com> writes: > Why not restrict it to UNIQUE indexes ? What benefit would such a restriction provide? AFAICS it doesn't make implementation any easier. regards, tom lane
On 28 June 2012 13:16, David E. Wheeler <david@justatheory.com> wrote: > Very interesting design document for SQLite 4: > > http://www.sqlite.org/src4/doc/trunk/www/design.wiki > > I'm particularly intrigued by "covering indexes". For example: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > This allows the following query to do an index-only scan: > > SELECT c, d FROM table1 WHERE a=? AND b=?; > > Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. Just to be clear, the ability to have covered indexes is already in 9.2, I updated the release notes to explain that a few months back. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote: >> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. > > Just to be clear, the ability to have covered indexes is already in > 9.2, I updated the release notes to explain that a few months back. You mean this? > Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, TomLane) > > This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visibletuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementingthis feature. That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columnsin an index, so that they can *also* be returned in the event of an index-only scan. Best, David
On 17 July 2012 16:21, David E. Wheeler <david@justatheory.com> wrote: > On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote: > >>> Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, whereadditional, unindexed columns are stored alongside indexed columns. >> >> Just to be clear, the ability to have covered indexes is already in >> 9.2, I updated the release notes to explain that a few months back. > > You mean this? > >> Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas,Tom Lane) >> >> This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visibletuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementingthis feature. > > That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columnsin an index, so that they can *also* be returned in the event of an index-only scan. CREATE INDEX ON foo (a, b, c, d); allows SELECT c, d FROM foo WHERE a = ? AND b = ? to use an index only scan. The phrase "unindexed" seems misleading since the data is clearly in the index from the description on the URL you gave. And since the index is non-unique, I don't see any gap between Postgres and SQLliite4. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote: > CREATE INDEX ON foo (a, b, c, d); > > allows > > SELECT c, d FROM foo WHERE a = ? AND b = ? > > to use an index only scan. > > The phrase "unindexed" seems misleading since the data is clearly in > the index from the description on the URL you gave. And since the > index is non-unique, I don't see any gap between Postgres and > SQLliite4. Yeah, but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexesas described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come alongfor the ride, but are not part of the indexed data: CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be able to include them in the index asadditional row data without actually indexing them. Best, David
On 17 July 2012 16:54, David E. Wheeler <david@justatheory.com> wrote: > On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote: > >> CREATE INDEX ON foo (a, b, c, d); >> >> allows >> >> SELECT c, d FROM foo WHERE a = ? AND b = ? >> >> to use an index only scan. >> >> The phrase "unindexed" seems misleading since the data is clearly in >> the index from the description on the URL you gave. And since the >> index is non-unique, I don't see any gap between Postgres and >> SQLliite4. > > Yeah, but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexesas described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come alongfor the ride, but are not part of the indexed data: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be able to include them in the index asadditional row data without actually indexing them. Can you explain what you mean by "without actually indexing them"? ISTM that it is a non-feature if the index is already non-unique, and the difference is simply down to the amount of snake oil applied to the descriptive text on the release notes. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jul 17, 2012 at 6:08 PM, Simon Riggs <span dir="ltr"><<a href="mailto:simon@2ndquadrant.com" target="_blank">simon@2ndquadrant.com</a>></span>wrote:<br /><div class="gmail_quote"><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 17 July 201216:54, David E. Wheeler <<a href="mailto:david@justatheory.com">david@justatheory.com</a>> wrote:<br /> > Yeah,but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexesas described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come alongfor the ride, but are not part of the indexed data:<br /> ><br /> > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);<br/> ><br /> > Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be ableto include them in the index as additional row data without actually indexing them.<br /><br /></div>Can you explainwhat you mean by "without actually indexing them"?<br /> ISTM that it is a non-feature if the index is already non-unique,and<br /> the difference is simply down to the amount of snake oil applied to<br /> the descriptive text on therelease notes.<br /></blockquote></div><br />It would be useful in non-unique indexes to store data without ordering operators(like xml).<br />
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of David E. Wheeler > Sent: Tuesday, July 17, 2012 11:55 AM > To: Simon Riggs > Cc: Pg Hackers > Subject: Re: [HACKERS] Covering Indexes > > On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote: > > > CREATE INDEX ON foo (a, b, c, d); > > > > allows > > > > SELECT c, d FROM foo WHERE a = ? AND b = ? > > > > to use an index only scan. > > > > The phrase "unindexed" seems misleading since the data is clearly in > > the index from the description on the URL you gave. And since the > > index is non-unique, I don't see any gap between Postgres and > > SQLliite4. > > Yeah, but that index is unnecessarily big if one will never use c or d in the > search. The nice thing about covering indexes as described for SQLite 4 and > implemented in MSSQL is that you can specify additional columns that just > come along for the ride, but are not part of the indexed data: > > CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); > > Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to > be able to include them in the index as additional row data without actually > indexing them. > > Best, > > David Concretely, I would presume that the contents of a covering index could then look like the following (a,b,c,d): (2,1,2,A) (2,1,5,A) <-- the 5 is out of natural order but exists in the "covering" part (2,1,3,A) Whereas PostgreSQL would be forced to have the index ordered as such: (2,1,2,A) (2,1,3,A) (2,1,5,A) Either way the data in "c" and "d" are IN THE INDEX otherwise in neither case could the data values be returned while strictly querying the index. So the question that needs to be asked is what kind of performance increase can be had during DML (insert/update) statements and whether those gains are worth pursuing. Since these other engines appear to allow both cases you should be able to get at least a partial idea of the performance gains between "index (a,b,c,d)" and "index (a,b) covering (c,d)". Vik's concurrent point regarding "non-indexable" values makes some sense but the use case there seems specialized as I suspect that in the general case values that are non-indexable (if there truly are any) are generally those that would be too large to warrant sticking into an index in the first place. But, XML values do ring true in my mind (particularly frequently used fragments that are generally quite small). But again whether that is a reasonable use case for a covering index I do not know. It feels like trying to solve the remaining 10% when it took a long while to even muster up enough support and resources to solve the 90%. David J.
On 17 July 2012 17:41, David Johnston <polobo@yahoo.com> wrote: > Concretely, I would presume that the contents of a covering index could then > look like the following (a,b,c,d): > > (2,1,2,A) > (2,1,5,A) <-- the 5 is out of natural order but exists in the "covering" > part > (2,1,3,A) > > Whereas PostgreSQL would be forced to have the index ordered as such: > > (2,1,2,A) > (2,1,3,A) > (2,1,5,A) > > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither > case could the data values be returned while strictly querying the index. > > So the question that needs to be asked is what kind of performance increase > can be had during DML (insert/update) statements and whether those gains are > worth pursuing. Since these other engines appear to allow both cases you > should be able to get at least a partial idea of the performance gains > between "index (a,b,c,d)" and "index (a,b) covering (c,d)". There is a use case, already discussed, whereby that is useful for create unique index on foo (a,b) covering (c,d) but there really isn't any functional difference between create index on foo (a,b) covering (c,d) and create index on foo (a,b,c,d) There is a potential performance impact. But as Tom says, that might even be negative if it is actually measurable. > Vik's concurrent point regarding "non-indexable" values makes some sense but > the use case there seems specialized as I suspect that in the general case > values that are non-indexable (if there truly are any) are generally those > that would be too large to warrant sticking into an index in the first > place. I think it would be easy enough to add noop operators for sorts if you wanted to do that. > But, XML values do ring true in my mind (particularly frequently > used fragments that are generally quite small). But again whether that is a > reasonable use case for a covering index I do not know. It feels like > trying to solve the remaining 10% when it took a long while to even muster > up enough support and resources to solve the 90%. The main thing is that we definitely already do have covering indexes and we will be announcing we have that soon. The fact we have chosen to implement that without adding new syntax strikes me as a selling point as well, so all client tools still work. So the feature we are talking about here needs to be called something else, otherwise we will be confusing people. "Unsorted trailing index columns"... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 07/17/2012 12:41 PM, David Johnston wrote: >> >> So the question that needs to be asked is what kind of performance increase >> can be had during DML (insert/update) statements and whether those gains are >> worth pursuing. Since these other engines appear to allow both cases you >> should be able to get at least a partial idea of the performance gains >> between "index (a,b,c,d)" and "index (a,b) covering (c,d)". Tom's recent answer to me on this point (as I understood it) was that he would expect performance to degrade, not improve, since the btree code is known not to perform well when there are many non-unique values. cheers andrew
"David E. Wheeler" <david@justatheory.com> <CA+U5nMJz33ZsvqPzK-AUoindxkQ6eLiP1HgQ53byoDLpwfDWUA@mail.gmail.com> writes: > On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote: >> The phrase "unindexed" seems misleading since the data is clearly in >> the index from the description on the URL you gave. And since the >> index is non-unique, I don't see any gap between Postgres and >> SQLliite4. > Yeah, but that index is unnecessarily big if one will never use c or d > in the search. The data would still have to be stored in the leaf entries, at least. Yeah, you could possibly omit the "unindexed columns" from upper tree levels, but with typical btree fanout ratios in the hundreds, the overall space savings would be negligible. The idea of different index tuple descriptors on different tree levels doesn't appeal to me, either. regards, tom lane
On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote: > > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither > > case could the data values be returned while strictly querying the index. > > > > So the question that needs to be asked is what kind of performance increase > > can be had during DML (insert/update) statements and whether those gains are > > worth pursuing. Since these other engines appear to allow both cases you > > should be able to get at least a partial idea of the performance gains > > between "index (a,b,c,d)" and "index (a,b) covering (c,d)". > > There is a use case, already discussed, whereby that is useful for > create unique index on foo (a,b) covering (c,d) > > but there really isn't any functional difference between > create index on foo (a,b) covering (c,d) > > and > create index on foo (a,b,c,d) > > There is a potential performance impact. But as Tom says, that might > even be negative if it is actually measurable. So, do we want a TODO item about adding columns to a unique index that will not be used for uniqueness checks? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote: >> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither >> > case could the data values be returned while strictly querying the index. >> > >> > So the question that needs to be asked is what kind of performance increase >> > can be had during DML (insert/update) statements and whether those gains are >> > worth pursuing. Since these other engines appear to allow both cases you >> > should be able to get at least a partial idea of the performance gains >> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)". >> >> There is a use case, already discussed, whereby that is useful for >> create unique index on foo (a,b) covering (c,d) >> >> but there really isn't any functional difference between >> create index on foo (a,b) covering (c,d) >> >> and >> create index on foo (a,b,c,d) >> >> There is a potential performance impact. But as Tom says, that might >> even be negative if it is actually measurable. > > So, do we want a TODO item about adding columns to a unique index that > will not be used for uniqueness checks? I think so. The case where you want a wide multiple column primary key to be extended to cover that one extra commonly grabbed value is not super common but entirely plausible. With the existing infrastructure to get the advantages of index covering you'd have to duplicate the entire index for the extra field which really sucks: aside from the huge waste of time and space, you force the planner to play the 'which index do i use?' game. merlin
On Thu, Jul 26, 2012 at 12:25 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Jul 26, 2012 at 11:13 AM, Bruce Momjian <bruce@momjian.us> wrote: >> On Tue, Jul 17, 2012 at 06:00:37PM +0100, Simon Riggs wrote: >>> > Either way the data in "c" and "d" are IN THE INDEX otherwise in neither >>> > case could the data values be returned while strictly querying the index. >>> > >>> > So the question that needs to be asked is what kind of performance increase >>> > can be had during DML (insert/update) statements and whether those gains are >>> > worth pursuing. Since these other engines appear to allow both cases you >>> > should be able to get at least a partial idea of the performance gains >>> > between "index (a,b,c,d)" and "index (a,b) covering (c,d)". >>> >>> There is a use case, already discussed, whereby that is useful for >>> create unique index on foo (a,b) covering (c,d) >>> >>> but there really isn't any functional difference between >>> create index on foo (a,b) covering (c,d) >>> >>> and >>> create index on foo (a,b,c,d) >>> >>> There is a potential performance impact. But as Tom says, that might >>> even be negative if it is actually measurable. >> >> So, do we want a TODO item about adding columns to a unique index that >> will not be used for uniqueness checks? > > I think so. The case where you want a wide multiple column primary > key to be extended to cover that one extra commonly grabbed value is > not super common but entirely plausible. With the existing > infrastructure to get the advantages of index covering you'd have to > duplicate the entire index for the extra field which really sucks: > aside from the huge waste of time and space, you force the planner to > play the 'which index do i use?' game. I think it is going to take several years before we really understand how index-only scans play out in the real world, and what factors limit their usefulness. This one has come up a few times because it's sort of an obvious thing to want to do and we don't have it, but I think that there's room for some skepticism about how well it will actually work, for reasons that have already been mentioned, and of course also because indexing more columns potentially means defeating HOT, which I suspect will defeat many otherwise-promising applications of index-only scans. That having been said, it would be unwise to speculate too much in advance of the data, and we're not going to get any data until someone tries implementing it, so +1 from me for putting something on the TODO list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 26, 2012 at 12:17 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I think so. The case where you want a wide multiple column primary >> key to be extended to cover that one extra commonly grabbed value is >> not super common but entirely plausible. With the existing >> infrastructure to get the advantages of index covering you'd have to >> duplicate the entire index for the extra field which really sucks: >> aside from the huge waste of time and space, you force the planner to >> play the 'which index do i use?' game. > > I think it is going to take several years before we really understand > how index-only scans play out in the real world, and what factors > limit their usefulness. This one has come up a few times because it's > sort of an obvious thing to want to do and we don't have it, but I > think that there's room for some skepticism about how well it will > actually work, for reasons that have already been mentioned, and of > course also because indexing more columns potentially means defeating > HOT, which I suspect will defeat many otherwise-promising applications > of index-only scans. Sure. many will still get to use them though: I'm doing tons of OLAP/BI lately: wide keys, minimal to no updating, minimal to no RI, andvery large tables (often clustered and partitioned), and extreme performance requirements. Covering indexes to me is basically a drop in feature and COVERING seems to make a lot of sense on paper. (I absolutely can't wait to get 9.2 on some of our bigger servers here). merlin
On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: > So, do we want a TODO item about adding columns to a unique index that > will not be used for uniqueness checks? -1 from me, at least in its current form. At it's heart, this is about separating the constraint from the index that enforces it -- you'd like the columns to be available for querying (for index only scans or otherwise), but not to take part in the constraint. And when you look at it from that perspective, this proposal is and extremely limited form. You can't, for example, decide that an existing index can be used for a new unique constraint. That's a lot more plausible than the use cases mentioned in this thread as far as I can see, but this proposal can't do that. I tried proposing a more general use case when developing exclusion constraints: http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis (allow user to specify multiple constraints enforced by one existing index). But, at least at the time, my proposal didn't pass the usefulness test: http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php even though my proposal was strictly more powerful than this one is. Also, this proposal extends the weird differences between CREATE UNIQUE INDEX and a the declaration of a UNIQUE constraint. For example, if you want DEFERRABLE you need to declare the constraint, but if you want to use an expression (rather than a simple column reference) you need to create the index. This problem does not exist with exclusion constraints. In my opinion, new innovations in unique constraints would be better served as part of exclusion constraints, and we should keep unique constraints simple. If we make an improvement to UNIQUE, then we will want to do similar things for exclusion constraints anyway, so it just seems duplicative. Regards,Jeff Davis
On Fri, Jul 27, 2012 at 12:24 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: >> So, do we want a TODO item about adding columns to a unique index that >> will not be used for uniqueness checks? > > -1 from me, at least in its current form. > > At it's heart, this is about separating the constraint from the index > that enforces it -- you'd like the columns to be available for querying > (for index only scans or otherwise), but not to take part in the > constraint. > > And when you look at it from that perspective, this proposal is and > extremely limited form. You can't, for example, decide that an existing > index can be used for a new unique constraint. That's a lot more > plausible than the use cases mentioned in this thread as far as I can > see, but this proposal can't do that. > > I tried proposing a more general use case when developing exclusion > constraints: > > http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis > > (allow user to specify multiple constraints enforced by one existing > index). But, at least at the time, my proposal didn't pass the > usefulness test: > > http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php > > even though my proposal was strictly more powerful than this one is. > > Also, this proposal extends the weird differences between CREATE UNIQUE > INDEX and a the declaration of a UNIQUE constraint. For example, if you > want DEFERRABLE you need to declare the constraint, but if you want to > use an expression (rather than a simple column reference) you need to > create the index. This problem does not exist with exclusion > constraints. > > In my opinion, new innovations in unique constraints would be better > served as part of exclusion constraints, and we should keep unique > constraints simple. If we make an improvement to UNIQUE, then we will > want to do similar things for exclusion constraints anyway, so it just > seems duplicative. Well, you're right. The exclusion constraint syntax is amazingly general (if somewhat arcane) and it would be neat to be extended like that. It already decouples you from physical assumptions on the index. For example, it creates a two field index for a double field btree equality exclusion and does a runtime, not equality based uniqueness check. The covering index/uniqueness use case adds legitimacy to the INDEX clause of exclusion constraints IMNSHO. One point of concern though is that (following a bit of testing) alter table foo add exclude using btree (id with =); ...is always strictly slower for inserts than alter table foo add primary key(id); This is probably because it doesn't use the low level btree based uniqueness check (the index is not declared UNIQUE) -- shouldn't it do that if it can? merlin
On Fri, 2012-07-27 at 15:27 -0500, Merlin Moncure wrote: > The covering index/uniqueness use case adds > legitimacy to the INDEX clause of exclusion constraints IMNSHO. Yes, I think it would be worth revisiting the idea. > One point of concern though is that (following a bit of testing) > alter table foo add exclude using btree (id with =); > ...is always strictly slower for inserts than > alter table foo add primary key(id); Yes, in my worst-case tests there is about a 2X difference for building the constraint and about a 30-50% slowdown during INSERT (I thought I remembered the difference being lower, but I didn't dig up my old test). That's for an in-memory case, I would expect disk I/O to make the difference less apparent. > This is probably because it doesn't use the low level btree based > uniqueness check (the index is not declared UNIQUE) -- shouldn't it do > that if it can? We could probably detect that the operator being used is the btree equality operator, set the unique property of the btree, and avoid the normal exclusion constraint check. I'm sure there are some details to work out, but if we start collecting more use cases where people want the flexibility of exclusion constraints and the speed of UNIQUE, we should look into it. Regards,Jeff Davis
On Fri, Jul 27, 2012 at 1:27 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > One point of concern though is that (following a bit of testing) > alter table foo add exclude using btree (id with =); > ...is always strictly slower for inserts than > alter table foo add primary key(id); > > This is probably because it doesn't use the low level btree based > uniqueness check (the index is not declared UNIQUE) -- shouldn't it do > that if it can? If it did that, then than would make it faster in precisely those cases were I wouldn't use it in the first place--where there is a less esoteric alternative that does exactly the same thing. While that is not something without value, it would seem better (although potentially more difficult of course) to just make it faster in general, instead. I didn't look into the creation, but rather into inserts. During inserts, it looks like it is doing a look up into the btree twice, presumably once to maintain it, and once to check for uniqueness. If there was some way to cache the look-up between those, I think it would go a long way towards eliminating the performance difference. Could that be done without losing the generality? And, does it matter? I would think covering indexes would be deployed to best effect when your data is not cached in RAM, in which case the IO cost common to both paths probably overwhelms any extra CPU cost. Cheers, Jeff
On Thu, 2012-07-26 at 12:13 -0400, Bruce Momjian wrote: > So, do we want a TODO item about adding columns to a unique index that > will not be used for uniqueness checks? -1 from me, at least in its current form. At it's heart, this is about separating the constraint from the index that enforces it -- you'd like the columns to be available for querying (for index only scans or otherwise), but not to take part in the constraint. And when you look at it from that perspective, this proposal is and extremely limited form. You can't, for example, decide that an existing index can be used for a new unique constraint. That's a lot more plausible than the use cases mentioned in this thread as far as I can see, but this proposal can't do that. I tried proposing a more general use case when developing exclusion constraints: http://archives.postgresql.org/message-id/1253466074.6983.22.camel@jdavis (allow user to specify multiple constraints enforced by one existing index). But, at least at the time, my proposal didn't pass the usefulness test: http://archives.postgresql.org/pgsql-hackers/2009-09/msg01355.php even though my proposal was strictly more powerful than this one is. Also, this proposal extends the weird differences between CREATE UNIQUE INDEX and a the declaration of a UNIQUE constraint. For example, if you want DEFERRABLE you need to declare the constraint, but if you want to use an expression (rather than a simple column reference) you need to create the index. This problem does not exist with exclusion constraints. In my opinion, new innovations in unique constraints would be better served as part of exclusion constraints, and we should keep unique constraints simple. If we make an improvement to UNIQUE, then we will want to do similar things for exclusion constraints anyway, so it just seems duplicative. Regards,Jeff Davis
* Jeff Janes: > I don't see the virtue of this in this case. Since the index is not > unique, why not just put the index on (a,b,c,d) and be done with it? AFAICT, SQLite 4 encodes keys in a way that is not easily reversed (although the encoding is injective, so it's reversible in principle). Therefore, covered columns need to be listed separately, so they are not subject to the key encoding.