Thread: Better estimates of index correlation
Currently, we don't measure any statistics about the ordering correlation of multi-column indexes, which means that btcostestimate has to pick a number out of the air if there's more than one column. We've been around on that at least once already: it used to use first column's correlation divided by number of columns, and now uses first column's correlation times 0.75, and neither of those approaches have any theoretical basis whatsoever. I got started thinking about this again in connection with John Surcombe's recent complaint on pgsql-performance, which seems likely to be related to having a good correlation estimate for one index but not another. It strikes me that it'd be possible to have btvacuumcleanup directly measure order correlation when it's processing a btree index, yielding a reliable answer for any btree index regardless of number of columns. We could do that by comparing the heap block numbers of adjacent index entries' TIDs and counting the number of up-transitions (block number larger than previous index entry) versus down-transitions (block number smaller than previous). Since btvacuumcleanup scans the index in physical order not logical order, we could not meaningfully make comparisons between the last entry on an index page and the first entry on the next, but I think it's quite reasonable to assume that the statistics of those comparisons would be the same as the statistics of the intra-page comparisons. So at the end of the cleanup scan we would have total numbers of up-transitions and down-transitions. It's clear that a perfectly correlated index would have many up-transitions and no down-transitions; a perfectly reverse-correlated index would have the opposite; and an uncorrelated index would have about equal numbers of them. So we could approximate the index correlation with something like (up_transition_count - down_transition_count) /(up_transition_count + down_transition_count) My statistics are a bit rusty so I'm not sure if we need a square or square root or something in there, but it seems like this would work, and would add only a negligible amount of overhead. We could then have VACUUM save the number into pg_statistic or pg_index, and away we go. I'm not planning to do anything about this idea right now, since I'm still hip-deep in collations, but I thought I'd throw it out to get it on the record. Comments? regards, tom lane
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: > I'm not planning to do anything about this idea right now, since I'm > still hip-deep in collations, but I thought I'd throw it out to get > it on the record. > > Comments? One question: Where is the overhead increase? JD > > regards, tom lane > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: > On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: > > > I'm not planning to do anything about this idea right now, since I'm > > still hip-deep in collations, but I thought I'd throw it out to get > > it on the record. > > > > Comments? > > One question: Where is the overhead increase? During VACUUM, in the pass that processes indexes. I think Tom is sligthly confused though: AFAICT this must happen in btvacuumscan (which does the actual scan), not btvacuumcleanup (which may not do it, if btbulkdelete did it previously). Which means it would be done for each pass over the index when vacuuming a relation, because I don't see any way for this function to determine whether this is the last pass we'll do over the index. It sure would be nice to be able to do it only during the last scan. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 14.03.2011 16:09, Alvaro Herrera wrote: > Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: >> On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: >> >>> I'm not planning to do anything about this idea right now, since I'm >>> still hip-deep in collations, but I thought I'd throw it out to get >>> it on the record. >>> >>> Comments? >> >> One question: Where is the overhead increase? > > During VACUUM, in the pass that processes indexes. > > I think Tom is sligthly confused though: AFAICT this must happen in > btvacuumscan (which does the actual scan), not btvacuumcleanup (which > may not do it, if btbulkdelete did it previously). Which means it would > be done for each pass over the index when vacuuming a relation, because > I don't see any way for this function to determine whether this is the > last pass we'll do over the index. > > It sure would be nice to be able to do it only during the last scan. Can't we do it at ANALYZE? If the estimate is only based on intra-page comparisons anyway, a sample of random pages ought to be enough. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Joshua D. Drake's message of dom mar 13 23:20:01 -0300 2011: >> On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: >> >> > I'm not planning to do anything about this idea right now, since I'm >> > still hip-deep in collations, but I thought I'd throw it out to get >> > it on the record. >> > >> > Comments? >> >> One question: Where is the overhead increase? > > During VACUUM, in the pass that processes indexes. > > I think Tom is sligthly confused though: AFAICT this must happen in > btvacuumscan (which does the actual scan), not btvacuumcleanup (which > may not do it, if btbulkdelete did it previously). Which means it would > be done for each pass over the index when vacuuming a relation, because > I don't see any way for this function to determine whether this is the > last pass we'll do over the index. > > It sure would be nice to be able to do it only during the last scan. Does it really matter? What Tom was describing sounded embarassingly cheap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: > On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > It sure would be nice to be able to do it only during the last scan. > > Does it really matter? What Tom was describing sounded embarassingly cheap. Well, you only do multiple passes for tables that are really large, so it's precisely there that you want to save the extra overhead of having to do it multiple times. As Heikki says, maybe this wouldn't be an issue at all if we can do it during ANALYZE instead, but I don't know if that works. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Mar 14, 2011 at 10:25 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: >> On Mon, Mar 14, 2011 at 10:09 AM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: > >> > It sure would be nice to be able to do it only during the last scan. >> >> Does it really matter? What Tom was describing sounded embarassingly cheap. > > Well, you only do multiple passes for tables that are really large, so > it's precisely there that you want to save the extra overhead of having > to do it multiple times. Right but if the overhead is 0.0001% then who cares? I have no idea what the real number is but it doesn't seem unrealistic to me that it could be on that order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Robert Haas's message of lun mar 14 11:18:24 -0300 2011: >> Does it really matter? What Tom was describing sounded embarassingly cheap. That was my thought exactly. If you could even measure the added cost of doing that, I'd be astonished. It'd be adding one comparison-and- possible-assignment to a loop that also has to invoke a binary search of a TID array --- a very large array, in the cases we're worried about. I'd put the actual update of pg_statistic somewhere where it only happened once, but I don't especially care if the stat gets computed on each index scan. > As Heikki says, maybe this wouldn't be an issue at all if we can do it > during ANALYZE instead, but I don't know if that works. I'm not convinced you can get a sufficiently good estimate from a small subset of pages. I actually started with the idea of having ANALYZE try to calculate correlation for multi-column indexes the same way it now calculates it for individual data columns, but when this idea occurred to me it just seemed a whole lot better. Note that we could remove the correlation calculations from ANALYZE altogether. regards, tom lane
On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Note that we could remove the correlation > calculations from ANALYZE altogether. Only if you don't mind having them only get updated when somebody vacuums. If a table is mostly getting inserted into, it may not get vacuumed very often (or possibly even - never). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 14, 2011 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Note that we could remove the correlation >> calculations from ANALYZE altogether. > Only if you don't mind having them only get updated when somebody > vacuums. If a table is mostly getting inserted into, it may not get > vacuumed very often (or possibly even - never). No, I don't mind that. Index correlation is a pretty second-order stat, and most of the time it'll be just fine if the estimate stays at the default zero. The situation that's problematic is where you have a very-far-from-zero number for one index and no estimate for another, because that can incorrectly bias the planner to prefer the first index; which is what I think is happening in Surcombe's case. The approach I'm describing would guarantee that all indexes of a table get their correlation estimates updated at the same time. (Memo to self: we'd also want btbuild to compute this stat, so that a newly created index doesn't start out at a disadvantage compared to others.) [ thinks for a bit... ] Although there *is* a small practical problem with having VACUUM update pg_statistic: a plain VACUUM never takes out an XID. I guess we could have it do the update nontransactionally, the same way it updates relpages/reltuples. regards, tom lane
>> As Heikki says, maybe this wouldn't be an issue at all if we can do it >> during ANALYZE instead, but I don't know if that works. > > I'm not convinced you can get a sufficiently good estimate from a small > subset of pages. Note that if this requires VACUUM rather than ANALYZE, it introduces a problem for data warehousing users, who can go years between vacuums of their largest tables. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> I'm not convinced you can get a sufficiently good estimate from a small >> subset of pages. > Note that if this requires VACUUM rather than ANALYZE, it introduces a > problem for data warehousing users, who can go years between vacuums of > their largest tables. It's likely that the default estimate of zero index correlation will work just fine for such users ... regards, tom lane
On 3/14/11 5:51 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >>> I'm not convinced you can get a sufficiently good estimate from a small >>> subset of pages. > >> Note that if this requires VACUUM rather than ANALYZE, it introduces a >> problem for data warehousing users, who can go years between vacuums of >> their largest tables. > > It's likely that the default estimate of zero index correlation will > work just fine for such users ... No, it isn't. In fact, users with very large tables are the ones hardest hit by our lack of correlation stats. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Mar 15, 2011 at 12:27 AM, Josh Berkus <josh@agliodbs.com> wrote: > Note that if this requires VACUUM rather than ANALYZE, it introduces a > problem for data warehousing users, who can go years between vacuums of > their largest tables. I don't understand, are they going years between vacuums because their data is static? In which case the index correlation won't change. Or is it append-only, in which case I suspect the newly appended data is likely to have the same correlation as the old data. But is there anything stopping us from doing some sort of ANALYZE-style sample of the index pages as well? I think the bigger problems here are that a) correlation isn't actually a useful statistic for estimating random seeks and b) I'm not sure what counting back-transitions has to do with correlation. If we're lucky it may be that counting back-transitions is a more useful stat than correlation anyways. It does seem to have more to do with random seeks than correlation. It might need some refinement though or some other metrics to go along with it to get a real basis for an estimate of random seeks though. I'm wondering about how far the back-transitions and forward transitions actually go. If you're skipping every other block that's twice as much i/o as reading every block, but if you skip n blocks where n > random_page_cost/seq_page_cost then that's one random read per block. If you skip 1 block backwards to a block you've already read then that's free, but if you skip backwards to a block that isn't recently referenced that's a random seek. There are also niggling questions about taking statistics based on tuples that are already dead or were never committed. -- greg
> I don't understand, are they going years between vacuums because their > data is static? In which case the index correlation won't change. Or > is it append-only, in which case I suspect the newly appended data is > likely to have the same correlation as the old data. Append-only. And yes, one could assume that correlation wouldn't change frequently. However, it may change more frequently than vacuums occur -- I'm not exaggerating about "years". I have several clients with large databases where they have log tables which only get vacuumed for XID wraparound, once every 2 years or so. There's also the question of how we get correlation stats for a new index/table, or one which has just been upgraded. Requiring a full DB vacuum isn't practical for those using pg_upgrade. > But is there > anything stopping us from doing some sort of ANALYZE-style sample of > the index pages as well? This would be ideal. Or even a separate command to scan the indexes only to collect correlation data. Since the indexes are 20X to 100X smaller than the tables (usually), it may be practical to full-scan them even if we can't do the same with the table. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Sun, 2011-03-13 at 19:40 -0400, Tom Lane wrote: > It strikes me that it'd be possible to have btvacuumcleanup directly > measure order correlation when it's processing a btree index, yielding a > reliable answer for any btree index regardless of number of columns. > We could do that by comparing the heap block numbers of adjacent > index entries' TIDs and counting the number of up-transitions (block > number larger than previous index entry) versus down-transitions (block > number smaller than previous). Link to previous discussion: http://archives.postgresql.org/pgsql-hackers/2008-10/msg01279.php Regards,Jeff Davis