Thread: Partial vacuum versus pg_class.reltuples
I complained a couple days ago that in HEAD, vacuum is putting very bogus values into pg_class.reltuples for indexes: http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php After looking through the code a bit, I've confirmed my prior guess that this is caused by the partial-vacuum patch. The number that's being used is the number of live tuples found on the pages that were actually scanned. There's a kluge in there to keep from storing this number as the relation's own reltuples, but the implications for indexes were not considered. The index AMs generally assume that what they're told in IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true anymore. I think probably the only workable solution for 8.4 is to use the prior value of the relation's reltuples field as num_heap_tuples when we have not scanned the whole heap. This will effectively mean that index reltuples values don't change either in a partial vacuum. In the longer term, we need to do something else. As the code now stands, reltuples is close to being completely untrustworthy: it will get updated only by VACUUM FULL or anti-wraparound vacuums, neither of which will happen often in a well-run installation. So the value will inevitably diverge from reality, perhaps arbitrarily far from reality. I wonder whether we shouldn't get rid of it altogether (and relpages too), and make the planner look to the counts maintained by the stats collector instead of using reltuples/relpages. The main objection I can think of to that is that turning off stats collection will no longer be a viable option ... but how many people do that anyway? Another interesting question is why successive vacuums aren't causing the index reltuples counts to go to zero. Shouldn't a partial vacuum result in *all* pages of the relation being marked as not needing to be examined by the next vacuum? regards, tom lane
I wrote: > Another interesting question is why successive vacuums aren't causing > the index reltuples counts to go to zero. Shouldn't a partial vacuum > result in *all* pages of the relation being marked as not needing to > be examined by the next vacuum? I figured out the reason for that: the first 32 pages of the table are always scanned, even if the whole thing is frozen, because of the SKIP_PAGES_THRESHOLD logic. We could change that behavior by initializing all_visible_streak to SKIP_PAGES_THRESHOLD instead of zero. But if we did so then having even just page zero be skippable would mean that we clear scanned_all and thus fail to update reltuples, which is probably not a good thing. regards, tom lane
On Sat, 2009-06-06 at 15:44 -0400, Tom Lane wrote: > I think probably the only workable solution for 8.4 is to use the prior > value of the relation's reltuples field as num_heap_tuples when we have > not scanned the whole heap. This will effectively mean that index > reltuples values don't change either in a partial vacuum. Our analysis in 2006 was that HOT had a very wide use case, while partial vacuum was needed, but in far fewer cases. ISTM that most vacuums will still be full-table vacuums, so change is probably unwarranted and could cause a worse regression (somewhere) than the behaviour being fixed. > In the longer term, we need to do something else. As the code now > stands, reltuples is close to being completely untrustworthy: it will > get updated only by VACUUM FULL or anti-wraparound vacuums, neither of > which will happen often in a well-run installation. So the value will > inevitably diverge from reality, perhaps arbitrarily far from reality. > I wonder whether we shouldn't get rid of it altogether (and relpages > too), and make the planner look to the counts maintained by the stats > collector instead of using reltuples/relpages. The main objection I can > think of to that is that turning off stats collection will no longer be > a viable option ... but how many people do that anyway? -1 for such radical change at this stage of release. Perhaps we can recalc the new reltuples value by pro-rating the amount of change against the previous value? i.e. for fraction of table scanned by partial vacuum, F new reltuples value = ((1-F) * old val) + (F * (new val from partial scan)) So when we scan whole table the existing behaviour is preserved. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > On Sat, 2009-06-06 at 15:44 -0400, Tom Lane wrote: >> In the longer term, we need to do something else. > -1 for such radical change at this stage of release. Uh, by "longer term" I meant this is something to think about for 8.5. regards, tom lane
On Sat, Jun 6, 2009 at 3:44 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I complained a couple days ago that in HEAD, vacuum is putting > very bogus values into pg_class.reltuples for indexes: > http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php > > After looking through the code a bit, I've confirmed my prior guess that > this is caused by the partial-vacuum patch. The number that's being > used is the number of live tuples found on the pages that were actually > scanned. There's a kluge in there to keep from storing this number as > the relation's own reltuples, but the implications for indexes were not > considered. The index AMs generally assume that what they're told in > IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true > anymore. > > I think probably the only workable solution for 8.4 is to use the prior > value of the relation's reltuples field as num_heap_tuples when we have > not scanned the whole heap. This will effectively mean that index > reltuples values don't change either in a partial vacuum. Am I wrong to be frightened by the implications of updating this value only once in a blue moon? Doesn't this have the potential to result in really bad plans? Do we have any reasonable manual way of forcing VACUUM to scan the entire heap? ...Robert
On Sun, Jun 7, 2009 at 7:11 PM, Robert Haas<robertmhaas@gmail.com> wrote: > Am I wrong to be frightened by the implications of updating this value > only once in a blue moon? Doesn't this have the potential to result > in really bad plans? Do we have any reasonable manual way of forcing > VACUUM to scan the entire heap? It shouldn't be necessary to scan the entire heap to get a reasonable estimate for the number of tuples. Analyze doesn't have to, for example. Perhaps we should just make autovacuum periodically run analyze even if it has run a vacuum recently -- i.e. not make vacuum count as a recent analyze. Actually it should be possible to get a more accurate estimate out of ANALYZE than we used to as well. It could very quickly scan the entire FSM and use that and the average tuple size to get a much more accurate estimate for the number of tuples. For VACUUM ANALYZE we could have it count the actual number of tuples in the vacuumable pages and separately take a sample of non-vacuumable pages and calculate an estimate based on the FSM and the average tuple size in those non-vacuumable pages and add those two values together. Since it just looked at every vacuumable page those FSM values are precisely accurate and the estimate for average tuple size ought to be pretty reliable. -- greg http://mit.edu/~gsstark/resume.pdf
Robert Haas <robertmhaas@gmail.com> writes: > Am I wrong to be frightened by the implications of updating this value > only once in a blue moon? It's not great, but I think it's probably not catastrophic either. Keep in mind that all we need from reltuples is that the ratio reltuples/relpages be a reasonable estimate of the density of live tuples, because what the planner actually uses is GetRelationNumberOfBlocks() * reltuples / relpages. So for example an append-only table isn't a big problem, even if it's been quite a while since we updated reltuples and relpages. There was some mention of having a partial vacuum extrapolate a value of reltuples and update pg_class with that. I'm afraid that that could be a seriously bad idea; because there is no very good reason to suppose that the subset of recently-modified pages forms a good sample of the whole table as far as live-tuple density goes. [ thinks a bit and reads the code some more ... ] There is a considerably safer alternative, which is to let ANALYZE update the reltuples estimate based on the pages it sampled; which should be a considerably less biased sample than the pages a partial vacuum would have looked at. And we have already got the code doing that, either in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned out to be partial. So actually I think we are in better shape than I realized, so far as the heap reltuples numbers go. The case that's a bit nasty is where we are propagating the heap reltuples number to the index reltuples number for a GIN index. (Remember this only matters for a partial index.) As the code stands today, what we'll be propagating is the reltuples estimate from the most recent ANALYZE, not the ANALYZE that we might be about to conduct. This is not great; but considering that we are completely ignoring the first-order problem of the partial index predicate's selectivity, quibbling about a second-order effect like the estimate being out of date is pretty pointless. > Do we have any reasonable manual way of forcing > VACUUM to scan the entire heap? You could use VACUUM FREEZE, for instance. regards, tom lane
On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Am I wrong to be frightened by the implications of updating this value >> only once in a blue moon? > > It's not great, but I think it's probably not catastrophic either. > Keep in mind that all we need from reltuples is that the ratio > reltuples/relpages be a reasonable estimate of the density of live > tuples, because what the planner actually uses is > GetRelationNumberOfBlocks() * reltuples / relpages. > So for example an append-only table isn't a big problem, even if it's > been quite a while since we updated reltuples and relpages. My first reaction was to be relieved by this explanation, but on further thought I'm not sure I was right to be relieved. The frequency of anti-wraparound vacuums is so low that it doesn't seem inconceivable that this ratio could change considerably on a far shorter time scale. (For me, at least in some cases, it's less than the frequency with which I dump+reload for maintenance reasons like OS upgrade, PG upgrade, HW upgrade...) > There was some mention of having a partial vacuum extrapolate a value of > reltuples and update pg_class with that. I'm afraid that that could be > a seriously bad idea; because there is no very good reason to suppose > that the subset of recently-modified pages forms a good sample of the > whole table as far as live-tuple density goes. I think you're right about that. > [ thinks a bit and reads the code some more ... ] There is a > considerably safer alternative, which is to let ANALYZE update the > reltuples estimate based on the pages it sampled; which should be a > considerably less biased sample than the pages a partial vacuum would > have looked at. And we have already got the code doing that, either > in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned > out to be partial. I'm confused with how this squares with the previous discussion - I thought you observed a case where this wasn't happening. > So actually I think we are in better shape than I realized, so far as > the heap reltuples numbers go. The case that's a bit nasty is where > we are propagating the heap reltuples number to the index reltuples > number for a GIN index. (Remember this only matters for a partial > index.) As the code stands today, what we'll be propagating is the > reltuples estimate from the most recent ANALYZE, not the ANALYZE that > we might be about to conduct. This is not great; but considering that > we are completely ignoring the first-order problem of the partial index > predicate's selectivity, quibbling about a second-order effect like the > estimate being out of date is pretty pointless. Being one ANALYZE behind shouldn't be too bad. >> Do we have any reasonable manual way of forcing >> VACUUM to scan the entire heap? > > You could use VACUUM FREEZE, for instance. That'll generate a fair amount of I/O. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> [ thinks a bit and reads the code some more ... ] �There is a >> considerably safer alternative, which is to let ANALYZE update the >> reltuples estimate based on the pages it sampled; which should be a >> considerably less biased sample than the pages a partial vacuum would >> have looked at. �And we have already got the code doing that, either >> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned >> out to be partial. > I'm confused with how this squares with the previous discussion - I > thought you observed a case where this wasn't happening. No, the problem was that the wrong things were getting done to reltuples entries for indexes. The heap reltuples values were correct, but there is code that takes the heap value into account when estimating the value for an index, and that was all wet because it was using the number of tuples on the scanned pages rather than any total-table estimate. >>> Do we have any reasonable manual way of forcing >>> VACUUM to scan the entire heap? >> >> You could use VACUUM FREEZE, for instance. > That'll generate a fair amount of I/O. Er, isn't that exactly what you asked for? regards, tom lane
On Sun, Jun 7, 2009 at 4:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> [ thinks a bit and reads the code some more ... ] There is a >>> considerably safer alternative, which is to let ANALYZE update the >>> reltuples estimate based on the pages it sampled; which should be a >>> considerably less biased sample than the pages a partial vacuum would >>> have looked at. And we have already got the code doing that, either >>> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned >>> out to be partial. > >> I'm confused with how this squares with the previous discussion - I >> thought you observed a case where this wasn't happening. > > No, the problem was that the wrong things were getting done to reltuples > entries for indexes. The heap reltuples values were correct, but there > is code that takes the heap value into account when estimating the value > for an index, and that was all wet because it was using the number of > tuples on the scanned pages rather than any total-table estimate. > >>>> Do we have any reasonable manual way of forcing >>>> VACUUM to scan the entire heap? >>> >>> You could use VACUUM FREEZE, for instance. > >> That'll generate a fair amount of I/O. > > Er, isn't that exactly what you asked for? Basically, I'm trying to figure out what we're going to recommend to someone who gets bitten by whatever remaining corner case still exists after your recent patch, and I admit I'm not real clear on what that is. VACUUM FULL doesn't seem like a good solution because it's more than just "vacuum but don't skip any pages even if the visibility map says you can". Maybe we should just have a GUC to enable/disable partial vacuums. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Basically, I'm trying to figure out what we're going to recommend to > someone who gets bitten by whatever remaining corner case still exists > after your recent patch, and I admit I'm not real clear on what that > is. If anyone actually shows up with a clear problem, we can recommend VACUUM FREEZE. I'm not going to panic about it right now. regards, tom lane
Robert Haas escribió: > Basically, I'm trying to figure out what we're going to recommend to > someone who gets bitten by whatever remaining corner case still exists > after your recent patch, and I admit I'm not real clear on what that > is. VACUUM FULL doesn't seem like a good solution because it's more > than just "vacuum but don't skip any pages even if the visibility map > says you can". Maybe we should just have a GUC to enable/disable > partial vacuums. IIRC you can set vacuum_freeze_table_age to 0. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Robert Haas escribi�: >> Maybe we should just have a GUC to enable/disable >> partial vacuums. > IIRC you can set vacuum_freeze_table_age to 0. That has the same effects as issuing VACUUM FREEZE, no? regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Robert Haas escribi�: > >> Maybe we should just have a GUC to enable/disable > >> partial vacuums. > > > IIRC you can set vacuum_freeze_table_age to 0. > > That has the same effects as issuing VACUUM FREEZE, no? As far as I can make from the docs, I think it only forces a full table scan, but the freeze age remains the same. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Jun 8, 2009 at 10:40 AM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > Tom Lane escribió: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >> > Robert Haas escribió: >> >> Maybe we should just have a GUC to enable/disable >> >> partial vacuums. >> >> > IIRC you can set vacuum_freeze_table_age to 0. >> >> That has the same effects as issuing VACUUM FREEZE, no? > > As far as I can make from the docs, I think it only forces a full table > scan, but the freeze age remains the same. Yeah, that looks like what I was looking for, thanks. I looked for it in the docs under the vacuum-related sections, but couldn't find it, and I didn't know the name of it so I couldn't find it that way either. I wonder if we should consider moving all of the vacuum and autovacuum parameters into one section. http://developer.postgresql.org/pgdocs/postgres/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE In the worst case scenario where the new partial-table-vacuums are causing headaches for someone, they should hopefully be able to use this parameter to basically turn them off without too many nasty side effects. (Another nice thing about this parameter is that if you have a really big table that is append-mostly, you can potentially tune this parameter downward to spread out the freezing activity. With the default settings, you might cruise along merrily until you hit 150M transactions and then generate an I/O storm as you freeze a pretty big chunk of the table all at once. With a lower setting, non-partial vacuums will be more frequent, but each one will generate a smaller amount of write traffic.) ...Robert
(back from vacation) Tom Lane wrote: > I wrote: >> Another interesting question is why successive vacuums aren't causing >> the index reltuples counts to go to zero. Shouldn't a partial vacuum >> result in *all* pages of the relation being marked as not needing to >> be examined by the next vacuum? > > I figured out the reason for that: the first 32 pages of the table are > always scanned, even if the whole thing is frozen, because of the > SKIP_PAGES_THRESHOLD logic. We could change that behavior by > initializing all_visible_streak to SKIP_PAGES_THRESHOLD instead of zero. > But if we did so then having even just page zero be skippable would mean > that we clear scanned_all and thus fail to update reltuples, which is > probably not a good thing. Right, that's exactly why I wrote it like that. I also thought about scanning the (beginning of the) visibility map first to see if there's big enough gaps in there to warrant skipping pages, but went with the current approach because it's so much simpler. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com