Thread: HOT patch, missing things
There's three things clearly missing in the patch: 1. HOT updates on tables with expression or partial indexes. Hasn't been done yet because it should be pretty straightforward and we've had more important things to do. Though not critical, should be finished before release in my opinion. 2. Pointer swinging. At the moment, after a row is HOT updated, the only way to get rid of the redirecting line pointer is to run VACUUM FULL or CLUSTER (or delete or cold update the row and vacuum). If we want to implement pointer swinging before release, we have to get started now. If we're happy to release without it and address the issue in later releases if it seems important, we need to make a conscious decision on that, now. I personally think we can release without it. 3. Statistics and autovacuum integration. How should HOT updates be taken into account when deciding when to autovacuum and autoanalyze? There's a FIXME comment in analyze.c related to this as well. What additional statio counters do we need? The patch adds counters for HOT updates and for following a HOT chain. Should we have counters for pruning and defraging a page as well? In addition to those, I'm sure there's a lot of small code refactorings etc. to do, as well as more performance testing, but these are the big coding tasks left. Pavan, do you have something in addition to these on your to-do list? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > There's three things clearly missing in the patch: > > 1. HOT updates on tables with expression or partial indexes. Hasn't been > done yet because it should be pretty straightforward and we've had more > important things to do. Though not critical, should be finished before > release in my opinion. sounds like a rather common use case to me and I think this should really be in a patch that is accepted for 8.3... > > 2. Pointer swinging. At the moment, after a row is HOT updated, the only > way to get rid of the redirecting line pointer is to run VACUUM FULL or > CLUSTER (or delete or cold update the row and vacuum). If we want to > implement pointer swinging before release, we have to get started now. > If we're happy to release without it and address the issue in later > releases if it seems important, we need to make a conscious decision on > that, now. I personally think we can release without it. hmm - I don't really understand most of the stuff behind HOT but does this mean that VACUUM FULL (or CLUSTER) is becoming a recommended or even required routine maintenance task for people using HOT ? Stefan
On Tue, 2007-08-07 at 19:01 +0100, Heikki Linnakangas wrote: > There's three things clearly missing in the patch: > > 1. HOT updates on tables with expression or partial indexes. Hasn't been > done yet because it should be pretty straightforward and we've had more > important things to do. Though not critical, should be finished before > release in my opinion. Only if we really are pretty much finished. > 2. Pointer swinging. At the moment, after a row is HOT updated, the only > way to get rid of the redirecting line pointer is to run VACUUM FULL or > CLUSTER (or delete or cold update the row and vacuum). If we want to > implement pointer swinging before release, we have to get started now. > If we're happy to release without it and address the issue in later > releases if it seems important, we need to make a conscious decision on > that, now. I personally think we can release without it. I think so too. > 3. Statistics and autovacuum integration. How should HOT updates be > taken into account when deciding when to autovacuum and autoanalyze? > There's a FIXME comment in analyze.c related to this as well. What > additional statio counters do we need? The patch adds counters for HOT > updates and for following a HOT chain. Should we have counters for > pruning and defraging a page as well? ISTM we should have stat counters (not statio counters) that describe the number of row versions defragged. Statio counters refer to block accesses, so HOT has nothing at all to do with that. Not sure we need to know about pruning, any more than we need to know about hint bits setting. We should then perform a vacuum if ( number of cold updates + number of hot updates + number of deletes - number of row versions removed by defragging) > (autovacuum threshold * size of table) Defragging could remove deletes and cold updates as well as hot updates, so we must take that into account. We also need something that will re-zero the stats when they reach anywhere near integer overflow, since we must not allow them to wrap. I would suggest we simply reset all values to zero for that table. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > We also need something that will re-zero the stats when they reach > anywhere near integer overflow, since we must not allow them to wrap. I > would suggest we simply reset all values to zero for that table. pgstat counters are int64. regards, tom lane
Stefan Kaltenbrunner wrote: > Heikki Linnakangas wrote: >> There's three things clearly missing in the patch: >> >> 1. HOT updates on tables with expression or partial indexes. Hasn't been >> done yet because it should be pretty straightforward and we've had more >> important things to do. Though not critical, should be finished before >> release in my opinion. > > sounds like a rather common use case to me and I think this should > really be in a patch that is accepted for 8.3... > >> 2. Pointer swinging. At the moment, after a row is HOT updated, the only >> way to get rid of the redirecting line pointer is to run VACUUM FULL or >> CLUSTER (or delete or cold update the row and vacuum). If we want to >> implement pointer swinging before release, we have to get started now. >> If we're happy to release without it and address the issue in later >> releases if it seems important, we need to make a conscious decision on >> that, now. I personally think we can release without it. > > hmm - I don't really understand most of the stuff behind HOT but does > this mean that VACUUM FULL (or CLUSTER) is becoming a recommended or > even required routine maintenance task for people using HOT ? No. A redirecting line pointer only takes 4 bytes, which isn't that much in the scheme of things. If you don't mind wasting those 4 bytes per HOT updated row, you don't need to run VACUUM FULL. Note that 8.3 will decrease the tuple header size by 4 bytes thanks to the combocid patch, and the varvarlen patch saves some more space on tuples with short varlen fields. Even if you have an extra line pointer for every row, 8.3 will still be at least as good as 8.2 with regard to storage size. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-08-07 at 15:14 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > We also need something that will re-zero the stats when they reach > > anywhere near integer overflow, since we must not allow them to wrap. I > > would suggest we simply reset all values to zero for that table. > > pgstat counters are int64. You would prefer undefined behaviour at wrap? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Stefan Kaltenbrunner wrote: > Heikki Linnakangas wrote: > >> 2. Pointer swinging. At the moment, after a row is HOT updated, the only >> way to get rid of the redirecting line pointer is to run VACUUM FULL or >> CLUSTER (or delete or cold update the row and vacuum). If we want to >> implement pointer swinging before release, we have to get started now. >> If we're happy to release without it and address the issue in later >> releases if it seems important, we need to make a conscious decision on >> that, now. I personally think we can release without it. >> > hmm - I don't really understand most of the stuff behind HOT but does > this mean that VACUUM FULL (or CLUSTER) is becoming a recommended or > even required routine maintenance task for people using HOT ? > No more than before. See the comment "or delete or cold update the row and vacuum". The row couldn't be cleared by vacuum before unless delete/update. Based on the above, it appears that every time an HOT update occurs, 4 to 8 bytes might get wasted in the page. Eventually this fills the page and a regular cold update is required and it is cleared. Cheers, mark -- Mark Mielke <mark@mielke.cc>
Heikki Linnakangas wrote: > Stefan Kaltenbrunner wrote: >> Heikki Linnakangas wrote: >>> There's three things clearly missing in the patch: >>> >>> 1. HOT updates on tables with expression or partial indexes. Hasn't been >>> done yet because it should be pretty straightforward and we've had more >>> important things to do. Though not critical, should be finished before >>> release in my opinion. >> sounds like a rather common use case to me and I think this should >> really be in a patch that is accepted for 8.3... >> >>> 2. Pointer swinging. At the moment, after a row is HOT updated, the only >>> way to get rid of the redirecting line pointer is to run VACUUM FULL or >>> CLUSTER (or delete or cold update the row and vacuum). If we want to >>> implement pointer swinging before release, we have to get started now. >>> If we're happy to release without it and address the issue in later >>> releases if it seems important, we need to make a conscious decision on >>> that, now. I personally think we can release without it. >> hmm - I don't really understand most of the stuff behind HOT but does >> this mean that VACUUM FULL (or CLUSTER) is becoming a recommended or >> even required routine maintenance task for people using HOT ? > > No. A redirecting line pointer only takes 4 bytes, which isn't that much > in the scheme of things. If you don't mind wasting those 4 bytes per HOT > updated row, you don't need to run VACUUM FULL. ah I see - that sounds like much less a problem than I thought it was. Thanks for the explaination! > > Note that 8.3 will decrease the tuple header size by 4 bytes thanks to > the combocid patch, and the varvarlen patch saves some more space on > tuples with short varlen fields. Even if you have an extra line pointer > for every row, 8.3 will still be at least as good as 8.2 with regard to > storage size. yep - I'm already playing with -HEAD for some of our production databases and the size improvements in itself are a very nice thing. Stefan
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Tue, 2007-08-07 at 15:14 -0400, Tom Lane wrote: >> "Simon Riggs" <simon@2ndquadrant.com> writes: >>> We also need something that will re-zero the stats when they reach >>> anywhere near integer overflow, since we must not allow them to wrap. I >>> would suggest we simply reset all values to zero for that table. >> >> pgstat counters are int64. > You would prefer undefined behaviour at wrap? You should live so long as to have a problem with it. Do the math: at one increment every nanosecond, 24x7x365, you'd be risking overflow after about 300 years of continuous initdb-less operation. For someone opining that important features are OK to omit from HOT for 8.3, I have to question your judgment in worrying about this. (In point of fact, I'd expect a database to wrap its WAL LSN counter long before any particular pgstat counter could overflow. Someday we might want to worry about that, but probably not in my lifetime.) regards, tom lane
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > 2. Pointer swinging. At the moment, after a row is HOT updated, the only > way to get rid of the redirecting line pointer is to run VACUUM FULL or > CLUSTER (or delete or cold update the row and vacuum). If we want to > implement pointer swinging before release, we have to get started now. > If we're happy to release without it and address the issue in later > releases if it seems important, we need to make a conscious decision on > that, now. I personally think we can release without it. For the record, there was an alternative here which avoided having to do pointer swinging. If we avoided ever returning a reference to the new line pointer we could swap the record back to the original line pointer when it becomes reusable again. IIRC there were a couple reasons why this was considered a bad idea though: a) This would require declaring that HOT updates to records don't change the tid of the record which would be a user visible behaviour change. I'm not sure if it would be worse or better from the point of view of ODBC, but the non-deterministic nature of HOT updates makes it hard to imagine it being very useful b) Finding the root of the HOT update chain to use in the place of the "real" tid of the record is not an especially cheap operation. If we could find a way to arrange for the tids returned to users to *never* change on updates that would be pretty useful from a user's point of view. But just doing it for HOT updates and not COLD updates seems broken. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-08-07 at 16:52 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > On Tue, 2007-08-07 at 15:14 -0400, Tom Lane wrote: > >> "Simon Riggs" <simon@2ndquadrant.com> writes: > >>> We also need something that will re-zero the stats when they reach > >>> anywhere near integer overflow, since we must not allow them to wrap. I > >>> would suggest we simply reset all values to zero for that table. > >> > >> pgstat counters are int64. > > > You would prefer undefined behaviour at wrap? > > You should live so long as to have a problem with it. Do the math: > at one increment every nanosecond, 24x7x365, you'd be risking overflow > after about 300 years of continuous initdb-less operation. For someone > opining that important features are OK to omit from HOT for 8.3, I have > to question your judgment in worrying about this. I'm not worried about it, but I was mistaken in thinking you might be. If you're OK with HOT as-is, then sure, I'll have partial indexes too and much more besides. The question is: are you OK with HOT as-is? Should we take it further? How far? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Mark Mielke wrote: > Stefan Kaltenbrunner wrote: >> Heikki Linnakangas wrote: >> >>> 2. Pointer swinging. At the moment, after a row is HOT updated, the only >>> way to get rid of the redirecting line pointer is to run VACUUM FULL or >>> CLUSTER (or delete or cold update the row and vacuum). If we want to >>> implement pointer swinging before release, we have to get started now. >>> If we're happy to release without it and address the issue in later >>> releases if it seems important, we need to make a conscious decision on >>> that, now. I personally think we can release without it. >>> >> hmm - I don't really understand most of the stuff behind HOT but does >> this mean that VACUUM FULL (or CLUSTER) is becoming a recommended or >> even required routine maintenance task for people using HOT ? >> > No more than before. See the comment "or delete or cold update the row > and vacuum". The row couldn't be cleared by vacuum before unless > delete/update. Based on the above, it appears that every time an HOT > update occurs, 4 to 8 bytes might get wasted in the page. Eventually > this fills the page and a regular cold update is required and it is > cleared. To be clear, the first time a row is HOT updated, and the old version is pruned later, the line pointer of the old version is turned into a redirecting line pointer and the old tuple version is removed. On subsequent HOT updates and prunings of the same row, the redirecting line pointer is modified to point to the newer version, but no new redirecting line pointers are left behind. IOW, there will be one redirecting line pointer per row that has ever been HOT updated. The patch also introduces the concept of redirected dead line pointers. When a HOT updated tuple (actually, as the patch stands, any tuple) is deleted, or COLD updated, the tuple itself is pruned away, and the redirecting line pointer is marked as dead. A redirected dead line pointer acts just like a dead tuple for all purposes, but it takes much less space. Redirected dead line pointers can accumulate on a page, but you don't need a VACUUM FULL to get rid of them, a normal vacuum is enough. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 8/7/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Yes, these are the major ones, though we might want to play with
the chain pruning, FSM handling and other smaller things to see
if there are any performance benefits.
I agree. I personally haven't had chance to look into this. But I can
do that now if we think that rest of the patch is in a good shape and
there are no other high priority items left.
I am personally not to much bothered about it. Not just because its
4 bytes per HOT chain we are talking about, but there would be time
when the tuple is COLD updated and the redirection in the old chain
would get reclaimed in the normal vacuum. I am very pleased with the
very little complexity left in the code now (and I am sure others would
be too :-))
This is something important. For example in the patch as it stands today,
autovacuum is triggered when the number of COLD updates crosses the
vacuum_scale_factor. But because of redirection idea, each COLD update
only results in 4 bytes of dead space (unlike today where the entire
tuple long dead space in created). So we can actually postpone autovacuum
even further.
As Tom pointed out, we might also need to think about how HOT affects
auto analyze etc
Thanks,
Pavan
There's three things clearly missing in the patch:
Yes, these are the major ones, though we might want to play with
the chain pruning, FSM handling and other smaller things to see
if there are any performance benefits.
1. HOT updates on tables with expression or partial indexes. Hasn't been
done yet because it should be pretty straightforward and we've had more
important things to do. Though not critical, should be finished before
release in my opinion.
I agree. I personally haven't had chance to look into this. But I can
do that now if we think that rest of the patch is in a good shape and
there are no other high priority items left.
2. Pointer swinging. At the moment, after a row is HOT updated, the only
way to get rid of the redirecting line pointer is to run VACUUM FULL or
CLUSTER (or delete or cold update the row and vacuum). If we want to
implement pointer swinging before release, we have to get started now.
If we're happy to release without it and address the issue in later
releases if it seems important, we need to make a conscious decision on
that, now. I personally think we can release without it.
I am personally not to much bothered about it. Not just because its
4 bytes per HOT chain we are talking about, but there would be time
when the tuple is COLD updated and the redirection in the old chain
would get reclaimed in the normal vacuum. I am very pleased with the
very little complexity left in the code now (and I am sure others would
be too :-))
3. Statistics and autovacuum integration. How should HOT updates be
taken into account when deciding when to autovacuum and autoanalyze?
There's a FIXME comment in analyze.c related to this as well. What
additional statio counters do we need? The patch adds counters for HOT
updates and for following a HOT chain. Should we have counters for
pruning and defraging a page as well?
This is something important. For example in the patch as it stands today,
autovacuum is triggered when the number of COLD updates crosses the
vacuum_scale_factor. But because of redirection idea, each COLD update
only results in 4 bytes of dead space (unlike today where the entire
tuple long dead space in created). So we can actually postpone autovacuum
even further.
As Tom pointed out, we might also need to think about how HOT affects
auto analyze etc
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Tue, 2007-08-07 at 19:01 +0100, Heikki Linnakangas wrote: >> There's three things clearly missing in the patch: >> >> 1. HOT updates on tables with expression or partial indexes. Hasn't been >> done yet because it should be pretty straightforward and we've had more >> important things to do. Though not critical, should be finished before >> release in my opinion. > > Only if we really are pretty much finished. > >> 2. Pointer swinging. At the moment, after a row is HOT updated, the only >> way to get rid of the redirecting line pointer is to run VACUUM FULL or >> CLUSTER (or delete or cold update the row and vacuum). If we want to >> implement pointer swinging before release, we have to get started now. >> If we're happy to release without it and address the issue in later >> releases if it seems important, we need to make a conscious decision on >> that, now. I personally think we can release without it. > > I think so too. > >> 3. Statistics and autovacuum integration. How should HOT updates be >> taken into account when deciding when to autovacuum and autoanalyze? >> There's a FIXME comment in analyze.c related to this as well. What >> additional statio counters do we need? The patch adds counters for HOT >> updates and for following a HOT chain. Should we have counters for >> pruning and defraging a page as well? > > ISTM we should have stat counters (not statio counters) that describe > the number of row versions defragged. Statio counters refer to block > accesses, so HOT has nothing at all to do with that. Not sure we need to > know about pruning, any more than we need to know about hint bits > setting. > > We should then perform a vacuum if > ( number of cold updates > + number of hot updates > + number of deletes > - number of row versions removed by defragging) >> (autovacuum threshold * size of table) > > Defragging could remove deletes and cold updates as well as hot updates, > so we must take that into account. It seems you're confusing pruning and defragging. I should probably update the glossary I wrote earlier... In pruning, any HOT updated tuples that are no longer visible to anyone are removed by marking the line pointer as unused, or turning it into a redirecting line pointer if there isn't one already. Because no tuples are moved, you only need a regular exclusive lock on the page to prune. Defragmenting a page means calling the good old PageRepairFragmentation function. We need a vacuum strength lock to do that, because it moves existing tuples around in the page to squeeze out any empty space between tuples. In defragmenting, we can't count number of row versions removed, because they've already been removed, and all that's left is empty space. I think the formula for triggering autovacuum should be left unchanged, (# of dead tuples > autovacuum threshold). # of dead tuples should be increased by cold update and deletes, as before. A hot update shouldn't increase it, because the old version can be removed by pruning. Because we can truncate dead tuples, even from cold updates and deletes, to redirected dead line pointers which take much less space than dead tuples, maybe we should increase the default autovacuum threshold? The analyze threshold is trickier. HOT updates should probably be taken into account, but with a smaller weight than other updates. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Because we can truncate dead tuples, even from cold updates and deletes, > to redirected dead line pointers which take much less space than dead > tuples, maybe we should increase the default autovacuum threshold? That would be the logical conclusion except that I think the original threshold was way too high. If you have 100 tuples on the page we were waiting until there were 20 dead tuples before vacuuming. In that scenario 20 dead line pointers would take 80 bytes or about the size of one tuple. In other words about the point when it might be useful to vacuum. On the other hand if you only have 20 tuples per page then 20% would only be 4 tuples or 16 bytes when your tuples are 400 bytes each and there's no point vacuuming yet. Or if you have 500 tuples per page then 20% means 100 line pointers or 400 bytes when each tuple is only 16 bytes so we would be putting off vacuuming until there's enough space for 25 tuples. It seems that previously percentage of tuples made sense because dead tuples took about the same amount of space as new tuples that need that space. But line pointers take much less space than the new tuples so the number of dead line pointers we need before we can recover a useful amount of space depends on the ratio of line pointer size to tuple size. Perhaps we should be gathering "bytes of dead tuples" in pg_stat not just n_dead_tuples. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 8/8/07, Gregory Stark <stark@enterprisedb.com> wrote:
It seems that previously percentage of tuples made sense because dead tuples
took about the same amount of space as new tuples that need that space. But
line pointers take much less space than the new tuples so the number of dead
line pointers we need before we can recover a useful amount of space depends
on the ratio of line pointer size to tuple size.
Perhaps we should be gathering "bytes of dead tuples" in pg_stat not just
n_dead_tuples.
This is a good idea. Alternatively, we can guess percentage of dead space
using the average tuple size, number of dead tuples and number of reltuples.
Another thing to worry about is index bloats. Even though a redirect-dead
line pointer takes only 4 bytes in the heap, the associated dead space
in the index is as large as any other index tuple and the index
might be in a need for vacuum.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 08, 2007 at 04:45:44PM +0530, Pavan Deolasee wrote: > On 8/8/07, Gregory Stark <stark@enterprisedb.com> wrote: > > It seems that previously percentage of tuples made sense because dead > > tuples > > took about the same amount of space as new tuples that need that space. > > But > > line pointers take much less space than the new tuples so the number of > > dead > > line pointers we need before we can recover a useful amount of space > > depends > > on the ratio of line pointer size to tuple size. > > > > Perhaps we should be gathering "bytes of dead tuples" in pg_stat not just > > n_dead_tuples. > > This is a good idea. Alternatively, we can guess percentage of dead space > using the average tuple size, number of dead tuples and number of reltuples. > > Another thing to worry about is index bloats. Even though a redirect-dead > line pointer takes only 4 bytes in the heap, the associated dead space > in the index is as large as any other index tuple and the index > might be in a need for vacuum. ...and bloat in indexes is generally going to be a lot more critical than table bloat. Heck, there's probably a lot of cases where you could go a very long time without vacuuming the heap, if only you could keep indexes under control. Whichever way we go, we should keep the vacuum parameter associated with actual bloat, which makes it much easier to set. If we wanted to get fancy, perhaps we could track the amount of bloat in the heap as well as the indexes, and trigger a vacuum when either one exceeded a threshold. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Wed, 2007-08-08 at 09:55 +0100, Heikki Linnakangas wrote: > >> 3. Statistics and autovacuum integration. How should HOT updates be > >> taken into account when deciding when to autovacuum and autoanalyze? > >> There's a FIXME comment in analyze.c related to this as well. What > >> additional statio counters do we need? The patch adds counters for HOT > >> updates and for following a HOT chain. Should we have counters for > >> pruning and defraging a page as well? > > > > ISTM we should have stat counters (not statio counters) that describe > > the number of row versions defragged. Statio counters refer to block > > accesses, so HOT has nothing at all to do with that. Not sure we need to > > know about pruning, any more than we need to know about hint bits > > setting. > > > > We should then perform a vacuum if > > ( number of cold updates > > + number of hot updates > > + number of deletes > > - number of row versions removed by defragging) > >> (autovacuum threshold * size of table) > > > > Defragging could remove deletes and cold updates as well as hot updates, > > so we must take that into account. > > It seems you're confusing pruning and defragging. I should probably > update the glossary I wrote earlier... Yes, I was. Glossary much appreciated. > I think the formula for triggering autovacuum should be left unchanged, > (# of dead tuples > autovacuum threshold). > > # of dead tuples should be increased by cold update and deletes, as > before. A hot update shouldn't increase it, because the old version can > be removed by pruning. So pruning removes dead hot updated tuples, while defragging will remove dead cold updated tuples and deletes, as well as rearranging space. It's easily possible that 100% of the cold updates and deletes are removed by defragging because of HOT updates. It would be pointless to trigger a VACUUM when it might find nothing to clear up. Its also possible that the deletes are at one end of the table and the updates at the other. So we really need to keep track of the effects of defragging dead cold updates and deletes, so they can be subtracted from the cold update + deletes. The argument for including HOT updates is weaker, but I'm still thinking that they should eventually cause a VACUUM to take place. A very slowly updated table with spread out updates might bloat a table to 200 times its size, even with HOT, in the worst case. Perhaps we should count each HOT update as 1/5th of a row for autovacuum purposes, or some other discounting factor? > The analyze threshold is trickier. HOT updates should probably be taken > into account, but with a smaller weight than other updates. Sorry, but I disagree here. Any change to a row can effect the distribution. HOT updates don't change indexed cols but they do change some columns. The ANALYZE collects stats for all columns, not just the indexed ones; there is no way you can say that non-indexed columns are less important than indexed ones, so we need to analyze just as frequently as we do with cold updates. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
I was wondering about HOT behavior. Will an INSERT reuse rows no longer visible caused by an UPDATE or DELETE, or if an UPDATE will reuse rows expired by an invisible DELETE? It seems both of these would be possible and useful. My basic question is HOT UPDATE-only in both operation and expired row reuse? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 8/9/07, Bruce Momjian <bruce@momjian.us> wrote:
As the patch stands today, we don't take any pains to update the
FSM information after pruning and defraging the page. So FSM would
not consider the page for either INSERT or COLD UPDATE. Of course,
the page can still be used for INSERT if relation->rd_targetblock is
somehow set to this page.
OTOH UPDATE will always reuse the dead space of either expired
updated rows or deleted rows or even aborted rows.
I was wondering about HOT behavior.
Will an INSERT reuse rows no longer visible caused by an UPDATE or
DELETE, or if an UPDATE will reuse rows expired by an invisible DELETE?
It seems both of these would be possible and useful.
As the patch stands today, we don't take any pains to update the
FSM information after pruning and defraging the page. So FSM would
not consider the page for either INSERT or COLD UPDATE. Of course,
the page can still be used for INSERT if relation->rd_targetblock is
somehow set to this page.
updated rows or deleted rows or even aborted rows.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee wrote: > On 8/9/07, Bruce Momjian <bruce@momjian.us> wrote: > > > > I was wondering about HOT behavior. > > > > Will an INSERT reuse rows no longer visible caused by an UPDATE or > > DELETE, or if an UPDATE will reuse rows expired by an invisible DELETE? > > It seems both of these would be possible and useful. > > > > As the patch stands today, we don't take any pains to update the > FSM information after pruning and defraging the page. So FSM would > not consider the page for either INSERT or COLD UPDATE. Of course, > the page can still be used for INSERT if relation->rd_targetblock is > somehow set to this page. > > OTOH UPDATE will always reuse the dead space of either expired > updated rows or deleted rows or even aborted rows. Well, that is very good news. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 8/8/07, Simon Riggs <simon@2ndquadrant.com> wrote:
No, pruning removes all dead tuples, irrespective of whether they are
HOT or COLD updated and whether they are heap-only or not. It
handles line pointer redirection and marks
all dead tuples as ~LP_USED. Defragging just repairs the fragmentation
by rearranging LP_USED tuples.
So pruning removes dead hot updated tuples, while defragging will remove
dead cold updated tuples and deletes, as well as rearranging space.
No, pruning removes all dead tuples, irrespective of whether they are
HOT or COLD updated and whether they are heap-only or not. It
handles line pointer redirection and marks
all dead tuples as ~LP_USED. Defragging just repairs the fragmentation
by rearranging LP_USED tuples.
It's easily possible that 100% of the cold updates and deletes are
removed by defragging because of HOT updates. It would be pointless to
trigger a VACUUM when it might find nothing to clear up. Its also
possible that the deletes are at one end of the table and the updates at
the other. So we really need to keep track of the effects of defragging
dead cold updates and deletes, so they can be subtracted from the cold
update + deletes.
The argument for including HOT updates is weaker, but I'm still thinking
that they should eventually cause a VACUUM to take place. A very slowly
updated table with spread out updates might bloat a table to 200 times
its size, even with HOT, in the worst case. Perhaps we should count each
HOT update as 1/5th of a row for autovacuum purposes, or some other
discounting factor?
I can't see how a table might bloat to 200 times its size assuming only
HOT updates and without long running transactions. And even if the table is
really bloating that way, it must be because of COLD updates and they
are discounted in triggering autovac. Am I missing something ?
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-08-09 at 12:39 +0530, Pavan Deolasee wrote: > No, pruning removes all dead tuples, irrespective of whether they are > HOT or COLD updated and whether they are heap-only or not. It > handles line pointer redirection and marks > all dead tuples as ~LP_USED. Defragging just repairs the fragmentation > by rearranging LP_USED tuples. > > > It's easily possible that 100% of the cold updates and deletes > are > removed by defragging because of HOT updates. It would be > pointless to > trigger a VACUUM when it might find nothing to clear up. Its > also > possible that the deletes are at one end of the table and the > updates at > the other. So we really need to keep track of the effects of > defragging > dead cold updates and deletes, so they can be subtracted from > the cold > update + deletes. Whether I got the exact details of frugging & depruning correct or not: if a tuple version is removed, then VACUUM doesn't need to remove it later, so any non-VACUUM removal of rows must defer a VACUUM. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On 8/7/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
There's three things clearly missing in the patch:
1. HOT updates on tables with expression or partial indexes. Hasn't been
done yet because it should be pretty straightforward and we've had more
important things to do. Though not critical, should be finished before
release in my opinion.
I started with this. ISTM to support partial indexes, we must check
the old and new tuple against partiality match.
HOT update is feasible iff
- old and new tuples, both match the partiality condition OR
- old and new tuples, both don't match the condition
In either case, we either had an index entry which can serve for
the new tuple OR we did not have an index entry for the old tuple,
but neither the new tuple needs it.
Of course, we still need to apply all other criteria to finally decide
whether to do HOT or COLD update.
For functional index, we should apply the function to the old and new
tuple and compare the outcome. If the results are same, HOT update
is feasible.
We still need to think about the best way to do this without any
modularity invasion and least possible overhead, but can anybody
see any issue with the broader approach ?
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On 8/9/07, Simon Riggs <simon@2ndquadrant.com> wrote:
Whether I got the exact details of frugging & depruning correct or not:
if a tuple version is removed, then VACUUM doesn't need to remove it
later, so any non-VACUUM removal of rows must defer a VACUUM.
ISTM that you are worried about the cases where a tuple is HOT updated
and hence can be pruned/defragged, but only if we revisit the page at
a later time.
What if we just track the amount of potentially dead space in the relation
(somebody had suggested that earlier in the thread) ? Every committed
UPDATE/DELETE and aborted UPDATE/INSERT would increment
the dead space. Whenever page fragmentation is repaired, either during
normal operation or during vacuum, the dead space is reduced by the
amount of reclaimed space. Autovacuum triggers whenever the percentage
of dead space increases beyond a threshold.
We can some fine tuning to track the space consumed by redirect-dead
line pointers.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-08-09 at 15:46 +0530, Pavan Deolasee wrote: > > > On 8/9/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > Whether I got the exact details of frugging & depruning > correct or not: > if a tuple version is removed, then VACUUM doesn't need to > remove it > later, so any non-VACUUM removal of rows must defer a VACUUM. > > > > ISTM that you are worried about the cases where a tuple is HOT updated > and hence can be pruned/defragged, but only if we revisit the page at > a later time. > > What if we just track the amount of potentially dead space in the > relation > (somebody had suggested that earlier in the thread) ? Every committed > UPDATE/DELETE and aborted UPDATE/INSERT would increment > the dead space. Whenever page fragmentation is repaired, either during > normal operation or during vacuum, the dead space is reduced by the > amount of reclaimed space. Autovacuum triggers whenever the percentage > of dead space increases beyond a threshold. > > We can some fine tuning to track the space consumed by redirect-dead > line pointers. Sounds great. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > HOT update is feasible iff > > - old and new tuples, both match the partiality condition OR > - old and new tuples, both don't match the condition ... > For functional index, we should apply the function to the old and new > tuple and compare the outcome. If the results are same, HOT update > is feasible. This is debatable. We could compare the columns used in the partial condition expression or function expression directly. If they're the same then the expression or function must return the same value. If the function is quite expensive then that might be cheaper. On the other hand if it's not expensive and the columns change frequently but the results don't then we might be doing a lot of work for nothing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Aug 09, 2007 at 01:25:14PM +0100, Gregory Stark wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > > HOT update is feasible iff > > > > - old and new tuples, both match the partiality condition OR > > - old and new tuples, both don't match the condition > ... > > For functional index, we should apply the function to the old and new > > tuple and compare the outcome. If the results are same, HOT update > > is feasible. > > This is debatable. We could compare the columns used in the partial condition > expression or function expression directly. If they're the same then the > expression or function must return the same value. If the function is quite > expensive then that might be cheaper. > > On the other hand if it's not expensive and the columns change frequently but > the results don't then we might be doing a lot of work for nothing. If we're going to get this into 8.3 I think we should be leaning towards whatever is the simplest way to do it... -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > I started with this. ISTM to support partial indexes, we must check > the old and new tuple against partiality match. > ... > For functional index, we should apply the function to the old and new > tuple and compare the outcome. If the results are same, HOT update > is feasible. I don't like either of these. They are going to be extremely expensive if the function or predicate is expensive (because you're going to be doing two evaluations that you might get no benefit from). Also, if the function is not as immutable as it's supposed to be, you will soon have an utterly corrupt index, with entries pointing at rows they in fact don't match and never did. We have so far managed to avoid any really strong dependencies on the requirement of index-function immutability --- your queries may not work very well if the function isn't immutable, but you are not at risk of system-level data corruption. With this, you will be. Since we are entirely dependent on users to mark immutable functions correctly (and have not always gotten it right ourselves :-(), I don't think this is an acceptable risk. If we can't do better than that (and offhand I don't see how to), then I agree with the current approach of disabling HOT when functional or partial indexes are present. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > We have so far managed to avoid any really strong dependencies on the > requirement of index-function immutability --- your queries may not work > very well if the function isn't immutable, but you are not at risk of > system-level data corruption. With this, you will be. Wait, how would this be any more vulnerable to system-level data corruption than a normal update? The worst case I can see is that you have a properly updated tuple but the new tuple version is indexed incorrectly just as would be the case if you have a functional index or expression index which had changed value since the update was performed. I agree about the costs for evaluating the expressions. But a COLD update is certainly going to have to evaluate both expressions once. The only additional cost here is that HOT is going to have to evaluate the *old* expression as well. So it's at worst twice as expensive as a normal COLD update. I think I'm leaning towards doing a binary comparison of the parameters to the expressions. That won't catch as many cases as comparing the results of the expressions -- and I can think of cases where that would be disappointing -- but it's in keeping with how it determines whether a tuple is eligible for a HOT update in the first place. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > I agree about the costs for evaluating the expressions. But a COLD update is > certainly going to have to evaluate both expressions once. The only additional > cost here is that HOT is going to have to evaluate the *old* expression as > well. So it's at worst twice as expensive as a normal COLD update. What's bothering me is the case where we evaluate the expression twice, find it doesn't match, and fall through to the COLD update logic which will do it a third time. > I think I'm leaning towards doing a binary comparison of the > parameters to the expressions. Yeah, we could simply insist on no change to any column that's used by any of the expressions. That would be cheap to test. regards, tom lane
On 8/9/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah, we could simply insist on no change to any column that's used by
any of the expressions. That would be cheap to test.
I am trying to figure out the best way to extract this information. Is there any
existing code to get all attributes used in the expressions ? Or do I need
to walk the tree and extract that information ?
Thanks,
Pavan
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On 8/9/07, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 2007-08-09 at 15:46 +0530, Pavan Deolasee wrote:
>
> What if we just track the amount of potentially dead space in the
> relation
> (somebody had suggested that earlier in the thread) ? Every committed
> UPDATE/DELETE and aborted UPDATE/INSERT would increment
> the dead space. Whenever page fragmentation is repaired, either during
> normal operation or during vacuum, the dead space is reduced by the
> amount of reclaimed space. Autovacuum triggers whenever the percentage
> of dead space increases beyond a threshold.
>
> We can some fine tuning to track the space consumed by redirect-dead
> line pointers.
Sounds great.
So do we have consensus here ? Fortunately, I think there won't be any
changes to user interface. Users can still use the vacuum_scale_factor to
tune autovacuum, but instead of percentage of dead tuples, it would
signify percentage of dead space in the relation.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-08-14 at 13:24 +0530, Pavan Deolasee wrote: > > > On 8/9/07, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, 2007-08-09 at 15:46 +0530, Pavan Deolasee wrote: > > > > > What if we just track the amount of potentially dead space > in the > > relation > > (somebody had suggested that earlier in the thread) ? Every > committed > > UPDATE/DELETE and aborted UPDATE/INSERT would increment > > the dead space. Whenever page fragmentation is repaired, > either during > > normal operation or during vacuum, the dead space is reduced > by the > > amount of reclaimed space. Autovacuum triggers whenever the > percentage > > of dead space increases beyond a threshold. > > > > We can some fine tuning to track the space consumed by > redirect-dead > > line pointers. > > Sounds great. > > > > So do we have consensus here ? Fortunately, I think there won't be any > changes to user interface. Users can still use the vacuum_scale_factor > to > tune autovacuum, but instead of percentage of dead tuples, it would > signify percentage of dead space in the relation. We have some consensus, but no complete design. My understanding is that we would see the following things tracked in pg_stats_xxx n_tup_ins count of rows inserted n_tup_upd count of rows updated (incl HOT and cold) n_tup_del count of rows deleted - the above are required because they are already there and useful too n_tup_hot_upd count of rows updated by HOT method only - the above is required to help tune HOT/cold updates dead_space total number of dead bytes in table - the above is required for autovacuum -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > I am trying to figure out the best way to extract this information. Is there > any > existing code to get all attributes used in the expressions ? Or do I need > to walk the tree and extract that information ? There are a number of near matches in backend/optimizer/util/var.c, but nothing that has exactly the API you probably want, which I'd think would be to extract a bitmapset of the varattnos of level-zero Vars. contain_var_reference() could be used repeatedly but that seems tremendously inefficient. I'd suggest coding up some new function using what's there for reference. regards, tom lane
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > What if we just track the amount of potentially dead space in the > relation > (somebody had suggested that earlier in the thread) ? Every committed > UPDATE/DELETE and aborted UPDATE/INSERT would increment > the dead space. Whenever page fragmentation is repaired, either during > normal operation or during vacuum, the dead space is reduced by the > amount of reclaimed space. Autovacuum triggers whenever the percentage > of dead space increases beyond a threshold. Doesn't this design completely fail to take index bloat into account? Repairing heap fragmentation does not reduce the need for VACUUM to work on the indexes. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: >> What if we just track the amount of potentially dead space in the >> relation >> (somebody had suggested that earlier in the thread) ? Every committed >> UPDATE/DELETE and aborted UPDATE/INSERT would increment >> the dead space. Whenever page fragmentation is repaired, either during >> normal operation or during vacuum, the dead space is reduced by the >> amount of reclaimed space. Autovacuum triggers whenever the percentage >> of dead space increases beyond a threshold. > > Doesn't this design completely fail to take index bloat into account? > Repairing heap fragmentation does not reduce the need for VACUUM to work > on the indexes. Index bloat is a bit of an open issue already. Because page splits already prune any LP_DELETEd pointers any busy index keys will be pruned already. However any index keys which have not been the subject of an index lookup -- and that includes keys which are only accessed by bitmap-index-scans -- won't be pruned. So we don't really know how much bloat is currently in an index. Perhaps we need a new statistic which gets updated whenever a page split prunes LP_DELETEd pointers (or perhaps when LP_DELETE is set?). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-08-14 at 10:10 -0400, Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > What if we just track the amount of potentially dead space in the > > relation > > (somebody had suggested that earlier in the thread) ? Every committed > > UPDATE/DELETE and aborted UPDATE/INSERT would increment > > the dead space. Whenever page fragmentation is repaired, either during > > normal operation or during vacuum, the dead space is reduced by the > > amount of reclaimed space. Autovacuum triggers whenever the percentage > > of dead space increases beyond a threshold. > > Doesn't this design completely fail to take index bloat into account? > Repairing heap fragmentation does not reduce the need for VACUUM to work > on the indexes. I thought of that, but we will only clean up space that is allowable, so the indexes don't degrade. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On 8/14/07, Gregory Stark <stark@enterprisedb.com> wrote:
I agree here. As a first step, may be can address the heap space usage
statistics and then take up index stats separately. Index bloat would carry
a different weight in triggering autovacuum.
I shall code up a patch which tracks the dead space in the heap and
trigger autovac based on that.
Thanks,
Pavan
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> Doesn't this design completely fail to take index bloat into account?
> Repairing heap fragmentation does not reduce the need for VACUUM to work
> on the indexes.
Index bloat is a bit of an open issue already. Because page splits already
prune any LP_DELETEd pointers any busy index keys will be pruned already.
However any index keys which have not been the subject of an index lookup --
and that includes keys which are only accessed by bitmap-index-scans -- won't
be pruned.
So we don't really know how much bloat is currently in an index. Perhaps we
need a new statistic which gets updated whenever a page split prunes
LP_DELETEd pointers (or perhaps when LP_DELETE is set?).
I agree here. As a first step, may be can address the heap space usage
statistics and then take up index stats separately. Index bloat would carry
a different weight in triggering autovacuum.
I shall code up a patch which tracks the dead space in the heap and
trigger autovac based on that.
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com