Thread: Revitalising VACUUM FULL for 8.3
Use case for VACUUM FULL is very low these days. VACUUM does the most important part of what VACUUM FULL offers, yet does it concurrently rather than with a full table lock. VACUUM FULL also - has very long execution time - generates lots of WAL traffic - uses lots of memory while it runs - isn't as good at compacting a relation as CLUSTER - sometimes requires multiple runs to properly compact data CLUSTER has - much better compaction than VACUUM FULL when run concurrently with other transactions (yes, really!) - need not generate WAL, in many cases - offers no weird failure cases I propose to use the CLUSTER technique in place of the current VACUUM FULL code. The command VACUUM FULL would still exist, but would execute the command in a different manner, very similar to the way CLUSTER works, just without the sort-the-table feature. The benefits of this approach would be: - VACUUM FULL would compact relations much better than it does now - would never need multiple executions to achieve good compaction - operate much faster, with a single pass over the main relation - it would compact as effectively as CLUSTER, yet execute faster - need not generate WAL while it executes - no requirement for large maintenance_work_mem The potential downsides of this approach are the following: - "it uses more space" - "it breaks MVCC" There are good answers to both these points, so please read on. The manual says VACUUM FULL can be speeded up by dropping and re-creating indexes, but it is still lengthy. It is even faster to drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old table and then rebuild the indexes. So the main use case for current VACUUM FULL is when the space to be freed inside the table is low enough to make defraging the table quicker than than the above workaround, yet still high enough that we were worried enough to do a VACUUM FULL. You must also be running it concurrently with other transactions, but clearly ones that don't include the current table because they will be locked out by the VACUUM FULL. That's a tough requirement because this table is by-definition one that is/has been heavily updated/deleted. And also VACUUM hasn't been much use at freeing space at the end of the table, which it would only fail to do with concurrently held locks. Thats a very narrow use case and I doubt whether it exists at all any longer. New VACUUM FULL would perform a SeqScan of the main relation using SnapshotNow, inserting the results into a new relfilenode. If it crashes part way through the first phase, we drop the file being built and table is untouched. No WAL need be written while we are doing this, except when archive_command is set. This produces a new relation which is tightly compacted, free of older tuples and the physical file is no larger than required. Vacuum delay points would also be supported. No FK checks would be required, nor would other constraints need to be-rechecked during the insertion. Index entries would not be made during the insertions into the second heap. Just as with CLUSTER, the second phase would consist of rebuilding all indexes, ensuring that they too are as compact as possible. Space usage of VACUUM FULL could be as high as twice the target table, but only in the case where there wasn't anything to VACUUM or truncate. In the typical use case we would be looking to remove large numbers of dead tuples, as well as truncate the relation, so the actual space overhead would be more typically only about +50% of the pre-VACUUM FULL size of the target table. If execution hits an out-of-space error then the command can quickly recover. If space really isn't available, then indexes can be dropped manually and the process re-executed. Arranging for more temp space is now easier with the new temp space management code. Current CLUSTER does not respect MVCC. It's possible for an old serializable transaction to miss consistent data as a result. That is a serious problem for pg_dump and I propose to fix that in this proposal, for both CLUSTER and the new VACUUM FULL. I would like to introduce the concept of utility transactions. This is any transaction that touches only one table in a transaction and is not returning or modifying data. All utility transactions wait until they are older than all non-utility transactions before they commit. A utility transaction would currently be any VACUUM, VACUUM FULL and CREATE INDEX CONCURRENTLY. That is safe because each of those commands executes in its own transaction and doesn't touch more than one table at a time. Once each knows there is no chance of being interfered with, it can continue its work and commit. This technique is already in use for CREATE INDEX CONCURRENTLY, so just needs to be extended to all other utilities - but in a way that allows them to recognise each other. This extends upon the thought that VACUUMs already recognise other VACUUMs and avoid using them as part of their Snapshot. I would also like to detect cases where CLUSTER, CREATE INDEX and ALTER TABLE are running in their own implicit top-level transaction, so that these too can be recognised by the server as utility transactions. The utility transaction concept would make new VACUUM FULL MVCC-safe and would also make most executions of CLUSTER MVCC-safe also (the implicit top-level transaction cases). The coding for much of this looks very straightforward. I have to rewrite VACUUM FULL to cope with HOT anyway, so this seems like both a faster, more robust and easier development path to take. However, none of the above arguments rely on the acceptance of the HOT patch to increase their relevance for this proposal. Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-03-01 at 13:02, Simon Riggs wrote: > I would like to introduce the concept of utility transactions. This is > any transaction that touches only one table in a transaction and is not > returning or modifying data. All utility transactions wait until they > are older than all non-utility transactions before they commit. A > utility transaction would currently be any VACUUM, VACUUM FULL and > CREATE INDEX CONCURRENTLY. That is safe because each of those commands > executes in its own transaction and doesn't touch more than one table at > a time. Once each knows there is no chance of being interfered with, it > can continue its work and commit. This technique is already in use for > CREATE INDEX CONCURRENTLY, so just needs to be extended to all other > utilities - but in a way that allows them to recognise each other. This > extends upon the thought that VACUUMs already recognise other VACUUMs > and avoid using them as part of their Snapshot. Wouldn't this be deadlock prone ? What if a non-utility transaction (which could even be started before the vacuum full) blocks on the table being vacuumed, then if the vacuum wants to wait until all non-utility transactions finish will deadlock. > The utility transaction concept would make new VACUUM FULL MVCC-safe and > would also make most executions of CLUSTER MVCC-safe also (the implicit > top-level transaction cases). Making cluster MVCC-safe will kill my back-door of clustering a hot table while I run a full DB backup. Cheers, Csaba.
Ühel kenal päeval, N, 2007-03-01 kell 12:02, kirjutas Simon Riggs: > Use case for VACUUM FULL is very low these days. VACUUM does the most > important part of what VACUUM FULL offers, yet does it concurrently > rather than with a full table lock. VACUUM FULL also > - has very long execution time > - generates lots of WAL traffic > - uses lots of memory while it runs > - isn't as good at compacting a relation as CLUSTER > - sometimes requires multiple runs to properly compact data Yet another way to achieve VACUUM FULL-like results would be a COMPACT TABLE command, which would do the following: 1 - start a forward sequential scan to find free space 2 - start a backwards seqscan to find live tuples move live tuples to free space by doing a null update (UPDATE without changing any field values) with new version being placed to lowest possible age until the two scans intersect. This, together with ordinary VACUUM would achieve almost the same results as VACUUM FULL with much lower overhead. And it can also be done in lots of smaller transactions instead of one big one if required. If we can trust FSM, the whole process just becomes the backward scan and null updates until the null update does not move tuple to a lower page. Also, for the duration of COMPACT TABLE the updated tuple should always be placed in lowes available slot, that is no same-page updates should be tied before going to FSM. This has some downsides : 1 - the original xmin will be lost 2 - as with any updates, it may block/abort other concurrent updates, so it could be a good thing to teach the update mechanism about null updates. Still I think that this would be the chepest way to get VACUUM FULL behaviour without locking the whole table for long time -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing: > If we can trust FSM, the whole process just becomes the backward scan > and null updates until the null update does not move tuple to a lower > page. Also, for the duration of COMPACT TABLE the updated tuple should > always be placed in lowes available slot, that is no same-page updates > should be tied before going to FSM. > > This has some downsides : > > 1 - the original xmin will be lost > > 2 - as with any updates, it may block/abort other concurrent updates, so > it could be a good thing to teach the update mechanism about null > updates. > > Still I think that this would be the chepest way to get VACUUM FULL > behaviour without locking the whole table for long time This means that VACUUM FULL mytable; would translate to: VACUUM mytable; -- make free space COMPACT mytable; -- move tuples in a bunch of small transactions -- might have a GUC for max trx length VACUUM mytable; -- free the tuples at the end and give space back to fs -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On Thu, 2007-03-01 at 13:21 +0100, Csaba Nagy wrote: > On Thu, 2007-03-01 at 13:02, Simon Riggs wrote: > > I would like to introduce the concept of utility transactions. This is > > any transaction that touches only one table in a transaction and is not > > returning or modifying data. All utility transactions wait until they > > are older than all non-utility transactions before they commit. A > > utility transaction would currently be any VACUUM, VACUUM FULL and > > CREATE INDEX CONCURRENTLY. That is safe because each of those commands > > executes in its own transaction and doesn't touch more than one table at > > a time. Once each knows there is no chance of being interfered with, it > > can continue its work and commit. This technique is already in use for > > CREATE INDEX CONCURRENTLY, so just needs to be extended to all other > > utilities - but in a way that allows them to recognise each other. This > > extends upon the thought that VACUUMs already recognise other VACUUMs > > and avoid using them as part of their Snapshot. > > Wouldn't this be deadlock prone ? What if a non-utility transaction > (which could even be started before the vacuum full) blocks on the table > being vacuumed, then if the vacuum wants to wait until all non-utility > transactions finish will deadlock. Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more frequent use case than VACUUM FULL does, even after I've made the proposed changes. The situation, as I understand it, would be that the utility command waits on another transaction to complete. As soon as that other transaction touches the locked table it will detect a simple deadlock and the non-utility statement will abort. > > The utility transaction concept would make new VACUUM FULL MVCC-safe and > > would also make most executions of CLUSTER MVCC-safe also (the implicit > > top-level transaction cases). > > Making cluster MVCC-safe will kill my back-door of clustering a hot > table while I run a full DB backup. Wow. I'll take that as a request for a NOWAIT option on utility commands, rather than a suggestion that we shouldn't strive to make things safe in the default case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-03-01 at 13:56, Simon Riggs wrote: > > Wouldn't this be deadlock prone ? What if a non-utility transaction > > (which could even be started before the vacuum full) blocks on the table > > being vacuumed, then if the vacuum wants to wait until all non-utility > > transactions finish will deadlock. > > Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more > frequent use case than VACUUM FULL does, even after I've made the > proposed changes. > > The situation, as I understand it, would be that the utility command > waits on another transaction to complete. As soon as that other > transaction touches the locked table it will detect a simple deadlock > and the non-utility statement will abort. Fair enough. > > > The utility transaction concept would make new VACUUM FULL MVCC-safe and > > > would also make most executions of CLUSTER MVCC-safe also (the implicit > > > top-level transaction cases). > > > > Making cluster MVCC-safe will kill my back-door of clustering a hot > > table while I run a full DB backup. > > Wow. I'll take that as a request for a NOWAIT option on utility > commands, rather than a suggestion that we shouldn't strive to make > things safe in the default case. Yes please... if the current behavior is possible to be triggered (by that NOWAIT for eg.), it would be actually good to have the MVCC behavior as default. Thanks, Csaba.
On Thu, 2007-03-01 at 14:32 +0200, Hannu Krosing wrote: > Ühel kenal päeval, N, 2007-03-01 kell 12:02, kirjutas Simon Riggs: > > Use case for VACUUM FULL is very low these days. VACUUM does the most > > important part of what VACUUM FULL offers, yet does it concurrently > > rather than with a full table lock. VACUUM FULL also > > - has very long execution time > > - generates lots of WAL traffic > > - uses lots of memory while it runs > > - isn't as good at compacting a relation as CLUSTER > > - sometimes requires multiple runs to properly compact data > > Yet another way to achieve VACUUM FULL-like results would be a COMPACT > TABLE command, which would do the following: > > 1 - start a forward sequential scan to find free space > > 2 - start a backwards seqscan to find live tuples > > move live tuples to free space by doing a null update (UPDATE without > changing any field values) with new version being placed to lowest > possible age until the two scans intersect. Thats almost exactly what VACUUM FULL does now, just using two passes of the table. 1st scan builds the free block list and tuples to move list, then it starts from back, moving rows until it gets to the front. This does sound like it would save some I/O by avoiding the second half of the first scan. In most other respects its the same thing, AFAICS. CLUSTER would still achieve better compaction and you'll still need to write lots of WAL doing this. ISTM a radical approach is needed, so I'm very open to discussion about this and how we cope. If we break down the main thoughts into a few parts: 1. would like a way to CLUSTER/VACUUM FULL where we don't have to move all of the tuple versions, just the current ones. 2. would like a way to compact a table more efficiently Your idea does (2) in a concurrent manner, which is very good. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, 2007-03-01 at 14:42 +0200, Hannu Krosing wrote: > Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing: > > > If we can trust FSM, the whole process just becomes the backward scan > > and null updates until the null update does not move tuple to a lower > > page. Also, for the duration of COMPACT TABLE the updated tuple should > > always be placed in lowes available slot, that is no same-page updates > > should be tied before going to FSM. > > > > This has some downsides : > > > > 1 - the original xmin will be lost > > > > 2 - as with any updates, it may block/abort other concurrent updates, so > > it could be a good thing to teach the update mechanism about null > > updates. > > > > Still I think that this would be the chepest way to get VACUUM FULL > > behaviour without locking the whole table for long time > > This means that > > VACUUM FULL mytable; > > would translate to: > > VACUUM mytable; -- make free space -- stop here if nothing further to do -- stop when freespace >= remaining spacein table -- update the FSM, so that concurrent inserts/updates put their new versions at start of table > COMPACT mytable; -- move tuples in a bunch of small transactions > -- might have a GUC for max trx length -- which => always perform COLD UPDATEs, never HOT ones > VACUUM mytable; -- free the tuples at the end and give space back to fs -- start the VACUUM from the first non-filledblock So if we do this, we wouldn't need to worry about HOT tuples at all, nor would we need to wait until all transactions are gone. This would also mean that VACUUM FULL could be achieved concurrently. I'm on the edge here, want to persuade me some more? Pro - we don't need any more space Cons - not as fast as dump-to-new-file technique - will end up with bloated indexes and REINDEX is still not concurrent -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> -- start the VACUUM from the first non-filled block > > So if we do this, we wouldn't need to worry about HOT tuples > at all, nor would we need to wait until all transactions are gone. You need to wait until you are allowed to truncate if you want concurrency. Or a concurrent scan might miss a row, because the visible tuple got truncated away. Andreas
Ühel kenal päeval, N, 2007-03-01 kell 13:14, kirjutas Simon Riggs: > On Thu, 2007-03-01 at 14:42 +0200, Hannu Krosing wrote: > > Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing: > > > > > If we can trust FSM, the whole process just becomes the backward scan > > > and null updates until the null update does not move tuple to a lower > > > page. Also, for the duration of COMPACT TABLE the updated tuple should > > > always be placed in lowes available slot, that is no same-page updates > > > should be tied before going to FSM. > > > > > > This has some downsides : > > > > > > 1 - the original xmin will be lost > > > > > > 2 - as with any updates, it may block/abort other concurrent updates, so > > > it could be a good thing to teach the update mechanism about null > > > updates. > > > > > > Still I think that this would be the chepest way to get VACUUM FULL > > > behaviour without locking the whole table for long time > > > > This means that > > > > VACUUM FULL mytable; > > > > would translate to: > > > > VACUUM mytable; -- make free space > -- stop here if nothing further to do > -- stop when freespace >= remaining space in table > -- update the FSM, so that concurrent inserts/updates > put their new versions at start of table This still needs some tweaking, so that same-page updates don't happen for tuples that need moving. > > COMPACT mytable; -- move tuples in a bunch of small transactions > > -- might have a GUC for max trx length > -- which => always perform COLD UPDATEs, never HOT ones Yes, that too. > > VACUUM mytable; -- free the tuples at the end and give space back to fs > -- start the VACUUM from the first non-filled block > > So if we do this, we wouldn't need to worry about HOT tuples at all, nor > would we need to wait until all transactions are gone. We would still need to do something for HOT tuples that are placed in the "dense" part of table, that is below the eventual truncation point. > This would also mean that VACUUM FULL could be achieved concurrently. That was the main point, yes. > I'm on the edge here, want to persuade me some more? It should be much simpler to do than any of the CLUSTER/COPY variants - "just" null updates with simple new-version placement . > Pro > - we don't need any more space Actually we may need a little (or a lot), as the moved tuples need index entries. But likely it is much less than for CLUSTER variant. > Cons > - not as fast as dump-to-new-file technique It depends on table contents and amount/placement of dirty space - it may be much faster. > - will end up with bloated indexes and REINDEX is still not concurrent If we get plan invalidation, then doing CREATE newindex; DROP oldindex; would be a good replacement for REINDEX. And it can be done CONCURRENTLY as well. Work done on COMPACT TABLE could probably be reused later for some version of CLUSTER CONCURRENTLY which would do something similar, but pay more attention on ordering and/or clustering and fill factor -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On Thu, 2007-03-01 at 14:35 +0100, Zeugswetter Andreas ADI SD wrote: > > -- start the VACUUM from the first non-filled block > > > > So if we do this, we wouldn't need to worry about HOT tuples > > at all, nor would we need to wait until all transactions are gone. > > You need to wait until you are allowed to truncate if you want > concurrency. > Or a concurrent scan might miss a row, because the visible tuple got > truncated away. I was not suggesting that we remove visible rows through truncation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> > > -- start the VACUUM from the first non-filled block > > > > > > So if we do this, we wouldn't need to worry about HOT tuples at all, > > > nor would we need to wait until all transactions are gone. > > > > You need to wait until you are allowed to truncate if you want > > concurrency. > > Or a concurrent scan might miss a row, because the visible tuple got > > truncated away. > > I was not suggesting that we remove visible rows through truncation. Sure, unless you suggest to not truncate during this vacuum run ? But we are talking about vacuum full, so truncation is essential. It was suggested to do a dummy null update to move live tuples up front. The old version is still visible for serializable txns. Andreas
Ühel kenal päeval, N, 2007-03-01 kell 15:03, kirjutas Zeugswetter Andreas ADI SD: > > > > -- start the VACUUM from the first non-filled block > > > > > > > > So if we do this, we wouldn't need to worry about HOT tuples at > all, > > > > nor would we need to wait until all transactions are gone. > > > > > > You need to wait until you are allowed to truncate if you want > > > concurrency. > > > Or a concurrent scan might miss a row, because the visible tuple got > > > > truncated away. > > > > I was not suggesting that we remove visible rows through truncation. > > Sure, unless you suggest to not truncate during this vacuum run ? > But we are talking about vacuum full, so truncation is essential. > > It was suggested to do a dummy null update to move live tuples up front. > The old version is still visible for serializable txns. There should probably be a wait-other-trxs to finish stage between COMPACT and 2nd VACUUM if truncation at this cycle is absolutely needed > Andreas > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
> This means that > > VACUUM FULL mytable; > > would translate to: > > VACUUM mytable; -- make free space > COMPACT mytable; -- move tuples in a bunch of small transactions > -- might have a GUC for max trx length > VACUUM mytable; -- free the tuples at the end and give space back to fs VACUUM my table [n] tuples; -- free tuples at the end but only up to [n] tuples. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
On Thu, Mar 01, 2007 at 04:54:44PM +0200, Hannu Krosing wrote: > > > I was not suggesting that we remove visible rows through truncation. > > > > Sure, unless you suggest to not truncate during this vacuum run ? > > But we are talking about vacuum full, so truncation is essential. > > > > It was suggested to do a dummy null update to move live tuples up front. > > The old version is still visible for serializable txns. > > There should probably be a wait-other-trxs to finish stage between > COMPACT and 2nd VACUUM if truncation at this cycle is absolutely needed Or you could just do a vacuum a bit later. If we've also got the DSM at that point, vacuum should be fast. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Thu, Mar 01, 2007 at 01:05:28PM +0000, Simon Riggs wrote: > ISTM a radical approach is needed, so I'm very open to discussion about > this and how we cope. > > If we break down the main thoughts into a few parts: > > 1. would like a way to CLUSTER/VACUUM FULL where we don't have to move > all of the tuple versions, just the current ones. > > 2. would like a way to compact a table more efficiently > > Your idea does (2) in a concurrent manner, which is very good. Along similar lines, I think it would also be very useful to have a mode where any time a tuple on the last X pages gets updated it's moved off of it's existing page in the relation. Kind of like a 'shrink the heap in the background'. Granted, this probably won't compact as much as something more aggressive would, but it should be essentially free. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)