Revitalising VACUUM FULL for 8.3 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Revitalising VACUUM FULL for 8.3 |
Date | |
Msg-id | 1172750543.3760.1232.camel@silverbirch.site Whole thread Raw |
Responses |
Re: Revitalising VACUUM FULL for 8.3
Re: Revitalising VACUUM FULL for 8.3 |
List | pgsql-hackers |
Use case for VACUUM FULL is very low these days. VACUUM does the most important part of what VACUUM FULL offers, yet does it concurrently rather than with a full table lock. VACUUM FULL also - has very long execution time - generates lots of WAL traffic - uses lots of memory while it runs - isn't as good at compacting a relation as CLUSTER - sometimes requires multiple runs to properly compact data CLUSTER has - much better compaction than VACUUM FULL when run concurrently with other transactions (yes, really!) - need not generate WAL, in many cases - offers no weird failure cases I propose to use the CLUSTER technique in place of the current VACUUM FULL code. The command VACUUM FULL would still exist, but would execute the command in a different manner, very similar to the way CLUSTER works, just without the sort-the-table feature. The benefits of this approach would be: - VACUUM FULL would compact relations much better than it does now - would never need multiple executions to achieve good compaction - operate much faster, with a single pass over the main relation - it would compact as effectively as CLUSTER, yet execute faster - need not generate WAL while it executes - no requirement for large maintenance_work_mem The potential downsides of this approach are the following: - "it uses more space" - "it breaks MVCC" There are good answers to both these points, so please read on. The manual says VACUUM FULL can be speeded up by dropping and re-creating indexes, but it is still lengthy. It is even faster to drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old table and then rebuild the indexes. So the main use case for current VACUUM FULL is when the space to be freed inside the table is low enough to make defraging the table quicker than than the above workaround, yet still high enough that we were worried enough to do a VACUUM FULL. You must also be running it concurrently with other transactions, but clearly ones that don't include the current table because they will be locked out by the VACUUM FULL. That's a tough requirement because this table is by-definition one that is/has been heavily updated/deleted. And also VACUUM hasn't been much use at freeing space at the end of the table, which it would only fail to do with concurrently held locks. Thats a very narrow use case and I doubt whether it exists at all any longer. New VACUUM FULL would perform a SeqScan of the main relation using SnapshotNow, inserting the results into a new relfilenode. If it crashes part way through the first phase, we drop the file being built and table is untouched. No WAL need be written while we are doing this, except when archive_command is set. This produces a new relation which is tightly compacted, free of older tuples and the physical file is no larger than required. Vacuum delay points would also be supported. No FK checks would be required, nor would other constraints need to be-rechecked during the insertion. Index entries would not be made during the insertions into the second heap. Just as with CLUSTER, the second phase would consist of rebuilding all indexes, ensuring that they too are as compact as possible. Space usage of VACUUM FULL could be as high as twice the target table, but only in the case where there wasn't anything to VACUUM or truncate. In the typical use case we would be looking to remove large numbers of dead tuples, as well as truncate the relation, so the actual space overhead would be more typically only about +50% of the pre-VACUUM FULL size of the target table. If execution hits an out-of-space error then the command can quickly recover. If space really isn't available, then indexes can be dropped manually and the process re-executed. Arranging for more temp space is now easier with the new temp space management code. Current CLUSTER does not respect MVCC. It's possible for an old serializable transaction to miss consistent data as a result. That is a serious problem for pg_dump and I propose to fix that in this proposal, for both CLUSTER and the new VACUUM FULL. I would like to introduce the concept of utility transactions. This is any transaction that touches only one table in a transaction and is not returning or modifying data. All utility transactions wait until they are older than all non-utility transactions before they commit. A utility transaction would currently be any VACUUM, VACUUM FULL and CREATE INDEX CONCURRENTLY. That is safe because each of those commands executes in its own transaction and doesn't touch more than one table at a time. Once each knows there is no chance of being interfered with, it can continue its work and commit. This technique is already in use for CREATE INDEX CONCURRENTLY, so just needs to be extended to all other utilities - but in a way that allows them to recognise each other. This extends upon the thought that VACUUMs already recognise other VACUUMs and avoid using them as part of their Snapshot. I would also like to detect cases where CLUSTER, CREATE INDEX and ALTER TABLE are running in their own implicit top-level transaction, so that these too can be recognised by the server as utility transactions. The utility transaction concept would make new VACUUM FULL MVCC-safe and would also make most executions of CLUSTER MVCC-safe also (the implicit top-level transaction cases). The coding for much of this looks very straightforward. I have to rewrite VACUUM FULL to cope with HOT anyway, so this seems like both a faster, more robust and easier development path to take. However, none of the above arguments rely on the acceptance of the HOT patch to increase their relevance for this proposal. Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: