From 6f218d033bb334b5dc42e78cbcf7b33abeafa4f8 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sat, 22 Apr 2023 11:19:50 -0700 Subject: [PATCH v3 4/9] Reorder routine vacuuming sections. This doesn't change any of the content itself. It is a mechanical change. The new order flows better because it talks about freezing directly after talking about space recovery tasks. Old order: New order: The new order matches processing order inside vacuumlazy.c. This order will be easier to work with in two later commits that more or less rewrite "vacuum-for-wraparound" and "vacuum-for-space-recovery". (Though it doesn't seem to make the existing content any less meaningful without the later rewrite commits.) --- doc/src/sgml/maintenance.sgml | 302 +++++++++++++++++----------------- 1 file changed, 151 insertions(+), 151 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index e8c8647cd..e130dfdbd 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -281,8 +281,9 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - To update data statistics used by the - PostgreSQL query planner. + To protect against loss of very old data due to + transaction ID wraparound or + multixact ID wraparound. @@ -292,9 +293,8 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - To protect against loss of very old data due to - transaction ID wraparound or - multixact ID wraparound. + To update data statistics used by the + PostgreSQL query planner. @@ -439,151 +439,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu - - Updating Planner Statistics - - - statistics - of the planner - - - - ANALYZE - - - - The PostgreSQL query planner relies on - statistical information about the contents of tables in order to - generate good plans for queries. These statistics are gathered by - the ANALYZE command, - which can be invoked by itself or - as an optional step in VACUUM. It is important to have - reasonably accurate statistics, otherwise poor choices of plans might - degrade database performance. - - - - The autovacuum daemon, if enabled, will automatically issue - ANALYZE commands whenever the content of a table has - changed sufficiently. However, administrators might prefer to rely - on manually-scheduled ANALYZE operations, particularly - if it is known that update activity on a table will not affect the - statistics of interesting columns. The daemon schedules - ANALYZE strictly as a function of the number of rows - inserted or updated; it has no knowledge of whether that will lead - to meaningful statistical changes. - - - - Tuples changed in partitions and inheritance children do not trigger - analyze on the parent table. If the parent table is empty or rarely - changed, it may never be processed by autovacuum, and the statistics for - the inheritance tree as a whole won't be collected. It is necessary to - run ANALYZE on the parent table manually in order to - keep the statistics up to date. - - - - As with vacuuming for space recovery, frequent updates of statistics - are more useful for heavily-updated tables than for seldom-updated - ones. But even for a heavily-updated table, there might be no need for - statistics updates if the statistical distribution of the data is - not changing much. A simple rule of thumb is to think about how much - the minimum and maximum values of the columns in the table change. - For example, a timestamp column that contains the time - of row update will have a constantly-increasing maximum value as - rows are added and updated; such a column will probably need more - frequent statistics updates than, say, a column containing URLs for - pages accessed on a website. The URL column might receive changes just - as often, but the statistical distribution of its values probably - changes relatively slowly. - - - - It is possible to run ANALYZE on specific tables and even - just specific columns of a table, so the flexibility exists to update some - statistics more frequently than others if your application requires it. - In practice, however, it is usually best to just analyze the entire - database, because it is a fast operation. ANALYZE uses a - statistically random sampling of the rows of a table rather than reading - every single row. - - - - - Although per-column tweaking of ANALYZE frequency might not be - very productive, you might find it worthwhile to do per-column - adjustment of the level of detail of the statistics collected by - ANALYZE. Columns that are heavily used in WHERE - clauses and have highly irregular data distributions might require a - finer-grain data histogram than other columns. See ALTER TABLE - SET STATISTICS, or change the database-wide default using the configuration parameter. - - - - Also, by default there is limited information available about - the selectivity of functions. However, if you create a statistics - object or an expression - index that uses a function call, useful statistics will be - gathered about the function, which can greatly improve query - plans that use the expression index. - - - - - - The autovacuum daemon does not issue ANALYZE commands for - foreign tables, since it has no means of determining how often that - might be useful. If your queries require statistics on foreign tables - for proper planning, it's a good idea to run manually-managed - ANALYZE commands on those tables on a suitable schedule. - - - - - - The autovacuum daemon does not issue ANALYZE commands - for partitioned tables. Inheritance parents will only be analyzed if the - parent itself is changed - changes to child tables do not trigger - autoanalyze on the parent table. If your queries require statistics on - parent tables for proper planning, it is necessary to periodically run - a manual ANALYZE on those tables to keep the statistics - up to date. - - - - - - - Updating the Visibility Map - - - Vacuum maintains a visibility map for each - table to keep track of which pages contain only tuples that are known to be - visible to all active transactions (and all future transactions, until the - page is again modified). This has two purposes. First, vacuum - itself can skip such pages on the next run, since there is nothing to - clean up. - - - - Second, it allows PostgreSQL to answer some - queries using only the index, without reference to the underlying table. - Since PostgreSQL indexes don't contain tuple - visibility information, a normal index scan fetches the heap tuple for each - matching index entry, to check whether it should be seen by the current - transaction. - An index-only - scan, on the other hand, checks the visibility map first. - If it's known that all tuples on the page are - visible, the heap fetch can be skipped. This is most useful on - large data sets where the visibility map can prevent disk accesses. - The visibility map is vastly smaller than the heap, so it can easily be - cached even when the heap is very large. - - - Preventing Transaction ID Wraparound Failures @@ -933,7 +788,152 @@ HINT: Stop the postmaster and vacuum that database in single-user mode. - + + + Updating the Visibility Map + + + Vacuum maintains a visibility + map for each table to keep track of which pages contain + only tuples that are known to be visible to all active + transactions (and all future transactions, until the page is again + modified). This has two purposes. First, vacuum itself can skip + such pages on the next run, since there is nothing to clean up. + + + + Second, it allows PostgreSQL to answer + some queries using only the index, without reference to the + underlying table. Since PostgreSQL + indexes don't contain tuple visibility information, a normal index + scan fetches the heap tuple for each matching index entry, to + check whether it should be seen by the current transaction. An + index-only + scan, on the other hand, checks the + visibility map first. If it's known that all tuples on the page + are visible, the heap fetch can be skipped. This is most useful + on large data sets where the visibility map can prevent disk + accesses. The visibility map is vastly smaller than the heap, so + it can easily be cached even when the heap is very large. + + + + + Updating Planner Statistics + + + statistics + of the planner + + + + ANALYZE + + + + The PostgreSQL query planner relies on + statistical information about the contents of tables in order to + generate good plans for queries. These statistics are gathered by + the ANALYZE command, + which can be invoked by itself or + as an optional step in VACUUM. It is important to have + reasonably accurate statistics, otherwise poor choices of plans might + degrade database performance. + + + + The autovacuum daemon, if enabled, will automatically issue + ANALYZE commands whenever the content of a table has + changed sufficiently. However, administrators might prefer to rely + on manually-scheduled ANALYZE operations, particularly + if it is known that update activity on a table will not affect the + statistics of interesting columns. The daemon schedules + ANALYZE strictly as a function of the number of rows + inserted or updated; it has no knowledge of whether that will lead + to meaningful statistical changes. + + + + Tuples changed in partitions and inheritance children do not trigger + analyze on the parent table. If the parent table is empty or rarely + changed, it may never be processed by autovacuum, and the statistics for + the inheritance tree as a whole won't be collected. It is necessary to + run ANALYZE on the parent table manually in order to + keep the statistics up to date. + + + + As with vacuuming for space recovery, frequent updates of statistics + are more useful for heavily-updated tables than for seldom-updated + ones. But even for a heavily-updated table, there might be no need for + statistics updates if the statistical distribution of the data is + not changing much. A simple rule of thumb is to think about how much + the minimum and maximum values of the columns in the table change. + For example, a timestamp column that contains the time + of row update will have a constantly-increasing maximum value as + rows are added and updated; such a column will probably need more + frequent statistics updates than, say, a column containing URLs for + pages accessed on a website. The URL column might receive changes just + as often, but the statistical distribution of its values probably + changes relatively slowly. + + + + It is possible to run ANALYZE on specific tables and even + just specific columns of a table, so the flexibility exists to update some + statistics more frequently than others if your application requires it. + In practice, however, it is usually best to just analyze the entire + database, because it is a fast operation. ANALYZE uses a + statistically random sampling of the rows of a table rather than reading + every single row. + + + + + Although per-column tweaking of ANALYZE frequency might not be + very productive, you might find it worthwhile to do per-column + adjustment of the level of detail of the statistics collected by + ANALYZE. Columns that are heavily used in WHERE + clauses and have highly irregular data distributions might require a + finer-grain data histogram than other columns. See ALTER TABLE + SET STATISTICS, or change the database-wide default using the configuration parameter. + + + + Also, by default there is limited information available about + the selectivity of functions. However, if you create a statistics + object or an expression + index that uses a function call, useful statistics will be + gathered about the function, which can greatly improve query + plans that use the expression index. + + + + + + The autovacuum daemon does not issue ANALYZE commands for + foreign tables, since it has no means of determining how often that + might be useful. If your queries require statistics on foreign tables + for proper planning, it's a good idea to run manually-managed + ANALYZE commands on those tables on a suitable schedule. + + + + + + The autovacuum daemon does not issue ANALYZE commands + for partitioned tables. Inheritance parents will only be analyzed if the + parent itself is changed - changes to child tables do not trigger + autoanalyze on the parent table. If your queries require statistics on + parent tables for proper planning, it is necessary to periodically run + a manual ANALYZE on those tables to keep the statistics + up to date. + + + + + -- 2.40.1