Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id 20210804.142254.670315945770222525.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
At Thu, 29 Jul 2021 18:03:55 -0700, Andres Freund <andres@anarazel.de> wrote in 
> And if one instead inverts the order of pgstat_report_analyze() and
> pgstat_report_anl_ancestors() one gets a slightly different problem: A manual
> ANALYZE of the partition root results in the partition root having a non-zero
> changes_since_analyze afterwards. expand_vacuum() causes child partitions to be
> added to the list of relations, which *first* causes the partition root to be
> analyzed, and *then* partitions. The partitions then report their
> changes_since_analyze upwards.

For the last behavior, as Andres suggested, the scan order need to be
reversed (or to be in the same order with autovacuum). Since
find_all_inheritors scans breadth-first so just reversing the result
works. The breadth-first is currently not in the contract of the
interface of the function. I suppose we can add such a contract?

Finally, I ended up with the attached.

 - reverse the relation order within a tree
 - reverse the order of pgstat_report_analyze and pgstat_report_analyze.

Inheritance expansion is performed per-tree basis so it works fine
even if multiple relations are given to vacuum().


> I don't think the code as is is fit for v14. It looks like it was rewritten
> with a new approach just before the freeze ([1]), and as far as I can tell the
> concerns I quoted above weren't even discussed in the whole thread.  Alvaro,
> any comments?
> 
> Greetings,
> 
> Andres Freund
> 
> [1] https://www.postgresql.org/message-id/20210408032235.GA6842%40alvherre.pgsql

FYI: this bahaves as the follows.

CREATE TABLE p (a int) PARTITION BY RANGE (a);
CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (200) PARTITION BY RANGE(a);
CREATE TABLE c11 PARTITION OF c1 FOR VALUES FROM (0) TO (100);
CREATE TABLE c12 PARTITION OF c1 FOR VALUES FROM (100) TO (200);
CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (200) TO (400) PARTITION BY RANGE(a);
CREATE TABLE c21 PARTITION OF c2 FOR VALUES FROM (200) TO (300);
CREATE TABLE c22 PARTITION OF c2 FOR VALUES FROM (300) TO (400);
INSERT INTO p (SELECT a FROM generate_series(0, 400 - 1) a, generate_series(0, 10) b);


INSERT INTO p (SELECT 200 FROM generate_series(0, 99));

SELECT relid, relname, n_mod_since_analyze FROM pg_stat_user_tables ORDER BY relid;
 relid | relname | n_mod_since_analyze 
-------+---------+---------------------
 16426 | p       |                   0
 16429 | c1      |                   0
 16432 | c11     |                   0
 16435 | c12     |                   0
 16438 | c2      |                   0
 16441 | c21     |                 100
 16444 | c22     |                   0
 16447 | sa      |                   0
(8 rows)

After "ANALYZE c21;"
 relid | relname | n_mod_since_analyze 
-------+---------+---------------------
 16426 | p       |                 100
 16429 | c1      |                   0
 16432 | c11     |                   0
 16435 | c12     |                   0
 16438 | c2      |                 100
 16441 | c21     |                   0
 16444 | c22     |                   0
 16447 | sa      |                   0

After "ANALYZE c2;"
 relid | relname | n_mod_since_analyze 
-------+---------+---------------------
 16426 | p       |                 100
 16429 | c1      |                   0
 16432 | c11     |                   0
 16435 | c12     |                   0
 16438 | c2      |                   0
 16441 | c21     |                   0
 16444 | c22     |                   0
 16447 | sa      |                   0

After "ANALYZE p;"
(all zero)


However, this gives a strange-looking side-effect, which affected
regression results.

=# VACUUM ANALYZE p(a, a);
ERROR:  column "a" of relation "c22" appears more than once

(Prevously it complained about p.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 16f7602f1b7755f288c508f1e57e0eae3c305813 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 4 Aug 2021 13:40:59 +0900
Subject: [PATCH] Fix changes_since_analyze's motion on manual analyze on
 partitioned tables

The analyze-stats machinery assumed bottom-to-top-ordered relation
scans but actually it was in the opposite order in manual
ANALYZE. Addition to that the current code tries to propagate
changes_since_analyze to parents after stats reporting which resets
the number to propagate.

As the result, when doing manual ANALYZE on a partition,
changes_since_analyze vanishes instead of being propagated to
parents. On the other hand when doing that on a partitioned tables,
the leaf relations end up with having bogus stats values.

To fix this, reverse the order relations on running manual ANALYZE and
move stats-reporting after stats-propagation.
---
 src/backend/catalog/pg_inherits.c    |  3 ++-
 src/backend/commands/analyze.c       | 26 +++++++++++++-------------
 src/backend/commands/vacuum.c        | 25 +++++++++++++++----------
 src/test/regress/expected/vacuum.out | 18 +++++++++---------
 4 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index ae990d4877..3451578580 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -239,7 +239,8 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 /*
  * find_all_inheritors -
  *        Returns a list of relation OIDs including the given rel plus
- *        all relations that inherit from it, directly or indirectly.
+ *        all relations that inherit from it, directly or indirectly, in
+ *        breadth-first ordering.
  *        Optionally, it also returns the number of parents found for
  *        each such relation within the inheritance tree rooted at the
  *        given rel.
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 0c9591415e..414da6630b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -682,19 +682,6 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
                             in_outer_xact);
     }
 
-    /*
-     * Now report ANALYZE to the stats collector.  For regular tables, we do
-     * it only if not doing inherited stats.  For partitioned tables, we only
-     * do it for inherited stats. (We're never called for not-inherited stats
-     * on partitioned tables anyway.)
-     *
-     * Reset the changes_since_analyze counter only if we analyzed all
-     * columns; otherwise, there is still work for auto-analyze to do.
-     */
-    if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-        pgstat_report_analyze(onerel, totalrows, totaldeadrows,
-                              (va_cols == NIL));
-
     /*
      * If this is a manual analyze of all columns of a permanent leaf
      * partition, and not doing inherited stats, also let the collector know
@@ -711,6 +698,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
         pgstat_report_anl_ancestors(RelationGetRelid(onerel));
     }
 
+    /*
+     * Now report ANALYZE to the stats collector.  For regular tables, we do
+     * it only if not doing inherited stats.  For partitioned tables, we only
+     * do it for inherited stats. (We're never called for not-inherited stats
+     * on partitioned tables anyway.)
+     *
+     * Reset the changes_since_analyze counter only if we analyzed all
+     * columns; otherwise, there is still work for auto-analyze to do.
+     */
+    if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+        pgstat_report_analyze(onerel, totalrows, totaldeadrows,
+                              (va_cols == NIL));
+
     /*
      * If this isn't part of VACUUM ANALYZE, let index AMs do cleanup.
      *
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..0a378487fb 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -826,13 +826,15 @@ expand_vacuum_rel(VacuumRelation *vrel, int options)
         ReleaseSysCache(tuple);
 
         /*
-         * If it is, make relation list entries for its partitions.  Note that
-         * the list returned by find_all_inheritors() includes the passed-in
-         * OID, so we have to skip that.  There's no point in taking locks on
-         * the individual partitions yet, and doing so would just add
-         * unnecessary deadlock risk.  For this last reason we do not check
-         * yet the ownership of the partitions, which get added to the list to
-         * process.  Ownership will be checked later on anyway.
+         * If it is, make relation list entries for its partitions in a
+         * bottom-to-top manner. Note that the list returned by
+         * find_all_inheritors() is in top-to-bottom ordering and includes the
+         * passed-in OID, so we have to reverse the order and skip the
+         * passed-in OID.  There's no point in taking locks on the individual
+         * partitions yet, and doing so would just add unnecessary deadlock
+         * risk.  For this last reason we do not check yet the ownership of the
+         * partitions, which get added to the list to process.  Ownership will
+         * be checked later on anyway.
          */
         if (include_parts)
         {
@@ -852,9 +854,12 @@ expand_vacuum_rel(VacuumRelation *vrel, int options)
                  * later.
                  */
                 oldcontext = MemoryContextSwitchTo(vac_context);
-                vacrels = lappend(vacrels, makeVacuumRelation(NULL,
-                                                              part_oid,
-                                                              vrel->va_cols));
+
+                /* Make the order reversed */
+                vacrels = lcons(makeVacuumRelation(NULL,
+                                                   part_oid,
+                                                   vrel->va_cols),
+                                vacrels);
                 MemoryContextSwitchTo(oldcontext);
             }
         }
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 3e70e4c788..5b9726225e 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -196,9 +196,9 @@ VACUUM (FULL) vacparted;
 VACUUM (FREEZE) vacparted;
 -- check behavior with duplicate column mentions
 VACUUM ANALYZE vacparted(a,b,a);
-ERROR:  column "a" of relation "vacparted" appears more than once
+ERROR:  column "a" of relation "vacparted1" appears more than once
 ANALYZE vacparted(a,b,b);
-ERROR:  column "b" of relation "vacparted" appears more than once
+ERROR:  column "b" of relation "vacparted1" appears more than once
 -- partitioned table with index
 CREATE TABLE vacparted_i (a int primary key, b varchar(100))
   PARTITION BY HASH (a);
@@ -239,7 +239,7 @@ ANALYZE vacparted (b), vactst;
 ANALYZE vactst, does_not_exist, vacparted;
 ERROR:  relation "does_not_exist" does not exist
 ANALYZE vactst (i), vacparted (does_not_exist);
-ERROR:  column "does_not_exist" of relation "vacparted" does not exist
+ERROR:  column "does_not_exist" of relation "vacparted1" does not exist
 ANALYZE vactst, vactst;
 BEGIN;  -- ANALYZE behaves differently inside a transaction block
 ANALYZE vactst, vactst;
@@ -319,24 +319,24 @@ WARNING:  skipping "pg_authid" --- only superuser can vacuum it
 -- independently.
 VACUUM vacowned_parted;
 WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
-WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM vacowned_part2;
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
 ANALYZE vacowned_parted;
 WARNING:  skipping "vacowned_parted" --- only table or database owner can analyze it
-WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 ANALYZE vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 ANALYZE vacowned_part2;
 WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
 VACUUM (ANALYZE) vacowned_parted;
 WARNING:  skipping "vacowned_parted" --- only table or database owner can vacuum it
-WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM (ANALYZE) vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM (ANALYZE) vacowned_part2;
@@ -389,22 +389,22 @@ ALTER TABLE vacowned_parted OWNER TO regress_vacuum;
 ALTER TABLE vacowned_part1 OWNER TO CURRENT_USER;
 SET ROLE regress_vacuum;
 VACUUM vacowned_parted;
-WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM vacowned_part2;
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
 ANALYZE vacowned_parted;
-WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 ANALYZE vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can analyze it
 ANALYZE vacowned_part2;
 WARNING:  skipping "vacowned_part2" --- only table or database owner can analyze it
 VACUUM (ANALYZE) vacowned_parted;
-WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 WARNING:  skipping "vacowned_part2" --- only table or database owner can vacuum it
+WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM (ANALYZE) vacowned_part1;
 WARNING:  skipping "vacowned_part1" --- only table or database owner can vacuum it
 VACUUM (ANALYZE) vacowned_part2;
-- 
2.27.0


pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Skipping logical replication transactions on subscriber side
Next
From: Masahiko Sawada
Date:
Subject: Re: [BUG] wrong refresh when ALTER SUBSCRIPTION ADD/DROP PUBLICATION