Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id 202108112233.xpz74i5pkgkr@alvherre.pgsql
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Andres Freund <andres@anarazel.de>)
Responses Re: Autovacuum on partitioned table (autoanalyze)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
After thinking about the described issues for a while, my proposal is to
completely revamp the way this feature works.  See below.

Now, the proposal seems awfully invasive, but it's *the* way I see to
avoid the pgstat traffic.  For pg14, maybe we can live with it, and just
use the smaller patches that Horiguchi-san and I have posted, which
solve the other issues; also, Euler Taveira suggested that we could add
a reloption to turn the feature off completely for some tables (maybe
make it off by default and have a reloption to turn it on for specific
partition hierarchies), so that it doesn't cause unduly pain for people
with large partitioning hierarchies.


* PgStat_StatTabEntry gets a new "Oid reportAncestorOid" member. This is
  the OID of a single partitioned ancestor, to which the changed-tuple
  counts are propagated up.
  Normally this is the topmost ancestor; but if the user wishes some
  intermediate ancestor to receive the counts they can use
  ALTER TABLE the_intermediate_ancestor SET (autovacuum_enabled=on).

* Corollary 1: for the normal case of single-level partitioning, the
  parent partitioned table behaves as currently.

* Corollary 2: for multi-level partitioning with no especially
  configured intermediate ancestors, only the leaf partitions and the
  top-level partitioned table will be analyzed.  Intermediate ancestors
  are ignored by autovacuum.

* Corollary 3: for multi-level partitioning with some intermediate
  ancestor(s) marked as autovacuum_enabled=on, that ancestor will
  receive all the counts from all of its partitions, so it will get
  analyzed itself; and it'll also forward those counts up to its
  report-ancestor.


* On ALTER TABLE .. ATTACH PARTITION or CREATE TABLE PARTITION AS,
  we send a message to collector with the analyze-ancestor OID.

* Backends running manual ANALYZE as well as autovacuum will examine
  each table's "relispartition" flag and its pgstat table entry; if it
  is a partition and doesn't have reportAncestorOid set, determine which
  ancestor should analyze counts be reported to; include this OID in the
  regular PgStat_MsgAnalyze.  This fixes the situation after a crash or
  other stats reset.  Also, it's not unduly expensive to do, because
  it's only in the rare case that the value sent by ATTACH was lost.

* Possible race condition in the previous step may cause multiple
  backends to send the same info.  Not a serious problem so we don't try
  to handle it.

* When tuple change counts for a partition are received by
  pgstat_recv_tabstat, they are propagated up to the indicated parent
  table in addition to being saved in the table itself.
  (Bonus points: when a table is attached or detached as a partition,
  the live tuples count is propagated to the newly acquired parent.)


What do people think of this?

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Next
From: Tomas Vondra
Date:
Subject: Re: Use extended statistics to estimate (Var op Var) clauses