Re: Autovacuum on partitioned table (autoanalyze) - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Autovacuum on partitioned table (autoanalyze)
Date
Msg-id 20201130160712.GA20727@alvherre.pgsql
Whole thread Raw
In response to Re: Autovacuum on partitioned table (autoanalyze)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
I looked at both Yuzuko Hosoya's patch and Kyotaro Horiguchi's, and
think we're doing things in a quite complicated manner, which perhaps
could be done more easily.

Hosoya's patch has pgstat_report_analyze call pgstat_get_tab_entry() for
the table being vacuumed, then obtains the list of ancestors, and then
sends for each ancestor a new message containing the partition's
changes_since_analyze for that ancestor.  When stat collector receives
that message, it adds the number to the ancestor's m_changed_tuples.

Horiguchi's doing a similar thing, only differently: it is do_analyze_rel 
that reads the count from the collector (this time by calling SQL
function pg_stat_get_mod_since_analyze) and then sends number back to
the collector for each ancestor.


I suggest that a better way to do this, is to forget about the new
"partchanges" message completely.  Instead, let's add an array of
ancestors to the analyze message (borrowing from PgStat_MsgFuncstat).
Something like this:

#define PGSTAT_NUM_ANCESTORENTRIES    \
    ((PGSTAT_MSG_PAYLOAD - sizeof(Oid) - sizeof(Oid) - sizeof(bool) - \
      sizeof(bool) - sizeof(TimestampTz) - sizeof(PgStat_Counter) - \
      sizeof(PgStat_Counter) - sizeof(int)) / sizeof(Oid))
typedef struct PgStat_MsgAnalyze
{
    PgStat_MsgHdr  m_hdr;
    Oid            m_databaseid;
    Oid            m_tableoid;
    bool           m_autovacuum;
    bool           m_resetcounter;
    TimestampTz    m_analyzetime;
    PgStat_Counter m_live_tuples;
    PgStat_Counter m_dead_tuples;
    int            m_nancestors;
    Oid            m_ancestors[PGSTAT_NUM_ANCESTORENTRIES];
} PgStat_MsgAnalyze;

For non-partitions, m_nancestors would be 0, so the message would be
handled as today.  For partitions, the array carries the OID of all
ancestors.  When the collector receives this message, first it looks up
the pgstat entries for each ancestors in the array, and it adds the
partition's current changes_since_analyze to the ancestor's
changes_since_analyze.  Then it does things as currently, including
reset the changes_since_analyze counter for the partition.

Key point in this is that we don't need to read the number from
collector into the backend executing analyze.  We just *send* the data
about ancestors, and the collector knows what to do with it.


One possible complaint is: what if there are more ancestors that fit in
the message?  I propose that this problem can be ignored, since in order
to hit this, you'd need to have (1000-8-4-4-1-1-8-8-8-4)/4 = 238
ancestors (if my math is right).  I doubt we'll hit the need to use 238
levels of partitionings before a stat collector rewrite occurs ...

(It is possible to remove that restriction by doing more complicated
things such as sending the list of ancestor in a new type of message
that can be sent several times, prior to the analyze message itself, but
I don't think this is worth the trouble.)



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Feature improvement for pg_stat_statements
Next
From: Andrey Lepikhov
Date:
Subject: Cost overestimation of foreign JOIN