Thread: Autovacuum on partitioned table

Autovacuum on partitioned table

From
yuzuko
Date:
Hello,

Greg reported in [1] before, autovacuum ignores partitioned tables.
That is, even if individual partitions’ statistics are updated, its parent's
statistics are not updated.  This is TODO for declarative partitioning.
As Amit mentioned in [2], a way to make parent's statistics from
partitions' statistics without scanning the partitions would be nice,
but it will need a lot of modifications.  So I tried to fix that using the
current analyze method.

The summary of the attached patch is as follows:
* If the relation is a partitioned table, check its children if they need
  vacuum or analyze.  Children need to do that are added to
  a table list for autovacuuum.  At least one child is added to the list,
  the partitioned table is also added to the list.  Then, autovacuum
  runs on all the tables in the list.
* If the partitioned table has foreign partitions, ignore them.

When the parent has children don't need vacuum/analyze or foreign
partitions, parent's stats are updated scanning the current data of all
children, so old stats and new are mixed within the partition tree.
Is that suitable?  Any thoughts?

[1] https://www.postgresql.org/message-id/CAM-w4HMQKC8hw7nB9TW3OV%2BhkB5OUcPtvr_U_EiSOjByoa-e4Q%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BHiwqEeZQ-H2OVbHZ%3Dn2RNNPF84Hygi1HC-MDwC-VnBjpA1%3DQ%40mail.gmail.com

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Laurenz Albe
Date:
On Mon, 2019-12-02 at 18:02 +0900, yuzuko wrote:
> Greg reported in [1] before, autovacuum ignores partitioned tables.
> That is, even if individual partitions’ statistics are updated, its parent's
> statistics are not updated.  This is TODO for declarative partitioning.
> As Amit mentioned in [2], a way to make parent's statistics from
> partitions' statistics without scanning the partitions would be nice,
> but it will need a lot of modifications.  So I tried to fix that using the
> current analyze method.
> 
> The summary of the attached patch is as follows:
> * If the relation is a partitioned table, check its children if they need
>   vacuum or analyze.  Children need to do that are added to
>   a table list for autovacuuum.  At least one child is added to the list,
>   the partitioned table is also added to the list.  Then, autovacuum
>   runs on all the tables in the list.

That means that all partitions are vacuumed if only one of them needs it,
right?  This will result in way more vacuuming than necessary.

Wouldn't it be an option to update the partitioned table's statistics
whenever one of the partitions is vacuumed?

Yours,
Laurenz Albe




Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hi Laurenz,

Thanks for the comments.

On Mon, Dec 2, 2019 at 6:19 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2019-12-02 at 18:02 +0900, yuzuko wrote:
> > Greg reported in [1] before, autovacuum ignores partitioned tables.
> > That is, even if individual partitions’ statistics are updated, its parent's
> > statistics are not updated.  This is TODO for declarative partitioning.
> > As Amit mentioned in [2], a way to make parent's statistics from
> > partitions' statistics without scanning the partitions would be nice,
> > but it will need a lot of modifications.  So I tried to fix that using the
> > current analyze method.
> >
> > The summary of the attached patch is as follows:
> > * If the relation is a partitioned table, check its children if they need
> >   vacuum or analyze.  Children need to do that are added to
> >   a table list for autovacuuum.  At least one child is added to the list,
> >   the partitioned table is also added to the list.  Then, autovacuum
> >   runs on all the tables in the list.
>
> That means that all partitions are vacuumed if only one of them needs it,
> right?  This will result in way more vacuuming than necessary.
>
Autovacuum runs only partitions need vacuum/analyze, so unnecessary
partitions stats are not updated.  However, to make parent's stats,
all children are scanned.  It might be a waste of time.

> Wouldn't it be an option to update the partitioned table's statistics
> whenever one of the partitions is vacuumed?
>
> Yours,
> Laurenz Albe
>

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hi,

As Laurenz commented in this thread, I tried adding option
to update parent's statistics during Autovacuum. To do that,
I propose supporting 'autovacuum_enabled' option already
exists on partitioned tables.

In the attached patch, you can use 'autovacuum_enabled' option
on partitioned table as usual, that is, a default value of this option
is true. So if you don't need autovacuum on a partitioned table,
you have to specify the option:
CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);

I'm not sure but I wonder if a suitable value as a default of
'autovacuum_enabled' for partitioned tables might be false.
Because autovacuum on *partitioned tables* requires scanning
all children to make partitioned tables' statistics.
But if the default value varies according to the relation,
is it confusing?  Any thoughts?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Masahiko Sawada
Date:
On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote:
>
> Hi,
>
> As Laurenz commented in this thread, I tried adding option
> to update parent's statistics during Autovacuum. To do that,
> I propose supporting 'autovacuum_enabled' option already
> exists on partitioned tables.
>
> In the attached patch, you can use 'autovacuum_enabled' option
> on partitioned table as usual, that is, a default value of this option
> is true. So if you don't need autovacuum on a partitioned table,
> you have to specify the option:
> CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);
>
> I'm not sure but I wonder if a suitable value as a default of
> 'autovacuum_enabled' for partitioned tables might be false.
> Because autovacuum on *partitioned tables* requires scanning
> all children to make partitioned tables' statistics.
> But if the default value varies according to the relation,
> is it confusing?  Any thoughts?

I don't look at the patch deeply yet but your patch seems to attempt
to vacuum on partitioned table. IIUC partitioned tables don't need to
be vacuumed and its all child tables are vacuumed instead if we pass
the partitioned table to vacuum() function. But autovacuum on child
tables is normally triggered since their statistics are updated.

I think it's a good idea to have that option but I think that doing
autovacuum on the parent table every time when autovacuum is triggered
on one of its child tables is very high cost especially when there are
a lot of child tables. Instead I thought it's more straight forward if
we compare the summation of the statistics of child tables (e.g.
n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we
consider the needs of autovacuum on the parent table. What do you
think?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
Hello,

On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote:
> > As Laurenz commented in this thread, I tried adding option
> > to update parent's statistics during Autovacuum. To do that,
> > I propose supporting 'autovacuum_enabled' option already
> > exists on partitioned tables.
> >
> > In the attached patch, you can use 'autovacuum_enabled' option
> > on partitioned table as usual, that is, a default value of this option
> > is true. So if you don't need autovacuum on a partitioned table,
> > you have to specify the option:
> > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);
> >
> > I'm not sure but I wonder if a suitable value as a default of
> > 'autovacuum_enabled' for partitioned tables might be false.
> > Because autovacuum on *partitioned tables* requires scanning
> > all children to make partitioned tables' statistics.
> > But if the default value varies according to the relation,
> > is it confusing?  Any thoughts?
>
> I don't look at the patch deeply yet but your patch seems to attempt
> to vacuum on partitioned table. IIUC partitioned tables don't need to
> be vacuumed and its all child tables are vacuumed instead if we pass
> the partitioned table to vacuum() function. But autovacuum on child
> tables is normally triggered since their statistics are updated.
>
> I think it's a good idea to have that option but I think that doing
> autovacuum on the parent table every time when autovacuum is triggered
> on one of its child tables is very high cost especially when there are
> a lot of child tables. Instead I thought it's more straight forward if
> we compare the summation of the statistics of child tables (e.g.
> n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we
> consider the needs of autovacuum on the parent table. What do you
> think?

There's this old email where Tom outlines a few ideas about triggering
auto-analyze on inheritance trees:

https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us

If I'm reading that correctly, the idea is to track only
changes_since_analyze and none of the finer-grained stats like
live/dead tuples for inheritance parents (partitioned tables) using
some new pgstat infrastrcture, an idea that Hosoya-san also seems to
be considering per an off-list discussion.  Besides the complexity of
getting that infrastructure in place, an important question is whether
the current system of applying threshold and scale factor to
changes_since_analyze should be used as-is for inheritance parents
(partitioned tables), because if users set those parameters similarly
to for regular tables, autovacuum might analyze partitioned tables
more than necessary.  We'll either need a different formula, or some
commentary in the documentation about how partitioned tables might
need different setting, or maybe both.

By the way, maybe I'm misunderstanding what Sawada-san wrote above,
but the only missing piece seems to be a way to trigger an *analyze*
on the parent tables -- to collect optimizer statistics for the
inheritance trees -- not vacuum, for which the existing system seems
enough.

Thanks,
Amit



Re: Autovacuum on partitioned table

From
Masahiko Sawada
Date:
On Tue, 28 Jan 2020 at 17:52, Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hello,
>
> On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote:
> > > As Laurenz commented in this thread, I tried adding option
> > > to update parent's statistics during Autovacuum. To do that,
> > > I propose supporting 'autovacuum_enabled' option already
> > > exists on partitioned tables.
> > >
> > > In the attached patch, you can use 'autovacuum_enabled' option
> > > on partitioned table as usual, that is, a default value of this option
> > > is true. So if you don't need autovacuum on a partitioned table,
> > > you have to specify the option:
> > > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);
> > >
> > > I'm not sure but I wonder if a suitable value as a default of
> > > 'autovacuum_enabled' for partitioned tables might be false.
> > > Because autovacuum on *partitioned tables* requires scanning
> > > all children to make partitioned tables' statistics.
> > > But if the default value varies according to the relation,
> > > is it confusing?  Any thoughts?
> >
> > I don't look at the patch deeply yet but your patch seems to attempt
> > to vacuum on partitioned table. IIUC partitioned tables don't need to
> > be vacuumed and its all child tables are vacuumed instead if we pass
> > the partitioned table to vacuum() function. But autovacuum on child
> > tables is normally triggered since their statistics are updated.
> >
> > I think it's a good idea to have that option but I think that doing
> > autovacuum on the parent table every time when autovacuum is triggered
> > on one of its child tables is very high cost especially when there are
> > a lot of child tables. Instead I thought it's more straight forward if
> > we compare the summation of the statistics of child tables (e.g.
> > n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we
> > consider the needs of autovacuum on the parent table. What do you
> > think?
>
> There's this old email where Tom outlines a few ideas about triggering
> auto-analyze on inheritance trees:
>
> https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us
>
> If I'm reading that correctly, the idea is to track only
> changes_since_analyze and none of the finer-grained stats like
> live/dead tuples for inheritance parents (partitioned tables) using
> some new pgstat infrastrcture, an idea that Hosoya-san also seems to
> be considering per an off-list discussion.  Besides the complexity of
> getting that infrastructure in place, an important question is whether
> the current system of applying threshold and scale factor to
> changes_since_analyze should be used as-is for inheritance parents
> (partitioned tables), because if users set those parameters similarly
> to for regular tables, autovacuum might analyze partitioned tables
> more than necessary.

How are you going to track changes_since_analyze of partitioned table?
It's just an idea but we can accumulate changes_since_analyze of
partitioned table by adding child tables's value after analyzing each
child table. And compare the partitioned tables value to the threshold
that is computed by (autovacuum_analyze_threshold  + total rows
including all child tables * autovacuum_analyze_scale_factor).

> By the way, maybe I'm misunderstanding what Sawada-san wrote above,
> but the only missing piece seems to be a way to trigger an *analyze*
> on the parent tables -- to collect optimizer statistics for the
> inheritance trees -- not vacuum, for which the existing system seems
> enough.

Right. We need only autoanalyze on partitioned tables.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hello,

> Besides the complexity of
> getting that infrastructure in place, an important question is whether
> the current system of applying threshold and scale factor to
> changes_since_analyze should be used as-is for inheritance parents
> (partitioned tables), because if users set those parameters similarly
> to for regular tables, autovacuum might analyze partitioned tables
> more than necessary.  We'll either need a different formula, or some
> commentary in the documentation about how partitioned tables might
> need different setting, or maybe both.
>
I'm not sure but I think we need new autovacuum parameters for
partitioned tables (autovacuum, autovacuum_analyze_threshold,
autovacuum_analyze_scale_factor) because whether it's necessary
to run autovacuum on partitioned tables will depend on users.
What do you think?

> How are you going to track changes_since_analyze of partitioned table?
> It's just an idea but we can accumulate changes_since_analyze of
> partitioned table by adding child tables's value after analyzing each
> child table. And compare the partitioned tables value to the threshold
> that is computed by (autovacuum_analyze_threshold  + total rows
> including all child tables * autovacuum_analyze_scale_factor).
>
The idea Sawada-san mentioned is similar to mine.  Also, for tracking
changes_since_analyze, we have to make partitioned table's statistics.
To do that, we can invent a new PgStat_StatPartitionedTabEntry based
on PgStat_StatTabEntry.  Through talking with Amit, I think the new structure
needs the following members:

tableid
changes_since_analyze
analyze_timestamp
analyze_count
autovac_analyze_timestamp
autovac_analyze_count

Vacuum doesn't run on partitioned tables, so I think members related to
(auto) vacuum need not be contained in the structure.

I'm still writing a patch.  I'll send it this week.
-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote:
> > Besides the complexity of
> > getting that infrastructure in place, an important question is whether
> > the current system of applying threshold and scale factor to
> > changes_since_analyze should be used as-is for inheritance parents
> > (partitioned tables), because if users set those parameters similarly
> > to for regular tables, autovacuum might analyze partitioned tables
> > more than necessary.  We'll either need a different formula, or some
> > commentary in the documentation about how partitioned tables might
> > need different setting, or maybe both.
> >
> I'm not sure but I think we need new autovacuum parameters for
> partitioned tables (autovacuum, autovacuum_analyze_threshold,
> autovacuum_analyze_scale_factor) because whether it's necessary
> to run autovacuum on partitioned tables will depend on users.
> What do you think?

Yes, we will need to first support those parameters on partitioned
tables.  Currently, you get:

create table p (a int) partition by list (a) with
(autovacuum_analyze_scale_factor=0);
ERROR:  unrecognized parameter "autovacuum_analyze_scale_factor"

> > How are you going to track changes_since_analyze of partitioned table?
> > It's just an idea but we can accumulate changes_since_analyze of
> > partitioned table by adding child tables's value after analyzing each
> > child table. And compare the partitioned tables value to the threshold
> > that is computed by (autovacuum_analyze_threshold  + total rows
> > including all child tables * autovacuum_analyze_scale_factor).
> >
> The idea Sawada-san mentioned is similar to mine.

So if I understand this idea correctly, a partitioned table's analyze
will only be triggered when partitions are analyzed.  That is,
inserts, updates, deletes of tuples in partitions will be tracked by
pgstat, which in turn is used by autovacuum to trigger analyze on
partitions.  Then, partitions changes_since_analyze is added into the
parent's changes_since_analyze, which in turn *may* trigger analyze
parent.  I said "may", because it would take multiple partition
analyzes to accumulate enough changes to trigger one on the parent.
Am I getting that right?

>  Also, for tracking
> changes_since_analyze, we have to make partitioned table's statistics.
> To do that, we can invent a new PgStat_StatPartitionedTabEntry based
> on PgStat_StatTabEntry.  Through talking with Amit, I think the new structure
> needs the following members:
>
> tableid
> changes_since_analyze
> analyze_timestamp
> analyze_count
> autovac_analyze_timestamp
> autovac_analyze_count
>
> Vacuum doesn't run on partitioned tables, so I think members related to
> (auto) vacuum need not be contained in the structure.

On second thought, maybe we don't need a new PgStat_ struct.  We can
just use what's used for regular tables and leave the fields that
don't make sense for partitioned tables set to 0, such as those that
track the counts of scans, tuples, etc.  That means we don't have to
mess with interfaces of existing functions, like this one:

static void relation_needs_vacanalyze(Oid relid,
                          AutoVacOpts *relopts,
                          Form_pg_class classForm,
                          PgStat_StatTabEntry *tabentry, ...

Thanks,
Amit



Re: Autovacuum on partitioned table

From
Michael Paquier
Date:
On Wed, Jan 29, 2020 at 05:56:40PM +0900, Amit Langote wrote:
> Yes, we will need to first support those parameters on partitioned
> tables.  Currently, you get:
>
> create table p (a int) partition by list (a) with
> (autovacuum_analyze_scale_factor=0);
> ERROR:  unrecognized parameter "autovacuum_analyze_scale_factor"

Worth the note: partitioned tables support zero reloptions as of now,
but there is the facility in place to allow that (see
RELOPT_KIND_PARTITIONED and partitioned_table_reloptions).
--
Michael

Attachment

Re: Autovacuum on partitioned table

From
Masahiko Sawada
Date:
On Wed, 29 Jan 2020 at 17:56, Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote:
> > > Besides the complexity of
> > > getting that infrastructure in place, an important question is whether
> > > the current system of applying threshold and scale factor to
> > > changes_since_analyze should be used as-is for inheritance parents
> > > (partitioned tables), because if users set those parameters similarly
> > > to for regular tables, autovacuum might analyze partitioned tables
> > > more than necessary.  We'll either need a different formula, or some
> > > commentary in the documentation about how partitioned tables might
> > > need different setting, or maybe both.
> > >
> > I'm not sure but I think we need new autovacuum parameters for
> > partitioned tables (autovacuum, autovacuum_analyze_threshold,
> > autovacuum_analyze_scale_factor) because whether it's necessary
> > to run autovacuum on partitioned tables will depend on users.
> > What do you think?
>
> Yes, we will need to first support those parameters on partitioned
> tables.  Currently, you get:
>
> create table p (a int) partition by list (a) with
> (autovacuum_analyze_scale_factor=0);
> ERROR:  unrecognized parameter "autovacuum_analyze_scale_factor"
>
> > > How are you going to track changes_since_analyze of partitioned table?
> > > It's just an idea but we can accumulate changes_since_analyze of
> > > partitioned table by adding child tables's value after analyzing each
> > > child table. And compare the partitioned tables value to the threshold
> > > that is computed by (autovacuum_analyze_threshold  + total rows
> > > including all child tables * autovacuum_analyze_scale_factor).
> > >
> > The idea Sawada-san mentioned is similar to mine.
>
> So if I understand this idea correctly, a partitioned table's analyze
> will only be triggered when partitions are analyzed.  That is,
> inserts, updates, deletes of tuples in partitions will be tracked by
> pgstat, which in turn is used by autovacuum to trigger analyze on
> partitions.  Then, partitions changes_since_analyze is added into the
> parent's changes_since_analyze, which in turn *may* trigger analyze
> parent.  I said "may", because it would take multiple partition
> analyzes to accumulate enough changes to trigger one on the parent.
> Am I getting that right?

Yeah that is what I meant. In addition, adding partition's
changes_since_analyze to its parent needs to be done recursively as
the parent table could also be a partitioned table.

>
> >  Also, for tracking
> > changes_since_analyze, we have to make partitioned table's statistics.
> > To do that, we can invent a new PgStat_StatPartitionedTabEntry based
> > on PgStat_StatTabEntry.  Through talking with Amit, I think the new structure
> > needs the following members:
> >
> > tableid
> > changes_since_analyze
> > analyze_timestamp
> > analyze_count
> > autovac_analyze_timestamp
> > autovac_analyze_count
> >
> > Vacuum doesn't run on partitioned tables, so I think members related to
> > (auto) vacuum need not be contained in the structure.
>
> On second thought, maybe we don't need a new PgStat_ struct.  We can
> just use what's used for regular tables and leave the fields that
> don't make sense for partitioned tables set to 0, such as those that
> track the counts of scans, tuples, etc.  That means we don't have to
> mess with interfaces of existing functions, like this one:
>
> static void relation_needs_vacanalyze(Oid relid,
>                           AutoVacOpts *relopts,
>                           Form_pg_class classForm,
>                           PgStat_StatTabEntry *tabentry, ...

+1

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Sun, Feb 2, 2020 at 12:53 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> On Wed, 29 Jan 2020 at 17:56, Amit Langote <amitlangote09@gmail.com> wrote:
> > On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote:
> > > > How are you going to track changes_since_analyze of partitioned table?
> > > > It's just an idea but we can accumulate changes_since_analyze of
> > > > partitioned table by adding child tables's value after analyzing each
> > > > child table. And compare the partitioned tables value to the threshold
> > > > that is computed by (autovacuum_analyze_threshold  + total rows
> > > > including all child tables * autovacuum_analyze_scale_factor).
> > > >
> > > The idea Sawada-san mentioned is similar to mine.
> >
> > So if I understand this idea correctly, a partitioned table's analyze
> > will only be triggered when partitions are analyzed.  That is,
> > inserts, updates, deletes of tuples in partitions will be tracked by
> > pgstat, which in turn is used by autovacuum to trigger analyze on
> > partitions.  Then, partitions changes_since_analyze is added into the
> > parent's changes_since_analyze, which in turn *may* trigger analyze
> > parent.  I said "may", because it would take multiple partition
> > analyzes to accumulate enough changes to trigger one on the parent.
> > Am I getting that right?
>
> Yeah that is what I meant. In addition, adding partition's
> changes_since_analyze to its parent needs to be done recursively as
> the parent table could also be a partitioned table.

That's a good point.  So, changes_since_analyze increments are
essentially propagated from leaf partitions to all the way up to the
root table, including any intermediate partitioned tables.  We'll need
to consider whether we should propagate only one level at a time (from
bottom of the tree) or update all parents up to the root, every time a
leaf partition is analyzed.  If we we do the latter, that might end up
triggering analyze on all the parents at the same time causing
repeated scanning of the same child tables in close intervals,
although setting analyze threshold and scale factor of the parent
tables of respective levels wisely can help avoid any negative impact
of that.

Thanks,
Amit



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hello,

I'm sorry for the delay.
Attach the latest patch based on discussion in this thread.

> > Yeah that is what I meant. In addition, adding partition's
> > changes_since_analyze to its parent needs to be done recursively as
> > the parent table could also be a partitioned table.
>
> That's a good point.  So, changes_since_analyze increments are
> essentially propagated from leaf partitions to all the way up to the
> root table, including any intermediate partitioned tables.  We'll need
> to consider whether we should propagate only one level at a time (from
> bottom of the tree) or update all parents up to the root, every time a
> leaf partition is analyzed.

For multi-level partitioning, all parents' changes_since_analyze will be
updated whenever analyzing a leaf partition in this patch.
Could you please check the patch again?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Amit Langote
Date:
Hosoya-san,

On Thu, Feb 20, 2020 at 3:34 PM yuzuko <yuzukohosoya@gmail.com> wrote:
> Attach the latest patch based on discussion in this thread.
>
> > > Yeah that is what I meant. In addition, adding partition's
> > > changes_since_analyze to its parent needs to be done recursively as
> > > the parent table could also be a partitioned table.
> >
> > That's a good point.  So, changes_since_analyze increments are
> > essentially propagated from leaf partitions to all the way up to the
> > root table, including any intermediate partitioned tables.  We'll need
> > to consider whether we should propagate only one level at a time (from
> > bottom of the tree) or update all parents up to the root, every time a
> > leaf partition is analyzed.
>
> For multi-level partitioning, all parents' changes_since_analyze will be
> updated whenever analyzing a leaf partition in this patch.
> Could you please check the patch again?

Thank you for the new patch.

I built and confirmed that the patch works.

Here are some comments:

* White-space noise in the diff (space used where tab is expected);
please check with git diff --check and fix.

* Names changes_tuples, m_changes_tuples should be changed_tuples and
m_changed_tuples, respectively?

* Did you intend to make it so that we now report *all* inherited
stats to the stats collector, not just those for partitioned tables?
IOW, do did you intend the new feature to also cover traditional
inheritance parents? I am talking about the following diff:

     /*
-     * Report ANALYZE to the stats collector, too.  However, if doing
-     * inherited stats we shouldn't report, because the stats collector only
-     * tracks per-table stats.  Reset the changes_since_analyze counter only
-     * if we analyzed all columns; otherwise, there is still work for
-     * auto-analyze to do.
+     * Report ANALYZE to the stats collector, too.  If the table is a
+     * partition, report changes_since_analyze of its parent because
+     * autovacuum process for partitioned tables needs it.  Reset the
+     * changes_since_analyze counter only if we analyzed all columns;
+     * otherwise, there is still work for auto-analyze to do.
      */
-    if (!inh)
-        pgstat_report_analyze(onerel, totalrows, totaldeadrows,
-                              (va_cols == NIL));
+    pgstat_report_analyze(onerel, totalrows, totaldeadrows,
+                          (va_cols == NIL));

* I may be missing something, but why doesn't do_autovacuum() fetch a
partitioned table's entry from pgstat instead of fetching that for
individual children and adding? That is, why do we need to do the
following:

+            /*
+             * If the relation is a partitioned table, we check it
using reltuples
+             * added up childrens' and changes_since_analyze tracked
by stats collector.


More later...

Thanks,
Amit



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Thu, Feb 20, 2020 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote:
> * I may be missing something, but why doesn't do_autovacuum() fetch a
> partitioned table's entry from pgstat instead of fetching that for
> individual children and adding? That is, why do we need to do the
> following:
>
> +            /*
> +             * If the relation is a partitioned table, we check it
> using reltuples
> +             * added up childrens' and changes_since_analyze tracked
> by stats collector.

Oh, it's only adding up children's pg_class.reltuple, not pgstat
stats.  We need to do that because a partitioned table's
pg_class.reltuples is always 0 and correctly so.  Sorry for not
reading the patch properly.

Thanks,
Amit



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Thu, Feb 20, 2020 at 5:32 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Feb 20, 2020 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > * I may be missing something, but why doesn't do_autovacuum() fetch a
> > partitioned table's entry from pgstat instead of fetching that for
> > individual children and adding? That is, why do we need to do the
> > following:
> >
> > +            /*
> > +             * If the relation is a partitioned table, we check it
> > using reltuples
> > +             * added up childrens' and changes_since_analyze tracked
> > by stats collector.
>
> Oh, it's only adding up children's pg_class.reltuple, not pgstat
> stats.  We need to do that because a partitioned table's
> pg_class.reltuples is always 0 and correctly so.  Sorry for not
> reading the patch properly.

Having read the relevant diffs again, I think this could be done
without duplicating code too much.  You seem to have added the same
logic in two places: do_autovacuum() and table_recheck_autovac().
More importantly, part of the logic of relation_needs_vacanalyze() is
duplicated in both of the aforementioned places, which I think is
unnecessary and undesirable if you consider maintainability. I think
we could just add the logic to compute reltuples for partitioned
tables at the beginning of relation_needs_vacanalyze() and be done.  I
have attached a delta patch to show what I mean.  Please check and
tell what you think.

Thanks,
Amit

Attachment

Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hello Amit-san,

Thanks for your comments.

> * White-space noise in the diff (space used where tab is expected);
> please check with git diff --check and fix.
Fixed it.

> * Names changes_tuples, m_changes_tuples should be changed_tuples and
> m_changed_tuples, respectively?
Yes, I modified it.

> * Did you intend to make it so that we now report *all* inherited
> stats to the stats collector, not just those for partitioned tables?
> IOW, do did you intend the new feature to also cover traditional
> inheritance parents? I am talking about the following diff:
>
I modified as follows to apply this feature to only declaretive partitioning.

- if (!inh)
-  pgstat_report_analyze(onerel, totalrows, totaldeadrows,
-         (va_cols == NIL));
+ if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ pgstat_report_analyze(onerel, totalrows, totaldeadrows,
+        (va_cols == NIL));


> Having read the relevant diffs again, I think this could be done
> without duplicating code too much.  You seem to have added the same
> logic in two places: do_autovacuum() and table_recheck_autovac().
> More importantly, part of the logic of relation_needs_vacanalyze() is
> duplicated in both of the aforementioned places, which I think is
> unnecessary and undesirable if you consider maintainability. I think
> we could just add the logic to compute reltuples for partitioned
> tables at the beginning of relation_needs_vacanalyze() and be done.
>
Yes, indeed.  Partitioned tables don't need to vacuum so I added new
checking process for partitioned tables outside relation_needs_vacanalyze().
However, partitioned tables' tabentry->n_dead_tuples are always 0 so
dovacuum is always false.   So I think that checking both auto vacuum
and analyze for partitioned tables doesn't matter.  I merged v3_amit_delta.patch
into the new patch and found minor bug, partitioned table's reltuples is
overwritten with it's classForm->reltuples, so I fixed it.

Also, I think partitioned tables' changes_since_analyze should be reported
only when Autovacuum process.  So I fixed it too.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Masahiko Sawada
Date:
On Fri, 21 Feb 2020 at 15:14, yuzuko <yuzukohosoya@gmail.com> wrote:
>
> Hello Amit-san,
>
> Thanks for your comments.
>
> > * White-space noise in the diff (space used where tab is expected);
> > please check with git diff --check and fix.
> Fixed it.
>
> > * Names changes_tuples, m_changes_tuples should be changed_tuples and
> > m_changed_tuples, respectively?
> Yes, I modified it.
>
> > * Did you intend to make it so that we now report *all* inherited
> > stats to the stats collector, not just those for partitioned tables?
> > IOW, do did you intend the new feature to also cover traditional
> > inheritance parents? I am talking about the following diff:
> >
> I modified as follows to apply this feature to only declaretive partitioning.
>
> - if (!inh)
> -  pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> -         (va_cols == NIL));
> + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> +        (va_cols == NIL));
>
>
> > Having read the relevant diffs again, I think this could be done
> > without duplicating code too much.  You seem to have added the same
> > logic in two places: do_autovacuum() and table_recheck_autovac().
> > More importantly, part of the logic of relation_needs_vacanalyze() is
> > duplicated in both of the aforementioned places, which I think is
> > unnecessary and undesirable if you consider maintainability. I think
> > we could just add the logic to compute reltuples for partitioned
> > tables at the beginning of relation_needs_vacanalyze() and be done.
> >
> Yes, indeed.  Partitioned tables don't need to vacuum so I added new
> checking process for partitioned tables outside relation_needs_vacanalyze().
> However, partitioned tables' tabentry->n_dead_tuples are always 0 so
> dovacuum is always false.   So I think that checking both auto vacuum
> and analyze for partitioned tables doesn't matter.  I merged v3_amit_delta.patch
> into the new patch and found minor bug, partitioned table's reltuples is
> overwritten with it's classForm->reltuples, so I fixed it.
>
> Also, I think partitioned tables' changes_since_analyze should be reported
> only when Autovacuum process.  So I fixed it too.

Thank you for updating the patch. I tested v4 patch.

After analyze or autoanalyze on partitioned table n_live_tup and
n_dead_tup are updated. However, TRUNCATE and VACUUM on the
partitioned table don't change these values until invoking analyze or
autoanalyze whereas in normal tables these values are reset or
changed. For example, with your patch:

* Before
 relname | n_live_tup | n_dead_tup | n_mod_since_analyze
---------+------------+------------+---------------------
 c1      |         11 |          0 |                   0
 c2      |         11 |          0 |                   0
 c3      |         11 |          0 |                   0
 c4      |         11 |          0 |                   0
 c5      |         11 |          0 |                   0
 parent  |         55 |          0 |                   0
(6 rows)

* After 'TRUNCATE parent'
 relname | n_live_tup | n_dead_tup | n_mod_since_analyze
---------+------------+------------+---------------------
 c1      |          0 |          0 |                   0
 c2      |          0 |          0 |                   0
 c3      |          0 |          0 |                   0
 c4      |          0 |          0 |                   0
 c5      |          0 |          0 |                   0
 parent  |         55 |          0 |                   0
(6 rows)

* Before
 relname | n_live_tup | n_dead_tup | n_mod_since_analyze
---------+------------+------------+---------------------
 c1      |          0 |         11 |                   0
 c2      |          0 |         11 |                   0
 c3      |          0 |         11 |                   0
 c4      |          0 |         11 |                   0
 c5      |          0 |         11 |                   0
 parent  |          0 |         55 |                   0
(6 rows)

* After 'VACUUM parent'
 relname | n_live_tup | n_dead_tup | n_mod_since_analyze
---------+------------+------------+---------------------
 c1      |          0 |          0 |                   0
 c2      |          0 |          0 |                   0
 c3      |          0 |          0 |                   0
 c4      |          0 |          0 |                   0
 c5      |          0 |          0 |                   0
 parent  |          0 |         55 |                   0
(6 rows)

We can make it work correctly but I think perhaps we can skip updating
statistics values of partitioned tables other than n_mod_since_analyze
as the first step. Because if we support also n_live_tup and
n_dead_tup, user might get confused that other statistics values such
as seq_scan, seq_tup_read however are not supported.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Fri, Feb 21, 2020 at 4:47 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> Thank you for updating the patch. I tested v4 patch.
>
> After analyze or autoanalyze on partitioned table n_live_tup and
> n_dead_tup are updated. However, TRUNCATE and VACUUM on the
> partitioned table don't change these values until invoking analyze or
> autoanalyze whereas in normal tables these values are reset or
> changed. For example, with your patch:
>
> * Before
>  relname | n_live_tup | n_dead_tup | n_mod_since_analyze
> ---------+------------+------------+---------------------
>  c1      |         11 |          0 |                   0
>  c2      |         11 |          0 |                   0
>  c3      |         11 |          0 |                   0
>  c4      |         11 |          0 |                   0
>  c5      |         11 |          0 |                   0
>  parent  |         55 |          0 |                   0
> (6 rows)
>
> * After 'TRUNCATE parent'
>  relname | n_live_tup | n_dead_tup | n_mod_since_analyze
> ---------+------------+------------+---------------------
>  c1      |          0 |          0 |                   0
>  c2      |          0 |          0 |                   0
>  c3      |          0 |          0 |                   0
>  c4      |          0 |          0 |                   0
>  c5      |          0 |          0 |                   0
>  parent  |         55 |          0 |                   0
> (6 rows)
>
> * Before
>  relname | n_live_tup | n_dead_tup | n_mod_since_analyze
> ---------+------------+------------+---------------------
>  c1      |          0 |         11 |                   0
>  c2      |          0 |         11 |                   0
>  c3      |          0 |         11 |                   0
>  c4      |          0 |         11 |                   0
>  c5      |          0 |         11 |                   0
>  parent  |          0 |         55 |                   0
> (6 rows)
>
> * After 'VACUUM parent'
>  relname | n_live_tup | n_dead_tup | n_mod_since_analyze
> ---------+------------+------------+---------------------
>  c1      |          0 |          0 |                   0
>  c2      |          0 |          0 |                   0
>  c3      |          0 |          0 |                   0
>  c4      |          0 |          0 |                   0
>  c5      |          0 |          0 |                   0
>  parent  |          0 |         55 |                   0
> (6 rows)
>
> We can make it work correctly but I think perhaps we can skip updating
> statistics values of partitioned tables other than n_mod_since_analyze
> as the first step. Because if we support also n_live_tup and
> n_dead_tup, user might get confused that other statistics values such
> as seq_scan, seq_tup_read however are not supported.

+1, that makes sense.

Thanks,
Amit



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hi,

Thanks for reviewing the patch.

> > We can make it work correctly but I think perhaps we can skip updating
> > statistics values of partitioned tables other than n_mod_since_analyze
> > as the first step. Because if we support also n_live_tup and
> > n_dead_tup, user might get confused that other statistics values such
> > as seq_scan, seq_tup_read however are not supported.
>
> +1, that makes sense.
>
Yes, Indeed.  I modified it not to update statistics other than
n_mod_since_analyze.
Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 as
msg.m_live_tuples and m_dead_tuples when the relation is partitioned.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Masahiko Sawada
Date:
On Wed, 26 Feb 2020 at 11:33, yuzuko <yuzukohosoya@gmail.com> wrote:
>
> Hi,
>
> Thanks for reviewing the patch.
>
> > > We can make it work correctly but I think perhaps we can skip updating
> > > statistics values of partitioned tables other than n_mod_since_analyze
> > > as the first step. Because if we support also n_live_tup and
> > > n_dead_tup, user might get confused that other statistics values such
> > > as seq_scan, seq_tup_read however are not supported.
> >
> > +1, that makes sense.
> >
> Yes, Indeed.  I modified it not to update statistics other than
> n_mod_since_analyze.
> Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 as
> msg.m_live_tuples and m_dead_tuples when the relation is partitioned.
>

Thank you for updating the patch. I'll look at it. I'd recommend to
register this patch to the next commit fest so at not to forget.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
Alvaro Herrera
Date:
Hello Yuzuko,

> +     * Report ANALYZE to the stats collector, too.  If the table is a
> +     * partition, report changes_since_analyze of its parent because
> +     * autovacuum process for partitioned tables needs it.  Reset the
> +     * changes_since_analyze counter only if we analyzed all columns;
> +     * otherwise, there is still work for auto-analyze to do.
>       */
> -    if (!inh)
> -        pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> -                              (va_cols == NIL));
> +    if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> +    pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> +                          (va_cols == NIL));
 
Hmm, I think the comment has a bug: it says "report ... of its parent"
but the report is of the same rel.  (The pgstat_report_analyze line is
mis-indented also).


>      /*
> +     * If the relation is a partitioned table, we must add up children's
> +     * reltuples.
> +     */
> +    if (classForm->relkind == RELKIND_PARTITIONED_TABLE)
> +    {
> +        List     *children;
> +        ListCell *lc;
> +
> +        reltuples = 0;
> +
> +        /* Find all members of inheritance set taking AccessShareLock */
> +        children = find_all_inheritors(relid, AccessShareLock, NULL);
> +
> +        foreach(lc, children)
> +        {
> +            Oid        childOID = lfirst_oid(lc);
> +            HeapTuple  childtuple;
> +            Form_pg_class childclass;
> +
> +            /* Ignore the parent table */
> +            if (childOID == relid)
> +                continue;

I think this loop counts partitioned partitions multiple times, because
we add up reltuples for all levels, no?  (If I'm wrong, that is, if
a partitioned rel does not have reltuples, then why skip the parent?)
 
> +    /*
> +     * If the table is a leaf partition, tell the stats collector its parent's
> +     * changes_since_analyze for auto analyze
> +     */
> +    if (IsAutoVacuumWorkerProcess() &&
> +        rel->rd_rel->relispartition &&
> +        !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))

I'm not sure I understand why we do this only on autovac. Why not all
analyzes?

> +    {
> +        Oid      parentoid;
> +        Relation parentrel;
> +        PgStat_StatDBEntry *dbentry;
> +        PgStat_StatTabEntry *tabentry;
> +
> +        /* Get its parent table's Oid and relation */
> +        parentoid = get_partition_parent(RelationGetRelid(rel));
> +        parentrel = table_open(parentoid, AccessShareLock);

Climbing up the partitioning hierarchy acquiring locks on ancestor
relations opens up for deadlocks.  It's better to avoid that.  (As a
test, you could try what happens if you lock the topmost relation with
access-exclusive and leave a transaction open, then have autoanalyze
run).  At the same time, I wonder if it's sensible to move one level up
here, and also have pgstat_report_partanalyze move more levels up.

> + * pgstat_report_partanalyze() -
> + *
> + *    Tell the collector about the parent table of which partition just analyzed.
> + *
> + * Caller must provide a child's changes_since_analyze as a parents.

I'm not sure what the last line is trying to say.


Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
Hosoya-san,

Thanks for the new patch.

On Wed, Feb 26, 2020 at 11:33 AM yuzuko <yuzukohosoya@gmail.com> wrote:
> Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 as
> msg.m_live_tuples and m_dead_tuples when the relation is partitioned.

Some comments:

+ * PgStat_MsgPartAnalyze        Sent by the backend or autovacuum daemon
+ *                              after ANALYZE for partitioned tables

Looking at the way this message is used, it does not seem to be an
"analyze" message and also it's not sent "after ANALYZE of partitioned
tables", but really after ANALYZE of leaf partitions.  Analyze (for
both partitioned tables and leaf partitions) is reported as a
PgStat_MsgAnalyze message as before.  It seems that
PgStat_MsgPartAnalyze is only sent to update a leaf partition's
parent's (and recursively any grandparents') changes_since_analyze
counters, so maybe we should find a different name for it.  Maybe,
PgStat_MsgPartChanges and accordingly the message type enum value.

     /*
-     * Report ANALYZE to the stats collector, too.  However, if doing
-     * inherited stats we shouldn't report, because the stats collector only
-     * tracks per-table stats.  Reset the changes_since_analyze counter only
-     * if we analyzed all columns; otherwise, there is still work for
-     * auto-analyze to do.
+     * Report ANALYZE to the stats collector, too.  If the table is a
+     * partition, report changes_since_analyze of its parent because
+     * autovacuum process for partitioned tables needs it.  Reset the
+     * changes_since_analyze counter only if we analyzed all columns;
+     * otherwise, there is still work for auto-analyze to do.
      */

The new comment says "partitions", which we typically use to refer to
a child table, but this comment really talks about parent tables.  Old
comment says we don't report "inherited stats", presumably because
stats collector lacks the infrastructure to distinguish a table's
inherited stats and own stats, at least in the case of traditional
inheritance.  With this patch, we are making an exception for
partitioned tables, because we are also teaching the stats collector
to maintain at least changes_since_analyze for them that accumulates
counts of changed tuples from partitions.

It seems Alvaro already reported some of the other issues I had with
the patch, such as why partanalyze messages are only sent from a
autovacuum worker.

Thanks,
Amit



Re: Autovacuum on partitioned table

From
Amit Langote
Date:
On Fri, Feb 28, 2020 at 11:25 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> >       /*
> > +      * If the relation is a partitioned table, we must add up children's
> > +      * reltuples.
> > +      */
> > +     if (classForm->relkind == RELKIND_PARTITIONED_TABLE)
> > +     {
> > +             List     *children;
> > +             ListCell *lc;
> > +
> > +             reltuples = 0;
> > +
> > +             /* Find all members of inheritance set taking AccessShareLock */
> > +             children = find_all_inheritors(relid, AccessShareLock, NULL);
> > +
> > +             foreach(lc, children)
> > +             {
> > +                     Oid        childOID = lfirst_oid(lc);
> > +                     HeapTuple  childtuple;
> > +                     Form_pg_class childclass;
> > +
> > +                     /* Ignore the parent table */
> > +                     if (childOID == relid)
> > +                             continue;
>
> I think this loop counts partitioned partitions multiple times, because
> we add up reltuples for all levels, no?  (If I'm wrong, that is, if
> a partitioned rel does not have reltuples, then why skip the parent?)

+1, no need to skip partitioned tables here a their reltuples is always 0.

> > +     /*
> > +      * If the table is a leaf partition, tell the stats collector its parent's
> > +      * changes_since_analyze for auto analyze

Maybe write:

For a leaf partition, add its current changes_since_analyze into its
ancestors' counts.  This must be done before sending the ANALYZE
message as it resets the partition's changes_since_analyze counter.

> > +      */
> > +     if (IsAutoVacuumWorkerProcess() &&
> > +             rel->rd_rel->relispartition &&
> > +             !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
>
> I'm not sure I understand why we do this only on autovac. Why not all
> analyzes?

+1.  If there is a reason, it should at least be documented in the
comment above.

> > +     {
> > +             Oid      parentoid;
> > +             Relation parentrel;
> > +             PgStat_StatDBEntry *dbentry;
> > +             PgStat_StatTabEntry *tabentry;
> > +
> > +             /* Get its parent table's Oid and relation */
> > +             parentoid = get_partition_parent(RelationGetRelid(rel));
> > +             parentrel = table_open(parentoid, AccessShareLock);
>
> Climbing up the partitioning hierarchy acquiring locks on ancestor
> relations opens up for deadlocks.  It's better to avoid that.  (As a
> test, you could try what happens if you lock the topmost relation with
> access-exclusive and leave a transaction open, then have autoanalyze
> run).  At the same time, I wonder if it's sensible to move one level up
> here, and also have pgstat_report_partanalyze move more levels up.

Maybe fetch all ancestors here and process from the top.  But as we'd
have locked the leaf partition long before we got here, maybe we
should lock ancestors even before we start analyzing the leaf
partition? AccessShareLock should be enough on the ancestors because
we're not actually analyzing them.

(It appears get_partition_ancestors() returns a list where the root
parent is the last element, so need to be careful with that.)

Thanks,
Amit



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hello,

Thank you for reviewing.

> > > +      */
> > > +     if (IsAutoVacuumWorkerProcess() &&
> > > +             rel->rd_rel->relispartition &&
> > > +             !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
> >
> > I'm not sure I understand why we do this only on autovac. Why not all
> > analyzes?
>
> +1.  If there is a reason, it should at least be documented in the
> comment above.
>
When we analyze partitioned table by ANALYZE command,
all inheritors including partitioned table are analyzed
at the same time.  In this case, if we call pgstat_report_partanalyze,
partitioned table's changes_since_analyze is updated
according to the number of analyzed tuples of partitions
as follows.  But I think it should be 0.

\d+ p
                               Partitioned table "public.p"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
 i      | integer |           |          |         | plain   |              |
Partition key: RANGE (i)
Partitions: p_1 FOR VALUES FROM (0) TO (100),
                 p_2 FOR VALUES FROM (100) TO (200)

insert into p select * from generate_series(0,199);
INSERT 0 200

(before analyze)
-[ RECORD 1 ]-------+------------------
relname             | p
n_mod_since_analyze | 0
-[ RECORD 2 ]-------+------------------
relname             | p_1
n_mod_since_analyze | 100
-[ RECORD 3 ]-------+------------------
relname             | p_2
n_mod_since_analyze | 100

(after analyze)
-[ RECORD 1 ]-------+------------------
relname             | p
n_mod_since_analyze | 200
-[ RECORD 2 ]-------+------------------
relname             | p_1
n_mod_since_analyze | 0
-[ RECORD 3 ]-------+------------------
relname             | p_2
n_mod_since_analyze | 0


I think if we analyze partition tree in order from leaf partitions
to root table, this problem can be fixed.
What do you think about it?

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hello,

> > > > +      */
> > > > +     if (IsAutoVacuumWorkerProcess() &&
> > > > +             rel->rd_rel->relispartition &&
> > > > +             !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
> > >
> > > I'm not sure I understand why we do this only on autovac. Why not all
> > > analyzes?
> >
> > +1.  If there is a reason, it should at least be documented in the
> > comment above.
> >
> When we analyze partitioned table by ANALYZE command,
> all inheritors including partitioned table are analyzed
> at the same time.  In this case, if we call pgstat_report_partanalyze,
> partitioned table's changes_since_analyze is updated
> according to the number of analyzed tuples of partitions
> as follows.  But I think it should be 0.
>
> \d+ p
>                                Partitioned table "public.p"
>  Column |  Type   | Collation | Nullable | Default | Storage | Stats
> target | Description
> --------+---------+-----------+----------+---------+---------+--------------+-------------
>  i      | integer |           |          |         | plain   |              |
> Partition key: RANGE (i)
> Partitions: p_1 FOR VALUES FROM (0) TO (100),
>                  p_2 FOR VALUES FROM (100) TO (200)
>
> insert into p select * from generate_series(0,199);
> INSERT 0 200
>
> (before analyze)
> -[ RECORD 1 ]-------+------------------
> relname             | p
> n_mod_since_analyze | 0
> -[ RECORD 2 ]-------+------------------
> relname             | p_1
> n_mod_since_analyze | 100
> -[ RECORD 3 ]-------+------------------
> relname             | p_2
> n_mod_since_analyze | 100
>
> (after analyze)
> -[ RECORD 1 ]-------+------------------
> relname             | p
> n_mod_since_analyze | 200
> -[ RECORD 2 ]-------+------------------
> relname             | p_1
> n_mod_since_analyze | 0
> -[ RECORD 3 ]-------+------------------
> relname             | p_2
> n_mod_since_analyze | 0
>
>
> I think if we analyze partition tree in order from leaf partitions
> to root table, this problem can be fixed.
> What do you think about it?
>

Attach the new patch fixes the above problem.  Also, This patch
includes modifications accoring to all comments Alvaro and Amit
mentioned before in this thread.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table

From
Alvaro Herrera
Date:
On 2020-Mar-18, yuzuko wrote:

> > I think if we analyze partition tree in order from leaf partitions
> > to root table, this problem can be fixed.
> > What do you think about it?
> 
> Attach the new patch fixes the above problem.

Thanks for the new version.

I'm confused about some error messages in the regression test when a
column is mentioned twice, that changed from mentioning the table named
in the vacuum command, to mentioning the first partition.  Is that
because you changed an lappend() to lcons()?  I think you do this so
that the counters accumulate for the topmost parent that will be
processed at the end.  I'm not sure I like that too much ... I think
that needs more thought.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Autovacuum on partitioned table (autoanalyze)

From
Justin Pryzby
Date:
Regarding this patch:

+     * the ANALYZE message as it resets the partition's changes_since_analze
=> analyze

+     * If the relation is a partitioned table, we must add up children's
childrens'

The approach in general:

I see an issue for timeseries data, where only the most recent partition is
being inserted into, and the histogram endpoint is being continuously extended
(this is our use-case).  The most recent partition will be analyzed pretty
often, and I think it'll be problematic if its parent doesn't get similar
treatment.  Let's say there are 12 historic, monthly children with 1e6 tuples
each, and the 13th child has 2e5 tuples (6 days into the month).  It's analyzed
when it grows by 20% (1.2 days), but at that point the parent has only grown by
12x less (~2%) and won't be analyzed until 12x further into the future (14
days).  Its histogram is 12x longer (geometrically), but the histogram changed
by just as much (arithmetically).  That's an issue for a query over "the last
few days"; if that's past the end of the histogram bound, the query planner
will estimate about ~0 tuples, and tend to give cascades of nested loops.  I'm
biased, but I'm guessing that's too common a use case to answer that the proper
fix is to set the parent's analyze_scale_factor=0.0005.  I think that suggests
that the parent might sometimes need to be analyzed every time any of its
children are.  In other cases (like probably any hash partitioning), that'd be
excessive, and maybe the default settings shouldn't do that, but I think that
behavior ought to be possible, and I think this patch doesn't allow that.  

In the past, I think there's was talk that maybe someone would invent a clever
way to dynamically combine all the partitions' statistics, so analyzing the
parent wasn't needed.  I think that's easy enough for reltuples, MCV, and I
think histogram, but ISTM that ndistinct is simultaneously important to get
right and hard to do so.  It depends on whether it's the partition key, which
now can be an arbitrary expression.  Extended stats further complicates it,
even if we didn't aim to dynamically compute extended stats for a parent.

While writing this, it occured to me that we could use "CREATE STATISTICS" as a
way to mark a partitioned table (or certain columns) as needing to be handled
by analyze.  I understand "CREATE STATs" was intended to (eventually) allow
implementing stats on expressions without using "create index" as a hack.  So
if it's excessive to automatically analyze a parent table when any of its
children are analyzed, maybe it's less excessive to only do that for parents
with a stats object, and only on the given colums.  I realize this patch is
alot less useful if it requires to do anything extra/nondefault, and it's
desirable to work without creating a stats object at all.  Also, using CREATE
STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but
doesn't help to reduce the I/O cost, which is significant.

-- 
Justin



Re: Autovacuum on partitioned table

From
yuzuko
Date:
Hi Alvaro,
Thank you for your comments.

> I'm confused about some error messages in the regression test when a
> column is mentioned twice, that changed from mentioning the table named
> in the vacuum command, to mentioning the first partition.  Is that
> because you changed an lappend() to lcons()?  I think you do this so
> that the counters accumulate for the topmost parent that will be
> processed at the end.  I'm not sure I like that too much ... I think
> that needs more thought.
>
I couldn't come up with a solution that counts changes_since_analyze
precisely when analyzing partitioned trees by ANALYZE command based on
this approach (update all ancestor's changes_since_analyze according to the
number of analyzed tuples of leaf partitions).

So I tried another approach to run autovacuum on partitioned tables.
In this approach, all ancestors' changed_tuples are updated when commiting
transactions (at AtEOXact_PgStat) according to the number of inserted/updated/
deleted tuples of leaf partitions.

Attach the latest patch.  What do you think?
 --
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table (autoanalyze)

From
Justin Pryzby
Date:
Not sure if you saw my earlier message ?

I think it ought to be possible to configure this feature such that an
auto-analyze on any child partition would trigger analyze of the parent.  I
think that would be important for maintaining accurate stats of the partition
key column for many cases involving RANGE-partitioned tables, which are likely
to rely on histogram rather than MCVs.

On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote:
> Regarding this patch:
> 
> +     * the ANALYZE message as it resets the partition's changes_since_analze
> => analyze
> 
> +     * If the relation is a partitioned table, we must add up children's
> childrens'
> 
> The approach in general:
> 
> I see an issue for timeseries data, where only the most recent partition is
> being inserted into, and the histogram endpoint is being continuously extended
> (this is our use-case).  The most recent partition will be analyzed pretty
> often, and I think it'll be problematic if its parent doesn't get similar
> treatment.  Let's say there are 12 historic, monthly children with 1e6 tuples
> each, and the 13th child has 2e5 tuples (6 days into the month).  It's analyzed
> when it grows by 20% (1.2 days), but at that point the parent has only grown by
> 12x less (~2%) and won't be analyzed until 12x further into the future (14
> days).  Its histogram is 12x longer (geometrically), but the histogram changed
> by just as much (arithmetically).  That's an issue for a query over "the last
> few days"; if that's past the end of the histogram bound, the query planner
> will estimate about ~0 tuples, and tend to give cascades of nested loops.  I'm
> biased, but I'm guessing that's too common a use case to answer that the proper
> fix is to set the parent's analyze_scale_factor=0.0005.  I think that suggests
> that the parent might sometimes need to be analyzed every time any of its
> children are.  In other cases (like probably any hash partitioning), that'd be
> excessive, and maybe the default settings shouldn't do that, but I think that
> behavior ought to be possible, and I think this patch doesn't allow that.  
> 
> In the past, I think there's was talk that maybe someone would invent a clever
> way to dynamically combine all the partitions' statistics, so analyzing the
> parent wasn't needed.  I think that's easy enough for reltuples, MCV, and I
> think histogram, but ISTM that ndistinct is simultaneously important to get
> right and hard to do so.  It depends on whether it's the partition key, which
> now can be an arbitrary expression.  Extended stats further complicates it,
> even if we didn't aim to dynamically compute extended stats for a parent.
> 
> While writing this, it occured to me that we could use "CREATE STATISTICS" as a
> way to mark a partitioned table (or certain columns) as needing to be handled
> by analyze.  I understand "CREATE STATs" was intended to (eventually) allow
> implementing stats on expressions without using "create index" as a hack.  So
> if it's excessive to automatically analyze a parent table when any of its
> children are analyzed, maybe it's less excessive to only do that for parents
> with a stats object, and only on the given colums.  I realize this patch is
> alot less useful if it requires to do anything extra/nondefault, and it's
> desirable to work without creating a stats object at all.  Also, using CREATE
> STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but
> doesn't help to reduce the I/O cost, which is significant.
> 
> -- 
> Justin

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581



Re: Autovacuum on partitioned table (autoanalyze)

From
yuzuko
Date:
Hi Justin,

Thank you for commens.

On Tue, Apr 7, 2020 at 12:32 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Not sure if you saw my earlier message ?
>
I'm sorry, I didn't notice for a while.

> I think it ought to be possible to configure this feature such that an
> auto-analyze on any child partition would trigger analyze of the parent.  I
> think that would be important for maintaining accurate stats of the partition
> key column for many cases involving RANGE-partitioned tables, which are likely
> to rely on histogram rather than MCVs.
>
I read your previous email and understand that it would be neccesary to analyze
partitioned tables automatically when any of its children are analyzed.  In my
first patch, auto-analyze on partitioned tables worked like this but there were
some comments about performance of autovacuum, especially when partitioned
tables have a lot of children.

The latest patch lets users set different autovacuum configuration for
each partitioned
tables like this,
  create table p3(i int) partition by range(i) with
   (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
so users can configure those parameters according to partitioning strategies
and other requirements.

So I think this patch can solve problem you mentioned.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Autovacuum on partitioned table (autoanalyze)

From
Justin Pryzby
Date:
On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > I think it ought to be possible to configure this feature such that an
> > auto-analyze on any child partition would trigger analyze of the parent.  I
> > think that would be important for maintaining accurate stats of the partition
> > key column for many cases involving RANGE-partitioned tables, which are likely
> > to rely on histogram rather than MCVs.
>
> I read your previous email and understand that it would be neccesary to analyze
> partitioned tables automatically when any of its children are analyzed.  In my
> first patch, auto-analyze on partitioned tables worked like this but there were
> some comments about performance of autovacuum, especially when partitioned
> tables have a lot of children.

I reread that part.  There was also confusion between autovacuum vacuum and
autovacuum analyze.

I agree that it *might* be a problem to analyze the parent every time any child
is analyzed.

But it might also be what's needed for this feature to be useful.

> The latest patch lets users set different autovacuum configuration for
> each partitioned
> tables like this,
>   create table p3(i int) partition by range(i) with
>    (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
> so users can configure those parameters according to partitioning strategies
> and other requirements.
> 
> So I think this patch can solve problem you mentioned.

I don't think that adequately allows what's needed.

I think it out to be possible to get the "analyze parent whenever a child is
analyzed" behavior easily, without having to compute new thershold parameters
every time one adds partitions, detaches partitions, loades 10x more data into
one of the partitions, load only 10% as much data into the latest partition,
etc.

For example, say a new customer has bunch of partitioned tables which each
currently have only one partition (for the current month), and that's expected
to grow to at least 20+ partitions (2+ years of history).  How does one set the
partitioned table's auto-analyze parameters to analyze whenever any child is
analyzed ?  I don't think it should be needed to update it every month after
computing sum(child tuples).

Possibly you could allow that behavior for some special values of the
threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
whenever any of its children are analyzed.

I think that use case and that need would be common, but I'd like to hear what
others think.

-- 
Justin



Re: Autovacuum on partitioned table (autoanalyze)

From
Amit Langote
Date:
On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > The latest patch lets users set different autovacuum configuration for
> > each partitioned
> > tables like this,
> >   create table p3(i int) partition by range(i) with
> >    (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
> > so users can configure those parameters according to partitioning strategies
> > and other requirements.
> >
> > So I think this patch can solve problem you mentioned.
>
> I don't think that adequately allows what's needed.
>
> I think it out to be possible to get the "analyze parent whenever a child is
> analyzed" behavior easily, without having to compute new thershold parameters
> every time one adds partitions, detaches partitions, loades 10x more data into
> one of the partitions, load only 10% as much data into the latest partition,
> etc.
>
> For example, say a new customer has bunch of partitioned tables which each
> currently have only one partition (for the current month), and that's expected
> to grow to at least 20+ partitions (2+ years of history).  How does one set the
> partitioned table's auto-analyze parameters to analyze whenever any child is
> analyzed ?  I don't think it should be needed to update it every month after
> computing sum(child tuples).
>
> Possibly you could allow that behavior for some special values of the
> threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
> whenever any of its children are analyzed.
>
> I think that use case and that need would be common, but I'd like to hear what
> others think.

Having to constantly pay attention to whether a parent's
analyze_threshold/scale_factor is working as intended would surely be
an annoyance, so I tend to agree that we might need more than just the
ability to set analyze_threshold/scale_factor on parent tables.
However, I think we can at least start with being able to do
*something* here. :)  Maybe others think that this shouldn't be
considered committable until we figure out a good analyze threshold
calculation formula to apply to parent tables.

For the cases in which parent's tuple count grows at about the same
rate as partitions (hash mainly), I guess the existing formula more or
less works. That is, we can set the parent's threshold/scale_factor
same as partitions' and the autovacuum's existing formula will ensure
that the parent is auto-analyzed in time and not more than needed. For
time-series partitioning, the same formula won't work, as you have
detailed in your comments. Is there any other partitioning pattern for
which the current formula won't work?

Considering that, how about having, say, a
autovacuum_analyze_partition_parent_frequency, with string values
'default', 'partition'? -- 'default' assumes the same formula as
regular tables, whereas with 'partition', parent is analyzed as soon
as a partition is.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: Autovacuum on partitioned table (autoanalyze)

From
Justin Pryzby
Date:
On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote:
> On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > I don't think that adequately allows what's needed.
...(paragraph with my typos elided)...
> > For example, say a new customer has bunch of partitioned tables which each
> > currently have only one partition (for the current month), and that's expected
> > to grow to at least 20+ partitions (2+ years of history).  How does one set the
> > partitioned table's auto-analyze parameters to analyze whenever any child is
> > analyzed ?  I don't think it should be needed to update it every month after
> > computing sum(child tuples).
> >
> > Possibly you could allow that behavior for some special values of the
> > threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
> > whenever any of its children are analyzed.
> >
> > I think that use case and that need would be common, but I'd like to hear what
> > others think.
> 
> Having to constantly pay attention to whether a parent's
> analyze_threshold/scale_factor is working as intended would surely be
> an annoyance, so I tend to agree that we might need more than just the
> ability to set analyze_threshold/scale_factor on parent tables.
> However, I think we can at least start with being able to do
> *something* here. :)  Maybe others think that this shouldn't be
> considered committable until we figure out a good analyze threshold
> calculation formula to apply to parent tables.
> 
> Considering that, how about having, say, a
> autovacuum_analyze_partition_parent_frequency, with string values
> 'default', 'partition'? -- 'default' assumes the same formula as
> regular tables, whereas with 'partition', parent is analyzed as soon
> as a partition is.

I assume you mean a reloption to be applied only to partitioned tables,

Your "partition" setting would mean that the scale/threshold values would have
no effect, which seems kind of unfortunate.

I think it should be called something else, and done differently, like maybe:
autovacuum_analyze_mode = {off,sum,max,...}

The threshold would be threshold + scale*tuples, as always, but would be
compared with f(changes) as determined by the relopt.

sum(changes) would do what you called "default", comparing the sum(changes)
across all partitions to the threshold, which is itself computed using
sum(reltuples) AS reltuples.

max(changes) would compute max(changes) compared to the threshold, and the
threshold would be computed separately for each partition's reltuples:
threshold_N = parent_threshold + parent_scale * part_N_tuples.  If *any*
partition exceeds that threshold, the partition itself is analyzed.  This
allows what I want for time-series.  Maybe this would have an alias called
"any".

I'm not sure if there's any other useful modes, like avg(changes)?  I guess we
can add them later if someone thinks of a good use case.

Also, for me, the v7 patch warns:
|src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
|   vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
..which seems to be a false positive, but easily avoided.


This patch includes partitioned tables in pg_stat_*_tables, which is great; I
complained awhile ago that they were missing [0].  It might be useful if that
part was split out into a separate 0001 patch (?).

Thanks,
-- 
Justin

[0] https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com



Re: Autovacuum on partitioned table (autoanalyze)

From
yuzuko
Date:
Hello,

On Sat, Apr 18, 2020 at 2:08 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote:
> > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > > I don't think that adequately allows what's needed.
> ...(paragraph with my typos elided)...
> > > For example, say a new customer has bunch of partitioned tables which each
> > > currently have only one partition (for the current month), and that's expected
> > > to grow to at least 20+ partitions (2+ years of history).  How does one set the
> > > partitioned table's auto-analyze parameters to analyze whenever any child is
> > > analyzed ?  I don't think it should be needed to update it every month after
> > > computing sum(child tuples).
> > >
> > > Possibly you could allow that behavior for some special values of the
> > > threshold.  Like if autovacuum_analyze_threshold=-2, then analyze the parent
> > > whenever any of its children are analyzed.
> > >
> > > I think that use case and that need would be common, but I'd like to hear what
> > > others think.
> >
> > Having to constantly pay attention to whether a parent's
> > analyze_threshold/scale_factor is working as intended would surely be
> > an annoyance, so I tend to agree that we might need more than just the
> > ability to set analyze_threshold/scale_factor on parent tables.
> > However, I think we can at least start with being able to do
> > *something* here. :)  Maybe others think that this shouldn't be
> > considered committable until we figure out a good analyze threshold
> > calculation formula to apply to parent tables.
> >
> > Considering that, how about having, say, a
> > autovacuum_analyze_partition_parent_frequency, with string values
> > 'default', 'partition'? -- 'default' assumes the same formula as
> > regular tables, whereas with 'partition', parent is analyzed as soon
> > as a partition is.
>
> I assume you mean a reloption to be applied only to partitioned tables,
>
> Your "partition" setting would mean that the scale/threshold values would have
> no effect, which seems kind of unfortunate.
>
> I think it should be called something else, and done differently, like maybe:
> autovacuum_analyze_mode = {off,sum,max,...}
>
The above reloption you suggested will be applied all tables?
Users might not use it for partitions, so I think we should add "parent"
to reloption's name, like Amit's suggestion.

> The threshold would be threshold + scale*tuples, as always, but would be
> compared with f(changes) as determined by the relopt.
>
> sum(changes) would do what you called "default", comparing the sum(changes)
> across all partitions to the threshold, which is itself computed using
> sum(reltuples) AS reltuples.
>
> max(changes) would compute max(changes) compared to the threshold, and the
> threshold would be computed separately for each partition's reltuples:
> threshold_N = parent_threshold + parent_scale * part_N_tuples.  If *any*
> partition exceeds that threshold, the partition itself is analyzed.  This
> allows what I want for time-series.  Maybe this would have an alias called
> "any".
>
I may be wrong but I think the fomula,
> threshold_N = parent_threshold + parent_scale * part_N_tuples
would use orginary table's threshold, not parent's.  If it use parent_threshold,
parent might not be analyzed even if its any partition is analyzed when
parent_threshold is larger than normal threshold.  I'm worried that this case
meets requirements for time-series.

> I'm not sure if there's any other useful modes, like avg(changes)?  I guess we
> can add them later if someone thinks of a good use case.
>
> Also, for me, the v7 patch warns:
> |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
> |   vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
> ..which seems to be a false positive, but easily avoided.
>
Thank you for testing the patch.
I got it.  I'll update the patch soon.

>
> This patch includes partitioned tables in pg_stat_*_tables, which is great; I
> complained awhile ago that they were missing [0].  It might be useful if that
> part was split out into a separate 0001 patch (?).
>
If partitioned table's statistics is used for other purposes,  I think
it would be
better to split the patch. Does anyone have any opinion?

---
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center



Re: Autovacuum on partitioned table (autoanalyze)

From
Justin Pryzby
Date:
On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote:
> In the past, I think there's was talk that maybe someone would invent a clever
> way to dynamically combine all the partitions' statistics, so analyzing the
> parent wasn't needed. [...]

I happened across the thread I was referring to:
https://www.postgresql.org/message-id/7363.1426537103@sss.pgh.pa.us

I'm not opposed to doing things the currently-proposed way (trigger analyze of
partitioned tables based on updates, same as nonpartitioned tables), but we
should think if it's worth doing something totally different, like what Tom
proposed.

Robert had concerns that it would increase planning time.  I imagine that
argument is even stronger now, since PG12 has *less* planning time for large
heirarchies (428b260f8) and advertizes support for "thousands" of partitions.

Tom said:
> we would automatically get statistics that account for
> partitions being eliminated by constraint exclusion, because only the
> non-eliminated partitions are present in the appendrel. And second,

That's a pretty strong benefit.  I don't know if there's a good way to support
both(either) ways of doing things.  Like maybe a reloption that allows
triggering autovacuum on partitioned tables, but if no statistics exist on a
partitioned table, then the planner would dynamically determine the selectivity
by decending into child statistics (Tom's way).  I think the usual way this
would play out is that someone with a small partition heirarchies would
eventually complain about high planning time and then we'd suggest implementing
a manual ANALYZE job.

I'm not sure it's good to support two ways anyway, since 1) I think that gives
different (better) statistics Tom's way (due to excluding stats of excluded
partitions); 2) there's not a good way to put an ANALYZE job in place and then
get rid of parent stats (have to DELETE FROM pg_statistic WHERE
starelid='...'::regclass; 3) if someone implements an ANALYZE job, but they
disable it or it stops working then they have outdated stats forever; 

-- 
Justin



Re: Autovacuum on partitioned table (autoanalyze)

From
Amit Langote
Date:
On Sat, Apr 25, 2020 at 11:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote:
> > In the past, I think there's was talk that maybe someone would invent a clever
> > way to dynamically combine all the partitions' statistics, so analyzing the
> > parent wasn't needed. [...]
>
> I happened across the thread I was referring to:
> https://www.postgresql.org/message-id/7363.1426537103@sss.pgh.pa.us
>
> I'm not opposed to doing things the currently-proposed way (trigger analyze of
> partitioned tables based on updates, same as nonpartitioned tables), but we
> should think if it's worth doing something totally different, like what Tom
> proposed.
>
> Robert had concerns that it would increase planning time.  I imagine that
> argument is even stronger now, since PG12 has *less* planning time for large
> heirarchies (428b260f8) and advertizes support for "thousands" of partitions.
>
> Tom said:
> > we would automatically get statistics that account for
> > partitions being eliminated by constraint exclusion, because only the
> > non-eliminated partitions are present in the appendrel. And second,
>
> That's a pretty strong benefit.  I don't know if there's a good way to support
> both(either) ways of doing things.  Like maybe a reloption that allows
> triggering autovacuum on partitioned tables, but if no statistics exist on a
> partitioned table, then the planner would dynamically determine the selectivity
> by decending into child statistics (Tom's way).  I think the usual way this
> would play out is that someone with a small partition heirarchies would
> eventually complain about high planning time and then we'd suggest implementing
> a manual ANALYZE job.
>
> I'm not sure it's good to support two ways anyway, since 1) I think that gives
> different (better) statistics Tom's way (due to excluding stats of excluded
> partitions); 2) there's not a good way to put an ANALYZE job in place and then
> get rid of parent stats (have to DELETE FROM pg_statistic WHERE
> starelid='...'::regclass; 3) if someone implements an ANALYZE job, but they
> disable it or it stops working then they have outdated stats forever;

Thanks for sharing that thread, had not seen it before.

I remember discussing with Alvaro and Hosoya-san an approach of
generating the whole-tree pg_statistics entries by combining the
children's entries, not during planning as the linked thread
discusses, but inside autovacuum.  The motivation for that design was
the complaint that we scan the children twice with the current method
of generating whole-tree statistics -- first to generate their own
statistics and then again to generate the parent's.

Aside from how hard it would be to actually implement, that approach
also doesn't address the concern about when to generate the whole-tree
statistics.  Because the linked thread mentions getting rid of the
whole-tree statistics altogether, there is no such concern if we go
its way.  Although I do agree with Robert's assertion on that thread
that making every query on a parent a bit slower would not be a good
compromise.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: Autovacuum on partitioned table (autoanalyze)

From
Daniel Gustafsson
Date:
> On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote:

> I'll update the patch soon.

Do you have an updated version to submit?  The previous patch no longer applies
to HEAD, so I'm marking this entry Waiting on Author in the meantime.

cheers ./daniel


Re: Autovacuum on partitioned table (autoanalyze)

From
yuzuko
Date:
> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote:
>
> > I'll update the patch soon.
>
> Do you have an updated version to submit?  The previous patch no longer applies
> to HEAD, so I'm marking this entry Waiting on Author in the meantime.
>
Thank you for letting me know.
I attach the latest patch applies to HEAD.

I think there are other approaches like Tom's idea that Justin previously
referenced, but this patch works the same way as previous patches.
(tracks updated/inserted/deleted tuples and checks whether the partitioned
tables needs auto-analyze, same as nonpartitioned tables)
Because I wanted to be able to analyze partitioned tables by autovacuum
as a first step, and I think this approach is the simplest way to do it.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table (autoanalyze)

From
Daniel Gustafsson
Date:
> On 6 Jul 2020, at 12:35, yuzuko <yuzukohosoya@gmail.com> wrote:
>
>> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote:
>>
>>> I'll update the patch soon.
>>
>> Do you have an updated version to submit?  The previous patch no longer applies
>> to HEAD, so I'm marking this entry Waiting on Author in the meantime.
>>
> Thank you for letting me know.
> I attach the latest patch applies to HEAD.

This version seems to fail under Werror which is used in the Travis builds:

autovacuum.c: In function ‘relation_needs_vacanalyze’:
autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
   anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
                                                           ^
autovacuum.c:2972:9: note: ‘reltuples’ was declared here
  float4  reltuples;  /* pg_class.reltuples */
         ^

I've moved this patch to the next commitfest, but kept the status as Waiting on
Author.  Please submit a new version of the patch.

cheers ./daniel


Re: Autovacuum on partitioned table (autoanalyze)

From
yuzuko
Date:
I'm sorry for the late reply.

> This version seems to fail under Werror which is used in the Travis builds:
>
> autovacuum.c: In function ‘relation_needs_vacanalyze’:
> autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>    anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
>                                                            ^
> autovacuum.c:2972:9: note: ‘reltuples’ was declared here
>   float4  reltuples;  /* pg_class.reltuples */
>          ^
>

I attach the latest patch that solves the above Werror.
Could you please check it again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment

Re: Autovacuum on partitioned table (autoanalyze)

From
Daniel Gustafsson
Date:
> On 17 Aug 2020, at 08:11, yuzuko <yuzukohosoya@gmail.com> wrote:
>
> I'm sorry for the late reply.
>
>> This version seems to fail under Werror which is used in the Travis builds:
>>
>> autovacuum.c: In function ‘relation_needs_vacanalyze’:
>> autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>   anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
>>                                                           ^
>> autovacuum.c:2972:9: note: ‘reltuples’ was declared here
>>  float4  reltuples;  /* pg_class.reltuples */
>>         ^
>
> I attach the latest patch that solves the above Werror.
> Could you please check it again?

This version now pass the tests in the Travis pipeline as can be seen in the
link below, and is ready to be reviewed in the upcoming commitfest:

    http://cfbot.cputube.org/yuzuko-hosoya.html

cheers ./daniel