Thread: Autovacuum on partitioned table
Hello, Greg reported in [1] before, autovacuum ignores partitioned tables. That is, even if individual partitions’ statistics are updated, its parent's statistics are not updated. This is TODO for declarative partitioning. As Amit mentioned in [2], a way to make parent's statistics from partitions' statistics without scanning the partitions would be nice, but it will need a lot of modifications. So I tried to fix that using the current analyze method. The summary of the attached patch is as follows: * If the relation is a partitioned table, check its children if they need vacuum or analyze. Children need to do that are added to a table list for autovacuuum. At least one child is added to the list, the partitioned table is also added to the list. Then, autovacuum runs on all the tables in the list. * If the partitioned table has foreign partitions, ignore them. When the parent has children don't need vacuum/analyze or foreign partitions, parent's stats are updated scanning the current data of all children, so old stats and new are mixed within the partition tree. Is that suitable? Any thoughts? [1] https://www.postgresql.org/message-id/CAM-w4HMQKC8hw7nB9TW3OV%2BhkB5OUcPtvr_U_EiSOjByoa-e4Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BHiwqEeZQ-H2OVbHZ%3Dn2RNNPF84Hygi1HC-MDwC-VnBjpA1%3DQ%40mail.gmail.com -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
On Mon, 2019-12-02 at 18:02 +0900, yuzuko wrote: > Greg reported in [1] before, autovacuum ignores partitioned tables. > That is, even if individual partitions’ statistics are updated, its parent's > statistics are not updated. This is TODO for declarative partitioning. > As Amit mentioned in [2], a way to make parent's statistics from > partitions' statistics without scanning the partitions would be nice, > but it will need a lot of modifications. So I tried to fix that using the > current analyze method. > > The summary of the attached patch is as follows: > * If the relation is a partitioned table, check its children if they need > vacuum or analyze. Children need to do that are added to > a table list for autovacuuum. At least one child is added to the list, > the partitioned table is also added to the list. Then, autovacuum > runs on all the tables in the list. That means that all partitions are vacuumed if only one of them needs it, right? This will result in way more vacuuming than necessary. Wouldn't it be an option to update the partitioned table's statistics whenever one of the partitions is vacuumed? Yours, Laurenz Albe
Hi Laurenz, Thanks for the comments. On Mon, Dec 2, 2019 at 6:19 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2019-12-02 at 18:02 +0900, yuzuko wrote: > > Greg reported in [1] before, autovacuum ignores partitioned tables. > > That is, even if individual partitions’ statistics are updated, its parent's > > statistics are not updated. This is TODO for declarative partitioning. > > As Amit mentioned in [2], a way to make parent's statistics from > > partitions' statistics without scanning the partitions would be nice, > > but it will need a lot of modifications. So I tried to fix that using the > > current analyze method. > > > > The summary of the attached patch is as follows: > > * If the relation is a partitioned table, check its children if they need > > vacuum or analyze. Children need to do that are added to > > a table list for autovacuuum. At least one child is added to the list, > > the partitioned table is also added to the list. Then, autovacuum > > runs on all the tables in the list. > > That means that all partitions are vacuumed if only one of them needs it, > right? This will result in way more vacuuming than necessary. > Autovacuum runs only partitions need vacuum/analyze, so unnecessary partitions stats are not updated. However, to make parent's stats, all children are scanned. It might be a waste of time. > Wouldn't it be an option to update the partitioned table's statistics > whenever one of the partitions is vacuumed? > > Yours, > Laurenz Albe > -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Hi, As Laurenz commented in this thread, I tried adding option to update parent's statistics during Autovacuum. To do that, I propose supporting 'autovacuum_enabled' option already exists on partitioned tables. In the attached patch, you can use 'autovacuum_enabled' option on partitioned table as usual, that is, a default value of this option is true. So if you don't need autovacuum on a partitioned table, you have to specify the option: CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0); I'm not sure but I wonder if a suitable value as a default of 'autovacuum_enabled' for partitioned tables might be false. Because autovacuum on *partitioned tables* requires scanning all children to make partitioned tables' statistics. But if the default value varies according to the relation, is it confusing? Any thoughts? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote: > > Hi, > > As Laurenz commented in this thread, I tried adding option > to update parent's statistics during Autovacuum. To do that, > I propose supporting 'autovacuum_enabled' option already > exists on partitioned tables. > > In the attached patch, you can use 'autovacuum_enabled' option > on partitioned table as usual, that is, a default value of this option > is true. So if you don't need autovacuum on a partitioned table, > you have to specify the option: > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0); > > I'm not sure but I wonder if a suitable value as a default of > 'autovacuum_enabled' for partitioned tables might be false. > Because autovacuum on *partitioned tables* requires scanning > all children to make partitioned tables' statistics. > But if the default value varies according to the relation, > is it confusing? Any thoughts? I don't look at the patch deeply yet but your patch seems to attempt to vacuum on partitioned table. IIUC partitioned tables don't need to be vacuumed and its all child tables are vacuumed instead if we pass the partitioned table to vacuum() function. But autovacuum on child tables is normally triggered since their statistics are updated. I think it's a good idea to have that option but I think that doing autovacuum on the parent table every time when autovacuum is triggered on one of its child tables is very high cost especially when there are a lot of child tables. Instead I thought it's more straight forward if we compare the summation of the statistics of child tables (e.g. n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we consider the needs of autovacuum on the parent table. What do you think? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote: > > As Laurenz commented in this thread, I tried adding option > > to update parent's statistics during Autovacuum. To do that, > > I propose supporting 'autovacuum_enabled' option already > > exists on partitioned tables. > > > > In the attached patch, you can use 'autovacuum_enabled' option > > on partitioned table as usual, that is, a default value of this option > > is true. So if you don't need autovacuum on a partitioned table, > > you have to specify the option: > > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0); > > > > I'm not sure but I wonder if a suitable value as a default of > > 'autovacuum_enabled' for partitioned tables might be false. > > Because autovacuum on *partitioned tables* requires scanning > > all children to make partitioned tables' statistics. > > But if the default value varies according to the relation, > > is it confusing? Any thoughts? > > I don't look at the patch deeply yet but your patch seems to attempt > to vacuum on partitioned table. IIUC partitioned tables don't need to > be vacuumed and its all child tables are vacuumed instead if we pass > the partitioned table to vacuum() function. But autovacuum on child > tables is normally triggered since their statistics are updated. > > I think it's a good idea to have that option but I think that doing > autovacuum on the parent table every time when autovacuum is triggered > on one of its child tables is very high cost especially when there are > a lot of child tables. Instead I thought it's more straight forward if > we compare the summation of the statistics of child tables (e.g. > n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we > consider the needs of autovacuum on the parent table. What do you > think? There's this old email where Tom outlines a few ideas about triggering auto-analyze on inheritance trees: https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us If I'm reading that correctly, the idea is to track only changes_since_analyze and none of the finer-grained stats like live/dead tuples for inheritance parents (partitioned tables) using some new pgstat infrastrcture, an idea that Hosoya-san also seems to be considering per an off-list discussion. Besides the complexity of getting that infrastructure in place, an important question is whether the current system of applying threshold and scale factor to changes_since_analyze should be used as-is for inheritance parents (partitioned tables), because if users set those parameters similarly to for regular tables, autovacuum might analyze partitioned tables more than necessary. We'll either need a different formula, or some commentary in the documentation about how partitioned tables might need different setting, or maybe both. By the way, maybe I'm misunderstanding what Sawada-san wrote above, but the only missing piece seems to be a way to trigger an *analyze* on the parent tables -- to collect optimizer statistics for the inheritance trees -- not vacuum, for which the existing system seems enough. Thanks, Amit
On Tue, 28 Jan 2020 at 17:52, Amit Langote <amitlangote09@gmail.com> wrote: > > Hello, > > On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya@gmail.com> wrote: > > > As Laurenz commented in this thread, I tried adding option > > > to update parent's statistics during Autovacuum. To do that, > > > I propose supporting 'autovacuum_enabled' option already > > > exists on partitioned tables. > > > > > > In the attached patch, you can use 'autovacuum_enabled' option > > > on partitioned table as usual, that is, a default value of this option > > > is true. So if you don't need autovacuum on a partitioned table, > > > you have to specify the option: > > > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0); > > > > > > I'm not sure but I wonder if a suitable value as a default of > > > 'autovacuum_enabled' for partitioned tables might be false. > > > Because autovacuum on *partitioned tables* requires scanning > > > all children to make partitioned tables' statistics. > > > But if the default value varies according to the relation, > > > is it confusing? Any thoughts? > > > > I don't look at the patch deeply yet but your patch seems to attempt > > to vacuum on partitioned table. IIUC partitioned tables don't need to > > be vacuumed and its all child tables are vacuumed instead if we pass > > the partitioned table to vacuum() function. But autovacuum on child > > tables is normally triggered since their statistics are updated. > > > > I think it's a good idea to have that option but I think that doing > > autovacuum on the parent table every time when autovacuum is triggered > > on one of its child tables is very high cost especially when there are > > a lot of child tables. Instead I thought it's more straight forward if > > we compare the summation of the statistics of child tables (e.g. > > n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we > > consider the needs of autovacuum on the parent table. What do you > > think? > > There's this old email where Tom outlines a few ideas about triggering > auto-analyze on inheritance trees: > > https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us > > If I'm reading that correctly, the idea is to track only > changes_since_analyze and none of the finer-grained stats like > live/dead tuples for inheritance parents (partitioned tables) using > some new pgstat infrastrcture, an idea that Hosoya-san also seems to > be considering per an off-list discussion. Besides the complexity of > getting that infrastructure in place, an important question is whether > the current system of applying threshold and scale factor to > changes_since_analyze should be used as-is for inheritance parents > (partitioned tables), because if users set those parameters similarly > to for regular tables, autovacuum might analyze partitioned tables > more than necessary. How are you going to track changes_since_analyze of partitioned table? It's just an idea but we can accumulate changes_since_analyze of partitioned table by adding child tables's value after analyzing each child table. And compare the partitioned tables value to the threshold that is computed by (autovacuum_analyze_threshold + total rows including all child tables * autovacuum_analyze_scale_factor). > By the way, maybe I'm misunderstanding what Sawada-san wrote above, > but the only missing piece seems to be a way to trigger an *analyze* > on the parent tables -- to collect optimizer statistics for the > inheritance trees -- not vacuum, for which the existing system seems > enough. Right. We need only autoanalyze on partitioned tables. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, > Besides the complexity of > getting that infrastructure in place, an important question is whether > the current system of applying threshold and scale factor to > changes_since_analyze should be used as-is for inheritance parents > (partitioned tables), because if users set those parameters similarly > to for regular tables, autovacuum might analyze partitioned tables > more than necessary. We'll either need a different formula, or some > commentary in the documentation about how partitioned tables might > need different setting, or maybe both. > I'm not sure but I think we need new autovacuum parameters for partitioned tables (autovacuum, autovacuum_analyze_threshold, autovacuum_analyze_scale_factor) because whether it's necessary to run autovacuum on partitioned tables will depend on users. What do you think? > How are you going to track changes_since_analyze of partitioned table? > It's just an idea but we can accumulate changes_since_analyze of > partitioned table by adding child tables's value after analyzing each > child table. And compare the partitioned tables value to the threshold > that is computed by (autovacuum_analyze_threshold + total rows > including all child tables * autovacuum_analyze_scale_factor). > The idea Sawada-san mentioned is similar to mine. Also, for tracking changes_since_analyze, we have to make partitioned table's statistics. To do that, we can invent a new PgStat_StatPartitionedTabEntry based on PgStat_StatTabEntry. Through talking with Amit, I think the new structure needs the following members: tableid changes_since_analyze analyze_timestamp analyze_count autovac_analyze_timestamp autovac_analyze_count Vacuum doesn't run on partitioned tables, so I think members related to (auto) vacuum need not be contained in the structure. I'm still writing a patch. I'll send it this week. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote: > > Besides the complexity of > > getting that infrastructure in place, an important question is whether > > the current system of applying threshold and scale factor to > > changes_since_analyze should be used as-is for inheritance parents > > (partitioned tables), because if users set those parameters similarly > > to for regular tables, autovacuum might analyze partitioned tables > > more than necessary. We'll either need a different formula, or some > > commentary in the documentation about how partitioned tables might > > need different setting, or maybe both. > > > I'm not sure but I think we need new autovacuum parameters for > partitioned tables (autovacuum, autovacuum_analyze_threshold, > autovacuum_analyze_scale_factor) because whether it's necessary > to run autovacuum on partitioned tables will depend on users. > What do you think? Yes, we will need to first support those parameters on partitioned tables. Currently, you get: create table p (a int) partition by list (a) with (autovacuum_analyze_scale_factor=0); ERROR: unrecognized parameter "autovacuum_analyze_scale_factor" > > How are you going to track changes_since_analyze of partitioned table? > > It's just an idea but we can accumulate changes_since_analyze of > > partitioned table by adding child tables's value after analyzing each > > child table. And compare the partitioned tables value to the threshold > > that is computed by (autovacuum_analyze_threshold + total rows > > including all child tables * autovacuum_analyze_scale_factor). > > > The idea Sawada-san mentioned is similar to mine. So if I understand this idea correctly, a partitioned table's analyze will only be triggered when partitions are analyzed. That is, inserts, updates, deletes of tuples in partitions will be tracked by pgstat, which in turn is used by autovacuum to trigger analyze on partitions. Then, partitions changes_since_analyze is added into the parent's changes_since_analyze, which in turn *may* trigger analyze parent. I said "may", because it would take multiple partition analyzes to accumulate enough changes to trigger one on the parent. Am I getting that right? > Also, for tracking > changes_since_analyze, we have to make partitioned table's statistics. > To do that, we can invent a new PgStat_StatPartitionedTabEntry based > on PgStat_StatTabEntry. Through talking with Amit, I think the new structure > needs the following members: > > tableid > changes_since_analyze > analyze_timestamp > analyze_count > autovac_analyze_timestamp > autovac_analyze_count > > Vacuum doesn't run on partitioned tables, so I think members related to > (auto) vacuum need not be contained in the structure. On second thought, maybe we don't need a new PgStat_ struct. We can just use what's used for regular tables and leave the fields that don't make sense for partitioned tables set to 0, such as those that track the counts of scans, tuples, etc. That means we don't have to mess with interfaces of existing functions, like this one: static void relation_needs_vacanalyze(Oid relid, AutoVacOpts *relopts, Form_pg_class classForm, PgStat_StatTabEntry *tabentry, ... Thanks, Amit
On Wed, Jan 29, 2020 at 05:56:40PM +0900, Amit Langote wrote: > Yes, we will need to first support those parameters on partitioned > tables. Currently, you get: > > create table p (a int) partition by list (a) with > (autovacuum_analyze_scale_factor=0); > ERROR: unrecognized parameter "autovacuum_analyze_scale_factor" Worth the note: partitioned tables support zero reloptions as of now, but there is the facility in place to allow that (see RELOPT_KIND_PARTITIONED and partitioned_table_reloptions). -- Michael
Attachment
On Wed, 29 Jan 2020 at 17:56, Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote: > > > Besides the complexity of > > > getting that infrastructure in place, an important question is whether > > > the current system of applying threshold and scale factor to > > > changes_since_analyze should be used as-is for inheritance parents > > > (partitioned tables), because if users set those parameters similarly > > > to for regular tables, autovacuum might analyze partitioned tables > > > more than necessary. We'll either need a different formula, or some > > > commentary in the documentation about how partitioned tables might > > > need different setting, or maybe both. > > > > > I'm not sure but I think we need new autovacuum parameters for > > partitioned tables (autovacuum, autovacuum_analyze_threshold, > > autovacuum_analyze_scale_factor) because whether it's necessary > > to run autovacuum on partitioned tables will depend on users. > > What do you think? > > Yes, we will need to first support those parameters on partitioned > tables. Currently, you get: > > create table p (a int) partition by list (a) with > (autovacuum_analyze_scale_factor=0); > ERROR: unrecognized parameter "autovacuum_analyze_scale_factor" > > > > How are you going to track changes_since_analyze of partitioned table? > > > It's just an idea but we can accumulate changes_since_analyze of > > > partitioned table by adding child tables's value after analyzing each > > > child table. And compare the partitioned tables value to the threshold > > > that is computed by (autovacuum_analyze_threshold + total rows > > > including all child tables * autovacuum_analyze_scale_factor). > > > > > The idea Sawada-san mentioned is similar to mine. > > So if I understand this idea correctly, a partitioned table's analyze > will only be triggered when partitions are analyzed. That is, > inserts, updates, deletes of tuples in partitions will be tracked by > pgstat, which in turn is used by autovacuum to trigger analyze on > partitions. Then, partitions changes_since_analyze is added into the > parent's changes_since_analyze, which in turn *may* trigger analyze > parent. I said "may", because it would take multiple partition > analyzes to accumulate enough changes to trigger one on the parent. > Am I getting that right? Yeah that is what I meant. In addition, adding partition's changes_since_analyze to its parent needs to be done recursively as the parent table could also be a partitioned table. > > > Also, for tracking > > changes_since_analyze, we have to make partitioned table's statistics. > > To do that, we can invent a new PgStat_StatPartitionedTabEntry based > > on PgStat_StatTabEntry. Through talking with Amit, I think the new structure > > needs the following members: > > > > tableid > > changes_since_analyze > > analyze_timestamp > > analyze_count > > autovac_analyze_timestamp > > autovac_analyze_count > > > > Vacuum doesn't run on partitioned tables, so I think members related to > > (auto) vacuum need not be contained in the structure. > > On second thought, maybe we don't need a new PgStat_ struct. We can > just use what's used for regular tables and leave the fields that > don't make sense for partitioned tables set to 0, such as those that > track the counts of scans, tuples, etc. That means we don't have to > mess with interfaces of existing functions, like this one: > > static void relation_needs_vacanalyze(Oid relid, > AutoVacOpts *relopts, > Form_pg_class classForm, > PgStat_StatTabEntry *tabentry, ... +1 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Feb 2, 2020 at 12:53 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > On Wed, 29 Jan 2020 at 17:56, Amit Langote <amitlangote09@gmail.com> wrote: > > On Wed, Jan 29, 2020 at 11:29 AM yuzuko <yuzukohosoya@gmail.com> wrote: > > > > How are you going to track changes_since_analyze of partitioned table? > > > > It's just an idea but we can accumulate changes_since_analyze of > > > > partitioned table by adding child tables's value after analyzing each > > > > child table. And compare the partitioned tables value to the threshold > > > > that is computed by (autovacuum_analyze_threshold + total rows > > > > including all child tables * autovacuum_analyze_scale_factor). > > > > > > > The idea Sawada-san mentioned is similar to mine. > > > > So if I understand this idea correctly, a partitioned table's analyze > > will only be triggered when partitions are analyzed. That is, > > inserts, updates, deletes of tuples in partitions will be tracked by > > pgstat, which in turn is used by autovacuum to trigger analyze on > > partitions. Then, partitions changes_since_analyze is added into the > > parent's changes_since_analyze, which in turn *may* trigger analyze > > parent. I said "may", because it would take multiple partition > > analyzes to accumulate enough changes to trigger one on the parent. > > Am I getting that right? > > Yeah that is what I meant. In addition, adding partition's > changes_since_analyze to its parent needs to be done recursively as > the parent table could also be a partitioned table. That's a good point. So, changes_since_analyze increments are essentially propagated from leaf partitions to all the way up to the root table, including any intermediate partitioned tables. We'll need to consider whether we should propagate only one level at a time (from bottom of the tree) or update all parents up to the root, every time a leaf partition is analyzed. If we we do the latter, that might end up triggering analyze on all the parents at the same time causing repeated scanning of the same child tables in close intervals, although setting analyze threshold and scale factor of the parent tables of respective levels wisely can help avoid any negative impact of that. Thanks, Amit
Hello, I'm sorry for the delay. Attach the latest patch based on discussion in this thread. > > Yeah that is what I meant. In addition, adding partition's > > changes_since_analyze to its parent needs to be done recursively as > > the parent table could also be a partitioned table. > > That's a good point. So, changes_since_analyze increments are > essentially propagated from leaf partitions to all the way up to the > root table, including any intermediate partitioned tables. We'll need > to consider whether we should propagate only one level at a time (from > bottom of the tree) or update all parents up to the root, every time a > leaf partition is analyzed. For multi-level partitioning, all parents' changes_since_analyze will be updated whenever analyzing a leaf partition in this patch. Could you please check the patch again? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
Hosoya-san, On Thu, Feb 20, 2020 at 3:34 PM yuzuko <yuzukohosoya@gmail.com> wrote: > Attach the latest patch based on discussion in this thread. > > > > Yeah that is what I meant. In addition, adding partition's > > > changes_since_analyze to its parent needs to be done recursively as > > > the parent table could also be a partitioned table. > > > > That's a good point. So, changes_since_analyze increments are > > essentially propagated from leaf partitions to all the way up to the > > root table, including any intermediate partitioned tables. We'll need > > to consider whether we should propagate only one level at a time (from > > bottom of the tree) or update all parents up to the root, every time a > > leaf partition is analyzed. > > For multi-level partitioning, all parents' changes_since_analyze will be > updated whenever analyzing a leaf partition in this patch. > Could you please check the patch again? Thank you for the new patch. I built and confirmed that the patch works. Here are some comments: * White-space noise in the diff (space used where tab is expected); please check with git diff --check and fix. * Names changes_tuples, m_changes_tuples should be changed_tuples and m_changed_tuples, respectively? * Did you intend to make it so that we now report *all* inherited stats to the stats collector, not just those for partitioned tables? IOW, do did you intend the new feature to also cover traditional inheritance parents? I am talking about the following diff: /* - * Report ANALYZE to the stats collector, too. However, if doing - * inherited stats we shouldn't report, because the stats collector only - * tracks per-table stats. Reset the changes_since_analyze counter only - * if we analyzed all columns; otherwise, there is still work for - * auto-analyze to do. + * Report ANALYZE to the stats collector, too. If the table is a + * partition, report changes_since_analyze of its parent because + * autovacuum process for partitioned tables needs it. Reset the + * changes_since_analyze counter only if we analyzed all columns; + * otherwise, there is still work for auto-analyze to do. */ - if (!inh) - pgstat_report_analyze(onerel, totalrows, totaldeadrows, - (va_cols == NIL)); + pgstat_report_analyze(onerel, totalrows, totaldeadrows, + (va_cols == NIL)); * I may be missing something, but why doesn't do_autovacuum() fetch a partitioned table's entry from pgstat instead of fetching that for individual children and adding? That is, why do we need to do the following: + /* + * If the relation is a partitioned table, we check it using reltuples + * added up childrens' and changes_since_analyze tracked by stats collector. More later... Thanks, Amit
On Thu, Feb 20, 2020 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote: > * I may be missing something, but why doesn't do_autovacuum() fetch a > partitioned table's entry from pgstat instead of fetching that for > individual children and adding? That is, why do we need to do the > following: > > + /* > + * If the relation is a partitioned table, we check it > using reltuples > + * added up childrens' and changes_since_analyze tracked > by stats collector. Oh, it's only adding up children's pg_class.reltuple, not pgstat stats. We need to do that because a partitioned table's pg_class.reltuples is always 0 and correctly so. Sorry for not reading the patch properly. Thanks, Amit
On Thu, Feb 20, 2020 at 5:32 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Thu, Feb 20, 2020 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote: > > * I may be missing something, but why doesn't do_autovacuum() fetch a > > partitioned table's entry from pgstat instead of fetching that for > > individual children and adding? That is, why do we need to do the > > following: > > > > + /* > > + * If the relation is a partitioned table, we check it > > using reltuples > > + * added up childrens' and changes_since_analyze tracked > > by stats collector. > > Oh, it's only adding up children's pg_class.reltuple, not pgstat > stats. We need to do that because a partitioned table's > pg_class.reltuples is always 0 and correctly so. Sorry for not > reading the patch properly. Having read the relevant diffs again, I think this could be done without duplicating code too much. You seem to have added the same logic in two places: do_autovacuum() and table_recheck_autovac(). More importantly, part of the logic of relation_needs_vacanalyze() is duplicated in both of the aforementioned places, which I think is unnecessary and undesirable if you consider maintainability. I think we could just add the logic to compute reltuples for partitioned tables at the beginning of relation_needs_vacanalyze() and be done. I have attached a delta patch to show what I mean. Please check and tell what you think. Thanks, Amit
Attachment
Hello Amit-san, Thanks for your comments. > * White-space noise in the diff (space used where tab is expected); > please check with git diff --check and fix. Fixed it. > * Names changes_tuples, m_changes_tuples should be changed_tuples and > m_changed_tuples, respectively? Yes, I modified it. > * Did you intend to make it so that we now report *all* inherited > stats to the stats collector, not just those for partitioned tables? > IOW, do did you intend the new feature to also cover traditional > inheritance parents? I am talking about the following diff: > I modified as follows to apply this feature to only declaretive partitioning. - if (!inh) - pgstat_report_analyze(onerel, totalrows, totaldeadrows, - (va_cols == NIL)); + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + pgstat_report_analyze(onerel, totalrows, totaldeadrows, + (va_cols == NIL)); > Having read the relevant diffs again, I think this could be done > without duplicating code too much. You seem to have added the same > logic in two places: do_autovacuum() and table_recheck_autovac(). > More importantly, part of the logic of relation_needs_vacanalyze() is > duplicated in both of the aforementioned places, which I think is > unnecessary and undesirable if you consider maintainability. I think > we could just add the logic to compute reltuples for partitioned > tables at the beginning of relation_needs_vacanalyze() and be done. > Yes, indeed. Partitioned tables don't need to vacuum so I added new checking process for partitioned tables outside relation_needs_vacanalyze(). However, partitioned tables' tabentry->n_dead_tuples are always 0 so dovacuum is always false. So I think that checking both auto vacuum and analyze for partitioned tables doesn't matter. I merged v3_amit_delta.patch into the new patch and found minor bug, partitioned table's reltuples is overwritten with it's classForm->reltuples, so I fixed it. Also, I think partitioned tables' changes_since_analyze should be reported only when Autovacuum process. So I fixed it too. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
On Fri, 21 Feb 2020 at 15:14, yuzuko <yuzukohosoya@gmail.com> wrote: > > Hello Amit-san, > > Thanks for your comments. > > > * White-space noise in the diff (space used where tab is expected); > > please check with git diff --check and fix. > Fixed it. > > > * Names changes_tuples, m_changes_tuples should be changed_tuples and > > m_changed_tuples, respectively? > Yes, I modified it. > > > * Did you intend to make it so that we now report *all* inherited > > stats to the stats collector, not just those for partitioned tables? > > IOW, do did you intend the new feature to also cover traditional > > inheritance parents? I am talking about the following diff: > > > I modified as follows to apply this feature to only declaretive partitioning. > > - if (!inh) > - pgstat_report_analyze(onerel, totalrows, totaldeadrows, > - (va_cols == NIL)); > + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + pgstat_report_analyze(onerel, totalrows, totaldeadrows, > + (va_cols == NIL)); > > > > Having read the relevant diffs again, I think this could be done > > without duplicating code too much. You seem to have added the same > > logic in two places: do_autovacuum() and table_recheck_autovac(). > > More importantly, part of the logic of relation_needs_vacanalyze() is > > duplicated in both of the aforementioned places, which I think is > > unnecessary and undesirable if you consider maintainability. I think > > we could just add the logic to compute reltuples for partitioned > > tables at the beginning of relation_needs_vacanalyze() and be done. > > > Yes, indeed. Partitioned tables don't need to vacuum so I added new > checking process for partitioned tables outside relation_needs_vacanalyze(). > However, partitioned tables' tabentry->n_dead_tuples are always 0 so > dovacuum is always false. So I think that checking both auto vacuum > and analyze for partitioned tables doesn't matter. I merged v3_amit_delta.patch > into the new patch and found minor bug, partitioned table's reltuples is > overwritten with it's classForm->reltuples, so I fixed it. > > Also, I think partitioned tables' changes_since_analyze should be reported > only when Autovacuum process. So I fixed it too. Thank you for updating the patch. I tested v4 patch. After analyze or autoanalyze on partitioned table n_live_tup and n_dead_tup are updated. However, TRUNCATE and VACUUM on the partitioned table don't change these values until invoking analyze or autoanalyze whereas in normal tables these values are reset or changed. For example, with your patch: * Before relname | n_live_tup | n_dead_tup | n_mod_since_analyze ---------+------------+------------+--------------------- c1 | 11 | 0 | 0 c2 | 11 | 0 | 0 c3 | 11 | 0 | 0 c4 | 11 | 0 | 0 c5 | 11 | 0 | 0 parent | 55 | 0 | 0 (6 rows) * After 'TRUNCATE parent' relname | n_live_tup | n_dead_tup | n_mod_since_analyze ---------+------------+------------+--------------------- c1 | 0 | 0 | 0 c2 | 0 | 0 | 0 c3 | 0 | 0 | 0 c4 | 0 | 0 | 0 c5 | 0 | 0 | 0 parent | 55 | 0 | 0 (6 rows) * Before relname | n_live_tup | n_dead_tup | n_mod_since_analyze ---------+------------+------------+--------------------- c1 | 0 | 11 | 0 c2 | 0 | 11 | 0 c3 | 0 | 11 | 0 c4 | 0 | 11 | 0 c5 | 0 | 11 | 0 parent | 0 | 55 | 0 (6 rows) * After 'VACUUM parent' relname | n_live_tup | n_dead_tup | n_mod_since_analyze ---------+------------+------------+--------------------- c1 | 0 | 0 | 0 c2 | 0 | 0 | 0 c3 | 0 | 0 | 0 c4 | 0 | 0 | 0 c5 | 0 | 0 | 0 parent | 0 | 55 | 0 (6 rows) We can make it work correctly but I think perhaps we can skip updating statistics values of partitioned tables other than n_mod_since_analyze as the first step. Because if we support also n_live_tup and n_dead_tup, user might get confused that other statistics values such as seq_scan, seq_tup_read however are not supported. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 21, 2020 at 4:47 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > Thank you for updating the patch. I tested v4 patch. > > After analyze or autoanalyze on partitioned table n_live_tup and > n_dead_tup are updated. However, TRUNCATE and VACUUM on the > partitioned table don't change these values until invoking analyze or > autoanalyze whereas in normal tables these values are reset or > changed. For example, with your patch: > > * Before > relname | n_live_tup | n_dead_tup | n_mod_since_analyze > ---------+------------+------------+--------------------- > c1 | 11 | 0 | 0 > c2 | 11 | 0 | 0 > c3 | 11 | 0 | 0 > c4 | 11 | 0 | 0 > c5 | 11 | 0 | 0 > parent | 55 | 0 | 0 > (6 rows) > > * After 'TRUNCATE parent' > relname | n_live_tup | n_dead_tup | n_mod_since_analyze > ---------+------------+------------+--------------------- > c1 | 0 | 0 | 0 > c2 | 0 | 0 | 0 > c3 | 0 | 0 | 0 > c4 | 0 | 0 | 0 > c5 | 0 | 0 | 0 > parent | 55 | 0 | 0 > (6 rows) > > * Before > relname | n_live_tup | n_dead_tup | n_mod_since_analyze > ---------+------------+------------+--------------------- > c1 | 0 | 11 | 0 > c2 | 0 | 11 | 0 > c3 | 0 | 11 | 0 > c4 | 0 | 11 | 0 > c5 | 0 | 11 | 0 > parent | 0 | 55 | 0 > (6 rows) > > * After 'VACUUM parent' > relname | n_live_tup | n_dead_tup | n_mod_since_analyze > ---------+------------+------------+--------------------- > c1 | 0 | 0 | 0 > c2 | 0 | 0 | 0 > c3 | 0 | 0 | 0 > c4 | 0 | 0 | 0 > c5 | 0 | 0 | 0 > parent | 0 | 55 | 0 > (6 rows) > > We can make it work correctly but I think perhaps we can skip updating > statistics values of partitioned tables other than n_mod_since_analyze > as the first step. Because if we support also n_live_tup and > n_dead_tup, user might get confused that other statistics values such > as seq_scan, seq_tup_read however are not supported. +1, that makes sense. Thanks, Amit
Hi, Thanks for reviewing the patch. > > We can make it work correctly but I think perhaps we can skip updating > > statistics values of partitioned tables other than n_mod_since_analyze > > as the first step. Because if we support also n_live_tup and > > n_dead_tup, user might get confused that other statistics values such > > as seq_scan, seq_tup_read however are not supported. > > +1, that makes sense. > Yes, Indeed. I modified it not to update statistics other than n_mod_since_analyze. Attach the v5 patch. In this patch, pgstat_report_analyze() always reports 0 as msg.m_live_tuples and m_dead_tuples when the relation is partitioned. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
On Wed, 26 Feb 2020 at 11:33, yuzuko <yuzukohosoya@gmail.com> wrote: > > Hi, > > Thanks for reviewing the patch. > > > > We can make it work correctly but I think perhaps we can skip updating > > > statistics values of partitioned tables other than n_mod_since_analyze > > > as the first step. Because if we support also n_live_tup and > > > n_dead_tup, user might get confused that other statistics values such > > > as seq_scan, seq_tup_read however are not supported. > > > > +1, that makes sense. > > > Yes, Indeed. I modified it not to update statistics other than > n_mod_since_analyze. > Attach the v5 patch. In this patch, pgstat_report_analyze() always reports 0 as > msg.m_live_tuples and m_dead_tuples when the relation is partitioned. > Thank you for updating the patch. I'll look at it. I'd recommend to register this patch to the next commit fest so at not to forget. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Yuzuko, > + * Report ANALYZE to the stats collector, too. If the table is a > + * partition, report changes_since_analyze of its parent because > + * autovacuum process for partitioned tables needs it. Reset the > + * changes_since_analyze counter only if we analyzed all columns; > + * otherwise, there is still work for auto-analyze to do. > */ > - if (!inh) > - pgstat_report_analyze(onerel, totalrows, totaldeadrows, > - (va_cols == NIL)); > + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + pgstat_report_analyze(onerel, totalrows, totaldeadrows, > + (va_cols == NIL)); Hmm, I think the comment has a bug: it says "report ... of its parent" but the report is of the same rel. (The pgstat_report_analyze line is mis-indented also). > /* > + * If the relation is a partitioned table, we must add up children's > + * reltuples. > + */ > + if (classForm->relkind == RELKIND_PARTITIONED_TABLE) > + { > + List *children; > + ListCell *lc; > + > + reltuples = 0; > + > + /* Find all members of inheritance set taking AccessShareLock */ > + children = find_all_inheritors(relid, AccessShareLock, NULL); > + > + foreach(lc, children) > + { > + Oid childOID = lfirst_oid(lc); > + HeapTuple childtuple; > + Form_pg_class childclass; > + > + /* Ignore the parent table */ > + if (childOID == relid) > + continue; I think this loop counts partitioned partitions multiple times, because we add up reltuples for all levels, no? (If I'm wrong, that is, if a partitioned rel does not have reltuples, then why skip the parent?) > + /* > + * If the table is a leaf partition, tell the stats collector its parent's > + * changes_since_analyze for auto analyze > + */ > + if (IsAutoVacuumWorkerProcess() && > + rel->rd_rel->relispartition && > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) I'm not sure I understand why we do this only on autovac. Why not all analyzes? > + { > + Oid parentoid; > + Relation parentrel; > + PgStat_StatDBEntry *dbentry; > + PgStat_StatTabEntry *tabentry; > + > + /* Get its parent table's Oid and relation */ > + parentoid = get_partition_parent(RelationGetRelid(rel)); > + parentrel = table_open(parentoid, AccessShareLock); Climbing up the partitioning hierarchy acquiring locks on ancestor relations opens up for deadlocks. It's better to avoid that. (As a test, you could try what happens if you lock the topmost relation with access-exclusive and leave a transaction open, then have autoanalyze run). At the same time, I wonder if it's sensible to move one level up here, and also have pgstat_report_partanalyze move more levels up. > + * pgstat_report_partanalyze() - > + * > + * Tell the collector about the parent table of which partition just analyzed. > + * > + * Caller must provide a child's changes_since_analyze as a parents. I'm not sure what the last line is trying to say. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hosoya-san, Thanks for the new patch. On Wed, Feb 26, 2020 at 11:33 AM yuzuko <yuzukohosoya@gmail.com> wrote: > Attach the v5 patch. In this patch, pgstat_report_analyze() always reports 0 as > msg.m_live_tuples and m_dead_tuples when the relation is partitioned. Some comments: + * PgStat_MsgPartAnalyze Sent by the backend or autovacuum daemon + * after ANALYZE for partitioned tables Looking at the way this message is used, it does not seem to be an "analyze" message and also it's not sent "after ANALYZE of partitioned tables", but really after ANALYZE of leaf partitions. Analyze (for both partitioned tables and leaf partitions) is reported as a PgStat_MsgAnalyze message as before. It seems that PgStat_MsgPartAnalyze is only sent to update a leaf partition's parent's (and recursively any grandparents') changes_since_analyze counters, so maybe we should find a different name for it. Maybe, PgStat_MsgPartChanges and accordingly the message type enum value. /* - * Report ANALYZE to the stats collector, too. However, if doing - * inherited stats we shouldn't report, because the stats collector only - * tracks per-table stats. Reset the changes_since_analyze counter only - * if we analyzed all columns; otherwise, there is still work for - * auto-analyze to do. + * Report ANALYZE to the stats collector, too. If the table is a + * partition, report changes_since_analyze of its parent because + * autovacuum process for partitioned tables needs it. Reset the + * changes_since_analyze counter only if we analyzed all columns; + * otherwise, there is still work for auto-analyze to do. */ The new comment says "partitions", which we typically use to refer to a child table, but this comment really talks about parent tables. Old comment says we don't report "inherited stats", presumably because stats collector lacks the infrastructure to distinguish a table's inherited stats and own stats, at least in the case of traditional inheritance. With this patch, we are making an exception for partitioned tables, because we are also teaching the stats collector to maintain at least changes_since_analyze for them that accumulates counts of changed tuples from partitions. It seems Alvaro already reported some of the other issues I had with the patch, such as why partanalyze messages are only sent from a autovacuum worker. Thanks, Amit
On Fri, Feb 28, 2020 at 11:25 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > /* > > + * If the relation is a partitioned table, we must add up children's > > + * reltuples. > > + */ > > + if (classForm->relkind == RELKIND_PARTITIONED_TABLE) > > + { > > + List *children; > > + ListCell *lc; > > + > > + reltuples = 0; > > + > > + /* Find all members of inheritance set taking AccessShareLock */ > > + children = find_all_inheritors(relid, AccessShareLock, NULL); > > + > > + foreach(lc, children) > > + { > > + Oid childOID = lfirst_oid(lc); > > + HeapTuple childtuple; > > + Form_pg_class childclass; > > + > > + /* Ignore the parent table */ > > + if (childOID == relid) > > + continue; > > I think this loop counts partitioned partitions multiple times, because > we add up reltuples for all levels, no? (If I'm wrong, that is, if > a partitioned rel does not have reltuples, then why skip the parent?) +1, no need to skip partitioned tables here a their reltuples is always 0. > > + /* > > + * If the table is a leaf partition, tell the stats collector its parent's > > + * changes_since_analyze for auto analyze Maybe write: For a leaf partition, add its current changes_since_analyze into its ancestors' counts. This must be done before sending the ANALYZE message as it resets the partition's changes_since_analyze counter. > > + */ > > + if (IsAutoVacuumWorkerProcess() && > > + rel->rd_rel->relispartition && > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) > > I'm not sure I understand why we do this only on autovac. Why not all > analyzes? +1. If there is a reason, it should at least be documented in the comment above. > > + { > > + Oid parentoid; > > + Relation parentrel; > > + PgStat_StatDBEntry *dbentry; > > + PgStat_StatTabEntry *tabentry; > > + > > + /* Get its parent table's Oid and relation */ > > + parentoid = get_partition_parent(RelationGetRelid(rel)); > > + parentrel = table_open(parentoid, AccessShareLock); > > Climbing up the partitioning hierarchy acquiring locks on ancestor > relations opens up for deadlocks. It's better to avoid that. (As a > test, you could try what happens if you lock the topmost relation with > access-exclusive and leave a transaction open, then have autoanalyze > run). At the same time, I wonder if it's sensible to move one level up > here, and also have pgstat_report_partanalyze move more levels up. Maybe fetch all ancestors here and process from the top. But as we'd have locked the leaf partition long before we got here, maybe we should lock ancestors even before we start analyzing the leaf partition? AccessShareLock should be enough on the ancestors because we're not actually analyzing them. (It appears get_partition_ancestors() returns a list where the root parent is the last element, so need to be careful with that.) Thanks, Amit
Hello, Thank you for reviewing. > > > + */ > > > + if (IsAutoVacuumWorkerProcess() && > > > + rel->rd_rel->relispartition && > > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) > > > > I'm not sure I understand why we do this only on autovac. Why not all > > analyzes? > > +1. If there is a reason, it should at least be documented in the > comment above. > When we analyze partitioned table by ANALYZE command, all inheritors including partitioned table are analyzed at the same time. In this case, if we call pgstat_report_partanalyze, partitioned table's changes_since_analyze is updated according to the number of analyzed tuples of partitions as follows. But I think it should be 0. \d+ p Partitioned table "public.p" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- i | integer | | | | plain | | Partition key: RANGE (i) Partitions: p_1 FOR VALUES FROM (0) TO (100), p_2 FOR VALUES FROM (100) TO (200) insert into p select * from generate_series(0,199); INSERT 0 200 (before analyze) -[ RECORD 1 ]-------+------------------ relname | p n_mod_since_analyze | 0 -[ RECORD 2 ]-------+------------------ relname | p_1 n_mod_since_analyze | 100 -[ RECORD 3 ]-------+------------------ relname | p_2 n_mod_since_analyze | 100 (after analyze) -[ RECORD 1 ]-------+------------------ relname | p n_mod_since_analyze | 200 -[ RECORD 2 ]-------+------------------ relname | p_1 n_mod_since_analyze | 0 -[ RECORD 3 ]-------+------------------ relname | p_2 n_mod_since_analyze | 0 I think if we analyze partition tree in order from leaf partitions to root table, this problem can be fixed. What do you think about it? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Hello, > > > > + */ > > > > + if (IsAutoVacuumWorkerProcess() && > > > > + rel->rd_rel->relispartition && > > > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)) > > > > > > I'm not sure I understand why we do this only on autovac. Why not all > > > analyzes? > > > > +1. If there is a reason, it should at least be documented in the > > comment above. > > > When we analyze partitioned table by ANALYZE command, > all inheritors including partitioned table are analyzed > at the same time. In this case, if we call pgstat_report_partanalyze, > partitioned table's changes_since_analyze is updated > according to the number of analyzed tuples of partitions > as follows. But I think it should be 0. > > \d+ p > Partitioned table "public.p" > Column | Type | Collation | Nullable | Default | Storage | Stats > target | Description > --------+---------+-----------+----------+---------+---------+--------------+------------- > i | integer | | | | plain | | > Partition key: RANGE (i) > Partitions: p_1 FOR VALUES FROM (0) TO (100), > p_2 FOR VALUES FROM (100) TO (200) > > insert into p select * from generate_series(0,199); > INSERT 0 200 > > (before analyze) > -[ RECORD 1 ]-------+------------------ > relname | p > n_mod_since_analyze | 0 > -[ RECORD 2 ]-------+------------------ > relname | p_1 > n_mod_since_analyze | 100 > -[ RECORD 3 ]-------+------------------ > relname | p_2 > n_mod_since_analyze | 100 > > (after analyze) > -[ RECORD 1 ]-------+------------------ > relname | p > n_mod_since_analyze | 200 > -[ RECORD 2 ]-------+------------------ > relname | p_1 > n_mod_since_analyze | 0 > -[ RECORD 3 ]-------+------------------ > relname | p_2 > n_mod_since_analyze | 0 > > > I think if we analyze partition tree in order from leaf partitions > to root table, this problem can be fixed. > What do you think about it? > Attach the new patch fixes the above problem. Also, This patch includes modifications accoring to all comments Alvaro and Amit mentioned before in this thread. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
On 2020-Mar-18, yuzuko wrote: > > I think if we analyze partition tree in order from leaf partitions > > to root table, this problem can be fixed. > > What do you think about it? > > Attach the new patch fixes the above problem. Thanks for the new version. I'm confused about some error messages in the regression test when a column is mentioned twice, that changed from mentioning the table named in the vacuum command, to mentioning the first partition. Is that because you changed an lappend() to lcons()? I think you do this so that the counters accumulate for the topmost parent that will be processed at the end. I'm not sure I like that too much ... I think that needs more thought. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Regarding this patch: + * the ANALYZE message as it resets the partition's changes_since_analze => analyze + * If the relation is a partitioned table, we must add up children's childrens' The approach in general: I see an issue for timeseries data, where only the most recent partition is being inserted into, and the histogram endpoint is being continuously extended (this is our use-case). The most recent partition will be analyzed pretty often, and I think it'll be problematic if its parent doesn't get similar treatment. Let's say there are 12 historic, monthly children with 1e6 tuples each, and the 13th child has 2e5 tuples (6 days into the month). It's analyzed when it grows by 20% (1.2 days), but at that point the parent has only grown by 12x less (~2%) and won't be analyzed until 12x further into the future (14 days). Its histogram is 12x longer (geometrically), but the histogram changed by just as much (arithmetically). That's an issue for a query over "the last few days"; if that's past the end of the histogram bound, the query planner will estimate about ~0 tuples, and tend to give cascades of nested loops. I'm biased, but I'm guessing that's too common a use case to answer that the proper fix is to set the parent's analyze_scale_factor=0.0005. I think that suggests that the parent might sometimes need to be analyzed every time any of its children are. In other cases (like probably any hash partitioning), that'd be excessive, and maybe the default settings shouldn't do that, but I think that behavior ought to be possible, and I think this patch doesn't allow that. In the past, I think there's was talk that maybe someone would invent a clever way to dynamically combine all the partitions' statistics, so analyzing the parent wasn't needed. I think that's easy enough for reltuples, MCV, and I think histogram, but ISTM that ndistinct is simultaneously important to get right and hard to do so. It depends on whether it's the partition key, which now can be an arbitrary expression. Extended stats further complicates it, even if we didn't aim to dynamically compute extended stats for a parent. While writing this, it occured to me that we could use "CREATE STATISTICS" as a way to mark a partitioned table (or certain columns) as needing to be handled by analyze. I understand "CREATE STATs" was intended to (eventually) allow implementing stats on expressions without using "create index" as a hack. So if it's excessive to automatically analyze a parent table when any of its children are analyzed, maybe it's less excessive to only do that for parents with a stats object, and only on the given colums. I realize this patch is alot less useful if it requires to do anything extra/nondefault, and it's desirable to work without creating a stats object at all. Also, using CREATE STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but doesn't help to reduce the I/O cost, which is significant. -- Justin
Hi Alvaro, Thank you for your comments. > I'm confused about some error messages in the regression test when a > column is mentioned twice, that changed from mentioning the table named > in the vacuum command, to mentioning the first partition. Is that > because you changed an lappend() to lcons()? I think you do this so > that the counters accumulate for the topmost parent that will be > processed at the end. I'm not sure I like that too much ... I think > that needs more thought. > I couldn't come up with a solution that counts changes_since_analyze precisely when analyzing partitioned trees by ANALYZE command based on this approach (update all ancestor's changes_since_analyze according to the number of analyzed tuples of leaf partitions). So I tried another approach to run autovacuum on partitioned tables. In this approach, all ancestors' changed_tuples are updated when commiting transactions (at AtEOXact_PgStat) according to the number of inserted/updated/ deleted tuples of leaf partitions. Attach the latest patch. What do you think? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
Not sure if you saw my earlier message ? I think it ought to be possible to configure this feature such that an auto-analyze on any child partition would trigger analyze of the parent. I think that would be important for maintaining accurate stats of the partition key column for many cases involving RANGE-partitioned tables, which are likely to rely on histogram rather than MCVs. On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote: > Regarding this patch: > > + * the ANALYZE message as it resets the partition's changes_since_analze > => analyze > > + * If the relation is a partitioned table, we must add up children's > childrens' > > The approach in general: > > I see an issue for timeseries data, where only the most recent partition is > being inserted into, and the histogram endpoint is being continuously extended > (this is our use-case). The most recent partition will be analyzed pretty > often, and I think it'll be problematic if its parent doesn't get similar > treatment. Let's say there are 12 historic, monthly children with 1e6 tuples > each, and the 13th child has 2e5 tuples (6 days into the month). It's analyzed > when it grows by 20% (1.2 days), but at that point the parent has only grown by > 12x less (~2%) and won't be analyzed until 12x further into the future (14 > days). Its histogram is 12x longer (geometrically), but the histogram changed > by just as much (arithmetically). That's an issue for a query over "the last > few days"; if that's past the end of the histogram bound, the query planner > will estimate about ~0 tuples, and tend to give cascades of nested loops. I'm > biased, but I'm guessing that's too common a use case to answer that the proper > fix is to set the parent's analyze_scale_factor=0.0005. I think that suggests > that the parent might sometimes need to be analyzed every time any of its > children are. In other cases (like probably any hash partitioning), that'd be > excessive, and maybe the default settings shouldn't do that, but I think that > behavior ought to be possible, and I think this patch doesn't allow that. > > In the past, I think there's was talk that maybe someone would invent a clever > way to dynamically combine all the partitions' statistics, so analyzing the > parent wasn't needed. I think that's easy enough for reltuples, MCV, and I > think histogram, but ISTM that ndistinct is simultaneously important to get > right and hard to do so. It depends on whether it's the partition key, which > now can be an arbitrary expression. Extended stats further complicates it, > even if we didn't aim to dynamically compute extended stats for a parent. > > While writing this, it occured to me that we could use "CREATE STATISTICS" as a > way to mark a partitioned table (or certain columns) as needing to be handled > by analyze. I understand "CREATE STATs" was intended to (eventually) allow > implementing stats on expressions without using "create index" as a hack. So > if it's excessive to automatically analyze a parent table when any of its > children are analyzed, maybe it's less excessive to only do that for parents > with a stats object, and only on the given colums. I realize this patch is > alot less useful if it requires to do anything extra/nondefault, and it's > desirable to work without creating a stats object at all. Also, using CREATE > STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but > doesn't help to reduce the I/O cost, which is significant. > > -- > Justin -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581
Hi Justin, Thank you for commens. On Tue, Apr 7, 2020 at 12:32 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Not sure if you saw my earlier message ? > I'm sorry, I didn't notice for a while. > I think it ought to be possible to configure this feature such that an > auto-analyze on any child partition would trigger analyze of the parent. I > think that would be important for maintaining accurate stats of the partition > key column for many cases involving RANGE-partitioned tables, which are likely > to rely on histogram rather than MCVs. > I read your previous email and understand that it would be neccesary to analyze partitioned tables automatically when any of its children are analyzed. In my first patch, auto-analyze on partitioned tables worked like this but there were some comments about performance of autovacuum, especially when partitioned tables have a lot of children. The latest patch lets users set different autovacuum configuration for each partitioned tables like this, create table p3(i int) partition by range(i) with (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100); so users can configure those parameters according to partitioning strategies and other requirements. So I think this patch can solve problem you mentioned. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > I think it ought to be possible to configure this feature such that an > > auto-analyze on any child partition would trigger analyze of the parent. I > > think that would be important for maintaining accurate stats of the partition > > key column for many cases involving RANGE-partitioned tables, which are likely > > to rely on histogram rather than MCVs. > > I read your previous email and understand that it would be neccesary to analyze > partitioned tables automatically when any of its children are analyzed. In my > first patch, auto-analyze on partitioned tables worked like this but there were > some comments about performance of autovacuum, especially when partitioned > tables have a lot of children. I reread that part. There was also confusion between autovacuum vacuum and autovacuum analyze. I agree that it *might* be a problem to analyze the parent every time any child is analyzed. But it might also be what's needed for this feature to be useful. > The latest patch lets users set different autovacuum configuration for > each partitioned > tables like this, > create table p3(i int) partition by range(i) with > (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100); > so users can configure those parameters according to partitioning strategies > and other requirements. > > So I think this patch can solve problem you mentioned. I don't think that adequately allows what's needed. I think it out to be possible to get the "analyze parent whenever a child is analyzed" behavior easily, without having to compute new thershold parameters every time one adds partitions, detaches partitions, loades 10x more data into one of the partitions, load only 10% as much data into the latest partition, etc. For example, say a new customer has bunch of partitioned tables which each currently have only one partition (for the current month), and that's expected to grow to at least 20+ partitions (2+ years of history). How does one set the partitioned table's auto-analyze parameters to analyze whenever any child is analyzed ? I don't think it should be needed to update it every month after computing sum(child tuples). Possibly you could allow that behavior for some special values of the threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent whenever any of its children are analyzed. I think that use case and that need would be common, but I'd like to hear what others think. -- Justin
On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > The latest patch lets users set different autovacuum configuration for > > each partitioned > > tables like this, > > create table p3(i int) partition by range(i) with > > (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100); > > so users can configure those parameters according to partitioning strategies > > and other requirements. > > > > So I think this patch can solve problem you mentioned. > > I don't think that adequately allows what's needed. > > I think it out to be possible to get the "analyze parent whenever a child is > analyzed" behavior easily, without having to compute new thershold parameters > every time one adds partitions, detaches partitions, loades 10x more data into > one of the partitions, load only 10% as much data into the latest partition, > etc. > > For example, say a new customer has bunch of partitioned tables which each > currently have only one partition (for the current month), and that's expected > to grow to at least 20+ partitions (2+ years of history). How does one set the > partitioned table's auto-analyze parameters to analyze whenever any child is > analyzed ? I don't think it should be needed to update it every month after > computing sum(child tuples). > > Possibly you could allow that behavior for some special values of the > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent > whenever any of its children are analyzed. > > I think that use case and that need would be common, but I'd like to hear what > others think. Having to constantly pay attention to whether a parent's analyze_threshold/scale_factor is working as intended would surely be an annoyance, so I tend to agree that we might need more than just the ability to set analyze_threshold/scale_factor on parent tables. However, I think we can at least start with being able to do *something* here. :) Maybe others think that this shouldn't be considered committable until we figure out a good analyze threshold calculation formula to apply to parent tables. For the cases in which parent's tuple count grows at about the same rate as partitions (hash mainly), I guess the existing formula more or less works. That is, we can set the parent's threshold/scale_factor same as partitions' and the autovacuum's existing formula will ensure that the parent is auto-analyzed in time and not more than needed. For time-series partitioning, the same formula won't work, as you have detailed in your comments. Is there any other partitioning pattern for which the current formula won't work? Considering that, how about having, say, a autovacuum_analyze_partition_parent_frequency, with string values 'default', 'partition'? -- 'default' assumes the same formula as regular tables, whereas with 'partition', parent is analyzed as soon as a partition is. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote: > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > I don't think that adequately allows what's needed. ...(paragraph with my typos elided)... > > For example, say a new customer has bunch of partitioned tables which each > > currently have only one partition (for the current month), and that's expected > > to grow to at least 20+ partitions (2+ years of history). How does one set the > > partitioned table's auto-analyze parameters to analyze whenever any child is > > analyzed ? I don't think it should be needed to update it every month after > > computing sum(child tuples). > > > > Possibly you could allow that behavior for some special values of the > > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent > > whenever any of its children are analyzed. > > > > I think that use case and that need would be common, but I'd like to hear what > > others think. > > Having to constantly pay attention to whether a parent's > analyze_threshold/scale_factor is working as intended would surely be > an annoyance, so I tend to agree that we might need more than just the > ability to set analyze_threshold/scale_factor on parent tables. > However, I think we can at least start with being able to do > *something* here. :) Maybe others think that this shouldn't be > considered committable until we figure out a good analyze threshold > calculation formula to apply to parent tables. > > Considering that, how about having, say, a > autovacuum_analyze_partition_parent_frequency, with string values > 'default', 'partition'? -- 'default' assumes the same formula as > regular tables, whereas with 'partition', parent is analyzed as soon > as a partition is. I assume you mean a reloption to be applied only to partitioned tables, Your "partition" setting would mean that the scale/threshold values would have no effect, which seems kind of unfortunate. I think it should be called something else, and done differently, like maybe: autovacuum_analyze_mode = {off,sum,max,...} The threshold would be threshold + scale*tuples, as always, but would be compared with f(changes) as determined by the relopt. sum(changes) would do what you called "default", comparing the sum(changes) across all partitions to the threshold, which is itself computed using sum(reltuples) AS reltuples. max(changes) would compute max(changes) compared to the threshold, and the threshold would be computed separately for each partition's reltuples: threshold_N = parent_threshold + parent_scale * part_N_tuples. If *any* partition exceeds that threshold, the partition itself is analyzed. This allows what I want for time-series. Maybe this would have an alias called "any". I'm not sure if there's any other useful modes, like avg(changes)? I guess we can add them later if someone thinks of a good use case. Also, for me, the v7 patch warns: |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function [-Wmaybe-uninitialized] | vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples; ..which seems to be a false positive, but easily avoided. This patch includes partitioned tables in pg_stat_*_tables, which is great; I complained awhile ago that they were missing [0]. It might be useful if that part was split out into a separate 0001 patch (?). Thanks, -- Justin [0] https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com
Hello, On Sat, Apr 18, 2020 at 2:08 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote: > > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > > I don't think that adequately allows what's needed. > ...(paragraph with my typos elided)... > > > For example, say a new customer has bunch of partitioned tables which each > > > currently have only one partition (for the current month), and that's expected > > > to grow to at least 20+ partitions (2+ years of history). How does one set the > > > partitioned table's auto-analyze parameters to analyze whenever any child is > > > analyzed ? I don't think it should be needed to update it every month after > > > computing sum(child tuples). > > > > > > Possibly you could allow that behavior for some special values of the > > > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent > > > whenever any of its children are analyzed. > > > > > > I think that use case and that need would be common, but I'd like to hear what > > > others think. > > > > Having to constantly pay attention to whether a parent's > > analyze_threshold/scale_factor is working as intended would surely be > > an annoyance, so I tend to agree that we might need more than just the > > ability to set analyze_threshold/scale_factor on parent tables. > > However, I think we can at least start with being able to do > > *something* here. :) Maybe others think that this shouldn't be > > considered committable until we figure out a good analyze threshold > > calculation formula to apply to parent tables. > > > > Considering that, how about having, say, a > > autovacuum_analyze_partition_parent_frequency, with string values > > 'default', 'partition'? -- 'default' assumes the same formula as > > regular tables, whereas with 'partition', parent is analyzed as soon > > as a partition is. > > I assume you mean a reloption to be applied only to partitioned tables, > > Your "partition" setting would mean that the scale/threshold values would have > no effect, which seems kind of unfortunate. > > I think it should be called something else, and done differently, like maybe: > autovacuum_analyze_mode = {off,sum,max,...} > The above reloption you suggested will be applied all tables? Users might not use it for partitions, so I think we should add "parent" to reloption's name, like Amit's suggestion. > The threshold would be threshold + scale*tuples, as always, but would be > compared with f(changes) as determined by the relopt. > > sum(changes) would do what you called "default", comparing the sum(changes) > across all partitions to the threshold, which is itself computed using > sum(reltuples) AS reltuples. > > max(changes) would compute max(changes) compared to the threshold, and the > threshold would be computed separately for each partition's reltuples: > threshold_N = parent_threshold + parent_scale * part_N_tuples. If *any* > partition exceeds that threshold, the partition itself is analyzed. This > allows what I want for time-series. Maybe this would have an alias called > "any". > I may be wrong but I think the fomula, > threshold_N = parent_threshold + parent_scale * part_N_tuples would use orginary table's threshold, not parent's. If it use parent_threshold, parent might not be analyzed even if its any partition is analyzed when parent_threshold is larger than normal threshold. I'm worried that this case meets requirements for time-series. > I'm not sure if there's any other useful modes, like avg(changes)? I guess we > can add them later if someone thinks of a good use case. > > Also, for me, the v7 patch warns: > |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function [-Wmaybe-uninitialized] > | vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples; > ..which seems to be a false positive, but easily avoided. > Thank you for testing the patch. I got it. I'll update the patch soon. > > This patch includes partitioned tables in pg_stat_*_tables, which is great; I > complained awhile ago that they were missing [0]. It might be useful if that > part was split out into a separate 0001 patch (?). > If partitioned table's statistics is used for other purposes, I think it would be better to split the patch. Does anyone have any opinion? --- Best regards, Yuzuko Hosoya NTT Open Source Software Center
On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote: > In the past, I think there's was talk that maybe someone would invent a clever > way to dynamically combine all the partitions' statistics, so analyzing the > parent wasn't needed. [...] I happened across the thread I was referring to: https://www.postgresql.org/message-id/7363.1426537103@sss.pgh.pa.us I'm not opposed to doing things the currently-proposed way (trigger analyze of partitioned tables based on updates, same as nonpartitioned tables), but we should think if it's worth doing something totally different, like what Tom proposed. Robert had concerns that it would increase planning time. I imagine that argument is even stronger now, since PG12 has *less* planning time for large heirarchies (428b260f8) and advertizes support for "thousands" of partitions. Tom said: > we would automatically get statistics that account for > partitions being eliminated by constraint exclusion, because only the > non-eliminated partitions are present in the appendrel. And second, That's a pretty strong benefit. I don't know if there's a good way to support both(either) ways of doing things. Like maybe a reloption that allows triggering autovacuum on partitioned tables, but if no statistics exist on a partitioned table, then the planner would dynamically determine the selectivity by decending into child statistics (Tom's way). I think the usual way this would play out is that someone with a small partition heirarchies would eventually complain about high planning time and then we'd suggest implementing a manual ANALYZE job. I'm not sure it's good to support two ways anyway, since 1) I think that gives different (better) statistics Tom's way (due to excluding stats of excluded partitions); 2) there's not a good way to put an ANALYZE job in place and then get rid of parent stats (have to DELETE FROM pg_statistic WHERE starelid='...'::regclass; 3) if someone implements an ANALYZE job, but they disable it or it stops working then they have outdated stats forever; -- Justin
On Sat, Apr 25, 2020 at 11:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote: > > In the past, I think there's was talk that maybe someone would invent a clever > > way to dynamically combine all the partitions' statistics, so analyzing the > > parent wasn't needed. [...] > > I happened across the thread I was referring to: > https://www.postgresql.org/message-id/7363.1426537103@sss.pgh.pa.us > > I'm not opposed to doing things the currently-proposed way (trigger analyze of > partitioned tables based on updates, same as nonpartitioned tables), but we > should think if it's worth doing something totally different, like what Tom > proposed. > > Robert had concerns that it would increase planning time. I imagine that > argument is even stronger now, since PG12 has *less* planning time for large > heirarchies (428b260f8) and advertizes support for "thousands" of partitions. > > Tom said: > > we would automatically get statistics that account for > > partitions being eliminated by constraint exclusion, because only the > > non-eliminated partitions are present in the appendrel. And second, > > That's a pretty strong benefit. I don't know if there's a good way to support > both(either) ways of doing things. Like maybe a reloption that allows > triggering autovacuum on partitioned tables, but if no statistics exist on a > partitioned table, then the planner would dynamically determine the selectivity > by decending into child statistics (Tom's way). I think the usual way this > would play out is that someone with a small partition heirarchies would > eventually complain about high planning time and then we'd suggest implementing > a manual ANALYZE job. > > I'm not sure it's good to support two ways anyway, since 1) I think that gives > different (better) statistics Tom's way (due to excluding stats of excluded > partitions); 2) there's not a good way to put an ANALYZE job in place and then > get rid of parent stats (have to DELETE FROM pg_statistic WHERE > starelid='...'::regclass; 3) if someone implements an ANALYZE job, but they > disable it or it stops working then they have outdated stats forever; Thanks for sharing that thread, had not seen it before. I remember discussing with Alvaro and Hosoya-san an approach of generating the whole-tree pg_statistics entries by combining the children's entries, not during planning as the linked thread discusses, but inside autovacuum. The motivation for that design was the complaint that we scan the children twice with the current method of generating whole-tree statistics -- first to generate their own statistics and then again to generate the parent's. Aside from how hard it would be to actually implement, that approach also doesn't address the concern about when to generate the whole-tree statistics. Because the linked thread mentions getting rid of the whole-tree statistics altogether, there is no such concern if we go its way. Although I do agree with Robert's assertion on that thread that making every query on a parent a bit slower would not be a good compromise. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
> On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote: > I'll update the patch soon. Do you have an updated version to submit? The previous patch no longer applies to HEAD, so I'm marking this entry Waiting on Author in the meantime. cheers ./daniel
> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote: > > > I'll update the patch soon. > > Do you have an updated version to submit? The previous patch no longer applies > to HEAD, so I'm marking this entry Waiting on Author in the meantime. > Thank you for letting me know. I attach the latest patch applies to HEAD. I think there are other approaches like Tom's idea that Justin previously referenced, but this patch works the same way as previous patches. (tracks updated/inserted/deleted tuples and checks whether the partitioned tables needs auto-analyze, same as nonpartitioned tables) Because I wanted to be able to analyze partitioned tables by autovacuum as a first step, and I think this approach is the simplest way to do it. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
> On 6 Jul 2020, at 12:35, yuzuko <yuzukohosoya@gmail.com> wrote: > >> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> >>> On 21 Apr 2020, at 18:21, yuzuko <yuzukohosoya@gmail.com> wrote: >> >>> I'll update the patch soon. >> >> Do you have an updated version to submit? The previous patch no longer applies >> to HEAD, so I'm marking this entry Waiting on Author in the meantime. >> > Thank you for letting me know. > I attach the latest patch applies to HEAD. This version seems to fail under Werror which is used in the Travis builds: autovacuum.c: In function ‘relation_needs_vacanalyze’: autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized] anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples; ^ autovacuum.c:2972:9: note: ‘reltuples’ was declared here float4 reltuples; /* pg_class.reltuples */ ^ I've moved this patch to the next commitfest, but kept the status as Waiting on Author. Please submit a new version of the patch. cheers ./daniel
I'm sorry for the late reply. > This version seems to fail under Werror which is used in the Travis builds: > > autovacuum.c: In function ‘relation_needs_vacanalyze’: > autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized] > anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples; > ^ > autovacuum.c:2972:9: note: ‘reltuples’ was declared here > float4 reltuples; /* pg_class.reltuples */ > ^ > I attach the latest patch that solves the above Werror. Could you please check it again? -- Best regards, Yuzuko Hosoya NTT Open Source Software Center
Attachment
> On 17 Aug 2020, at 08:11, yuzuko <yuzukohosoya@gmail.com> wrote: > > I'm sorry for the late reply. > >> This version seems to fail under Werror which is used in the Travis builds: >> >> autovacuum.c: In function ‘relation_needs_vacanalyze’: >> autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized] >> anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples; >> ^ >> autovacuum.c:2972:9: note: ‘reltuples’ was declared here >> float4 reltuples; /* pg_class.reltuples */ >> ^ > > I attach the latest patch that solves the above Werror. > Could you please check it again? This version now pass the tests in the Travis pipeline as can be seen in the link below, and is ready to be reviewed in the upcoming commitfest: http://cfbot.cputube.org/yuzuko-hosoya.html cheers ./daniel