Thread: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17942 Logged by: Henri Chapelle Email address: henri.chapelle@dbandmore.com PostgreSQL version: 13.8 Operating system: Centos 7 Description: Hi, We recently added extended statistics on big partitioned tables in Postgresql 13.8. While runing analyze tablename or vacuumdb -Z -t tablename populate extended statistics, the vacuumdb -Z command (without -t) doesn't process the parent table during the analyze. We created a script to collecte table names and run the analyze table by table. Is it expected ? The documentation mentions that autovacuum will not process the parent table due to no data in the parent table. But vacuumdb should process all tables, right ? Here is an example : CREATE TABLE people ( id int not null, birth_date date not null, country_code character(2) not null, name text ) PARTITION BY RANGE (birth_date); CREATE TABLE people_y2000 PARTITION OF people FOR VALUES FROM ('2000-01-01') TO ('2001-01-01'); CREATE TABLE people_y2001 PARTITION OF people FOR VALUES FROM ('2001-01-01') TO ('2002-01-01'); CREATE TABLE people_y2002 PARTITION OF people FOR VALUES FROM ('2002-01-01') TO ('2003-01-01'); INSERT INTO people (id, birth_date, country_code, name) VALUES (1, '2000-01-01', 'US', 'John'), (2, '2000-02-02', 'IT', 'Jane'), (3, '2001-03-03', 'FR', 'Bob'); create statistics people_stat_001 on birth_date, name from people; vacuumdb -Z -t people : -> ok extende stats collected vacuumdb -Z : -> no extended stats and analyze much faster
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Kyotaro Horiguchi
Date:
At Wed, 24 May 2023 06:41:23 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > vacuumdb -Z -t people : -> ok extende stats collected > vacuumdb -Z : -> no extended stats and analyze much faster ANALYZE processes inhertance parents. By a quick search, I found that commit 3c3bb99330 modified get_rel_oids (which hasnow been relocated to get_all_vacuum_rels) to include RELKIND_PARTITIONED_TABLE. However, it seems like it overlookedupdating the same in vacuumdb.c. This dates back to 10. diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 4b17a07089..1369c37504 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -686,7 +686,8 @@ vacuum_one_database(ConnParams *cparams, { appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array[" CppAsString2(RELKIND_RELATION) ", " - CppAsString2(RELKIND_MATVIEW) "])\n"); + CppAsString2(RELKIND_MATVIEW) ", " + CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n"); has_where = true; } regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Michael Paquier
Date:
On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote: > ANALYZE processes inhertance parents. By a quick search, I found > that commit 3c3bb99330 modified get_rel_oids (which has now been > relocated to get_all_vacuum_rels) to include > RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked > updating the same in vacuumdb.c. This dates back to 10. You cannot do that, because this would make a single run of vacuumdb process more than once each partition. Am I missing something? -- Michael
Attachment
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Kyotaro Horiguchi
Date:
At Thu, 25 May 2023 16:45:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote: > > ANALYZE processes inhertance parents. By a quick search, I found > > that commit 3c3bb99330 modified get_rel_oids (which has now been > > relocated to get_all_vacuum_rels) to include > > RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked > > updating the same in vacuumdb.c. This dates back to 10. > > You cannot do that, because this would make a single run of vacuumdb > process more than once each partition. Am I missing something? It seems to be exactly the same as ANALYZE, though. I'm a bit unclear about our perspective on this SQL command's behavior. We had a patch proposed to prevent such duplication, but it was withdrawn for a reason that's I don't remember clearly. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Michael Paquier
Date:
On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote: > It seems to be exactly the same as ANALYZE, though. I'm a bit unclear > about our perspective on this SQL command's behavior. Sorry for being a bit unclear here. When dealing with partitioned tables, a database-wide ANALYZE processes the partitions individually as well as a full partition/inheritance tree. My point is slightly different though: your suggestion of adding RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work for -Z, but it would cause the vacuum code path of vacuumdb to process more than once all the partitions in a single run. For instance, take this schema: CREATE TABLE parent_list (id int) PARTITION BY LIST (id); CREATE TABLE child_list PARTITION OF parent_list FOR VALUES IN (1, 2); `vacuumdb` would now list both parent_list and child_list, making child_list being vacuumed twice, which is not necessary. In order to get a behavior in parity with the SQL commands ANALYZE, VACUUM and VACUUM ANALYZE, we need to be more careful about the addition of RELKIND_PARTITIONED_TABLE to the filtering clause. -- Michael
Attachment
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Kyotaro Horiguchi
Date:
At Fri, 26 May 2023 14:48:30 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote: > > It seems to be exactly the same as ANALYZE, though. I'm a bit unclear > > about our perspective on this SQL command's behavior. > > Sorry for being a bit unclear here. When dealing with partitioned > tables, a database-wide ANALYZE processes the partitions individually > as well as a full partition/inheritance tree. > > My point is slightly different though: your suggestion of adding > RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work > for -Z, but it would cause the vacuum code path of vacuumdb to process > more than once all the partitions in a single run. For instance, take > this schema: .. > `vacuumdb` would now list both parent_list and child_list, making > child_list being vacuumed twice, which is not necessary. In order to > get a behavior in parity with the SQL commands ANALYZE, VACUUM and > VACUUM ANALYZE, we need to be more careful about the addition of > RELKIND_PARTITIONED_TABLE to the filtering clause. Ah, thanks. The difference lies in how VACUUM and vacuumdb handle table names. VACUUM collects all names automatically, while the vacuumdb specifies individual table names. The difference in handling table names seems to be due to vacuumdb's certain options that need to be checked against each table at the client side, specifically --min-xid-age and min-mxid-age. It might be nice if we included these options in VACUUM/ANALYZE's syntax. Then vacuumdb wouldn't have to explicity gather table names. On the other hand, regarding the existing versions. It would make sense to allow partitioned tables only for analyze-only cases. However, in the case of vacuum-analyze, we rather need to exclude children. Therefore, it might be a better approach to always exlclude children from the target relation list only if the parent is present in the list. Anyway, I don't find a simple way to do that for now. So, the simplest measure for the issue would be to add the description about the restriction to the documentation.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Kyotaro Horiguchi
Date:
At Fri, 26 May 2023 16:49:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > present in the list. Anyway, I don't find a simple way to do that for > now. For 12 or later, pg_partition_ancestors() is available. Thus something like the following query would work. +WITH inh_children AS + (SELECT tc.relid::name, t.nspname FROM + (SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c + JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid + LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid + WHERE c.relkind OPERATOR(pg_catalog.=) 'p') t, + LATERAL + (SELECT tt.relid, t.nspname + FROM pg_catalog.pg_partition_tree(t.relname::text) tt WHERE isleaf) tc) SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm', 'p']) + AND NOT EXISTS (SELECT * FROM inh_children i WHERE i.relid = c.relname AND i.nspname = ns.nspname) ORDER BY c.relpages DESC; The lines prefixed by a '+' removes other than the topmost-parents of trees. For 11, we need to do the same withouth using the function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables
From
Michael Paquier
Date:
On Fri, May 26, 2023 at 06:02:07PM +0900, Kyotaro Horiguchi wrote: > For 12 or later, pg_partition_ancestors() is available. Thus something > like the following query would work. Yes, we will need to be a bit more imaginative for older servers :( Any pre-11 logic added had better mention the compatibility tweak it required, and that pg_partition_ancestors() could be used to simplify the filtering a lot for the analyze case. It would be good to add some test cases to check all the patterns we would need to look after, for both the VACUUM and ANALYZE cases. Perhaps with vacuumdb --verbose and a pattern match check of the logs produced, with at least 2 level of partitioning to check after the case of ANALYZE run across a rather complex partitioning tree? -- Michael