Thread: BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

BUG #17942: vacuumdb doesn't populate extended statistics on partitioned tables

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17942
Logged by:          Henri Chapelle
Email address:      henri.chapelle@dbandmore.com
PostgreSQL version: 13.8
Operating system:   Centos 7
Description:

Hi,
We recently added extended statistics on big partitioned tables in
Postgresql 13.8.
While runing analyze tablename or vacuumdb -Z -t tablename populate extended
statistics, the vacuumdb -Z command (without -t) doesn't process the parent
table during the analyze.
We created a script to collecte table names and run the analyze table by
table.
Is it expected ?
The documentation mentions that autovacuum will not process the parent table
due to no data in the parent table.
But vacuumdb should process all tables, right ?

Here is an example :
CREATE TABLE people (
    id int not null,
    birth_date date not null,
    country_code character(2) not null,
    name text
) PARTITION BY RANGE (birth_date);

CREATE TABLE people_y2000 PARTITION OF people
    FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');

CREATE TABLE people_y2001 PARTITION OF people
    FOR VALUES FROM ('2001-01-01') TO ('2002-01-01');

CREATE TABLE people_y2002 PARTITION OF people
    FOR VALUES FROM ('2002-01-01') TO ('2003-01-01');


INSERT INTO people (id, birth_date, country_code, name) VALUES (1,
'2000-01-01', 'US', 'John'), (2, '2000-02-02', 'IT', 'Jane'), (3,
'2001-03-03', 'FR', 'Bob');

create statistics people_stat_001 on birth_date, name from people;

vacuumdb -Z -t people : -> ok extende stats collected
vacuumdb -Z : -> no extended stats and analyze much faster


At Wed, 24 May 2023 06:41:23 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> vacuumdb -Z -t people : -> ok extende stats collected
> vacuumdb -Z : -> no extended stats and analyze much faster

ANALYZE processes inhertance parents. By a quick search, I found that commit 3c3bb99330 modified get_rel_oids (which
hasnow been relocated to get_all_vacuum_rels) to include RELKIND_PARTITIONED_TABLE. However, it seems like it
overlookedupdating the same in vacuumdb.c. This dates back to 10.
 

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4b17a07089..1369c37504 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -686,7 +686,8 @@ vacuum_one_database(ConnParams *cparams,
     {
         appendPQExpBufferStr(&catalog_query, " WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array["
                              CppAsString2(RELKIND_RELATION) ", "
-                             CppAsString2(RELKIND_MATVIEW) "])\n");
+                             CppAsString2(RELKIND_MATVIEW) ", "
+                             CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
         has_where = true;
     }

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote:
> ANALYZE processes inhertance parents. By a quick search, I found
> that commit 3c3bb99330 modified get_rel_oids (which has now been
> relocated to get_all_vacuum_rels) to include
> RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked
> updating the same in vacuumdb.c. This dates back to 10.

You cannot do that, because this would make a single run of vacuumdb
process more than once each partition.  Am I missing something?
--
Michael

Attachment
At Thu, 25 May 2023 16:45:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Thu, May 25, 2023 at 01:07:59PM +0900, Kyotaro Horiguchi wrote:
> > ANALYZE processes inhertance parents. By a quick search, I found
> > that commit 3c3bb99330 modified get_rel_oids (which has now been
> > relocated to get_all_vacuum_rels) to include
> > RELKIND_PARTITIONED_TABLE. However, it seems like it overlooked
> > updating the same in vacuumdb.c. This dates back to 10.
> 
> You cannot do that, because this would make a single run of vacuumdb
> process more than once each partition.  Am I missing something?

It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
about our perspective on this SQL command's behavior.

We had a patch proposed to prevent such duplication, but it was
withdrawn for a reason that's I don't remember clearly.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote:
> It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
> about our perspective on this SQL command's behavior.

Sorry for being a bit unclear here.  When dealing with partitioned
tables, a database-wide ANALYZE processes the partitions individually
as well as a full partition/inheritance tree.

My point is slightly different though: your suggestion of adding
RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work
for -Z, but it would cause the vacuum code path of vacuumdb to process
more than once all the partitions in a single run.  For instance, take
this schema:
CREATE TABLE parent_list (id int) PARTITION BY LIST (id);
                                                   
CREATE TABLE child_list PARTITION OF parent_list
  FOR VALUES IN (1, 2);
                                                 

`vacuumdb` would now list both parent_list and child_list, making
child_list being vacuumed twice, which is not necessary.  In order to
get a behavior in parity with the SQL commands ANALYZE, VACUUM and
VACUUM ANALYZE, we need to be more careful about the addition of
RELKIND_PARTITIONED_TABLE to the filtering clause.
--
Michael

Attachment
At Fri, 26 May 2023 14:48:30 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Fri, May 26, 2023 at 09:15:36AM +0900, Kyotaro Horiguchi wrote:
> > It seems to be exactly the same as ANALYZE, though. I'm a bit unclear
> > about our perspective on this SQL command's behavior.
> 
> Sorry for being a bit unclear here.  When dealing with partitioned
> tables, a database-wide ANALYZE processes the partitions individually
> as well as a full partition/inheritance tree.
> 
> My point is slightly different though: your suggestion of adding
> RELKIND_PARTITIONED_TABLE to the filter added in vacuumdb would work
> for -Z, but it would cause the vacuum code path of vacuumdb to process
> more than once all the partitions in a single run.  For instance, take
> this schema: 
..
> `vacuumdb` would now list both parent_list and child_list, making
> child_list being vacuumed twice, which is not necessary.  In order to
> get a behavior in parity with the SQL commands ANALYZE, VACUUM and
> VACUUM ANALYZE, we need to be more careful about the addition of
> RELKIND_PARTITIONED_TABLE to the filtering clause.

Ah, thanks. The difference lies in how VACUUM and vacuumdb handle
table names. VACUUM collects all names automatically, while the
vacuumdb specifies individual table names. The difference in handling
table names seems to be due to vacuumdb's certain options that need to
be checked against each table at the client side, specifically
--min-xid-age and min-mxid-age.

It might be nice if we included these options in VACUUM/ANALYZE's
syntax. Then vacuumdb wouldn't have to explicity gather table names.

On the other hand, regarding the existing versions. It would make
sense to allow partitioned tables only for analyze-only
cases. However, in the case of vacuum-analyze, we rather need to
exclude children. Therefore, it might be a better approach to always
exlclude children from the target relation list only if the parent is
present in the list. Anyway, I don't find a simple way to do that for
now.

So, the simplest measure for the issue would be to add the description
about the restriction to the documentation..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Fri, 26 May 2023 16:49:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> present in the list. Anyway, I don't find a simple way to do that for
> now.

For 12 or later, pg_partition_ancestors() is available. Thus something
like the following query would work.

+WITH inh_children AS
+ (SELECT tc.relid::name, t.nspname FROM
+  (SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
+   JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
+   LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
+   WHERE c.relkind OPERATOR(pg_catalog.=) 'p') t,
+  LATERAL
+   (SELECT tt.relid, t.nspname
+    FROM pg_catalog.pg_partition_tree(t.relname::text) tt WHERE isleaf) tc)

SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
 JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid 
 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid 
 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm', 'p'])
+  AND NOT EXISTS (SELECT * FROM inh_children i WHERE i.relid = c.relname AND i.nspname = ns.nspname)

ORDER BY c.relpages DESC;

The lines prefixed by a '+' removes other than the topmost-parents of trees.

For 11, we need to do the same withouth using the function.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Fri, May 26, 2023 at 06:02:07PM +0900, Kyotaro Horiguchi wrote:
> For 12 or later, pg_partition_ancestors() is available. Thus something
> like the following query would work.

Yes, we will need to be a bit more imaginative for older servers :(

Any pre-11 logic added had better mention the compatibility tweak it
required, and that pg_partition_ancestors() could be used to simplify
the filtering a lot for the analyze case.  It would be good to add
some test cases to check all the patterns we would need to look after,
for both the VACUUM and ANALYZE cases.  Perhaps with vacuumdb
--verbose and a pattern match check of the logs produced, with at
least 2 level of partitioning to check after the case of ANALYZE run
across a rather complex partitioning tree?
--
Michael

Attachment