Thread: analyze-in-stages post upgrade questions

analyze-in-stages post upgrade questions

From
"Zechman, Derek S"
Date:

 

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade.  It has been noticed that some plans changed to use hash joins instead of nested loops.  Further investigation found it was because the parent table of partitioned tables did not have stats.  After running an ANALYZE on the parent tables we got similar plan an execution times as before.

 

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables? 

2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

 

Thanks,

Sean

Re: analyze-in-stages post upgrade questions

From
Laurenz Albe
Date:
On Fri, 2025-06-27 at 08:31 -0700, Adrian Klaver wrote:
> On 6/27/25 06:35, Zechman, Derek S wrote:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> > performed the analyze-in-stages post upgrade.  It has been noticed that
> > some plans changed to use hash joins instead of nested loops.  Further
> > investigation found it was because the parent table of partitioned
> > tables did not have stats.  After running an ANALYZE on the parent
> > tables we got similar plan an execution times as before.
> >
> > I have two questions
> >
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> >
> > 2 – What happens if we do not run analyze-in-stages post upgrade and
> > just run an analyze?
>
> It is spelled out in the docs:
>
> https://www.postgresql.org/docs/current/pgupgrade.html
>
> Emphasis added
>
> "Using vacuumdb --all --analyze-only can efficiently generate such
> statistics, and the use of --jobs can speed it up. Option
> --analyze-in-stages can be used to generate **minimal statistics**
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be
> overridden to speed up statistics generation using PGOPTIONS, e.g.,
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
>
> and from here:
>
> https://www.postgresql.org/docs/current/app-vacuumdb.html
>
> "--analyze-in-stages
>
>      Only calculate statistics for use by the optimizer (no vacuum),
> like --analyze-only. Run three stages of analyze; the first stage uses
> the lowest possible statistics target (see default_statistics_target) to
> produce usable statistics faster, and subsequent stages build the full
> statistics.
>
>      This option is only useful to analyze a database that currently has
> no statistics or has wholly incorrect ones, such as if it is newly
> populated from a restored dump or by pg_upgrade. Be aware that running
> with this option in a database with existing statistics may cause the
> query optimizer choices to become transiently worse due to the low
> statistics targets of the early stages.

Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

Yours,
Laurenz Albe



RE: analyze-in-stages post upgrade questions

From
"Zechman, Derek S"
Date:

 

 

 

We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and performed the analyze-in-stages post upgrade.  It has been noticed that some plans changed to use hash joins instead of nested loops.  Further investigation found it was because the parent table of partitioned tables did not have stats.  After running an ANALYZE on the parent tables we got similar plan an execution times as before.

 

I have two questions

1 - Why does analyze-in-stages not analyze the parent tables? 

2 – What happens if we do not run analyze-in-stages post upgrade and just run an analyze?

 

“It takes more time, and you don't have any statistics on a given table until the ANALYZE on that table completes.

 

How long did "vacuumdb --analyze-only --jobs=$mumble your_db" take?”


Thanks – that makes sense.  I understand what analyze in stages does just wish it would include parent tables.

 

"vacuumdb --all --analyze-only --jobs=7" took about 75 minutes where the analyze-in-stages after upgrade took 115 minutes.  Neither of these activities analyzed the parent tables. 

Reading more and it seems vacuumdb doesn’t analyze parent tables and a manual analyze on those is needed if we want better planner statistics.

RE: analyze-in-stages post upgrade questions

From
"Zechman, Derek S"
Date:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> > performed the analyze-in-stages post upgrade.  It has been noticed that 
> > some plans changed to use hash joins instead of nested loops.  Further 
> > investigation found it was because the parent table of partitioned 
> > tables did not have stats.  After running an ANALYZE on the parent 
> > tables we got similar plan an execution times as before.
> > 
> > I have two questions
> > 
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> > 
> > 2 – What happens if we do not run analyze-in-stages post upgrade and 
> > just run an analyze?
> 
> It is spelled out in the docs:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvnFFR6Ws$
> 
> Emphasis added
> 
> "Using vacuumdb --all --analyze-only can efficiently generate such 
> statistics, and the use of --jobs can speed it up. Option 
> --analyze-in-stages can be used to generate **minimal statistics** 
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
> overridden to speed up statistics generation using PGOPTIONS, e.g., 
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
> 
> and from here:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvvC7gfd0$
> 
> "--analyze-in-stages
> 
>      Only calculate statistics for use by the optimizer (no vacuum), 
> like --analyze-only. Run three stages of analyze; the first stage uses 
> the lowest possible statistics target (see default_statistics_target) to 
> produce usable statistics faster, and subsequent stages build the full 
> statistics.
> 
>      This option is only useful to analyze a database that currently has 
> no statistics or has wholly incorrect ones, such as if it is newly 
> populated from a restored dump or by pg_upgrade. Be aware that running 
> with this option in a database with existing statistics may cause the 
> query optimizer choices to become transiently worse due to the low 
> statistics targets of the early stages.
 
Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.
 
Derek, can cou share the pg_stats entries for the partitioned table?
 
Yours,
Laurenz Albe
 
 
There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
 
=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;
   relname   | reltuples | relkind
-------------+-----------+---------
 chapter     |        -1 | p
 chapter_1   |         4 | r
 chapter_10  |         4 | r
 chapter_100 |        30 | r
 chapter_101 |        15 | r
 chapter_102 |        15 | r
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
     0
(1 row)
 
=> analyze chapter;
ANALYZE
=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';
 relname | reltuples | relkind
---------+-----------+---------
 chapter |      7589 | p
(1 row)
 
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
    49
(1 row)
 
toy_epc_stg_1_db=>

Re: analyze-in-stages post upgrade questions

From
Laurenz Albe
Date:
On Sat, 2025-06-28 at 01:23 +0000, Zechman, Derek S wrote:
> > Well, that wouldn't explain why it doesn't work on partitioned tables.
> > I am under the impression that it should.
> >
> > Derek, can cou share the pg_stats entries for the partitioned table?
>
> There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below

You are right.  I looked at the code, and "vacuumdb" does not process
partitiond tables, even if --analyze-only is specified.  I find that
surprising, aince the SQL command ANALYZE (without a table name) will
also collect statistics for partitioned tables.

I think that it would be a good idea to change that behavior.
In particular, it makes a lot of sense to collect statistics for
partitioned tables after a "pg_upgrade".

Attached is a patch to make "vacuumdb --analyze-only" consider
partitioned tables as well.

Yours,
Laurenz Albe

Attachment