RE: analyze-in-stages post upgrade questions - Mailing list pgsql-general

From Zechman, Derek S
Subject RE: analyze-in-stages post upgrade questions
Date
Msg-id PH0PR04MB82943D67CD179002714AA503C044A@PH0PR04MB8294.namprd04.prod.outlook.com
Whole thread Raw
In response to Re: analyze-in-stages post upgrade questions  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: analyze-in-stages post upgrade questions
List pgsql-general
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> > performed the analyze-in-stages post upgrade.  It has been noticed that 
> > some plans changed to use hash joins instead of nested loops.  Further 
> > investigation found it was because the parent table of partitioned 
> > tables did not have stats.  After running an ANALYZE on the parent 
> > tables we got similar plan an execution times as before.
> > 
> > I have two questions
> > 
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> > 
> > 2 – What happens if we do not run analyze-in-stages post upgrade and 
> > just run an analyze?
> 
> It is spelled out in the docs:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvnFFR6Ws$
> 
> Emphasis added
> 
> "Using vacuumdb --all --analyze-only can efficiently generate such 
> statistics, and the use of --jobs can speed it up. Option 
> --analyze-in-stages can be used to generate **minimal statistics** 
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
> overridden to speed up statistics generation using PGOPTIONS, e.g., 
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
> 
> and from here:
> 
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA!jU5QgA1re0Txg_h2dD1N3XvK_l8hYdyMvpcxrLL5GDyQ5qN4aGQcxmDE8qmaV_p5telTzYmOL6S3fR8eRc0s_8UvvC7gfd0$
> 
> "--analyze-in-stages
> 
>      Only calculate statistics for use by the optimizer (no vacuum), 
> like --analyze-only. Run three stages of analyze; the first stage uses 
> the lowest possible statistics target (see default_statistics_target) to 
> produce usable statistics faster, and subsequent stages build the full 
> statistics.
> 
>      This option is only useful to analyze a database that currently has 
> no statistics or has wholly incorrect ones, such as if it is newly 
> populated from a restored dump or by pg_upgrade. Be aware that running 
> with this option in a database with existing statistics may cause the 
> query optimizer choices to become transiently worse due to the low 
> statistics targets of the early stages.
 
Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.
 
Derek, can cou share the pg_stats entries for the partitioned table?
 
Yours,
Laurenz Albe
 
 
There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
 
=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;
   relname   | reltuples | relkind
-------------+-----------+---------
 chapter     |        -1 | p
 chapter_1   |         4 | r
 chapter_10  |         4 | r
 chapter_100 |        30 | r
 chapter_101 |        15 | r
 chapter_102 |        15 | r
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
     0
(1 row)
 
=> analyze chapter;
ANALYZE
=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';
 relname | reltuples | relkind
---------+-----------+---------
 chapter |      7589 | p
(1 row)
 
=> select count(*) from pg_stats where tablename='chapter';
 count
-------
    49
(1 row)
 
toy_epc_stg_1_db=>

pgsql-general by date:

Previous
From: "Zechman, Derek S"
Date:
Subject: RE: analyze-in-stages post upgrade questions
Next
From: Laurenz Albe
Date:
Subject: Re: analyze-in-stages post upgrade questions