Thread: pgsql: Transfer statistics during pg_upgrade.

pgsql: Transfer statistics during pg_upgrade.

From
Jeff Davis
Date:
Transfer statistics during pg_upgrade.

Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.

Some statistics are not transferred, such as extended statistics or
statistics with a custom stakind.

Now pg_dump accepts the options --schema-only, --no-schema,
--data-only, --no-data, --statistics-only, and --no-statistics; which
allow all combinations of schema, data, and/or stats. The options are
named this way to preserve compatibility with the previous
--schema-only and --data-only options.

Statistics are in SECTION_DATA, unless the object itself is in
SECTION_POST_DATA.

The stats are represented as calls to pg_restore_relation_stats() and
pg_restore_attribute_stats().

Author: Corey Huinker, Jeff Davis
Reviewed-by: Jian He
Discussion: https://postgr.es/m/CADkLM=fzX7QX6r78fShWDjNN3Vcr4PVAnvXxQ4DiGy6V=0bCUA@mail.gmail.com
Discussion: https://postgr.es/m/CADkLM%3DcB0rF3p_FuWRTMSV0983ihTRpsH%2BOCpNyiqE7Wk0vUWA%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1fd1bd871012732e3c6c482667d2f2c56f1a9395

Modified Files
--------------
doc/src/sgml/ref/pg_dump.sgml                  |  75 ++++-
doc/src/sgml/ref/pg_dumpall.sgml               |  41 ++-
doc/src/sgml/ref/pg_restore.sgml               |  48 ++-
doc/src/sgml/ref/pgupgrade.sgml                |   9 +
src/bin/pg_dump/pg_backup.h                    |   2 +
src/bin/pg_dump/pg_backup_archiver.c           |  83 +++--
src/bin/pg_dump/pg_backup_archiver.h           |   3 +-
src/bin/pg_dump/pg_backup_directory.c          |   2 +-
src/bin/pg_dump/pg_dump.c                      | 415 ++++++++++++++++++++++++-
src/bin/pg_dump/pg_dump.h                      |  12 +
src/bin/pg_dump/pg_dump_sort.c                 |  37 ++-
src/bin/pg_dump/pg_dumpall.c                   |  24 +-
src/bin/pg_dump/pg_restore.c                   |  25 +-
src/bin/pg_dump/t/001_basic.pl                 |  18 ++
src/bin/pg_dump/t/002_pg_dump.pl               | 106 ++++++-
src/bin/pg_upgrade/dump.c                      |   3 +-
src/bin/pg_upgrade/option.c                    |   7 +
src/bin/pg_upgrade/pg_upgrade.h                |   1 +
src/bin/pg_upgrade/t/002_pg_upgrade.pl         |   7 +
src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm |  10 +
src/test/recovery/t/027_stream_regress.pl      |   4 +-
src/tools/pgindent/typedefs.list               |   1 +
22 files changed, 858 insertions(+), 75 deletions(-)


Re: pgsql: Transfer statistics during pg_upgrade.

From
Andrew Dunstan
Date:
On 2025-02-20 Th 4:29 AM, Jeff Davis wrote:
> Transfer statistics during pg_upgrade.


Small nit.

I notice this commit has introduced a couple of dubious uses of 
"mututally exclusive":

doc/src/sgml/ref/pg_dump.sgml:        This option is mutually exclusive 
to <option>--data-only</option>
doc/src/sgml/ref/pg_dump.sgml-        and 
<option>--statistics-only</option>.
--
doc/src/sgml/ref/pg_restore.sgml:        This option is mutually 
exclusive of <option>--data-only</option>
doc/src/sgml/ref/pg_restore.sgml-        and 
<option>--statistics-only</option>.


I don't think this is idiomatic usage, and it's possibly not correct 
usage. At best it is jarring. One does not normally follow "mutually 
exclusive" with a preposition. I would replace the these with something 
like "This option cannot be used with ...". If you want to use the 
phrase, it should be something like "This option and ... are mutually 
exclusive."


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Transfer statistics during pg_upgrade.

From
David Rowley
Date:
On Thu, 20 Feb 2025 at 22:29, Jeff Davis <jdavis@postgresql.org> wrote:
> Add support to pg_dump for dumping stats, and use that during
> pg_upgrade so that statistics are transferred during upgrade. In most
> cases this removes the need for a costly re-analyze after upgrade.

I was surprised to see when I did pg_dump -T just_this_table that I
also got statistics data for all tables in the database.

Have filtering stats for tables not in the -T list been overlooked?

David



Re: pgsql: Transfer statistics during pg_upgrade.

From
Jeff Davis
Date:
On Tue, 2025-04-08 at 17:24 +1200, David Rowley wrote:
> I was surprised to see when I did pg_dump -T just_this_table that I
> also got statistics data for all tables in the database.

-T means "exclude this pattern"

I tried a few basic variations and they seemed to do what I expected:

   CREATE TABLE mytable1(i INT);
   INSERT INTO mytable1 VALUES(1);
   CREATE TABLE mytable2(i INT);
   INSERT INTO mytable2 VALUES(2);
   CREATE TABLE mytable3(i INT);
   INSERT INTO mytable3 VALUES(3);
   ANALYZE;

   $ ./bin/pg_dump -t mytable1 postgres | grep "Name: mytable"
   -- Name: mytable1; Type: TABLE; Schema: public; Owner: jdavis
   -- Data for Name: mytable1; Type: TABLE DATA; Schema: ...
   -- Statistics for Name: mytable1; Type: STATISTICS DATA; ...

   $ ./bin/pg_dump -T mytable1 postgres | grep "Name: mytable"
   -- Name: mytable2; Type: TABLE; Schema: public; Owner: jdavis
   -- Name: mytable3; Type: TABLE; Schema: public; Owner: jdavis
   -- Data for Name: mytable2; Type: TABLE DATA; Schema: ...
   -- Data for Name: mytable3; Type: TABLE DATA; Schema: ...
   -- Statistics for Name: mytable2; Type: STATISTICS DATA; ...
   -- Statistics for Name: mytable3; Type: STATISTICS DATA; ...

If you still see a problem, please let me know.

Regards,
    Jeff Davis