Re: Statistics Import and Export - Mailing list pgsql-hackers

From Hari Krishna Sunder
Subject Re: Statistics Import and Export
Date
Msg-id CAAeiqZ3BPCXziob2-Ldf15h0eS-0C6qbNoT3n5jiXEvMrjEW-w@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
Thanks Nathan.
Here is the patch with a comment.

On Wed, May 14, 2025 at 8:53 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, May 13, 2025 at 05:01:02PM -0700, Hari Krishna Sunder wrote:
> We found a minor issue when testing statistics import with upgrading from
> versions older than v14. (We have VACUUM and ANALYZE disabled)
> 3d351d916b20534f973eda760cde17d96545d4c4
> <https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3d351d916b20534f973eda760cde17d96545d4c4>
> changed
> the default value for reltuples from 0 to -1. So when such tables are
> imported they get the pg13 default of 0 which in pg18 is treated
> as "vacuumed and seen to be empty" instead of "never yet vacuumed". The
> planner then proceeds to pick seq scans even if there are indexes for these
> tables.
> This is a very narrow edge case and the next VACUUM or ANALYZE will fix it
> but the perf of these tables immediately after the upgrade is considerably
> affected.

There was a similar report for vacuumdb's new --missing-stats-only option.
We fixed that in commit 9879105 by removing the check for reltuples != 0,
which means that --missing-stats-only will process empty tables.

> Can we instead use -1 if the version is older than 14, and reltuples is 0?
> This will have the unintended consequence of treating a truly empty table
> as "never yet vacuumed", but that should be fine as empty tables are going
> to be fast regardless of the plan picked.

I'm inclined to agree that we should do this.  Even if it's much more
likely that 0 means empty versus not-yet-processed, the one-time cost of
processing some empty tables doesn't sound too bad.  In any case, since
this only applies to upgrades from <v14, that trade-off should dissipate
over time.

> PS: This is my first patch, so apologies for any issues with the patch.

It needs a comment, but otherwise it looks generally reasonable to me after
a quick glance.

--
nathan
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 18 Beta 1 io_max_concurrency
Next
From: Aleksander Alekseev
Date:
Subject: Should we optimize the `ORDER BY random() LIMIT x` case?