Re: Statistics Import and Export: difference in statistics dumped - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Statistics Import and Export: difference in statistics dumped |
Date | |
Msg-id | CAExHW5sEjLx-8r85+r=mL=o4jVx2Y0D-mRzGTZxcfNm4YSeRSQ@mail.gmail.com Whole thread Raw |
In response to | Re: Statistics Import and Export: difference in statistics dumped (Jeff Davis <pgsql@j-davis.com>) |
List | pgsql-hackers |
On Tue, Mar 4, 2025 at 11:45 PM Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2025-03-04 at 10:28 +0530, Ashutosh Bapat wrote: > > > > > > > What solution are you suggesting? The only one that comes to mind > > > is > > > moving everything to SECTION_POST_DATA, which is possible, but it > > > seems > > > like a big design change to satisfy a small detail. > > > > We don't have to do that. We can manage it by making statistics of > > index dependent upon the indexes on the table. > > The index relstats are already dependent on the index definition. If > you have a simple database like: > > CREATE TABLE t(i INT); > INSERT INTO t SELECT generate_series(1,10); > CREATE INDEX t_idx ON t (i); > ANALYZE; > > and then you dump it, you get: > > > ------- SECTION_PRE_DATA ------- > > CREATE TABLE public.t ... > > ------- SECTION_DATA ----------- > > COPY public.t (i) FROM stdin; > ... > SELECT * FROM pg_catalog.pg_restore_relation_stats( > 'version', '180000'::integer, > 'relation', 'public.t'::regclass, > 'relpages', '1'::integer, > 'reltuples', '10'::real, > 'relallvisible', '0'::integer > ); > ... > > ------- SECTION_POST_DATA ------ > > CREATE INDEX t_idx ON public.t USING btree (i); > SELECT * FROM pg_catalog.pg_restore_relation_stats( > 'version', '180000'::integer, > 'relation', 'public.t_idx'::regclass, > 'relpages', '2'::integer, > 'reltuples', '10'::real, > 'relallvisible', '0'::integer > ); > > (section annotations added for clarity) > > There is no problem with the index relstats, because they are already > dependent on the index definition, and will be restored after the > CREATE INDEX. > > The issue is when the table's restored relstats are different from what > CREATE INDEX calculates, and then the CREATE INDEX overwrites the > table's just-restored relation stats. The easiest way to see this is > when restoring with --no-data, because CREATE INDEX will see an empty > table and overwrite the table's restored relstats with zeros. > > If we view this issue as a dependency problem, then we'd have to make > the *table relstats* depend on the *index definition*. If a table has > any indexes, the relstats would need to go after the last index > definition, effectively moving most relstats to SECTION_POST_DATA. The > table's attribute stats would not be dependent on the index definition > (because CREATE INDEX doesn't touch those), so they could stay in > SECTION_DATA. And if the table doesn't have any indexes, then its > relstats could also stay in SECTION_DATA. But then we have a mess, so > we might as well just put all stats in SECTION_POST_DATA. > > But I don't see it as a dependency problem. When I look at the above > SQL, it reads nicely to me and there's no obvious problem with it. Thanks for explaining it. I > > If we want stats to be stable, we need some kind of mode to tell the > server not to apply these kind of helpful optimizations, otherwise the > issue will resurface in some form no matter what we do with pg_dump. We > could invent a new mode, but autovacuum=off seems close enough to me. Hmm. Updating the statistics without consuming more CPU is more valuable when autovacuum is off it improves query plans with no extra efforts. But if adding a new mode is some significant work, riding it on top of autovacuum=off might ok. It's not documented either way, so we could change that behaviour later if we find it troublesome. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: