Re: Statistics Import and Export: difference in statistics dumped - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Statistics Import and Export: difference in statistics dumped
Date
Msg-id CAExHW5sEjLx-8r85+r=mL=o4jVx2Y0D-mRzGTZxcfNm4YSeRSQ@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export: difference in statistics dumped  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, Mar 4, 2025 at 11:45 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2025-03-04 at 10:28 +0530, Ashutosh Bapat wrote:
>
> > >
> > > What solution are you suggesting? The only one that comes to mind
> > > is
> > > moving everything to SECTION_POST_DATA, which is possible, but it
> > > seems
> > > like a big design change to satisfy a small detail.
> >
> > We don't have to do that. We can manage it by making statistics of
> > index dependent upon the indexes on the table.
>
> The index relstats are already dependent on the index definition. If
> you have a simple database like:
>
>    CREATE TABLE t(i INT);
>    INSERT INTO t SELECT generate_series(1,10);
>    CREATE INDEX t_idx ON t (i);
>    ANALYZE;
>
> and then you dump it, you get:
>
>
>    ------- SECTION_PRE_DATA -------
>
>    CREATE TABLE public.t ...
>
>    ------- SECTION_DATA -----------
>
>    COPY public.t (i) FROM stdin;
>    ...
>    SELECT * FROM pg_catalog.pg_restore_relation_stats(
>         'version', '180000'::integer,
>         'relation', 'public.t'::regclass,
>         'relpages', '1'::integer,
>         'reltuples', '10'::real,
>         'relallvisible', '0'::integer
>    );
>    ...
>
>    ------- SECTION_POST_DATA ------
>
>    CREATE INDEX t_idx ON public.t USING btree (i);
>    SELECT * FROM pg_catalog.pg_restore_relation_stats(
>         'version', '180000'::integer,
>         'relation', 'public.t_idx'::regclass,
>         'relpages', '2'::integer,
>         'reltuples', '10'::real,
>         'relallvisible', '0'::integer
>    );
>
> (section annotations added for clarity)
>
> There is no problem with the index relstats, because they are already
> dependent on the index definition, and will be restored after the
> CREATE INDEX.
>
> The issue is when the table's restored relstats are different from what
> CREATE INDEX calculates, and then the CREATE INDEX overwrites the
> table's just-restored relation stats. The easiest way to see this is
> when restoring with --no-data, because CREATE INDEX will see an empty
> table and overwrite the table's restored relstats with zeros.
>
> If we view this issue as a dependency problem, then we'd have to make
> the *table relstats* depend on the *index definition*. If a table has
> any indexes, the relstats would need to go after the last index
> definition, effectively moving most relstats to SECTION_POST_DATA. The
> table's attribute stats would not be dependent on the index definition
> (because CREATE INDEX doesn't touch those), so they could stay in
> SECTION_DATA. And if the table doesn't have any indexes, then its
> relstats could also stay in SECTION_DATA. But then we have a mess, so
> we might as well just put all stats in SECTION_POST_DATA.
>
> But I don't see it as a dependency problem. When I look at the above
> SQL, it reads nicely to me and there's no obvious problem with it.

Thanks for explaining it. I

>
> If we want stats to be stable, we need some kind of mode to tell the
> server not to apply these kind of helpful optimizations, otherwise the
> issue will resurface in some form no matter what we do with pg_dump. We
> could invent a new mode, but autovacuum=off seems close enough to me.

Hmm. Updating the statistics without consuming more CPU is more
valuable when autovacuum is off it improves query plans with no extra
efforts. But if adding a new mode is some significant work, riding it
on top of autovacuum=off might ok. It's not documented either way, so
we could change that behaviour later if we find it troublesome.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: per backend WAL statistics
Next
From: Andrei Lepikhov
Date:
Subject: Re: making EXPLAIN extensible