Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Import Statistics in postgres_fdw before resorting to sampling.
Date
Msg-id aPXaeowZ6RBPlsrD@paquier.xyz
Whole thread Raw
In response to Re: Import Statistics in postgres_fdw before resorting to sampling.  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Import Statistics in postgres_fdw before resorting to sampling.
List pgsql-hackers
On Sat, Oct 18, 2025 at 08:32:24PM -0400, Corey Huinker wrote:
> Rebased again.

Hearing an opinion from Fujita-san would be interesting here, so I am
adding him in CC.  I have been looking a little bit at this patch.

+    ImportStatistics_function ImportStatistics;

All FDW callbacks are documented in fdwhandler.sgml.  This new one is
not.

I am a bit uncomfortable regarding the design you are using here,
where the ImportStatistics callback, if defined, takes priority over
the existing AnalyzeForeignTable, especially regarding the fact that
both callbacks attempt to retrieve the same data, except that the
existing callback has a different idea of the timing to use to
retrieve reltuples and relpages.  The original callback
AnalyzeForeignTable retrieves immediately the total number of pages
via SQL, to feed ANALYZE.  reltuples is then fed to ANALYZE from a
callback function that that's defined in AnalyzeForeignTable().

My point is: rather than trying to implement a second solution with a
new callback, shouldn't we try to rework the existing callback so as
it would fit better with what you are trying to do here: feed data
that ANALYZE would then be in charge of inserting?  Relying on
pg_restore_relation_stats() for the job feels incorrect to me knowing
that ANALYZE is the code path in charge of updating the stats of a
relation.  The new callback is a shortcut that bypasses what ANALYZE
does, so the problem, at least it seems to me, is that we want to
retrieve all the data in a single step, like your new callback, not in
two steps, something that only the existing callback allows.  Hence,
wouldn't it be more natural to retrieve the total number of pages and
reltuples from one callback, meaning that for local relations we
should delay RelationGetNumberOfBlocks() inside the existing
"acquire_sample_rows" callback (renaming it would make sense)?  This
requires some redesign of AnalyzeForeignTable and the internals of
analyze.c, but it would let FDW extensions know about the potential
efficiency gain.

There is also a performance concern to be made with the patch, but as
it's an ANALYZE path that may be acceptable: if we fail the first
callback, then we may call the second callback.

Fujita-san, what do you think?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: using index to speedup add not null constraints to a table
Next
From: Michael Paquier
Date:
Subject: Re: Preserve index stats during ALTER TABLE ... TYPE ...