On Tue, Aug 12, 2025 at 10:33 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>
>
> Attached is my current work on adding remote fetching of statistics to postgres_fdw, and opening the possibility of
doingso to other foreign data wrappers.
>
> This involves adding two new options to postgres_fdw at the server and table level.
>
> The first option, fetch_stats, defaults to true at both levels. If enabled, it will cause an ANALYZE of a
postgres_fdwforeign table to first attempt to fetch relation and attribute statistics from the remote table. If this
succeeds,then those statistics are imported into the local foreign table, allowing us to skip a potentially expensive
samplingoperation.
>
> The second option, remote_analyze, defaults to false at both levels, and only comes into play if the first fetch
succeedsbut no attribute statistics (i.e. the stats from pg_stats) are found. If enabled then the function will attempt
toANALYZE the remote table, and if that is successful then a second attempt at fetching remote statistics will be made.
>
> If no statistics were fetched, then the operation will fall back to the normal sampling operation per settings.
>
> Note patches 0001 and 0002 are already a part of a separate thread
https://www.postgresql.org/message-id/flat/CADkLM%3DcpUiJ3QF7aUthTvaVMmgQcm7QqZBRMDLhBRTR%2BgJX-Og%40mail.gmail.com
regardinga bug (0001) and a nitpick (0002) that came about as a side-effect to this effort, and but I expect those to
beresolved one way or another soon. Any feedback on those two can be handled there.
I think this is very useful to avoid fetching rows from foreign server
and analyzing them locally.
This isn't a full review. I looked at the patches mainly to find out
how does it fit into the current method of analysing a foreign table.
Right now, do_analyze_rel() is called with FDW specific acquireFunc,
which collects a sample of rows. The sample is passed to attribute
specific compute_stats which fills VacAttrStats for that attribute.
VacAttrStats for all the attributes is passed to update_attstats(),
which updates pg_statistics. The patch changes that to fetch the
statistics from the foreign server and call pg_restore_attribute_stats
for each attribute. Instead I was expecting that after fetching the
stats from the foreign server, it would construct VacAttrStats and
call update_attstats(). That might be marginally faster since it
avoids SPI call and updates stats for all the attributes. Did you
consider this alternate approach and why it was discarded?
If a foreign table points to an inheritance parent on the foreign
server, we will receive two rows for that table - one with inherited =
false and other with true in that order. I think the stats with
inherited=true are relevant to the local server since querying the
parent will fetch rows from children. Since that stats is applied
last, the pg_statistics will retain the intended statistics. But why
to fetch two rows in the first place and waste computing cycles?
--
Best Wishes,
Ashutosh Bapat