Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Import Statistics in postgres_fdw before resorting to sampling.
Date
Msg-id CAPmGK14U+wiuuqJ8Qgd8P-aZ52XO=o2DyF-sRhVp8qaJ5HduAg@mail.gmail.com
Whole thread
In response to Re: Import Statistics in postgres_fdw before resorting to sampling.  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Import Statistics in postgres_fdw before resorting to sampling.
List pgsql-hackers
On Fri, Apr 3, 2026 at 2:14 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>> The problem is not limited to this special case.  Consider cases when
>> 1) the remote table that has many rows are heavily updated after it
>> got analyzed, and then 2) postgres_fdw imports its stats before it
>> gets re-analyzed.  The stats postgres_fdw imports would be stale,
>> causing plan degradation.  I don't think we should enable this feature
>> by default until we guarantee stats freshness in some way.
>
> So it seems like we have the following configurations desired by at least somebody:
>
> 0. Row Sampling Only
> 1. Fetch stats and fall back to row sampling.
> 2. Always analyze remote table (assuming it is a table that can hold stats), then fetch stats, and fall back if
necessary.
> 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then try to refetch and if it still fails go
torow sampling. 
>
> With the following interpretation of reltuples = 0:
>
> a. The table is definitively empty, stop.
> b. The table is missing stats and running an analyze is cheap (assuming remote analysis is even enabled)
> c. if remote version >= 14 then a else b
>
> I'm of the opinion that 3c is the best configuration for most tables, and you have advocated for 1a without an
analyzeoption and 2a with one. Option 2 seems a bit heavy handed to me, but I could see checking the remote
pg_stat_all_tablesand making an analyze/no-analyze judgement call based on that, perhaps call that
analyze_stale_vacuum_intervalor something like that. That could be a neat feature for v20, and so whatever default we
choosefor fetch_stats, I ask that we choose values that keep our options open for all 4x3 configurations enumerated
above.

Before discussing this, let's wrap up the work for v19.  Here is a new
version of the patch.  Changes are:

* As above, I think we should disable this feature by default for now,
so I modified the patch as such.

* You are defining a remote query per version at the top of the file
that is used in fetch_attstats, but I think that that reduces
readability, making maintenance hard, so I modified that function to
build the query dynamically in it, as I proposed before.  I also
modified that function to use PQsendQuery, not PQsendQueryParams, as
in fetch_relstats, for efficiency.

* I moved the code for opening/closing the connection from
postgresImportStatistics to fetch_remote_statistics, as the connection
is only used in the latter function.  I also removed useless cleanup
processing in that function.

* I added regression tests.

* I fixed a minor bug I noticed while adding those tests.

* I did some cleanup and editorialization including re-ordering
functions in logical order.

One thing I'm wondering is: we should rename ImportStatistics to
ImportForeignStatistics, for consistency with ImportForeignSchema?
Also, we should rename fetch_stats to import_stats, for consistency
with eg, import_default?

Anyway I think the patch is now in good shape, except comments/docs,
which I will work on next.

Best regards,
Etsuro Fujita

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: vectorized CRC on ARM64
Next
From: Imran Zaheer
Date:
Subject: Remove commented-out code in 026_overwrite_contrecord.pl