On Fri, Apr 3, 2026 at 2:14 AM Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>> The problem is not limited to this special case. Consider cases when
>> 1) the remote table that has many rows are heavily updated after it
>> got analyzed, and then 2) postgres_fdw imports its stats before it
>> gets re-analyzed. The stats postgres_fdw imports would be stale,
>> causing plan degradation. I don't think we should enable this feature
>> by default until we guarantee stats freshness in some way.
>
> So it seems like we have the following configurations desired by at least somebody:
>
> 0. Row Sampling Only
> 1. Fetch stats and fall back to row sampling.
> 2. Always analyze remote table (assuming it is a table that can hold stats), then fetch stats, and fall back if
necessary.
> 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then try to refetch and if it still fails go
torow sampling.
>
> With the following interpretation of reltuples = 0:
>
> a. The table is definitively empty, stop.
> b. The table is missing stats and running an analyze is cheap (assuming remote analysis is even enabled)
> c. if remote version >= 14 then a else b
>
> I'm of the opinion that 3c is the best configuration for most tables, and you have advocated for 1a without an
analyzeoption and 2a with one. Option 2 seems a bit heavy handed to me, but I could see checking the remote
pg_stat_all_tablesand making an analyze/no-analyze judgement call based on that, perhaps call that
analyze_stale_vacuum_intervalor something like that. That could be a neat feature for v20, and so whatever default we
choosefor fetch_stats, I ask that we choose values that keep our options open for all 4x3 configurations enumerated
above.
Before discussing this, let's wrap up the work for v19. Here is a new
version of the patch. Changes are:
* As above, I think we should disable this feature by default for now,
so I modified the patch as such.
* You are defining a remote query per version at the top of the file
that is used in fetch_attstats, but I think that that reduces
readability, making maintenance hard, so I modified that function to
build the query dynamically in it, as I proposed before. I also
modified that function to use PQsendQuery, not PQsendQueryParams, as
in fetch_relstats, for efficiency.
* I moved the code for opening/closing the connection from
postgresImportStatistics to fetch_remote_statistics, as the connection
is only used in the latter function. I also removed useless cleanup
processing in that function.
* I added regression tests.
* I fixed a minor bug I noticed while adding those tests.
* I did some cleanup and editorialization including re-ordering
functions in logical order.
One thing I'm wondering is: we should rename ImportStatistics to
ImportForeignStatistics, for consistency with ImportForeignSchema?
Also, we should rename fetch_stats to import_stats, for consistency
with eg, import_default?
Anyway I think the patch is now in good shape, except comments/docs,
which I will work on next.
Best regards,
Etsuro Fujita