Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Import Statistics in postgres_fdw before resorting to sampling.
Date
Msg-id CADkLM=cU1YW4yeW-osNGLkhWQp+p6bt0MYUizYE-Vw87pG-igg@mail.gmail.com
Whole thread
In response to Re: Import Statistics in postgres_fdw before resorting to sampling.  (Corey Huinker <corey.huinker@gmail.com>)
List pgsql-hackers
On Thu, Jan 29, 2026 at 2:20 PM Corey Huinker <corey.huinker@gmail.com> wrote:

The way this is implemented, it will favour the usecases where foreign
tables are not child tables.

It is true that this feature does not benefit the recursive do_analyze_rel() case. But it does help when those same tables are analyzed directly.
 
That leaves out the sharding use case
which I believe is also a significant usecase. I think we need to
think, how can we make that usecase benefit from this optimization.

I agree that we should find a way to do that, but this handles the other case, and doesn't prevent us from later teaching postgresAnalyzeForeignTable() to use cache the rowsample locally for later use, which postgresImportStatistics() could then consider the relative benefits of using that local cached sample vs the already formed remote statistics. Even in that case, I'm guessing that the remote table's stats will be based on a larger and therefore better sample size then the sample we are able to pull across the wire and cache locally, so the remotely computed statistics would be better.

Not being able to use statistics available on the remote side seems a
major limitation. But I don't have a better solution than to think of
supporting some kind of partial statistics.

I'm not against trying to fetch and cache rowsamples, or cache some partially aggregated results of a rowsample, but this patch does not cover that. This patch should, at least in theory, reduce the number of table samples pulled across the wire by 50% and that seems worthwhile.
 

Rebase with some error message cleanups.
Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
Next
From: Tatsuya Kawata
Date:
Subject: Re: [PATCH] Add sampling statistics to autoanalyze log output