Home > mailing lists

Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

From	Etsuro Fujita
Subject	Re: Import Statistics in postgres_fdw before resorting to sampling.
Date	December 14 21:40:49
Msg-id	CAPmGK17v-qJL5j2Z25jD33vKLyhAnNppYsy5UTbEAwAeBP=vGQ@mail.gmail.com Whole thread Raw
In response to	Re: Import Statistics in postgres_fdw before resorting to sampling. (Corey Huinker <corey.huinker@gmail.com>)
Responses	Re: Import Statistics in postgres_fdw before resorting to sampling.
List	pgsql-hackers

Tree view

On Sun, Dec 14, 2025 at 4:12 AM Corey Huinker <corey.huinker@gmail.com> wrote:
> On Fri, Dec 12, 2025 at 2:45 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>> CCing Jonathan Katz and Nathan Bossart as they had been sounding boards for me when I started designing this
feature.

>>> Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the
>>> foreign table is an inheritance parent, we would fail to do inherited
>>> stats.

>> Perhaps I'm not understanding completely, but I believe what we're doing now should be ok.
>>
>> The local table of type 'f' can be a member of a partition, but can't be a partition itself, so whatever stats we
getfor it, we're storing them as inh = false. 
>>
>> On the remote side, the table could be an inheritance parent, in which case we ONLY want the inh=true stats, but we
willstill store them locally as inh = false. 
>>
>> The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of the query ensures that we get inh=true
statsif they're there in preference to the inh=false steps. 
>>
>> I will grant you that in an old-style inheritance (i.e. not formal partitions) situation we'd probably want some
weightedmix of the inherited and non-inherited stats, but 1) very few people use old-style inheritance anymore, 2) few
ofthose export tables via a FDW, and 3) there's no way to do that weighting so we should fall back to sampling anyway. 

Ah, I mean the case where the foreign table is an inheritance parent
on the *local* side.  In that case, the return would cause us to skip
the recursive ANALYZE (i.e., do_analyze_rel() with inh=true), leading
to no inherited stats.  I agree that the case is minor, but I don't
think that that's acceptable.

>>> void
>>> ImportStatistics(Relation relation, List *va_cols, int elevel)
>>>
>>> Imports the stats for the foreign table from the foreign server.  As
>>> mentioned in previous emails, I don't think it's a good idea to fall
>>> back to the normal processing when the attempt to import the stats
>>> fails, in which case I think we should just throw an error (or
>>> warning) so that the user can re-try to import the stats after fixing
>>> the foreign side in some way.  So I re-defined this as a void
>>> function.  Note that this re-definition removes the concern mentioned
>>> in the comment starting with "XXX:".  In the postgres_fdw case, as
>>> mentioned in a previous email, I think it would be good to implement
>>> this so that it checks whether the remote table is a view or not when
>>> importing the relation stats from the remote server, and if so, just
>>> throws an error (or warning), making the user reset the fetch_stats
>>> flag.

>> I think I'm ok with this design as the decision, as it still leaves open the fdw-specific options of how to handle
initiallyfinding no remote stats. 
>>
>> I can still see a situation where a local table expects the remote table to eventually have proper statistics on it,
butuntil that happens it will fall back to table samples. This design decision means that either the user lives without
anystatistics for a while, or alters the foreign table options and hopefully remembers to set them back. While I
understandthe desire to first implement something very simple, I think that adding the durability that fallback allows
forwill be harder to implement if we don't build it in from the start. I'm interested to hear with Nathan and/or
Jonathanhave to say about that. 

My concern about the fall-back behavior is that it reduces flexibility
in some cases, as mentioned upthread.  Maybe that could be addressed
by making the behavior an option, but my another (bigger) concern is
that considering that it's the user's responsibility to make remote
stats up-to-date when he/she uses this feature, the "no remote stats
found" result would be his/her fault; it might not be worth
complicating the code for something like that.

Anyway, I too would like to hear the opinions of them (or anyone else).

> Anyway, here's v6 incorporating both threads of feedback.

Thanks for updating the patch!  I will review the postgres_fdw changes next.

Best regards,
Etsuro Fujita

pgsql-hackers by date:

From: Isaac Morland
Date: 14 December, 21:21:40
Subject: Re: Get rid of "Section.N.N.N" on DOCs

From: Tom Lane
Date: 14 December, 22:57:48
Subject: Proposed mid-cycle update of typedefs.list

Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers

Previous

Next