Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers
| From | Etsuro Fujita |
|---|---|
| Subject | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date | |
| Msg-id | CAPmGK17v-qJL5j2Z25jD33vKLyhAnNppYsy5UTbEAwAeBP=vGQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Import Statistics in postgres_fdw before resorting to sampling. (Corey Huinker <corey.huinker@gmail.com>) |
| Responses |
Re: Import Statistics in postgres_fdw before resorting to sampling.
|
| List | pgsql-hackers |
On Sun, Dec 14, 2025 at 4:12 AM Corey Huinker <corey.huinker@gmail.com> wrote: > On Fri, Dec 12, 2025 at 2:45 PM Corey Huinker <corey.huinker@gmail.com> wrote: >> CCing Jonathan Katz and Nathan Bossart as they had been sounding boards for me when I started designing this feature. >>> Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the >>> foreign table is an inheritance parent, we would fail to do inherited >>> stats. >> Perhaps I'm not understanding completely, but I believe what we're doing now should be ok. >> >> The local table of type 'f' can be a member of a partition, but can't be a partition itself, so whatever stats we getfor it, we're storing them as inh = false. >> >> On the remote side, the table could be an inheritance parent, in which case we ONLY want the inh=true stats, but we willstill store them locally as inh = false. >> >> The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of the query ensures that we get inh=true statsif they're there in preference to the inh=false steps. >> >> I will grant you that in an old-style inheritance (i.e. not formal partitions) situation we'd probably want some weightedmix of the inherited and non-inherited stats, but 1) very few people use old-style inheritance anymore, 2) few ofthose export tables via a FDW, and 3) there's no way to do that weighting so we should fall back to sampling anyway. Ah, I mean the case where the foreign table is an inheritance parent on the *local* side. In that case, the return would cause us to skip the recursive ANALYZE (i.e., do_analyze_rel() with inh=true), leading to no inherited stats. I agree that the case is minor, but I don't think that that's acceptable. >>> void >>> ImportStatistics(Relation relation, List *va_cols, int elevel) >>> >>> Imports the stats for the foreign table from the foreign server. As >>> mentioned in previous emails, I don't think it's a good idea to fall >>> back to the normal processing when the attempt to import the stats >>> fails, in which case I think we should just throw an error (or >>> warning) so that the user can re-try to import the stats after fixing >>> the foreign side in some way. So I re-defined this as a void >>> function. Note that this re-definition removes the concern mentioned >>> in the comment starting with "XXX:". In the postgres_fdw case, as >>> mentioned in a previous email, I think it would be good to implement >>> this so that it checks whether the remote table is a view or not when >>> importing the relation stats from the remote server, and if so, just >>> throws an error (or warning), making the user reset the fetch_stats >>> flag. >> I think I'm ok with this design as the decision, as it still leaves open the fdw-specific options of how to handle initiallyfinding no remote stats. >> >> I can still see a situation where a local table expects the remote table to eventually have proper statistics on it, butuntil that happens it will fall back to table samples. This design decision means that either the user lives without anystatistics for a while, or alters the foreign table options and hopefully remembers to set them back. While I understandthe desire to first implement something very simple, I think that adding the durability that fallback allows forwill be harder to implement if we don't build it in from the start. I'm interested to hear with Nathan and/or Jonathanhave to say about that. My concern about the fall-back behavior is that it reduces flexibility in some cases, as mentioned upthread. Maybe that could be addressed by making the behavior an option, but my another (bigger) concern is that considering that it's the user's responsibility to make remote stats up-to-date when he/she uses this feature, the "no remote stats found" result would be his/her fault; it might not be worth complicating the code for something like that. Anyway, I too would like to hear the opinions of them (or anyone else). > Anyway, here's v6 incorporating both threads of feedback. Thanks for updating the patch! I will review the postgres_fdw changes next. Best regards, Etsuro Fujita
pgsql-hackers by date: