Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers
| From | Corey Huinker |
|---|---|
| Subject | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date | |
| Msg-id | CADkLM=dfuT8YgXOyoXW7nuGdOFw-sYh2uR3hQS-N66Gj_k_ytw@mail.gmail.com Whole thread Raw |
| In response to | Re: Import Statistics in postgres_fdw before resorting to sampling. (Corey Huinker <corey.huinker@gmail.com>) |
| List | pgsql-hackers |
On Fri, Dec 12, 2025 at 2:45 PM Corey Huinker <corey.huinker@gmail.com> wrote:
CCing Jonathan Katz and Nathan Bossart as they had been sounding boards for me when I started designing this feature.Returning in the FDW_IMPORT_STATS_OK case isn't 100% correct; if the
foreign table is an inheritance parent, we would fail to do inherited
stats.Perhaps I'm not understanding completely, but I believe what we're doing now should be ok.The local table of type 'f' can be a member of a partition, but can't be a partition itself, so whatever stats we get for it, we're storing them as inh = false.On the remote side, the table could be an inheritance parent, in which case we ONLY want the inh=true stats, but we will still store them locally as inh = false.The DISTINCT ON(a.attname)....ORDER BY a.attname, s.inherited DESC part of the query ensures that we get inh=true stats if they're there in preference to the inh=false steps.I will grant you that in an old-style inheritance (i.e. not formal partitions) situation we'd probably want some weighted mix of the inherited and non-inherited stats, but 1) very few people use old-style inheritance anymore, 2) few of those export tables via a FDW, and 3) there's no way to do that weighting so we should fall back to sampling anyway.None of this takes away from your suggestions down below.IIUC, the FDW needs to do two things within the ImportStatistics
callback function: check the importability, and if ok, do the work. I
think that would make the API complicated. To avoid that, how about
1) splitting the callback function into the two functions shown below
and then 2) rewriting the above to something like the attached? The
attached addresses the returning issue mentioned above as well.
bool
StatisticsAreImportable(Relation relation)
Checks the importability, and if ok, returns true, in which case the
following callback function is called. In the postgres_fdw case, we
could implement this to check if the fetch_stats flag for the foreign
table is set to true, and if so, return true.+1void
ImportStatistics(Relation relation, List *va_cols, int elevel)
Imports the stats for the foreign table from the foreign server. As
mentioned in previous emails, I don't think it's a good idea to fall
back to the normal processing when the attempt to import the stats
fails, in which case I think we should just throw an error (or
warning) so that the user can re-try to import the stats after fixing
the foreign side in some way. So I re-defined this as a void
function. Note that this re-definition removes the concern mentioned
in the comment starting with "XXX:". In the postgres_fdw case, as
mentioned in a previous email, I think it would be good to implement
this so that it checks whether the remote table is a view or not when
importing the relation stats from the remote server, and if so, just
throws an error (or warning), making the user reset the fetch_stats
flag.I think I'm ok with this design as the decision, as it still leaves open the fdw-specific options of how to handle initially finding no remote stats.I can still see a situation where a local table expects the remote table to eventually have proper statistics on it, but until that happens it will fall back to table samples. This design decision means that either the user lives without any statistics for a while, or alters the foreign table options and hopefully remembers to set them back. While I understand the desire to first implement something very simple, I think that adding the durability that fallback allows for will be harder to implement if we don't build it in from the start. I'm interested to hear with Nathan and/or Jonathan have to say about that.I added two arguments to the callback function: va_cols, for
supporting the column list in the ANALYZE command, and elevel, for
proper logging. I'm not sure if VaccumParams should be added as well.Good catch, I forgot about that one.Going to think some more on this before I work incorporating your
Heh, the word "changes" got cut off.
Anyway, here's v6 incorporating both threads of feedback.
Attachment
pgsql-hackers by date: