Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers
From | Corey Huinker |
---|---|
Subject | Re: Import Statistics in postgres_fdw before resorting to sampling. |
Date | |
Msg-id | CADkLM=feCJrEDWCOpoxxg58J_6zX6nv+BnXufUOEZuP7bYTCbg@mail.gmail.com Whole thread Raw |
In response to | Re: Import Statistics in postgres_fdw before resorting to sampling. (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
List | pgsql-hackers |
This isn't a full review. I looked at the patches mainly to find out
how does it fit into the current method of analysing a foreign table.
Any degree of review is welcome. We're chasing views, reviews, etc.
Right now, do_analyze_rel() is called with FDW specific acquireFunc,
which collects a sample of rows. The sample is passed to attribute
specific compute_stats which fills VacAttrStats for that attribute.
VacAttrStats for all the attributes is passed to update_attstats(),
which updates pg_statistics. The patch changes that to fetch the
statistics from the foreign server and call pg_restore_attribute_stats
for each attribute.
That recap is accurate.
Instead I was expecting that after fetching the
stats from the foreign server, it would construct VacAttrStats and
call update_attstats(). That might be marginally faster since it
avoids SPI call and updates stats for all the attributes. Did you
consider this alternate approach and why it was discarded?
It might be marginally faster, but it would duplicate a lot of the pair-checking (must have a most-common-freqs with a most-common-vals, etc) and type-checking logic (the vals in a most-common vals must all input coerce to the correct datatype for the _destination_ column, etc), and we've already got that in pg_restore_attribute_stats. There used to be a non-fcinfo function that took a long list of Datums and isnull boolean pairs, but that wasn't pretty to look at and was replaced with the positional fcinfo version we have today. This use case might be a reason to bring that back, or expose the existing positional fcinfo function (presently static) if we want to avoid SPI badly enough. As it is, the SPI code is fairly future proof in that it isn't required to add new stat types as they are created. My first attempt at this patch attempted to make a FunctionCallInvoke() on the variadic pg_restore_attribute_stats, but that would require a filled out fn_expr, and to get that we'd have to duplicate a lot of logic from the parser, so the infrastructure isn't available to easily call a variadic function.
If a foreign table points to an inheritance parent on the foreign
server, we will receive two rows for that table - one with inherited =
false and other with true in that order. I think the stats with
inherited=true are relevant to the local server since querying the
parent will fetch rows from children. Since that stats is applied
last, the pg_statistics will retain the intended statistics. But why
to fetch two rows in the first place and waste computing cycles?
Glad you agree that we're fetching the right statistics.
That was the only way I could think of to do one client fetch and still get exactly one row back.
Anything else involved fetching the inh=true first, and if that failed fetching the inh=false statistics. That adds extra round-trips especially given that inherited statistics are more rare than non-inherited statistics. Moreoever, we're making decisions (analyze yes/no, fallback to sampling yes/no) based on whether or not we got statistics back from the foreign server at all, and having to consider the result of two fetches instead of one makes that logic more complicated.
If, however, you were referring to the work we're handing the remote server, I'm open to queries that you think would be more lightweight. However, the pg_stats view is a security barrier view, so we reduce overhead by passing through that barrier as few times as possible.
pgsql-hackers by date: