Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Ideas about a better API for postgres_fdw remote estimates
Date
Msg-id CAG-ACPXUnC2WBBH=MfVZkhW98eoP6f_yRN-Y-SoWy7HTrh8giw@mail.gmail.com
Whole thread Raw
In response to Re: Ideas about a better API for postgres_fdw remote estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers


On Wed, 9 Sep 2020 at 02:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote

I think that was the topic of *this* thread as started by Tom, but I now
realize Andrey steered it in the direction to allow re-using remote
stats. Which seems useful too, but it confused me a bit.

I didn't realize that the nearby thread I am mentioning is actually this thread :). Sorry.
 

>Using EXPLAIN to get costs from the foreign server isn't efficient. It
>increases planning time a lot; sometimes planning time exceeds execution
>time. If usage of foreign tables becomes more and more common, this isn't
>ideal. I think we should move towards a model in which the optimizer can
>decide whether a subtree involving a foreign server should be evaluated
>locally or on the foreign server without the help of foreign server. One
>way to do it (I am not saying that this is the only or the best way) is to
>estimate the cost of foreign query locally based on the information
>available locally about the foreign server and foreign table. This might
>mean that we have to get that information from the foreign server and cache
>it locally and use it several times, including the indexes on foreign
>table, values of various costs etc. Though this approach doesn't solve all
>of those problems it's one step forward + it makes the current scenario
>also efficient.
>

True, but that ptoject is way more ambitious than providing a simple API
for postgres_fdw to obtain the estimates more efficiently.

Doing all of that is a big project. But what this patch aims at is a small subset which makes statistics collection efficient and automatic. So, just for that, we should consider it.
 

>I agree that the patch needs some work though, esp the code dealing with
>serialization and deserialization of statistics.

I think there's a bunch of open questions, e.g. what to do with extended
statistics - for example what should happen when the extended statistics
object is defined only on local/remote server, or when the definitions
don't match? What should happen when the definitions don't match? This
probably is not an issue for "regular" stats, because that seems pretty
stable, but for extended stats there are differences between versions.

If it is defined on the foreign server but not the local server, there is no need to fetch it from the foreign server. The other way round case is tricky. We could mark the extended statistics object invalid if it's not defined on the foreign server or the definition is different. We have to document it that way. I think that should serve most of the cases.

-- 
Best Wishes,
Ashutosh

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: VACUUM (INTERRUPTIBLE)?
Next
From: Amit Kapila
Date:
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?