Re: Join push-down support for foreign tables - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Join push-down support for foreign tables
Date
Msg-id CAOeZVieZFg_XT3qnSfPKTQoAPPUXpaANBjy-XiqbZ32abz34PQ@mail.gmail.com
Whole thread Raw
In response to Re: Join push-down support for foreign tables  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Join push-down support for foreign tables  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers



On Thu, Sep 4, 2014 at 9:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep  4, 2014 at 08:41:43PM +0530, Atri Sharma wrote:
>
>
> On Thursday, September 4, 2014, Bruce Momjian <bruce@momjian.us> wrote:
>
>     On Thu, Sep  4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
>     > The main problem I see here is that accurate costing may require a
>     > round-trip to the remote server.  If there is only one path that is
>     > probably OK; the cost of asking the question will usually be more than
>     > paid for by hearing that the pushed-down join clobbers the other
>     > possible methods of executing the query.  But if there are many paths,
>     > for example because there are multiple sets of useful pathkeys, it
>     > might start to get a bit expensive.
>     >
>     > Probably both the initial cost and final cost calculations should be
>     > delegated to the FDW, but maybe within postgres_fdw, the initial cost
>     > should do only the work that can be done without contacting the remote
>     > server; then, let the final cost step do that if appropriate.  But I'm
>     > not entirely sure what is best here.
>
>     I am thinking eventually we will need to cache the foreign server
>     statistics on the local server.
>
>
>
>
> Wouldn't that lead to issues where the statistics get outdated and we have to
> anyways query the foreign server before planning any joins? Or are you thinking
> of dropping the foreign table statistics once the foreign join is complete?

I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server.  I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.



That could lead to some bloat in storing statistics since we may have a lot of tables for a lot of foreign servers. Also, will we have VACUUM look at ANALYZING the foreign tables?

Also, how will we decide that the statistics are invalid? Will we have the FDW query the foreign server and do some sort of comparison between the statistics the foreign server has and the statistics we locally have? I am trying to understand how the idea of invalidation message from foreign server will work.

Regards,

Atri 

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Join push-down support for foreign tables
Next
From: Marko Tiikkaja
Date:
Subject: Re: PL/pgSQL 2