Re: Append with naive multiplexing of FDWs - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Append with naive multiplexing of FDWs
Date
Msg-id 20191205.122637.1855053671071685153.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Append with naive multiplexing of FDWs  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Append with naive multiplexing of FDWs
List pgsql-hackers
Hello.

At Sat, 30 Nov 2019 14:26:11 -0500, Bruce Momjian <bruce@momjian.us> wrote in 
> On Sun, Nov 17, 2019 at 09:54:55PM +1300, Thomas Munro wrote:
> > On Sat, Sep 28, 2019 at 4:20 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > On Wed, Sep  4, 2019 at 06:18:31PM +1200, Thomas Munro wrote:
> > > > A few years back[1] I experimented with a simple readiness API that
> > > > would allow Append to start emitting tuples from whichever Foreign
> > > > Scan has data available, when working with FDW-based sharding.  I used
> > > > that primarily as a way to test Andres's new WaitEventSet stuff and my
> > > > kqueue implementation of that, but I didn't pursue it seriously
> > > > because I knew we wanted a more ambitious async executor rewrite and
> > > > many people had ideas about that, with schedulers capable of jumping
> > > > all over the tree etc.
> > > >
> > > > Anyway, Stephen Frost pinged me off-list to ask about that patch, and
> > > > asked why we don't just do this naive thing until we have something
> > > > better.  It's a very localised feature that works only between Append
> > > > and its immediate children.  The patch makes it work for postgres_fdw,
> > > > but it should work for any FDW that can get its hands on a socket.
> > > >
> > > > Here's a quick rebase of that old POC patch, along with a demo.  Since
> > > > 2016, Parallel Append landed, but I didn't have time to think about
> > > > how to integrate with that so I did a quick "sledgehammer" rebase that
> > > > disables itself if parallelism is in the picture.
> > >
> > > Yes, sharding has been waiting on parallel FDW scans.  Would this work
> > > for parallel partition scans if the partitions were FDWs?
> > 
> > Yeah, this works for partitions that are FDWs (as shown), but only for
> > Append, not for Parallel Append.  So you'd have parallelism in the
> > sense that your N remote shard servers are all doing stuff at the same
> > time, but it couldn't be in a parallel query on your 'home' server,
> > which is probably good for things that push down aggregation and bring
> > back just a few tuples from each shard, but bad for anything wanting
> > to ship back millions of tuples to chew on locally.  Do you think
> > that'd be useful enough on its own?
> 
> Yes, I think so.  There are many data warehouse queries that want to
> return only aggregate values, or filter for a small number of rows. 
> Even OLTP queries might return only a few rows from multiple partitions.
> This would allow for a proof-of-concept implementation so we can see how
> realistic this approach is.
> 
> > The problem is that parallel safe non-partial plans (like postgres_fdw
> > scans) are exclusively 'claimed' by one process under Parallel Append,
> > so with the patch as posted, if you modify it to allow parallelism
> > then it'll probably give correct answers but nothing prevents a single
> > process from claiming and starting all the scans and then waiting for
> > them to be ready, while the other processes miss out on doing any work
> > at all.  There's probably some kludgy solution involving not letting
> > any one worker start more than X, and some space cadet solution
> > involving passing sockets around and teaching libpq to hand over
> > connections at certain controlled phases of the protocol (due to lack
> > of threads), but nothing like that has jumped out as the right path so
> > far.
> 
> I am unclear how many queries can do any meaningful work until all
> shards have giving their full results.

There's my pending (somewhat stale) patch, which allows to run local
scans while waiting for remote servers.

https://www.postgresql.org/message-id/20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp

I (or we) wanted to introduce the asynchronous node mechanism as the
basis of async-capable postgres_fdw. The reason why it is stopping is
that we are seeing and I am waiting the executor change that makes
executor push-up style, on which the async-node mechanism will be
constructed. If that won't happen shortly, I'd like to continue that
work..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Increase footprint of %m and reduce strerror()
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Increase footprint of %m and reduce strerror()