postgres_fdw & async queries - Mailing list pgsql-hackers

From Stephen Frost
Subject postgres_fdw & async queries
Date
Msg-id 20131104032604.GB2706@tamriel.snowman.net
Whole thread Raw
List pgsql-hackers
Greetings,
 I was poking around the Append node and considering my earlier suggestion to build an Async method of pulling data out
ofnodes under Append which support that option.  It looks workable, but I was also considering simply changing
postgres_fdwto use async queries instead- which are started at ExecInit time instead. 
 A few more specifics about what I've been thinking:
 #1 - Add an Async mechanism to the executor
   - ExecAsyncCapable - Check if the node supports async.   - ExecAsyncStart - Begin an async request from the node   -
ExecAsyncGetFD- Return FD to be used with select()   - ExecAsyncConsume - Consume data from FD   - ExecAsyncBusy -
Checkif getResult would block   - ExecAsyncScan - Get next tuple   - All of the above added to the FDW API also.. 
   This is clearly modeled off of the libpq async system and can likely   be simplified, but it feels like we'd want at
leastthings like   GetFD/Consume/Busy, to allow us to build a select() inside Append   which would wait until data is
availablesomewhere and would then   work through all of the nodes (dealing with ones which are done and   don't return
anything)until it's able to return a row back up.  As   an aside- I've always wondered if we should have an explicit
bulk  mechanism instead of one-row-at-a-time and perhaps we build that   into this API.  I was also thinking we would
havean identifier of   some kind provided through the API which would indicate which I/O   channels are shared and we'd
thenset up Append with a two-level   list of paths to walk, where it would issue one request against each   distinct
I/Ochannel and simply move on to the next entry for the   same I/O channel when the prior one completes. 
 #2 - Make postgres_fdw build/send an async query during ExecInitNode.
   Currently, we wait to set up the remote cursor and fetch records   until we've actually been asked for a record- but
thenwe go and try   to get 100 of them.  It would seem like we might be able to simply   call create_cursor() at the
bottomof postgresBeginForeignScan and   follow that up with a PQsendQuery, postgresIterateForeignScan   wouldn't really
changeexcept for not being asked to also create the   cursor.  Sure, we'd end up blocking if there isn't data available
 for this node yet, but at least we'd get the query started during   Init across all the remote servers, which would
certainlybe a   massive improvment and we wouldn't need to modify Append or the FDW   API at all.  Practically
speaking,we'd parallelize the initial   request of 100 tuples and then scan through the results in order,   meaning
we'donly pull from one machine at a time if the result set   is larger than that initial 100 tuples per system.  That
said,if   most of the work on the remote systems is getting the query started,   it would still be a win. 
   This goes against the current documention, which explicitly states   that the actual scan should not be started
untilIterateForeignScan,   but it's not clear, to me at least, why that's strictly necessary. 
 One other thought going in favor of #1 is that we could make other nodes, such as SeqScan, support the new API which
wouldallow us to parallelize across, say, mutliple tablespaces (on the presumption that they are independent I/O
systemsunderneath, which may or may not be true, of course; perhaps we could explicitly ask the user for the I/O
channelrelationship during tablespace creation).  With either suggestion, we would need to ensure that postgres_fdw
worksthrough the entire Async query and stores the results before trying to do another Async query, if we're going to
keepthe one-connection model, as we can't have two Async queries running at the same time.  I don't see that as a
terribleissue though- we're already fetching/cacheing up to 100 rows of data for the foreign table anyway, and we'd
stillhave the actual cursor.  If we get a request for a different cursor while we have an Async still open, we'd just
finishout the current Async request of 100-or-fewer tuples, cache them, and then send a new request for the new
relation.
 This is all speculation at this point as I've not gotten down into the details of trying to implement any of it, so
pleasefeel free to point out any big holes. :) 
 Thoughts?
   Thanks,
       Stephen

pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: RFC: Async query processing
Next
From: Amit Kapila
Date:
Subject: Re: Shave a few instructions from child-process startup sequence