Re: Speed dblink using alternate libpq tuple storage - Mailing list pgsql-hackers

From Marko Kreen
Subject Re: Speed dblink using alternate libpq tuple storage
Date
Msg-id 20120401230327.GA24637@gmail.com
Whole thread Raw
In response to Re: Speed dblink using alternate libpq tuple storage  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Speed dblink using alternate libpq tuple storage  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Apr 01, 2012 at 05:51:19PM -0400, Tom Lane wrote:
> I've been thinking some more about the early-termination cases (where
> the row processor returns zero or longjmps), and I believe I have got
> proposals that smooth off most of the rough edges there.
> 
> First off, returning zero is really pretty unsafe as it stands, because
> it only works more-or-less-sanely if the connection is being used in
> async style.  If the row processor returns zero within a regular
> PQgetResult call, that will cause PQgetResult to block waiting for more
> input.  Which might not be forthcoming, if we're in the last bufferload
> of a query response from the server.  Even in the async case, I think
> it's a bad design to have PQisBusy return true when the row processor
> requested stoppage.  In that situation, there is work available for the
> outer application code to do, whereas normally PQisBusy == true means
> we're still waiting for the server.
> 
> I think we can fix this by introducing a new PQresultStatus, called say
> PGRES_SUSPENDED, and having PQgetResult return an empty PGresult with
> status PGRES_SUSPENDED after the row processor has returned zero.
> Internally, there'd also be a new asyncStatus PGASYNC_SUSPENDED,
> which we'd set before exiting from the getAnotherTuple call.  This would
> cause PQisBusy and PQgetResult to do the right things.  In PQgetResult,
> we'd switch back to PGASYNC_BUSY state after producing a PGRES_SUSPENDED
> result, so that subsequent calls would resume parsing input.
> 
> With this design, a suspending row processor can be used safely in
> either async or non-async mode.  It does cost an extra PGresult creation
> and deletion per cycle, but that's not much more than a malloc and free.

I added extra magic to PQisBusy(), you are adding extra magic to
PQgetResult().  Not much difference.

Seems we both lost sight of actual usage scenario for the early-exit
logic - that both callback and upper-level code *must* cooperate
for it to be useful.  Instead, we designed API for non-cooperating case,
which is wrong.

So the proper approach would be to have new API call, designed to
handle it, and allow early-exit only from there.

That would also avoid any breakage of old APIs.  Also it would avoid
any accidental data loss, if the user code does not have exactly
right sequence of calls.

How about PQisBusy2(), which returns '2' when early-exit is requested?
Please suggest something better...


-- 
marko



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: measuring lwlock-related latency spikes
Next
From: Tom Lane
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage