Re: PATCH: Batch/pipelining support for libpq - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: PATCH: Batch/pipelining support for libpq
Date
Msg-id CAGRY4nzf20aWm_VkAOK2QYa-H7A4wbLQHzzt9VjzTwo=4CU5_Q@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Batch/pipelining support for libpq  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: PATCH: Batch/pipelining support for libpq  (Craig Ringer <craig.ringer@enterprisedb.com>)
List pgsql-hackers
On Thu, 11 Feb 2021 at 07:51, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Jan-21, Alvaro Herrera wrote:

> As you can see in an XXX comment in the libpq test program, the current
> implementation has the behavior that PQgetResult() returns NULL after a
> batch is finished and has reported PGRES_BATCH_END.  I don't know if
> there's a hard reason to do that, but I'd like to supress it because it
> seems weird and out of place.

Hello Craig, a question for you since this API is your devising.  I've
been looking at changing the way this works to prevent those NULL
returns from PQgetResult.  That is, instead of having what seems like a
"NULL separator" of query results, you'd just get the PGRES_BATCH_END to
signify a batch end (not a NULL result after the BATCH_END); and the
normal PGRES_COMMAND_OK or PGRES_TUPLES_OK etc when the result of a
command has been sent.  It's not working yet so I'm not sending an
updated patch, but I wanted to know if you had a rationale for including
this NULL return "separator" or was it just a convenience because of how
the code grew together.

The existing API for libpq actually specifies[1] that you should call PQgetResult() until it returns NULL:

> After successfully calling PQsendQuery, call PQgetResult one or more times to obtain the results. PQsendQuery cannot be called again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is done.

Similarly, in single-row mode, the existing API specifies that you should call PQgetResult() until it returns NULL.

Also, IIRC the protocol already permits multiple result sets to be returned, and the caller cannot safely assume that a single PQsendQuery() will produce only a single result set. (I really should write a test extension that exercises that and how libpq reacts to it.)

That's why I went for the NULL return. I think. It's been a while now so it's a bit fuzzy.

I would definitely like an API that does not rely on testing for a NULL return. Especially since NULL return can be ambiguous in the context of row-at-a-time mode. New explicit enumerations for PGresult would make a lot more sense.

So +1 from me for the general idea. I need to look at the patch as it has evolved soon too.

Remember that the original patch I submitted for this was a 1-day weekend hack and proof of concept to show that libpq could be modified to support query pipelining (and thus batching too), so I could illustrate the performance benefits that could be attained by doing so. I'd been aware of the benefits and the protocol's ability to support it for some time thanks to working on PgJDBC, but couldn't get anyone interested without some C code to demonstrate it, so I wrote some. I am not going to argue that the API I added for it is ideal in any way, and happy to see improvements.

The only change I would very strongly object to would be anything that turned this into a *batch* mode without query-pipelining support. If you have to queue all the queries up in advance then send them as a batch and wait for all the results, you miss out on a lot of the potential round-trip-time optimisations and you add initial latency. So long as there is a way to "send A", "send B", "send C", "read results from A", "send D", and there's a way for the application to associate some kind of state (an application specific id or index, a pointer to an application query-queue struct, whatever)  so it can match queries to results ... then I'm happy.

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Snapbuild woes followup
Next
From: Michael Paquier
Date:
Subject: Re: [DOC] add missing "[ NO ]" to various "DEPENDS ON" synopses