Re: PATCH: Batch/pipelining support for libpq - Mailing list pgsql-hackers

From Matthieu Garrigues
Subject Re: PATCH: Batch/pipelining support for libpq
Date
Msg-id CAJkzx4S6YTKAJrasb4hSaM8tNHNrgUk-bxB4Da7sT++D9Zq7cg@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Batch/pipelining support for libpq  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
Hi David,

Thanks for the feedback. I did rework a bit the doc based on your
remarks. Here is the v24 patch.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 6:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> On 2020-Nov-02, Alvaro Herrera wrote:
>>
>> > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > missing the new values.  Added those.  No significant other changes yet.
>>
>
> Just reading the documentation of this patch, haven't been following the longer thread:
>
> Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to
howsynchronous functions are disallowed? 
>
> "Batched operations will be executed by the server in the order the client
> sends them. The server will send the results in the order the statements
> executed."
>
> Maybe:
>
> "The server executes statements, and returns results, in the order the client sends them."
>
> Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to
addunnecessary cognitive load. 
>
> +     The client <link linkend="libpq-batch-interleave">interleaves result
> +     processing</link> with sending batch queries, or for small batches may
> +     process all results after sending the whole batch.
>
> Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the
completebatch has been sent." 
>
> I would expect to process the results of a batch only after sending the entire batch to the server.  That I don't
haveto is informative but knowing when I should avoid doing so, and why, is informative as well.  To the extreme while
youcan use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing
pointless. Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems
worthconsidering.  The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or
isit just process memory on the server) if interleaving it not performed and sizes are large. 
>
> I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored
whileprevious completed transactions are retained" in the "When to Use Batching".  Something like "Batching is less
useful,and more complex, when a single batch contains multiple transactions (see Error Handling)." 
>
> My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction,
endthe batch, check for batch failure and if it doesn't fail have the option to easily continue without processing
individualpgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the
rest,knowing that the transaction as a whole was reverted and the batch unapplied).  I've never interfaced with libpq
directly. Though given how the existing C API works what is implemented here seems consistent. 
>
> The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side
behaviorwhere nothing is sent to the server until the batch has been completed.  Reading further it becomes clear that
allit basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while
priorcommands have their results waiting to be ingested by the client. 
>
> Batch seems like the user-visible term to describe this feature.  Pipeline seems like an implementation detail that
doesn'tneed to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the
firsttwo paragraphs of the chapter and never without being linked directly to "batch".  I would probably leave the
indextermand have a paragraph describing that batching is implemented using a query pipeline so that people with the
implementationdetail on their mind can find this chapter, but the prose for the user should just stick to batching. 
>
> Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some
morewords spent on what trade-offs are being made when using batching versus normal command-response processing.  That
said,while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the
documentationat the moment.  Hopefully my perspective helps though, and depending on what happens next I may try and
makemy thoughts more concrete with an actual patch. 
>
> David J.
>

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: logical streaming of xacts via test_decoding is broken
Next
From: Etsuro Fujita
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.