Re: PATCH: Batch/pipelining support for libpq - Mailing list pgsql-hackers
From | Matthieu Garrigues |
---|---|
Subject | Re: PATCH: Batch/pipelining support for libpq |
Date | |
Msg-id | CAJkzx4S6YTKAJrasb4hSaM8tNHNrgUk-bxB4Da7sT++D9Zq7cg@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: Batch/pipelining support for libpq ("David G. Johnston" <david.g.johnston@gmail.com>) |
List | pgsql-hackers |
Hi David, Thanks for the feedback. I did rework a bit the doc based on your remarks. Here is the v24 patch. Matthieu Garrigues On Tue, Nov 3, 2020 at 6:21 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2020-Nov-02, Alvaro Herrera wrote: >> >> > In v23 I've gone over docs; discovered that PQgetResults docs were >> > missing the new values. Added those. No significant other changes yet. >> > > Just reading the documentation of this patch, haven't been following the longer thread: > > Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to howsynchronous functions are disallowed? > > "Batched operations will be executed by the server in the order the client > sends them. The server will send the results in the order the statements > executed." > > Maybe: > > "The server executes statements, and returns results, in the order the client sends them." > > Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to addunnecessary cognitive load. > > + The client <link linkend="libpq-batch-interleave">interleaves result > + processing</link> with sending batch queries, or for small batches may > + process all results after sending the whole batch. > > Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the completebatch has been sent." > > I would expect to process the results of a batch only after sending the entire batch to the server. That I don't haveto is informative but knowing when I should avoid doing so, and why, is informative as well. To the extreme while youcan use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing pointless. Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems worthconsidering. The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or isit just process memory on the server) if interleaving it not performed and sizes are large. > > I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored whileprevious completed transactions are retained" in the "When to Use Batching". Something like "Batching is less useful,and more complex, when a single batch contains multiple transactions (see Error Handling)." > > My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction, endthe batch, check for batch failure and if it doesn't fail have the option to easily continue without processing individualpgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the rest,knowing that the transaction as a whole was reverted and the batch unapplied). I've never interfaced with libpq directly. Though given how the existing C API works what is implemented here seems consistent. > > The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side behaviorwhere nothing is sent to the server until the batch has been completed. Reading further it becomes clear that allit basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while priorcommands have their results waiting to be ingested by the client. > > Batch seems like the user-visible term to describe this feature. Pipeline seems like an implementation detail that doesn'tneed to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the firsttwo paragraphs of the chapter and never without being linked directly to "batch". I would probably leave the indextermand have a paragraph describing that batching is implemented using a query pipeline so that people with the implementationdetail on their mind can find this chapter, but the prose for the user should just stick to batching. > > Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some morewords spent on what trade-offs are being made when using batching versus normal command-response processing. That said,while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the documentationat the moment. Hopefully my perspective helps though, and depending on what happens next I may try and makemy thoughts more concrete with an actual patch. > > David J. >
Attachment
pgsql-hackers by date: