Thread: Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On 2017-04-05 15:45:26 -0700, Andres Freund wrote: > Hi, > > On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote: > > Regarding test patch, I have corrected the test suite after David Steele's > > comments. > > Also, I would like to mention that a companion patch was submitted by David > > Steele up-thread. > > > > Attached the latest code and test patch. > > My impression is that this'll need a couple more rounds of review. Given > that this'll establish API we'll pretty much ever going to be able to > change/remove, I think it'd be a bad idea to rush this into v10. > Therefore I propose moving this to the next CF. Craig, Vaishnavi, everyone else: Are you planning to continue to work on this for v11? I'm willing to do another round, but only if it's worthwhile. FWIW, I still think this needs a pgbench or similar example integration, so we can actually properly measure the benefits. - Andres
On Tue, Jun 20, 2017 at 8:49 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> > Regarding test patch, I have corrected the test suite after David Steele's
> > comments.
> > Also, I would like to mention that a companion patch was submitted by David
> > Steele up-thread.
> >
> > Attached the latest code and test patch.
>
> My impression is that this'll need a couple more rounds of review. Given
> that this'll establish API we'll pretty much ever going to be able to
> change/remove, I think it'd be a bad idea to rush this into v10.
> Therefore I propose moving this to the next CF.
Craig, Vaishnavi, everyone else: Are you planning to continue to work on
this for v11? I'm willing to do another round, but only if it's
worthwhile.
Yes, am willing to continue working on this patch for v11.
FWIW, I still think this needs a pgbench or similar example integration,
so we can actually properly measure the benefits.
I will investigate on this further.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
On 20 June 2017 at 06:49, Andres Freund <andres@anarazel.de> wrote: > On 2017-04-05 15:45:26 -0700, Andres Freund wrote: >> Hi, >> >> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote: >> > Regarding test patch, I have corrected the test suite after David Steele's >> > comments. >> > Also, I would like to mention that a companion patch was submitted by David >> > Steele up-thread. >> > >> > Attached the latest code and test patch. >> >> My impression is that this'll need a couple more rounds of review. Given >> that this'll establish API we'll pretty much ever going to be able to >> change/remove, I think it'd be a bad idea to rush this into v10. >> Therefore I propose moving this to the next CF. > > Craig, Vaishnavi, everyone else: Are you planning to continue to work on > this for v11? I'm willing to do another round, but only if it's > worthwhile. I'm happy to work on review, and will try to make some time, but have to focus primarily on logical rep infrastructure. This patch was a proof of concept and fun hack for me and while I'm glad folks are interested, it's not something I can dedicate much time to. Especially with a 6-week-old baby now.... > FWIW, I still think this needs a pgbench or similar example integration, > so we can actually properly measure the benefits. I agree. I originally wanted to patch psql, but it's pretty intrusive. pgbench is likely a better target. Also pg_restore. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 20, 2017 at 10:43 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Especially with a 6-week-old baby now.... Congratulations! -- Michael
Andres Freund wrote: > FWIW, I still think this needs a pgbench or similar example integration, > so we can actually properly measure the benefits. Here's an updated version of the patch I made during review, adding \beginbatch and \endbatch to pgbench. The performance improvement appears clearly with a custom script of this kind: \beginbatch UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0; ..above repeated 1000 times... \endbatch versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch On localhost on my desktop I tend to see a 30% difference in favor of the batch mode with that kind of test. On slower networks there are much bigger differences. The latest main patch (v10) must also be slightly updated for HEAD, because of this: error: patch failed: src/interfaces/libpq/exports.txt:171 v11 attached without any other change. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2017-06-20 17:51:23 +0200, Daniel Verite wrote: > Andres Freund wrote: > > > FWIW, I still think this needs a pgbench or similar example integration, > > so we can actually properly measure the benefits. > > Here's an updated version of the patch I made during review, > adding \beginbatch and \endbatch to pgbench. > The performance improvement appears clearly > with a custom script of this kind: > \beginbatch > UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0; > ..above repeated 1000 times... > \endbatch > > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch > > On localhost on my desktop I tend to see a 30% difference in favor > of the batch mode with that kind of test. > On slower networks there are much bigger differences. This is seriously impressive. Just using the normal pgbench mixed workload, wrapping a whole transaction into a batch *doubles* the throughput. And that's locally over a unix socket - the gain over actual network will be larger. \set nbranches 1 * :scale \set ntellers 10 * :scale \set naccounts 100000 * :scale \set aid random(1, :naccounts) \set bid random(1, :nbranches) \set tid random(1, :ntellers) \set delta random(-5000, 5000) \beginbatch BEGIN; UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); END; \endbatch - Andres
On 22 Jun. 2017 07:40, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:This is seriously impressive. Just using the normal pgbench mixed
> Andres Freund wrote:
>
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
>
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
> \beginbatch
> UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
> ..above repeated 1000 times...
> \endbatch
>
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
>
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.
workload, wrapping a whole transaction into a batch *doubles* the
throughput. And that's locally over a unix socket - the gain over
actual network will be larger.
In my original tests I got over a 300x improvement on WAN :) . I should check if the same applies with pgbench.
On 2017-06-21 16:40:48 -0700, Andres Freund wrote: > On 2017-06-20 17:51:23 +0200, Daniel Verite wrote: > > Andres Freund wrote: > > > > > FWIW, I still think this needs a pgbench or similar example integration, > > > so we can actually properly measure the benefits. > > > > Here's an updated version of the patch I made during review, > > adding \beginbatch and \endbatch to pgbench. > > The performance improvement appears clearly > > with a custom script of this kind: > > \beginbatch > > UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0; > > ..above repeated 1000 times... > > \endbatch > > > > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch > > > > On localhost on my desktop I tend to see a 30% difference in favor > > of the batch mode with that kind of test. > > On slower networks there are much bigger differences. > > This is seriously impressive. Just using the normal pgbench mixed > workload, wrapping a whole transaction into a batch *doubles* the > throughput. And that's locally over a unix socket - the gain over > actual network will be larger. I've not analyzed this further, but something with the way network is done isn't yet quite right either in the pgbench patch or in the libpq patch. You'll currently get IO like: sendto(3, "B\0\0\0\22\0P1_2\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 36, MSG_NOSIGNAL, NULL, 0) = 36 sendto(3, "B\0\0\0\35\0P1_4\0\0\0\0\1\0\0\0\0073705952\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47 sendto(3, "B\0\0\0\35\0P1_6\0\0\0\0\1\0\0\0\0077740854\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47 sendto(3, "B\0\0\0\35\0P1_8\0\0\0\0\1\0\0\0\0071570280\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47 sendto(3, "B\0\0\0\36\0P1_10\0\0\0\0\1\0\0\0\0072634305\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\36\0P1_12\0\0\0\0\1\0\0\0\0078960656\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\36\0P1_14\0\0\0\0\1\0\0\0\0073030370\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\35\0P1_16\0\0\0\0\1\0\0\0\006376125\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47 sendto(3, "B\0\0\0\36\0P1_18\0\0\0\0\1\0\0\0\0072982423\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\36\0P1_20\0\0\0\0\1\0\0\0\0073860195\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\36\0P1_22\0\0\0\0\1\0\0\0\0072794433\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\36\0P1_24\0\0\0\0\1\0\0\0\0075475271\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48 sendto(3, "B\0\0\0\23\0P1_25\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t"..., 37, MSG_NOSIGNAL, NULL, 0) = 37 sendto(3, "S\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5 recvfrom(3, "2\0\0\0\4n\0\0\0\4C\0\0\0\nBEGIN\0002\0\0\0\4T\0\0\0!\0"..., 16384, 0, NULL, NULL) = 775 recvfrom(3, 0x559a02667ff2, 15630, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667fb1, 15695, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667f6c, 15764, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667f2b, 15829, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667eea, 15894, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667ea9, 15959, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667e68, 16024, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667e24, 16092, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667de3, 16157, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667da2, 16222, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667d5d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667d1c, 16356, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x559a02667d06, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) I.e. we're doing tiny write send() syscalls (they should be coalesced) and then completely unnecessarily call recv() over and over again without polling. To me it looks very much like we're just doing either exactly once per command... - Andres
On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote: > I.e. we're doing tiny write send() syscalls (they should be coalesced) That's likely worth doing, but can probably wait for a separate patch. The kernel will usually do some packet aggregation unless we use TCP_NODELAY (which we don't and shouldn't), and the syscall overhead is IMO not worth worrying about just yet. > and then completely unnecessarily call recv() over and over again > without polling. To me it looks very much like we're just doing either > exactly once per command... Yeah, that looks suspect. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2017-06-22 09:03:05 +0800, Craig Ringer wrote: > On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote: > > > I.e. we're doing tiny write send() syscalls (they should be coalesced) > > That's likely worth doing, but can probably wait for a separate patch. I don't think so, we should get this right, it could have API influence. > The kernel will usually do some packet aggregation unless we use > TCP_NODELAY (which we don't and shouldn't), and the syscall overhead > is IMO not worth worrying about just yet. 1) /* * Select socket options: no delay of outgoing data for * TCP sockets,nonblock mode, close-on-exec. Fail if any * of this fails. */ if (!IS_AF_UNIX(addr_cur->ai_family)) { if (!connectNoDelay(conn)) { pqDropConnection(conn, true); conn->addr_cur = addr_cur->ai_next; continue; } } 2) Even if nodelay weren't set, this can still lead to smaller packets being sent, because you start sending normal sizedtcp packets, rather than jumbo ones, even if configured (pretty common these days). 3) Syscall overhead is actually quite significant. - Andres
On 22 June 2017 at 09:07, Andres Freund <andres@anarazel.de> wrote: > On 2017-06-22 09:03:05 +0800, Craig Ringer wrote: >> On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote: >> >> > I.e. we're doing tiny write send() syscalls (they should be coalesced) >> >> That's likely worth doing, but can probably wait for a separate patch. > > I don't think so, we should get this right, it could have API influence. > > >> The kernel will usually do some packet aggregation unless we use >> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead >> is IMO not worth worrying about just yet. > > 1) > /* > * Select socket options: no delay of outgoing data for > * TCP sockets, nonblock mode, close-on-exec. Fail if any > * of this fails. > */ > if (!IS_AF_UNIX(addr_cur->ai_family)) > { > if (!connectNoDelay(conn)) > { > pqDropConnection(conn, true); > conn->addr_cur = addr_cur->ai_next; > continue; > } > } > > 2) Even if nodelay weren't set, this can still lead to smaller packets > being sent, because you start sending normal sized tcp packets, > rather than jumbo ones, even if configured (pretty common these > days). > > 3) Syscall overhead is actually quite significant. Fair enough, and *headdesk* re not checking NODELAY. I thought I'd checked for our use of that before, but I must've remembered wrong. We could use TCP_CORK but it's not portable and it'd be better to just collect up a buffer to dispatch. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2017-06-21 18:07:21 -0700, Andres Freund wrote: > On 2017-06-22 09:03:05 +0800, Craig Ringer wrote: > > On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote: > > > > > I.e. we're doing tiny write send() syscalls (they should be coalesced) > > > > That's likely worth doing, but can probably wait for a separate patch. > > I don't think so, we should get this right, it could have API influence. > > > > The kernel will usually do some packet aggregation unless we use > > TCP_NODELAY (which we don't and shouldn't), and the syscall overhead > > is IMO not worth worrying about just yet. > > 1) > /* > * Select socket options: no delay of outgoing data for > * TCP sockets, nonblock mode, close-on-exec. Fail if any > * of this fails. > */ > if (!IS_AF_UNIX(addr_cur->ai_family)) > { > if (!connectNoDelay(conn)) > { > pqDropConnection(conn, true); > conn->addr_cur = addr_cur->ai_next; > continue; > } > } > > 2) Even if nodelay weren't set, this can still lead to smaller packets > being sent, because you start sending normal sized tcp packets, > rather than jumbo ones, even if configured (pretty common these > days). > > 3) Syscall overhead is actually quite significant. Proof of the pudding: pgbench of 10 pgbench select statements in a batch: as submitted by Daniel: pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 10000 -P 1 -f ~/tmp/pgbench-select-only-batch.sq progress: 1.0 s, 24175.5 tps, lat 0.647 ms stddev 0.782 progress: 2.0 s, 27737.6 tps, lat 0.577 ms stddev 0.625 progress: 3.0 s, 28853.3 tps, lat 0.554 ms stddev 0.619 progress: 4.0 s, 26660.8 tps, lat 0.600 ms stddev 0.776 progress: 5.0 s, 30023.8 tps, lat 0.533 ms stddev 0.484 progress: 6.0 s, 29959.3 tps, lat 0.534 ms stddev 0.450 progress: 7.0 s, 29944.9 tps, lat 0.534 ms stddev 0.536 progress: 8.0 s, 30137.7 tps, lat 0.531 ms stddev 0.533 progress: 9.0 s, 30285.2 tps, lat 0.528 ms stddev 0.479 progress: 10.0 s, 30228.7 tps, lat 0.529 ms stddev 0.460 progress: 11.0 s, 29921.4 tps, lat 0.534 ms stddev 0.613 progress: 12.0 s, 29982.4 tps, lat 0.533 ms stddev 0.510 progress: 13.0 s, 29247.4 tps, lat 0.547 ms stddev 0.526 progress: 14.0 s, 28757.3 tps, lat 0.556 ms stddev 0.635 progress: 15.0 s, 29035.3 tps, lat 0.551 ms stddev 0.523 ^C sample vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa st 19 0 0 488992 787332 23558676 0 0 0 0 9720 455099 65 35 0 0 0 (i.e. ~450k context switches) hackily patched: pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 10000 -P 1 -f ~/tmp/pgbench-select-only-batch.sq progress: 1.0 s, 40545.2 tps, lat 0.386 ms stddev 0.625 progress: 2.0 s, 48158.0 tps, lat 0.332 ms stddev 0.277 progress: 3.0 s, 50125.7 tps, lat 0.319 ms stddev 0.204 progress: 4.0 s, 50740.6 tps, lat 0.315 ms stddev 0.250 progress: 5.0 s, 50795.6 tps, lat 0.315 ms stddev 0.246 progress: 6.0 s, 51195.6 tps, lat 0.312 ms stddev 0.207 progress: 7.0 s, 50746.7 tps, lat 0.315 ms stddev 0.264 progress: 8.0 s, 50619.1 tps, lat 0.316 ms stddev 0.250 progress: 9.0 s, 50619.4 tps, lat 0.316 ms stddev 0.228 progress: 10.0 s, 46967.8 tps, lat 0.340 ms stddev 0.499 progress: 11.0 s, 50480.1 tps, lat 0.317 ms stddev 0.239 progress: 12.0 s, 50242.5 tps, lat 0.318 ms stddev 0.286 progress: 13.0 s, 49912.7 tps, lat 0.320 ms stddev 0.266 progress: 14.0 s, 49841.7 tps, lat 0.321 ms stddev 0.271 progress: 15.0 s, 49807.1 tps, lat 0.321 ms stddev 0.248 ^C sample vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa st 23 0 0 482008 787312 23558996 0 0 0 0 8219 105097 87 14 0 0 0 (i.e. ~100k context switches) That's *localhost*. It's completely possible that I've screwed something up here, I didn't test it besides running pgbench, but the send/recv'd data looks like it's similar amounts of data, just fewer syscalls. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Craig Ringer wrote: > The kernel will usually do some packet aggregation unless we use > TCP_NODELAY (which we don't and shouldn't) Not sure. As a point of comparison, Oracle has it as a tunable parameter (TCP.NODELAY), and they changed its default from No to Yes starting from their 10g R2. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Andres Freund wrote: - if (pqFlush(conn) < 0) - goto sendFailed; + if (conn->batch_status == PQBATCH_MODE_OFF) + { + /* + * Give the data a push. In nonblock mode, don't complain if we're unable + * to send it all; PQgetResult() will do any additional flushing needed. + */ + if (pqFlush(conn) < 0) + goto sendFailed; + } Seems to be responsible for roughly an 1.7x speedup in tps and equivalent decrease in latency, based on the "progress" info. I wonder how much of that corresponds to a decrease in the number of packets versus the number of syscalls. Both matter, I guess. But OTOH there are certainly batch workloads where it will be preferrable for the first query to reach the server ASAP, rather than waiting to be coalesced with the next ones. libpq is not going to know what's best. One option may be to leave that decision to the user by providing a PQBatchAutoFlush(true|false) property, along with a PQBatchFlush() function. Maybe we could even let the user set the size of the sending buffer, so those who really want to squeeze performance may tune it for their network and workload. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Hi, On 2017-06-22 13:43:35 +0200, Daniel Verite wrote: > But OTOH there are certainly batch workloads where it will be preferrable > for the first query to reach the server ASAP, rather than waiting to be > coalesced with the next ones. Is that really something people expect from a batch API? I suspect it's not really, and nothing would stop one from adding PQflush() or similar calls if desirable anyway. FWIW, the way I did that in the hack clearly isn't ok: If you were to send a gigabyte of queries, it'd buffer them all up in memory... So some more intelligence is going to be needed. > libpq is not going to know what's best. > One option may be to leave that decision to the user by providing a > PQBatchAutoFlush(true|false) property, along with a PQBatchFlush() > function. What'd be the difference between PQflush() and PQbatchFlush()? Greetings, Andres Freund
Andres Freund wrote: > > One option may be to leave that decision to the user by providing a > > PQBatchAutoFlush(true|false) property, along with a PQBatchFlush() > > function. > > What'd be the difference between PQflush() and PQbatchFlush()? I guess no difference, I was just not seeing that libpq already provides this functionality... Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
> If you were to send a gigabyte of queries, it'd buffer them all up in memory... So some
>more intelligence is going to be needed.
>more intelligence is going to be needed.
Firstly, sorry for the delayed response as I got busy with other activities.
To buffer up the queries before flushing them to the socket, I think "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in Windows as per comments in pqSendSome() API.
Attached the code patch to replace pqFlush calls with pqBatchFlush in the asynchronous libpq function calls flow.
Still pqFlush is used in "PQbatchSyncQueue" and "PQexitBatchMode" functions.
Am failing to see the benefit in allowing user to set PQBatchAutoFlush(true|false) property? Is it really needed?
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment
Hi, On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote: > Andres Freund wrote : > > If you were to send a gigabyte of queries, it'd buffer them all up in > memory... So some > >more intelligence is going to be needed. > > Firstly, sorry for the delayed response as I got busy with other > activities. No worries - development of new features was slowed down anyway, due to the v10 feature freeze. > To buffer up the queries before flushing them to the socket, I think > "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in > Windows as per comments in pqSendSome() API. > > Attached the code patch to replace pqFlush calls with pqBatchFlush in the > asynchronous libpq function calls flow. > Still pqFlush is used in "PQbatchSyncQueue" and > "PQexitBatchMode" functions. > Am failing to see the benefit in allowing user to set > PQBatchAutoFlush(true|false) property? Is it really needed? I'm inclined not to introduce that for now. If somebody comes up with a convincing usecase and numbers, we can add it later. Libpq API is set in stone, so I'd rather not introduce unnecessary stuff... > + <para> > + Much like asynchronous query mode, there is no performance disadvantage to > + using batching and pipelining. It increases client application complexity > + and extra caution is required to prevent client/server deadlocks but > + can sometimes offer considerable performance improvements. > + </para> That's not necessarily true, is it? Unless you count always doing batches of exactly size 1. > + <para> > + Batching is most useful when the server is distant, i.e. network latency > + (<quote>ping time</quote>) is high, and when many small operations are being performed in > + rapid sequence. There is usually less benefit in using batches when each > + query takes many multiples of the client/server round-trip time to execute. > + A 100-statement operation run on a server 300ms round-trip-time away would take > + 30 seconds in network latency alone without batching; with batching it may spend > + as little as 0.3s waiting for results from the server. > + </para> I'd add a remark that this is frequently beneficial even in cases of minimal latency - as e.g. shown by the numbers I presented upthread. > + <para> > + Use batches when your application does lots of small > + <literal>INSERT</literal>, <literal>UPDATE</literal> and > + <literal>DELETE</literal> operations that can't easily be transformed into > + operations on sets or into a > + <link linkend="libpq-copy"><literal>COPY</literal></link> operation. > + </para> Aren't SELECTs also a major beneficiarry of this? > + <para> > + Batching is less useful when information from one operation is required by the > + client before it knows enough to send the next operation. s/less/not/ > + <note> > + <para> > + The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can > + use batches on server versions 8.4 and newer. Batching works on any server > + that supports the v3 extended query protocol. > + </para> > + </note> Where's the 8.4 coming from? > + <para> > + The client uses libpq's asynchronous query functions to dispatch work, > + marking the end of each batch with <function>PQbatchSyncQueue</function>. > + And to get results, it uses <function>PQgetResult</function> and > + <function>PQbatchProcessQueue</function>. It may eventually exit > + batch mode with <function>PQexitBatchMode</function> once all results are > + processed. > + </para> > + > + <note> > + <para> > + It is best to use batch mode with <application>libpq</application> in > + <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in > + blocking mode it is possible for a client/server deadlock to occur. The > + client will block trying to send queries to the server, but the server will > + block trying to send results from queries it has already processed to the > + client. This only occurs when the client sends enough queries to fill its > + output buffer and the server's receive buffer before switching to > + processing input from the server, but it's hard to predict exactly when > + that'll happen so it's best to always use non-blocking mode. > + </para> > + </note> Mention that nonblocking only actually helps if send/recv is done as required, and can essentially require unbound memory? We probably should either document or implement some smarts about when to signal read/write readyness. Otherwise we e.g. might be receiving tons of result data without having sent the next query - or the other way round. > + <sect3 id="libpq-batch-sending"> > + <title>Issuing queries</title> > + > + <para> > + After entering batch mode the application dispatches requests > + using normal asynchronous <application>libpq</application> functions such as > + <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>, > + <function>PQsendQueryPrepared</function>, <function>PQsendDescribePortal</function>, > + <function>PQsendDescribePrepared</function>. > + The asynchronous requests are followed by a <link > + linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link> call to mark > + the end of the batch. The client <emphasis>does not</emphasis> need to call > + <function>PQgetResult</function> immediately after dispatching each > + operation. <link linkend="libpq-batch-results">Result processing</link> > + is handled separately. > + </para> > + > + <para> > + Batched operations will be executed by the server in the order the client > + sends them. The server will send the results in the order the statements > + executed. The server may begin executing the batch before all commands > + in the batch are queued and the end of batch command is sent. If any > + statement encounters an error the server aborts the current transaction and > + skips processing the rest of the batch. Query processing resumes after the > + end of the failed batch. > + </para> Maybe note that multiple batches can be "in flight"? I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't have a great idea, but we might want to rename... > + <note> > + <para> > + The client must not assume that work is committed when it > + <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the > + corresponding result is received to confirm the commit is complete. > + Because errors arrive asynchronously the application needs to be able to > + restart from the last <emphasis>received</emphasis> committed change and > + resend work done after that point if something goes wrong. > + </para> > + </note> This seems fairly independent of batching. > + </sect3> > + > + <sect3 id="libpq-batch-interleave"> > + <title>Interleaving result processing and query dispatch</title> > + > + <para> > + To avoid deadlocks on large batches the client should be structured around > + a nonblocking I/O loop using a function like <function>select</function>, > + <function>poll</function>, <function>epoll</function>, > + <function>WaitForMultipleObjectEx</function>, etc. > + </para> > + > + <para> > + The client application should generally maintain a queue of work still to > + be dispatched and a queue of work that has been dispatched but not yet had > + its results processed. Hm. Why? If queries are just issued, no such queue is required? > When the socket is writable it should dispatch more > + work. When the socket is readable it should read results and process them, > + matching them up to the next entry in its expected results queue. Batches > + should be scoped to logical units of work, usually (but not always) one > + transaction per batch. There's no need to exit batch mode and re-enter it > + between batches or to wait for one batch to finish before sending the next. > + </para> This really needs to take memory usage into account. > + </variablelist> > + > + </listitem> > + </varlistentry> > + > + <varlistentry id="libpq-PQenterBatchMode"> > + <term> > + <function>PQenterBatchMode</function> > + <indexterm> > + <primary>PQenterBatchMode</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Causes a connection to enter batch mode if it is currently idle or > + already in batch mode. > + > +<synopsis> > +int PQenterBatchMode(PGconn *conn); > +</synopsis> > + > + </para> > + <para> > + Returns 1 for success. Returns 0 and has no > + effect if the connection is not currently idle, i.e. it has a result > + ready, is waiting for more input from the server, etc. This function > + does not actually send anything to the server, it just changes the > + <application>libpq</application> connection state. > + > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry id="libpq-PQexitBatchMode"> > + <term> > + <function>PQexitBatchMode</function> > + <indexterm> > + <primary>PQexitBatchMode</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Causes a connection to exit batch mode if it is currently in batch mode > + with an empty queue and no pending results. > +<synopsis> > +int PQexitBatchMode(PGconn *conn); > +</synopsis> > + </para> > + <para>Returns 1 for success. > + Returns 1 and takes no action if not in batch mode. If the connection has "returns 1"? > + <varlistentry id="libpq-PQbatchSyncQueue"> > + <term> > + <function>PQbatchSyncQueue</function> > + <indexterm> > + <primary>PQbatchSyncQueue</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Delimits the end of a set of a batched commands by sending a <link > + linkend="protocol-flow-ext-query">sync message</link> and flushing > + the send buffer. The end of a batch serves as > + the delimiter of an implicit transaction and > + an error recovery point; see <link linkend="libpq-batch-errors"> > + error handling</link>. I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems to mostly make sense from a protocol POV. > + <varlistentry id="libpq-PQbatchQueueCount"> > + <term> > + <function>PQbatchQueueCount</function> > + <indexterm> > + <primary>PQbatchQueueCount</primary> > + </indexterm> > + </term> > + > + <listitem> > + <para> > + Returns the number of queries still in the queue for this batch, not > + including any query that's currently having results being processed. > + This is the number of times <function>PQbatchProcessQueue</function> has to be > + called before the query queue is empty again. > + > +<synopsis> > +int PQbatchQueueCount(PGconn *conn); > +</synopsis> > + > + </para> > + </listitem> > + </varlistentry> Given that apps are supposed to track this, I'm not sure why we have this? > +/* > + * PQrecyclePipelinedCommand > + * Push a command queue entry onto the freelist. It must be a dangling entry > + * with null next pointer and not referenced by any other entry's next pointer. > + */ > +static void > +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry) > +{ > + if (entry == NULL) > + return; > + if (entry->next != NULL) > + { > + fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry")); > + abort(); Don't think we use abort() in libpq like that. There's some Assert()s tho. > static bool > PQsendQueryStart(PGconn *conn) > @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn) > libpq_gettext("no connection to the server\n")); > return false; > } > - /* Can't send while already busy, either. */ > - if (conn->asyncStatus != PGASYNC_IDLE) > + /* Can't send while already busy, either, unless enqueuing for later */ > + if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == PQBATCH_MODE_OFF) > { > printfPQExpBuffer(&conn->errorMessage, > libpq_gettext("another command is already in progress\n")); > return false; > } > > - /* initialize async result-accumulation state */ > - pqClearAsyncResult(conn); > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + /* Weirdly indented. > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + pipeCmd = PQmakePipelinedCommand(conn); > + > + if (pipeCmd == NULL) > + return 0; /* error msg already set */ > + > + last_query = &pipeCmd->query; > + queryclass = &pipeCmd->queryclass; > + } > + else > + { > + last_query = &conn->last_query; > + queryclass = &conn->queryclass; > + } The amount of complexity / branches we're adding to all of these is more than a bit unsightly. > +/* > + * PQbatchQueueCount > + * Return number of queries currently pending in batch mode > + */ > +int > +PQbatchQueueCount(PGconn *conn) > +{ > + int count = 0; > + PGcommandQueueEntry *entry; > + > + if (PQbatchStatus(conn) == PQBATCH_MODE_OFF) > + return 0; > + > + entry = conn->cmd_queue_head; > + while (entry != NULL) > + { > + ++count; > + entry = entry->next; > + } > + return count; > +} Ugh, O(N)? In that case I'd rather just remove this. > +/* > + * PQbatchBegin Mismatched w/ actual function name. > + * Put an idle connection in batch mode. Commands submitted after this > + * can be pipelined on the connection, there's no requirement to wait for > + * one to finish before the next is dispatched. > + * > + * Queuing of new query or syncing during COPY is not allowed. +"a"? > + * A set of commands is terminated by a PQbatchQueueSync. Multiple sets of batched > + * commands may be sent while in batch mode. Batch mode can be exited by > + * calling PQbatchEnd() once all results are processed. > + * > + * This doesn't actually send anything on the wire, it just puts libpq > + * into a state where it can pipeline work. > + */ > +int > +PQenterBatchMode(PGconn *conn) > +{ > + if (!conn) > + return false; true/false isn't quite in line with int return code. > +/* > + * PQbatchEnd wrong name. > + * End batch mode and return to normal command mode. > + * > + * Has no effect unless the client has processed all results > + * from all outstanding batches and the connection is idle. That seems wrong - will lead to hard to diagnose errors. > + * Returns true if batch mode ended. > + */ > +int > +PQexitBatchMode(PGconn *conn) > +{ > + if (!conn) > + return false; > + > + if (conn->batch_status == PQBATCH_MODE_OFF) > + return true; > + > + switch (conn->asyncStatus) > + { > + case PGASYNC_IDLE: > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext_noop("internal error, IDLE in batch mode")); > + break; > + case PGASYNC_COPY_IN: > + case PGASYNC_COPY_OUT: > + case PGASYNC_COPY_BOTH: > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext_noop("internal error, COPY in batch mode")); > + break; So we'll still check the queue in this case, that's a bit weird? > +/* > + * PQbatchQueueSync Out of sync. > + * End a batch submission by sending a protocol sync. The connection will > + * remain in batch mode and unavailable for new non-batch commands until all > + * results from the batch are processed by the client. "unavailable for new non-batch commands" - that's hard to follow, and seems pretty redundant with PQendBatchMode (or however it's called). > + * It's legal to start submitting another batch immediately, without waiting > + * for the results of the current batch. There's no need to end batch mode > + * and start it again. > + * > + * If a command in a batch fails, every subsequent command up to and including > + * the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the > + * whole batch is processed without error, a PGresult with PGRES_BATCH_END is > + * produced. Hm, should probably mention that that's only true for commands since the last PQbatchQueueSync? > +/* > + * PQbatchQueueProcess Out of sync. > + * Returns false if the current query isn't done yet, the connection > + * is not in a batch, or there are no more queries to process. Last complaint about this - think this forgiving mode is a mistake. > + */ > +int > +PQbatchProcessQueue(PGconn *conn) > +{ > + /* This command's results will come in immediately. > + * Initialize async result-accumulation state */ > + pqClearAsyncResult(conn); I'm not following? > /* > * PQgetResult > @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn) > + if (conn->batch_status != PQBATCH_MODE_OFF) > + { > + /* > + * batched queries aren't followed by a Sync to put us back in > + * PGASYNC_IDLE state, and when we do get a sync we could > + * still have another batch coming after this one. This needs rephrasing. > + * The connection isn't idle since we can't submit new > + * nonbatched commands. It isn't also busy since the current > + * command is done and we need to process a new one. > + */ > + conn->asyncStatus = PGASYNC_QUEUED; Not sure I like the name. > + if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("Synchronous command execution functions are not allowed in batch mode\n")); > + return false; > + } Why do we need the PGASYNC_QUEUED test here? > +/* pqBatchFlush > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value. > + * In non-batch mode, data will be flushed all the time. > + */ > +static int > +pqBatchFlush(PGconn *conn) > +{ > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536)) > + return(pqFlush(conn)); > + return 0; /* Just to keep compiler quiet */ > +} This should be defined in a macro or such, rather than hardcoded. Falling over now. This seems like enough feedback for a bit of work anyway. Regards, Andres
On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres@anarazel.de> wrote:
> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?
I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...
Thanks for reviewing the patch and yes ok.
> + <para>
> + Much like asynchronous query mode, there is no performance disadvantage to
> + using batching and pipelining. It increases client application complexity
> + and extra caution is required to prevent client/server deadlocks but
> + can sometimes offer considerable performance improvements.
> + </para>
That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.
Client application complexity is increased in batch mode,because application needs to remember the query queue status. Results processing can be done at anytime, so the application needs to know till what query, the results are consumed.
> + <para>
> + Use batches when your application does lots of small
> + <literal>INSERT</literal>, <literal>UPDATE</literal> and
> + <literal>DELETE</literal> operations that can't easily be transformed into
> + operations on sets or into a
> + <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> + </para>
Aren't SELECTs also a major beneficiarry of this?
Hmm, though SELECTs also benefit from batch mode, doing multiple selects in batch mode will fill up the memory rapidly and might not be as beneficial as other operations listed.
> + <para>
> + Batching is less useful when information from one operation is required by the
> + client before it knows enough to send the next operation.
s/less/not/
Corrected.
> + <note>
> + <para>
> + The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
> + use batches on server versions 8.4 and newer. Batching works on any server
> + that supports the v3 extended query protocol.
> + </para>
> + </note>
Where's the 8.4 coming from?
I guess it is 7.4 where "PQsendQueryParams" is introduced, and not 8.4. Corrected.
> + <note>
> + <para>
> + It is best to use batch mode with <application>libpq</application> in
> + <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> + blocking mode it is possible for a client/server deadlock to occur. The
> + client will block trying to send queries to the server, but the server will
> + block trying to send results from queries it has already processed to the
> + client. This only occurs when the client sends enough queries to fill its
> + output buffer and the server's receive buffer before switching to
> + processing input from the server, but it's hard to predict exactly when
> + that'll happen so it's best to always use non-blocking mode.
> + </para>
> + </note>
Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory? We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.
Added a statement for caution in documentation and again this is one of the reason why SELECT query is not so beneficial in batch mode.
Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...
This function not only does error handling, but also sends the "Sync" message to backend. In batch mode, "Sync" message is not sent with every query but will
be sent only via this function to mark the end of implicit transaction. Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any other better name.
> + <note>
> + <para>
> + The client must not assume that work is committed when it
> + <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> + corresponding result is received to confirm the commit is complete.
> + Because errors arrive asynchronously the application needs to be able to
> + restart from the last <emphasis>received</emphasis> committed change and
> + resend work done after that point if something goes wrong.
> + </para>
> + </note>
This seems fairly independent of batching.
Yes and the reason why is it explicitly specified for batch mode is that if more than one explicit transactions are used in Single batch, then failure of one transaction will lead to skipping the consequent transactions until the end of current batch is reached. This behavior is specific to batch mode, so adding a precautionary note here is needed I think.
> + </sect3>
> +
> + <sect3 id="libpq-batch-interleave">
> + <title>Interleaving result processing and query dispatch</title>
> +
> + <para>
> + To avoid deadlocks on large batches the client should be structured around
> + a nonblocking I/O loop using a function like <function>select</function>,
> + <function>poll</function>, <function>epoll</function>,
> + <function>WaitForMultipleObjectEx</ function>, etc.
> + </para>
> +
> + <para>
> + The client application should generally maintain a queue of work still to
> + be dispatched and a queue of work that has been dispatched but not yet had
> + its results processed.
Hm. Why? If queries are just issued, no such queue is required?
This is essentially telling that client application should remember the order of queries sent , so that result processing will be based on what results are expected. For e.g, if the order of queries are 1)SELECT, 2)INSERT, then in result processing, "PGRES_TUPLES_OK" and later "PGRES_COMMAND_OK" checks are required. Above mentioned queue of work means the intelligent to remember the queue of queries.
> When the socket is writable it should dispatch more
> + work. When the socket is readable it should read results and process them,
> + matching them up to the next entry in its expected results queue. Batches
> + should be scoped to logical units of work, usually (but not always) one
> + transaction per batch. There's no need to exit batch mode and re-enter it
> + between batches or to wait for one batch to finish before sending the next.
> + </para>
This really needs to take memory usage into account.
Modified the para to include that frequency of result processing should be based on available memory.
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> + </para>
> + <para>Returns 1 for success.
> + Returns 1 and takes no action if not in batch mode. If the connection has
"returns 1"?
Modified the code to return 1/0.
> + <varlistentry id="libpq-PQbatchSyncQueue">
> + <term>
> + <function>PQbatchSyncQueue</function>
> + <indexterm>
> + <primary>PQbatchSyncQueue</primary>
> + </indexterm>
> + </term>
I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.
Renamed to PQbatchCommitQueue.
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +
Given that apps are supposed to track this, I'm not sure why we have
this?
Removed this function considering your another comment about O(N) as well.
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> + if (entry == NULL)
> + return;
> + if (entry->next != NULL)
> + {
> + fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry"));
> + abort();
Don't think we use abort() in libpq like that. There's some Assert()s
tho.
For out-of-memory cases, we do abort(), and here abort will happen only during some memory corruption. So, I think it is ok to abort here. Please let me know if you think otherwise.
> static bool
> PQsendQueryStart(PGconn *conn)
.....
Weirdly indented.
Corrected.
> +/*
> + * PQbatchBegin
Mismatched w/ actual function name.
Corrected.
> + * Put an idle connection in batch mode. Commands submitted after this
> + * can be pipelined on the connection, there's no requirement to wait for
> + * one to finish before the next is dispatched.
> + *
> + * Queuing of new query or syncing during COPY is not allowed.
+"a"?
Hmm, Can you explain the question please. I don't understand.
> +/*
> + * PQbatchEnd
wrong name.
Corrected.
> + * End batch mode and return to normal command mode.
> + *
> + * Has no effect unless the client has processed all results
> + * from all outstanding batches and the connection is idle.
That seems wrong - will lead to hard to diagnose errors.
I have now added a error message when the batch end is failed to ease the error diagnose. And, this behavior is documented. So the application should not assume that the batch mode ended without checking the return value.
> + * Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> + if (!conn)
> + return false;
> +
> + if (conn->batch_status == PQBATCH_MODE_OFF)
> + return true;
> +
> + switch (conn->asyncStatus)
> + {
> + case PGASYNC_IDLE:
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext_noop("internal error, IDLE in batch mode"));
> + break;
> + case PGASYNC_COPY_IN:
> + case PGASYNC_COPY_OUT:
> + case PGASYNC_COPY_BOTH:
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext_noop("internal error, COPY in batch mode"));
> + break;
So we'll still check the queue in this case, that's a bit weird?
Removed above cases as adding error here is not of much help as well.
> +/*
> + * PQbatchQueueSync
Out of sync.
Corrected.
> + * End a batch submission by sending a protocol sync. The connection will
> + * remain in batch mode and unavailable for new non-batch commands until all
> + * results from the batch are processed by the client.
"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).
Modified documentation to be more precise and difference between PQexitBatchMode and this one is that to exit batch, application should have consumed all the results pending in buffer and post exit batch, an application can call synchronous command execution functions. Whereas with batch sync/commit, application cannot issue synchronous commands but can issue asynchronous commands without reading all the results pending in buffer. This function basically marks the end of implicit transaction.
> + * It's legal to start submitting another batch immediately, without waiting
> + * for the results of the current batch. There's no need to end batch mode
> + * and start it again.
> + *
> + * If a command in a batch fails, every subsequent command up to and including
> + * the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the
> + * whole batch is processed without error, a PGresult with PGRES_BATCH_END is
> + * produced.
Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?
"Batch" means the set of commands between PQbatchCommitQueue(newly renamed name) calls except first batch which count from beginning to PQbatchCommitQueue. And in documentation it is mentioned that "The result for <function>PQbatchSyncQueue</function> is reported as
<literal>PGRES_BATCH_END</literal> to signal the end of the aborted batch and resumption of normal result processing."
> +/*
> + * PQbatchQueueProcess
Out of sync.
Corrected .
> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.
Last complaint about this - think this forgiving mode is a mistake.
Now added an error message using printfPQExpBuffer(). I think aborting here is not really needed. Please let me know if you prefer to abort() rather than this solution.
> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{
> + /* This command's results will come in immediately.
> + * Initialize async result-accumulation state */
> + pqClearAsyncResult(conn);
I'm not following?
First line of comment is removed. It is copy/paste error.
> /*
> * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)
> + if (conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + /*
> + * batched queries aren't followed by a Sync to put us back in
> + * PGASYNC_IDLE state, and when we do get a sync we could
> + * still have another batch coming after this one.
This needs rephrasing.
Rephrased as "In batch mode, query execution state cannot be IDLE as there can be other queries or results waiting in the queue ..." . Hope this is simple and enough.
> + * The connection isn't idle since we can't submit new
> + * nonbatched commands. It isn't also busy since the current
> + * command is done and we need to process a new one.
> + */
> + conn->asyncStatus = PGASYNC_QUEUED;
Not sure I like the name.
Changed the name to PGASYNC_BATCH to make it more align to batch mode as this status will not be set anywhere in non-batch mode.
> + if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF)
> + {
> + printfPQExpBuffer(&conn->errorMessage,
> + libpq_gettext("Synchronous command execution functions are not allowed in batch mode\n"));
> + return false;
> + }
Why do we need the PGASYNC_QUEUED test here?
Yes, we don't need this check here. It was removed in PQfn and similarly it is not needed here and in pqParseInput2 too.
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> + return(pqFlush(conn));
> + return 0; /* Just to keep compiler quiet */
> +}
This should be defined in a macro or such, rather than hardcoded.
Added macro "OUTBUFFER_THRESHOLD"
Falling over now. This seems like enough feedback for a bit of work
anyway.
Thanks once again for reviewing the patch. Attached the patch with your review comments incorporation.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment
On 13 September 2017 at 13:06, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres@anarazel.de> wrote:
> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?
I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...Thanks for reviewing the patch and yes ok.
> + <para>
> + Much like asynchronous query mode, there is no performance disadvantage to
> + using batching and pipelining. It increases client application complexity
> + and extra caution is required to prevent client/server deadlocks but
> + can sometimes offer considerable performance improvements.
> + </para>
That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.Client application complexity is increased in batch mode,because application needs to remember the query queue status. Results processing can be done at anytime, so the application needs to know till what query, the results are consumed.
Yep. Also, the client/server deadlocks at issue here are a buffer management issue, and deadlock is probably not exactly the right word. Your app has to process replies from the server while it's sending queries, otherwise it can get into a state where it has no room left in its send buffer, but the server isn't consuming its receive buffer because the server's send buffer is full. To allow the system to make progress, the client must read from the client receive buffer.
This isn't an issue when using libpq normally.
PgJDBC has similar issues with its batch mode, but in PgJDBC it's much worse because there's no non-blocking send available. In libpq you can at least set your sending socket to non-blocking.
> + <para>
> + Use batches when your application does lots of small
> + <literal>INSERT</literal>, <literal>UPDATE</literal> and
> + <literal>DELETE</literal> operations that can't easily be transformed into
> + operations on sets or into a
> + <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> + </para>
Aren't SELECTs also a major beneficiarry of this?
Yes, many individual SELECTs that cannot be assembled into a single more efficient query would definitely also benefit.
Hmm, though SELECTs also benefit from batch mode, doing multiple selects in batch mode will fill up the memory rapidly and might not be as beneficial as other operations listed.
Depends on the SELECT. With wide results you'll get less benefit, but even then you can gain if you're on a high latency network. With "n+1" patterns and similar, you'll see huge gains.
Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...This function not only does error handling, but also sends the "Sync" message to backend. In batch mode, "Sync" message is not sent with every query but willbe sent only via this function to mark the end of implicit transaction. Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any other better name.
I really do not like calling it "commit" as that conflates with a database commit.
A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.
> + <varlistentry id="libpq-PQbatchSyncQueue">
> + <term>
> + <function>PQbatchSyncQueue</function>
> + <indexterm>
> + <primary>PQbatchSyncQueue</primary>
> + </indexterm>
> + </term>
I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.Renamed to PQbatchCommitQueue.
Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?
> + * Put an idle connection in batch mode. Commands submitted after this
> + * can be pipelined on the connection, there's no requirement to wait for
> + * one to finish before the next is dispatched.
> + *
> + * Queuing of new query or syncing during COPY is not allowed.
+"a"?Hmm, Can you explain the question please. I don't understand.
s/of new query/of a new query/
On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
I really do not like calling it "commit" as that conflates with a database commit.A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.
Ok, SendQueue seems ok to me as well. Will change it in next version.
+"a"?Hmm, Can you explain the question please. I don't understand.s/of new query/of a new query/
Thanks for explaining. Will change this too in next version.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
On 13 September 2017 at 13:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
Thanks for explaining. Will change this too in next version.
Thankyou, a lot, for picking up this patch.
> On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote: > > On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com <mailto:craig@2ndquadrant.com>> wrote: > > I really do not like calling it "commit" as that conflates with a database commit. > > A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit,then a later part to fail, if that's the case. So that name is IMO wrong. > > Ok, SendQueue seems ok to me as well. Will change it in next version. > > +"a"? > > Hmm, Can you explain the question please. I don't understand. > > s/of new query/of a new query/ > > Thanks for explaining. Will change this too in next version. Based on the discussions in this thread, and that a new version hasn’t been submitted, I’m marking this Returned with Feedback. Please re-submit the new version in an upcoming commitfest when ready. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 2, 2017 at 8:31 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
>
> On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com <mailto:craig@2ndquadrant.com>> wrote:
>
> I really do not like calling it "commit" as that conflates with a database commit.
>
> A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.
>
> Ok, SendQueue seems ok to me as well. Will change it in next version.
>
> +"a"?
>
> Hmm, Can you explain the question please. I don't understand.
>
> s/of new query/of a new query/
>
> Thanks for explaining. Will change this too in next version.
Based on the discussions in this thread, and that a new version hasn’t been
submitted, I’m marking this Returned with Feedback. Please re-submit the new
version in an upcoming commitfest when ready.
Thanks for the suggestion and, OK I will create a new patch in upcoming commitfest with attached patch addressing above review comments.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment
On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote: > Thanks for the suggestion and, OK I will create a new patch in upcoming > commitfest with attached patch addressing above review comments. The patch still applies and there has been no updates for the last month, as well as no reviews. I am bumping it to next CF. -- Michael
On Tue, Nov 28, 2017 at 12:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks for the suggestion and, OK I will create a new patch in upcoming
> commitfest with attached patch addressing above review comments.
The patch still applies and there has been no updates for the last
month, as well as no reviews. I am bumping it to next CF.
Thank you, I see the patch generates a compilation error due to usage of "FALSE" with latest postgres code, Hence attaching the patch with correction.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment
On Fri, Jan 5, 2018 at 4:55 PM, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
On Tue, Nov 28, 2017 at 12:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks for the suggestion and, OK I will create a new patch in upcoming
> commitfest with attached patch addressing above review comments.
The patch still applies and there has been no updates for the last
month, as well as no reviews. I am bumping it to next CF.Thank you, I see the patch generates a compilation error due to usage of "FALSE" with latest postgres code, Hence attaching the patch with correction.
Corrected compilation error in documentation portion of patch with latest postgres code. Attached the corrected patch.
Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment
Hello. At Fri, 12 Jan 2018 10:12:35 +1100, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote in <CAOoUkxRRTBm8ztzVXL45EVk=BC8AjfaFM5=ZMtrOMMfm+dnbwA@mail.gmail.com> > Corrected compilation error in documentation portion of patch with latest > postgres code. Attached the corrected patch. My understanding of this patch is that it intends to do the following things without changing the protocol. A. Refrain from sending sync message and socket flush during batching. B. Set required information to PGconn before receiving a result. Multi statement query does the a similar thing but a bit different. Queries are not needed to be built and mreged as a string in advance with this feature. Since I'm not sure what is the current stage of this patch and I haven't fully recognize the disucssion so far, I might make stupid or duplicate comments but would like to make some comments. - Previous comments A disucssion on psql batch mode was held in another branch of this thread. How do we treat that? - Interface complteness I think PQenter/exitBatchMode() is undoubtedly needed. PQbatchSendQueue() seems to be required to make sure to send queries before calling PQgetResult. PQbatchProcessQueue() doesn't seem essential. If there's no case where it is called without following PQgetResult, it can be included in PQgetResult. Client code may be simpler if it is not required. The latest patch has extern definition of PQbatchQueueCount, the body and doc of which seems to have been removed, or haven't exist since the beginning. - Some comments on the code -- PQbatchProcessQueue() PQbatchProcessQueue() returns 0 for several undistinguishable reasons. If asyncStatus is PGASYNC_BUSY at the time, the connection is out of sync and doesn't accept further operations. So it should be distinguished from the end-of-queue case. (It can reutrn an error result if combining with PQgetResult.) The function handles COPY_* in inconsistent way. It sets an error message to conn in hardly noticeable way but allows to go further. The document is telling as the follows. > COPY is not recommended as it most likely will trigger failure > in batch processing I don't think the desciription is informative, but the reason for the description would be the fact that we cannot detect a command leads to protocol failure before the failure actually happens. The test code suggests that that is the case where any command (other than Sync) following "COPY FROM" breaks the protocol. We get an error messages like below in the case. (The header part is of my test program.) > ABORTED: (PGRES_BATCH_ABORTED) ERROR: unexpected message type 0x42 during COPY from stdin > CONTEXT: COPY t, line 1 I haven't investigated further, but it seems unable to restore the sync of connection until disconnection. It would need a side-channel for COPY_IN to avoid the failure but we don't have it now. The possible choices here are: A. Make the description more precisely, by adding the condition to cause the error and how it looks. B. The situation seems detectable in PGgetResult. Return an error result containing any meaningful error. But this doesn't stop the following protocol error messages. > ABORTED: (PGRES_FATAL_ERROR) COPY_IN cannot be followed by any command in a batch > ABORTED: (PGRES_FATAL_ERROR) ERROR: 08P01: unexpected message type 0x42 during COPY from stdin > ABORTED: (PGRES_BATCH_ABORTED) ERROR: unexpected message type 0x42 during COPY from stdin > ABORTED: (PGRES_BATCH_ABORTED) ERROR: unexpected message type 0x42 during COPY from stdin C. Cancel COPY on the server and return error to the client for the situation. This might let us be able to avoid unrecoverable desync of the protocol in the case. (I'm not sure this can be done cleanly.) D. Wait for a protocol change that can solve this problem. -- PQbatchFlush() > static int pqBatchFlush(Pgconn *conn) ... > if (...) > return(pqFlush(conn)); > return 0; /* Just to keep compiler quiet */ The comment in the last line is wrong. + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_T Isn't this a bit too long? -- PGcommandQueueEntry The resason for the fact that it is a linked list seems to me to allow queueing and result processing happen alternately in one batch period. But I coundn't find a description about the behavior in the documentation. (State transition chart is required?) Aside from the above, the interface to handle the command queue seems to be divided into too-small pieces. PQsendQueryGuts() > el = PQmakePipelinedCommand(conn) > el->members = hoge; > .. > PQappendPipelinedCommand(conn, el) Isn't the following enough instead of the above? > PQappendPipelinedCommand(conn, conn->queryclass, conn->last_query); regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI wrote: > A disucssion on psql batch mode was held in another branch of > this thread. How do we treat that? There's a batch mode for pgbench in a patch posted in [1], with \beginbatch and \endbatch commands, but nothing for psql AFAICS. psql is more complicated because currently it uses a blocking PQexec() call at its core. Craig mentioned psql integration in [2] and [3]. Also a script can have inter-query dependencies such as in insert into table(...) returning id \gset update othertable set col= :id where ...; which is a problem in batch mode, as we don't want to send the update before the right value for :id is known. Whether we want to support these dependencies and how needs discussion. For instance we might not support them at all, or create a synchronization command that collects all results of queries sent so far, or do it implicitly when a variable is injected into a query... This looks like substantial work that might be best done separately from the libpq patch. [1] https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da26d7@manitou-mail.org [2] https://www.postgresql.org/message-id/CAMsr+YGLhaDkjymLuNVQy4MrSKQoA=F1vO=aN8XQf30N=aQuVA@mail.gmail.com [3] https://www.postgresql.org/message-id/CAMsr+YE6BK4iAaQz=nY3xDnbLhnNZ_4tp-PTJqbNNpsZMgoo8Q@mail.gmail.com Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Fri, Mar 23, 2018 at 02:18:09PM +0100, Daniel Verite wrote: > There's a batch mode for pgbench in a patch posted in [1], > with \beginbatch and \endbatch commands, but nothing > for psql AFAICS. > psql is more complicated because currently it uses a > blocking PQexec() call at its core. Craig mentioned psql > integration in [2] and [3]. This patch has been around for some time now, the last version fails to apply cleanly and in-depth reviews have happened. I am moving that to the next CF, waiting on its author. -- Michael
Attachment
> On Mon, Oct 1, 2018 at 8:34 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 23, 2018 at 02:18:09PM +0100, Daniel Verite wrote: > > There's a batch mode for pgbench in a patch posted in [1], > > with \beginbatch and \endbatch commands, but nothing > > for psql AFAICS. > > psql is more complicated because currently it uses a > > blocking PQexec() call at its core. Craig mentioned psql > > integration in [2] and [3]. > > This patch has been around for some time now, the last version fails to > apply cleanly and in-depth reviews have happened. I am moving that to > the next CF, waiting on its author. Unfortunately, nothing was changed since then, so there is already some amount of unaddressed review feedback. I'll move this patch to "Returned with feedback".
Hi, > > This patch has been around for some time now, the last version fails to > > apply cleanly and in-depth reviews have happened. I am moving that to > > the next CF, waiting on its author. > > Unfortunately, nothing was changed since then, so there is already some amount > of unaddressed review feedback. I'll move this patch to "Returned with > feedback". > Craig Ringer mentioned about this thread to me recently. This effort has seen decent reviews from Craig, Andres and Michael already. So, I thought of refreshing it to work against latest master HEAD. PFA, main patch as well as the test patch (I named the test patch v17 to be consistent with the main patch). The major grouse with the test patch AFAICS was the use of non-Windows compliant timersub() function. I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of portable macros for the same. Please let me know on what we think of the above. Regards, Nikhil -- Nikhil Sontakke 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Attachment
On Fri, Aug 30, 2019 at 7:06 PM Nikhil Sontakke <nikhils@2ndquadrant.com> wrote: > > Hi, > > > > This patch has been around for some time now, the last version fails to > > > apply cleanly and in-depth reviews have happened. I am moving that to > > > the next CF, waiting on its author. > > > > Unfortunately, nothing was changed since then, so there is already some amount > > of unaddressed review feedback. I'll move this patch to "Returned with > > feedback". > > > > Craig Ringer mentioned about this thread to me recently. > > This effort has seen decent reviews from Craig, Andres and Michael > already. So, I thought of refreshing it to work against latest master > HEAD. > Thanks for picking up this. However, I noticed that previously Horiguchi-San has given some comments on this patch [1] which doesn't seem to be addressed or at least not all of them are addressed. It is possible that you would have already addressed those, but in that case, it would be good if you respond to his email as well. If those are not addressed, then it will be good to address those. [1] - https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2019-Sep-09, Amit Kapila wrote: > Thanks for picking up this. However, I noticed that previously > Horiguchi-San has given some comments on this patch [1] which doesn't > seem to be addressed or at least not all of them are addressed. It is > possible that you would have already addressed those, but in that > case, it would be good if you respond to his email as well. If those > are not addressed, then it will be good to address those. Totally unasked for, here's a rebase of this patch series. I didn't do anything other than rebasing to current master, solving a couple of very trivial conflicts, fixing some whitespace complaints by git apply, and running tests to verify everthing works. I don't foresee working on this at all, so if anyone is interested in seeing this feature in, I encourage them to read and address Horiguchi-san's feedback. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi, On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote: > Totally unasked for, here's a rebase of this patch series. I didn't do > anything other than rebasing to current master, solving a couple of very > trivial conflicts, fixing some whitespace complaints by git apply, and > running tests to verify everthing works. > > I don't foresee working on this at all, so if anyone is interested in > seeing this feature in, I encourage them to read and address > Horiguchi-san's feedback. Nor am I planning to do so, but I do think its a pretty important improvement. > +/* > + * PQrecyclePipelinedCommand > + * Push a command queue entry onto the freelist. It must be a dangling entry > + * with null next pointer and not referenced by any other entry's next pointer. > + */ Dangling sounds a bit like it's already freed. > +/* > + * PQbatchSendQueue > + * End a batch submission by sending a protocol sync. The connection will > + * remain in batch mode and unavailable for new synchronous command execution > + * functions until all results from the batch are processed by the client. I feel like the reference to the protocol sync is a bit too low level for an external API. It should first document what the function does from a user's POV. I think it'd also be good to document whether / whether not queries can already have been sent before PQbatchSendQueue is called or not. > +/* > + * PQbatchProcessQueue > + * In batch mode, start processing the next query in the queue. > + * > + * Returns 1 if the next query was popped from the queue and can > + * be processed by PQconsumeInput, PQgetResult, etc. > + * > + * Returns 0 if the current query isn't done yet, the connection > + * is not in a batch, or there are no more queries to process. > + */ > +int > +PQbatchProcessQueue(PGconn *conn) > +{ > + PGcommandQueueEntry *next_query; > + > + if (!conn) > + return 0; > + > + if (conn->batch_status == PQBATCH_MODE_OFF) > + return 0; > + > + switch (conn->asyncStatus) > + { > + case PGASYNC_COPY_IN: > + case PGASYNC_COPY_OUT: > + case PGASYNC_COPY_BOTH: > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext_noop("internal error, COPY in batch mode")); > + break; Shouldn't there be a return 0 here? > + if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC) > + { > + /* > + * In an aborted batch we don't get anything from the server for each > + * result; we're just discarding input until we get to the next sync > + * from the server. The client needs to know its queries got aborted > + * so we create a fake PGresult to return immediately from > + * PQgetResult. > + */ > + conn->result = PQmakeEmptyPGresult(conn, > + PGRES_BATCH_ABORTED); > + if (!conn->result) > + { > + printfPQExpBuffer(&conn->errorMessage, > + libpq_gettext("out of memory")); > + pqSaveErrorResult(conn); > + return 0; Is there any way an application can recover at this point? ISTM we'd be stuck in the previous asyncStatus, no? > +/* pqBatchFlush > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value. > + * In non-batch mode, data will be flushed all the time. > + */ > +static int > +pqBatchFlush(PGconn *conn) > +{ > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD)) > + return(pqFlush(conn)); > + return 0; /* Just to keep compiler quiet */ > +} unnecessarily long line. > +/* > + * Connection's outbuffer threshold is set to 64k as it is safe > + * in Windows as per comments in pqSendSome() API. > + */ > +#define OUTBUFFER_THRESHOLD 65536 I don't think the comment explains much. It's fine to send more than 64k with pqSendSome(), they'll just be send with separate pgsecure_write() invocations. And only on windows. It clearly makes sense to start sending out data at a certain granularity to avoid needing unnecessary amounts of memory, and to make more efficient use of latency / serer side compute. It's not implausible that 64k is the right amount for that, I just don't think the explanation above is good. > diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c > new file mode 100644 > index 0000000000..4d6ba266e5 > --- /dev/null > +++ b/src/test/modules/test_libpq/testlibpqbatch.c > @@ -0,0 +1,1456 @@ > +/* > + * src/test/modules/test_libpq/testlibpqbatch.c > + * > + * > + * testlibpqbatch.c > + * Test of batch execution functionality > + */ > + > +#ifdef WIN32 > +#include <windows.h> > +#endif ISTM that this shouldn't be needed in a test program like this? Shouldn't libpq abstract all of this away? > +static void > +simple_batch(PGconn *conn) > +{ ISTM that all or at least several of these should include tests of transactional behaviour with pipelining (e.g. using both implicit and explicit transactions inside a single batch, using transactions across batches, multiple explicit transactions inside a batch). > + PGresult *res = NULL; > + const char *dummy_params[1] = {"1"}; > + Oid dummy_param_oids[1] = {INT4OID}; > + > + fprintf(stderr, "simple batch... "); > + fflush(stderr); Why do we need fflush()? IMO that shouldn't be needed in a use like this? Especially not on stderr, which ought to be unbuffered? > + /* > + * Enter batch mode and dispatch a set of operations, which we'll then > + * process the results of as they come in. > + * > + * For a simple case we should be able to do this without interim > + * processing of results since our out buffer will give us enough slush to > + * work with and we won't block on sending. So blocking mode is fine. > + */ > + if (PQisnonblocking(conn)) > + { > + fprintf(stderr, "Expected blocking connection mode\n"); > + goto fail; > + } Perhaps worth adding a helper for this? #define EXPECT(condition, explanation) \ ... or such? Greetings, Andres Freund