Thread: Re: [HACKERS] PATCH: Batch/pipelining support for libpq

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> > Regarding test patch, I have corrected the test suite after David Steele's
> > comments.
> > Also, I would like to mention that a companion patch was submitted by David
> > Steele up-thread.
> > 
> > Attached the latest code and test patch.
> 
> My impression is that this'll need a couple more rounds of review. Given
> that this'll establish API we'll pretty much ever going to be able to
> change/remove, I think it'd be a bad idea to rush this into v10.
> Therefore I propose moving this to the next CF.

Craig, Vaishnavi, everyone else: Are you planning to continue to work on
this for v11?  I'm willing to do another round, but only if it's
worthwhile.

FWIW, I still think this needs a pgbench or similar example integration,
so we can actually properly measure the benefits.

- Andres



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Tue, Jun 20, 2017 at 8:49 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> > Regarding test patch, I have corrected the test suite after David Steele's
> > comments.
> > Also, I would like to mention that a companion patch was submitted by David
> > Steele up-thread.
> >
> > Attached the latest code and test patch.
>
> My impression is that this'll need a couple more rounds of review. Given
> that this'll establish API we'll pretty much ever going to be able to
> change/remove, I think it'd be a bad idea to rush this into v10.
> Therefore I propose moving this to the next CF.

Craig, Vaishnavi, everyone else: Are you planning to continue to work on
this for v11?  I'm willing to do another round, but only if it's
worthwhile.

Yes, am willing to continue working on this patch for v11.
 

FWIW, I still think this needs a pgbench or similar example integration,
so we can actually properly measure the benefits.

I will investigate on this further.
 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 20 June 2017 at 06:49, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-05 15:45:26 -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
>> > Regarding test patch, I have corrected the test suite after David Steele's
>> > comments.
>> > Also, I would like to mention that a companion patch was submitted by David
>> > Steele up-thread.
>> >
>> > Attached the latest code and test patch.
>>
>> My impression is that this'll need a couple more rounds of review. Given
>> that this'll establish API we'll pretty much ever going to be able to
>> change/remove, I think it'd be a bad idea to rush this into v10.
>> Therefore I propose moving this to the next CF.
>
> Craig, Vaishnavi, everyone else: Are you planning to continue to work on
> this for v11?  I'm willing to do another round, but only if it's
> worthwhile.

I'm happy to work on review, and will try to make some time, but have
to focus primarily on logical rep infrastructure. This patch was a
proof of concept and fun hack for me and while I'm glad folks are
interested, it's not something I can dedicate much time to. Especially
with a 6-week-old baby now....

> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.

I agree. I originally wanted to patch psql, but it's pretty intrusive.
pgbench is likely a better target. Also pg_restore.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Tue, Jun 20, 2017 at 10:43 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Especially with a 6-week-old baby now....

Congratulations!
-- 
Michael



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Andres Freund wrote:

> FWIW, I still think this needs a pgbench or similar example integration,
> so we can actually properly measure the benefits.

Here's an updated version of the patch I made during review,
adding \beginbatch and \endbatch to pgbench.
The performance improvement appears clearly
with a custom script of this kind:
  \beginbatch
     UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
     ..above repeated 1000 times...
  \endbatch

versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch

On localhost on my desktop I tend to see a 30% difference in favor
of the batch mode with that kind of test.
On slower networks there are much bigger differences.

The latest main patch (v10) must also be slightly updated for HEAD,
because of this:
error: patch failed: src/interfaces/libpq/exports.txt:171
v11 attached without any other change.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>     Andres Freund wrote:
> 
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
> 
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>      UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>      ..above repeated 1000 times...
>   \endbatch
> 
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> 
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.

\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\set aid random(1, :naccounts)
\set bid random(1, :nbranches)
\set tid random(1, :ntellers)
\set delta random(-5000, 5000)
\beginbatch
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
\endbatch

- Andres



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:


On 22 Jun. 2017 07:40, "Andres Freund" <andres@anarazel.de> wrote:
On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>       Andres Freund wrote:
>
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
>
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>      UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>      ..above repeated 1000 times...
>   \endbatch
>
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
>
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.

In my original tests I got over a 300x improvement on WAN :) . I should check if the same applies with pgbench.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-06-21 16:40:48 -0700, Andres Freund wrote:
> On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
> >     Andres Freund wrote:
> > 
> > > FWIW, I still think this needs a pgbench or similar example integration,
> > > so we can actually properly measure the benefits.
> > 
> > Here's an updated version of the patch I made during review,
> > adding \beginbatch and \endbatch to pgbench.
> > The performance improvement appears clearly
> > with a custom script of this kind:
> >   \beginbatch
> >      UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
> >      ..above repeated 1000 times...
> >   \endbatch
> > 
> > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> > 
> > On localhost on my desktop I tend to see a 30% difference in favor
> > of the batch mode with that kind of test.
> > On slower networks there are much bigger differences.
> 
> This is seriously impressive.  Just using the normal pgbench mixed
> workload, wrapping a whole transaction into a batch *doubles* the
> throughput.  And that's locally over a unix socket - the gain over
> actual network will be larger.

I've not analyzed this further, but something with the way network is
done isn't yet quite right either in the pgbench patch or in the libpq
patch.  You'll currently get IO like:

sendto(3, "B\0\0\0\22\0P1_2\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 36, MSG_NOSIGNAL, NULL, 0) = 36
sendto(3, "B\0\0\0\35\0P1_4\0\0\0\0\1\0\0\0\0073705952\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_6\0\0\0\0\1\0\0\0\0077740854\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_8\0\0\0\0\1\0\0\0\0071570280\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_10\0\0\0\0\1\0\0\0\0072634305\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_12\0\0\0\0\1\0\0\0\0078960656\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_14\0\0\0\0\1\0\0\0\0073030370\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\35\0P1_16\0\0\0\0\1\0\0\0\006376125\0\1\0\0D\0"..., 47, MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_18\0\0\0\0\1\0\0\0\0072982423\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_20\0\0\0\0\1\0\0\0\0073860195\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_22\0\0\0\0\1\0\0\0\0072794433\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_24\0\0\0\0\1\0\0\0\0075475271\0\1\0\0D"..., 48, MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\23\0P1_25\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t"..., 37, MSG_NOSIGNAL, NULL, 0) = 37
sendto(3, "S\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
recvfrom(3, "2\0\0\0\4n\0\0\0\4C\0\0\0\nBEGIN\0002\0\0\0\4T\0\0\0!\0"..., 16384, 0, NULL, NULL) = 775
recvfrom(3, 0x559a02667ff2, 15630, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667fb1, 15695, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667f6c, 15764, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667f2b, 15829, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667eea, 15894, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667ea9, 15959, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667e68, 16024, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667e24, 16092, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667de3, 16157, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667da2, 16222, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667d5d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667d1c, 16356, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x559a02667d06, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

I.e. we're doing tiny write send() syscalls (they should be coalesced)
and then completely unnecessarily call recv() over and over again
without polling.  To me it looks very much like we're just doing either
exactly once per command...

- Andres



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote:

> I.e. we're doing tiny write send() syscalls (they should be coalesced)

That's likely worth doing, but can probably wait for a separate patch.
The kernel will usually do some packet aggregation unless we use
TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
is IMO not worth worrying about just yet.

> and then completely unnecessarily call recv() over and over again
> without polling.  To me it looks very much like we're just doing either
> exactly once per command...

Yeah, that looks suspect.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote:
> 
> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> 
> That's likely worth doing, but can probably wait for a separate patch.

I don't think so, we should get this right, it could have API influence.


> The kernel will usually do some packet aggregation unless we use
> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> is IMO not worth worrying about just yet.

1)                /*                 * Select socket options: no delay of outgoing data for                 * TCP
sockets,nonblock mode, close-on-exec. Fail if any                 * of this fails.                 */                if
(!IS_AF_UNIX(addr_cur->ai_family))               {                    if (!connectNoDelay(conn))                    {
                    pqDropConnection(conn, true);                        conn->addr_cur = addr_cur->ai_next;
           continue;                    }                }
 

2) Even if nodelay weren't set, this can still lead to smaller packets  being sent, because you start sending normal
sizedtcp packets,  rather than jumbo ones, even if configured (pretty common these  days).
 

3) Syscall overhead is actually quite significant.

- Andres



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 22 June 2017 at 09:07, Andres Freund <andres@anarazel.de> wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
>> On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote:
>>
>> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
>>
>> That's likely worth doing, but can probably wait for a separate patch.
>
> I don't think so, we should get this right, it could have API influence.
>
>
>> The kernel will usually do some packet aggregation unless we use
>> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
>> is IMO not worth worrying about just yet.
>
> 1)
>                                         /*
>                                          * Select socket options: no delay of outgoing data for
>                                          * TCP sockets, nonblock mode, close-on-exec. Fail if any
>                                          * of this fails.
>                                          */
>                                         if (!IS_AF_UNIX(addr_cur->ai_family))
>                                         {
>                                                 if (!connectNoDelay(conn))
>                                                 {
>                                                         pqDropConnection(conn, true);
>                                                         conn->addr_cur = addr_cur->ai_next;
>                                                         continue;
>                                                 }
>                                         }
>
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>    being sent, because you start sending normal sized tcp packets,
>    rather than jumbo ones, even if configured (pretty common these
>    days).
>
> 3) Syscall overhead is actually quite significant.

Fair enough, and *headdesk* re not checking NODELAY. I thought I'd
checked for our use of that before, but I must've remembered wrong.

We could use TCP_CORK but it's not portable and it'd be better to just
collect up a buffer to dispatch.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-06-21 18:07:21 -0700, Andres Freund wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> > On 22 June 2017 at 08:29, Andres Freund <andres@anarazel.de> wrote:
> > 
> > > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> > 
> > That's likely worth doing, but can probably wait for a separate patch.
> 
> I don't think so, we should get this right, it could have API influence.
> 
> 
> > The kernel will usually do some packet aggregation unless we use
> > TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> > is IMO not worth worrying about just yet.
> 
> 1)
>                     /*
>                      * Select socket options: no delay of outgoing data for
>                      * TCP sockets, nonblock mode, close-on-exec. Fail if any
>                      * of this fails.
>                      */
>                     if (!IS_AF_UNIX(addr_cur->ai_family))
>                     {
>                         if (!connectNoDelay(conn))
>                         {
>                             pqDropConnection(conn, true);
>                             conn->addr_cur = addr_cur->ai_next;
>                             continue;
>                         }
>                     }
> 
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>    being sent, because you start sending normal sized tcp packets,
>    rather than jumbo ones, even if configured (pretty common these
>    days).
> 
> 3) Syscall overhead is actually quite significant.

Proof of the pudding:

pgbench of 10 pgbench select statements in a batch:

as submitted by Daniel:

pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 10000 -P 1 -f ~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 24175.5 tps, lat 0.647 ms stddev 0.782
progress: 2.0 s, 27737.6 tps, lat 0.577 ms stddev 0.625
progress: 3.0 s, 28853.3 tps, lat 0.554 ms stddev 0.619
progress: 4.0 s, 26660.8 tps, lat 0.600 ms stddev 0.776
progress: 5.0 s, 30023.8 tps, lat 0.533 ms stddev 0.484
progress: 6.0 s, 29959.3 tps, lat 0.534 ms stddev 0.450
progress: 7.0 s, 29944.9 tps, lat 0.534 ms stddev 0.536
progress: 8.0 s, 30137.7 tps, lat 0.531 ms stddev 0.533
progress: 9.0 s, 30285.2 tps, lat 0.528 ms stddev 0.479
progress: 10.0 s, 30228.7 tps, lat 0.529 ms stddev 0.460
progress: 11.0 s, 29921.4 tps, lat 0.534 ms stddev 0.613
progress: 12.0 s, 29982.4 tps, lat 0.533 ms stddev 0.510
progress: 13.0 s, 29247.4 tps, lat 0.547 ms stddev 0.526
progress: 14.0 s, 28757.3 tps, lat 0.556 ms stddev 0.635
progress: 15.0 s, 29035.3 tps, lat 0.551 ms stddev 0.523
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
19  0      0 488992 787332 23558676    0    0     0     0 9720 455099 65 35  0  0  0

(i.e. ~450k context switches)

hackily patched:
pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 10000 -P 1 -f ~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 40545.2 tps, lat 0.386 ms stddev 0.625
progress: 2.0 s, 48158.0 tps, lat 0.332 ms stddev 0.277
progress: 3.0 s, 50125.7 tps, lat 0.319 ms stddev 0.204
progress: 4.0 s, 50740.6 tps, lat 0.315 ms stddev 0.250
progress: 5.0 s, 50795.6 tps, lat 0.315 ms stddev 0.246
progress: 6.0 s, 51195.6 tps, lat 0.312 ms stddev 0.207
progress: 7.0 s, 50746.7 tps, lat 0.315 ms stddev 0.264
progress: 8.0 s, 50619.1 tps, lat 0.316 ms stddev 0.250
progress: 9.0 s, 50619.4 tps, lat 0.316 ms stddev 0.228
progress: 10.0 s, 46967.8 tps, lat 0.340 ms stddev 0.499
progress: 11.0 s, 50480.1 tps, lat 0.317 ms stddev 0.239
progress: 12.0 s, 50242.5 tps, lat 0.318 ms stddev 0.286
progress: 13.0 s, 49912.7 tps, lat 0.320 ms stddev 0.266
progress: 14.0 s, 49841.7 tps, lat 0.321 ms stddev 0.271
progress: 15.0 s, 49807.1 tps, lat 0.321 ms stddev 0.248
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
23  0      0 482008 787312 23558996    0    0     0     0 8219 105097 87 14  0  0  0

(i.e. ~100k context switches)

That's *localhost*.


It's completely possible that I've screwed something up here, I didn't
test it besides running pgbench, but the send/recv'd data looks like
it's similar amounts of data, just fewer syscalls.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Craig Ringer wrote:

> The kernel will usually do some packet aggregation unless we use
> TCP_NODELAY (which we don't and shouldn't)

Not sure. As a point of comparison, Oracle has it as a tunable
parameter (TCP.NODELAY), and they changed its default from
No to Yes starting from their 10g R2.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Andres Freund wrote:

-    if (pqFlush(conn) < 0)
-        goto sendFailed;
+    if (conn->batch_status == PQBATCH_MODE_OFF)
+    {
+        /*
+         * Give the data a push.  In nonblock mode, don't complain if
we're unable
+         * to send it all; PQgetResult() will do any additional
flushing needed.
+         */
+        if (pqFlush(conn) < 0)
+            goto sendFailed;
+    }

Seems to be responsible for roughly an 1.7x speedup in tps and equivalent
decrease in latency, based on the "progress" info.
I wonder how much of that corresponds to a decrease in the number of
packets versus the number of syscalls. Both matter, I guess.

But OTOH there are certainly batch workloads where it will be preferrable
for the first query to reach the server ASAP, rather than waiting to be
coalesced with the next ones.
libpq is not going to know what's best.
One option may be to leave that decision to the user by providing a
PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
function. Maybe we could even let the user set the size of the sending
buffer, so those who really want to squeeze performance may tune it
for their network and workload.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2017-06-22 13:43:35 +0200, Daniel Verite wrote:
> But OTOH there are certainly batch workloads where it will be preferrable
> for the first query to reach the server ASAP, rather than waiting to be
> coalesced with the next ones.

Is that really something people expect from a batch API?  I suspect it's
not really, and nothing would stop one from adding PQflush() or similar
calls if desirable anyway.

FWIW, the way I did that in the hack clearly isn't ok: If you were to
send a gigabyte of queries, it'd buffer them all up in memory... So some
more intelligence is going to be needed.


> libpq is not going to know what's best.
> One option may be to leave that decision to the user by providing a
> PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> function.

What'd be the difference between PQflush() and PQbatchFlush()?

Greetings,

Andres Freund



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Andres Freund wrote:

> > One option may be to leave that decision to the user by providing a
> > PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> > function.
>
> What'd be the difference between PQflush() and PQbatchFlush()?

I guess no difference, I was just not seeing that libpq already provides
this functionality...

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:

Andres Freund wrote :
> If you were to send a gigabyte of queries, it'd buffer them all up in memory... So some
>more intelligence is going to be needed.

Firstly, sorry for the delayed response as I got busy with other activities. 

To buffer up the queries before flushing them to the socket, I think "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in Windows as per comments in pqSendSome() API. 

Attached the code patch to replace pqFlush calls with pqBatchFlush in the asynchronous libpq function calls flow. 
Still pqFlush is used in "PQbatchSyncQueue" and "PQexitBatchMode" functions. 

Am failing to see the benefit in allowing user to set PQBatchAutoFlush(true|false) property? Is it really needed?

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 



Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2017-08-10 15:23:06 +1000, Vaishnavi Prabakaran wrote:
> Andres Freund wrote :
> > If you were to send a gigabyte of queries, it'd buffer them all up in
> memory... So some
> >more intelligence is going to be needed.
> 
> Firstly, sorry for the delayed response as I got busy with other
> activities.

No worries - development of new features was slowed down anyway, due to
the v10 feature freeze.


> To buffer up the queries before flushing them to the socket, I think
> "conn->outCount>=65536" is ok to use, as 64k is considered to be safe in
> Windows as per comments in pqSendSome() API.
> 
> Attached the code patch to replace pqFlush calls with pqBatchFlush in the
> asynchronous libpq function calls flow.
> Still pqFlush is used in "PQbatchSyncQueue" and
> "PQexitBatchMode" functions.


> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...



> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.


> +   <para>
> +    Batching is most useful when the server is distant, i.e. network latency
> +    (<quote>ping time</quote>) is high, and when many small operations are being performed in
> +    rapid sequence. There is usually less benefit in using batches when each
> +    query takes many multiples of the client/server round-trip time to execute.
> +    A 100-statement operation run on a server 300ms round-trip-time away would take
> +    30 seconds in network latency alone without batching; with batching it may spend
> +    as little as 0.3s waiting for results from the server.
> +   </para>

I'd add a remark that this is frequently beneficial even in cases of
minimal latency - as e.g. shown by the numbers I presented upthread.


> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?


> +   <para>
> +    Batching is less useful when information from one operation is required by the
> +    client before it knows enough to send the next operation.

s/less/not/


> +   <note>
> +    <para>
> +     The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
> +     use batches on server versions 8.4 and newer. Batching works on any server
> +     that supports the v3 extended query protocol.
> +    </para>
> +   </note>

Where's the 8.4 coming from?


> +   <para>
> +    The client uses libpq's asynchronous query functions to dispatch work,
> +    marking the end of each batch with <function>PQbatchSyncQueue</function>.
> +    And to get results, it uses <function>PQgetResult</function> and
> +    <function>PQbatchProcessQueue</function>. It may eventually exit
> +    batch mode with <function>PQexitBatchMode</function> once all results are
> +    processed.
> +   </para>
> +
> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server will
> +     block trying to send results from queries it has already processed to the
> +     client. This only occurs when the client sends enough queries to fill its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.


> +   <sect3 id="libpq-batch-sending">
> +    <title>Issuing queries</title>
> +
> +    <para>
> +     After entering batch mode the application dispatches requests
> +     using normal asynchronous <application>libpq</application> functions such as 
> +     <function>PQsendQueryParams</function>, <function>PQsendPrepare</function>,
> +     <function>PQsendQueryPrepared</function>, <function>PQsendDescribePortal</function>,
> +     <function>PQsendDescribePrepared</function>.
> +     The asynchronous requests are followed by a <link
> +     linkend="libpq-PQbatchSyncQueue"><function>PQbatchSyncQueue(conn)</function></link> call to mark
> +     the end of the batch. The client <emphasis>does not</emphasis> need to call
> +     <function>PQgetResult</function> immediately after dispatching each
> +     operation. <link linkend="libpq-batch-results">Result processing</link>
> +     is handled separately.
> +    </para>
> +    
> +    <para>
> +     Batched operations will be executed by the server in the order the client
> +     sends them. The server will send the results in the order the statements
> +     executed. The server may begin executing the batch before all commands
> +     in the batch are queued and the end of batch command is sent. If any
> +     statement encounters an error the server aborts the current transaction and
> +     skips processing the rest of the batch. Query processing resumes after the
> +     end of the failed batch.
> +    </para>

Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


> +    <note>
> +     <para>
> +      The client must not assume that work is committed when it
> +      <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +      corresponding result is received to confirm the commit is complete.
> +      Because errors arrive asynchronously the application needs to be able to
> +      restart from the last <emphasis>received</emphasis> committed change and
> +      resend work done after that point if something goes wrong.
> +     </para>
> +    </note>

This seems fairly independent of batching.


> +   </sect3>
> +
> +   <sect3 id="libpq-batch-interleave">
> +    <title>Interleaving result processing and query dispatch</title>
> +
> +    <para>
> +     To avoid deadlocks on large batches the client should be structured around
> +     a nonblocking I/O loop using a function like <function>select</function>,
> +     <function>poll</function>, <function>epoll</function>,
> +     <function>WaitForMultipleObjectEx</function>, etc.
> +    </para>
> +
> +    <para>
> +     The client application should generally maintain a queue of work still to
> +     be dispatched and a queue of work that has been dispatched but not yet had
> +     its results processed.

Hm. Why? If queries are just issued, no such queue is required?


> When the socket is writable it should dispatch more
> +     work. When the socket is readable it should read results and process them,
> +     matching them up to the next entry in its expected results queue. Batches
> +     should be scoped to logical units of work, usually (but not always) one
> +     transaction per batch. There's no need to exit batch mode and re-enter it
> +     between batches or to wait for one batch to finish before sending the next.
> +    </para>

This really needs to take memory usage into account.

> +       </variablelist>
> +
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQenterBatchMode">
> +     <term>
> +      <function>PQenterBatchMode</function>
> +      <indexterm>
> +       <primary>PQenterBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to enter batch mode if it is currently idle or
> +      already in batch mode.
> +
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +
> +        </para>
> +        <para>
> +          Returns 1 for success. Returns 0 and has no 
> +          effect if the connection is not currently idle, i.e. it has a result 
> +          ready, is waiting for more input from the server, etc. This function 
> +          does not actually send anything to the server, it just changes the 
> +          <application>libpq</application> connection state.
> +
> +        </para>
> +     </listitem>
> +    </varlistentry>
> +
> +    <varlistentry id="libpq-PQexitBatchMode">
> +     <term>
> +      <function>PQexitBatchMode</function>
> +      <indexterm>
> +       <primary>PQexitBatchMode</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to exit batch mode if it is currently in batch mode
> +      with an empty queue and no pending results.
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> +        </para>
> +        <para>Returns 1 for success.
> +      Returns 1 and takes no action if not in batch mode. If the connection has

"returns 1"?


> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Delimits the end of a set of a batched commands by sending a <link
> +      linkend="protocol-flow-ext-query">sync message</link> and flushing
> +      the send buffer. The end of a batch serves as 
> +      the delimiter of an implicit transaction and
> +      an error recovery point; see <link linkend="libpq-batch-errors">
> +      error handling</link>.

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


> +    <varlistentry id="libpq-PQbatchQueueCount">
> +     <term>
> +      <function>PQbatchQueueCount</function>
> +      <indexterm>
> +       <primary>PQbatchQueueCount</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Returns the number of queries still in the queue for this batch, not
> +      including any query that's currently having results being processed.
> +      This is the number of times <function>PQbatchProcessQueue</function> has to be
> +      called before the query queue is empty again.
> +
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +
> +      </para>
> +     </listitem>
> +    </varlistentry>

Given that apps are supposed to track this, I'm not sure why we have
this?


> +/*
> + * PQrecyclePipelinedCommand
> + *    Push a command queue entry onto the freelist. It must be a dangling entry
> + *    with null next pointer and not referenced by any other entry's next pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +    if (entry == NULL)
> +        return;
> +    if (entry->next != NULL)
> +    {
> +        fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry"));
> +        abort();

Don't think we use abort() in libpq like that. There's some Assert()s
tho.


>  static bool
>  PQsendQueryStart(PGconn *conn)
> @@ -1377,20 +1486,59 @@ PQsendQueryStart(PGconn *conn)
>                            libpq_gettext("no connection to the server\n"));
>          return false;
>      }
> -    /* Can't send while already busy, either. */
> -    if (conn->asyncStatus != PGASYNC_IDLE)
> +    /* Can't send while already busy, either, unless enqueuing for later */
> +    if (conn->asyncStatus != PGASYNC_IDLE && conn->batch_status == PQBATCH_MODE_OFF)
>      {
>          printfPQExpBuffer(&conn->errorMessage,
>                            libpq_gettext("another command is already in progress\n"));
>          return false;
>      }
>  
> -    /* initialize async result-accumulation state */
> -    pqClearAsyncResult(conn);
> +    if (conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +    /*

Weirdly indented.


> +    if (conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +        pipeCmd = PQmakePipelinedCommand(conn);
> +
> +        if (pipeCmd == NULL)
> +            return 0;            /* error msg already set */
> +
> +        last_query = &pipeCmd->query;
> +        queryclass = &pipeCmd->queryclass;
> +    }
> +    else
> +    {
> +        last_query = &conn->last_query;
> +        queryclass = &conn->queryclass;
> +    }

The amount of complexity / branches we're adding to all of these is more
than a bit unsightly.


> +/*
> + * PQbatchQueueCount
> + *     Return number of queries currently pending in batch mode
> + */
> +int
> +PQbatchQueueCount(PGconn *conn)
> +{
> +    int            count = 0;
> +    PGcommandQueueEntry *entry;
> +
> +    if (PQbatchStatus(conn) == PQBATCH_MODE_OFF)
> +        return 0;
> +
> +    entry = conn->cmd_queue_head;
> +    while (entry != NULL)
> +    {
> +        ++count;
> +        entry = entry->next;
> +    }
> +    return count;
> +}

Ugh, O(N)? In that case I'd rather just remove this.


> +/*
> + * PQbatchBegin

Mismatched w/ actual function name.


> + *     Put an idle connection in batch mode. Commands submitted after this
> + *     can be pipelined on the connection, there's no requirement to wait for
> + *     one to finish before the next is dispatched.
> + *
> + *     Queuing of new query or syncing during COPY is not allowed.

+"a"?

> + *     A set of commands is terminated by a PQbatchQueueSync. Multiple sets of batched
> + *     commands may be sent while in batch mode. Batch mode can be exited by
> + *     calling PQbatchEnd() once all results are processed.
> + *
> + *     This doesn't actually send anything on the wire, it just puts libpq
> + *     into a state where it can pipeline work.
> + */
> +int
> +PQenterBatchMode(PGconn *conn)
> +{
> +    if (!conn)
> +        return false;

true/false isn't quite in line with int return code.


> +/*
> + * PQbatchEnd

wrong name.

> + *     End batch mode and return to normal command mode.
> + *
> + *     Has no effect unless the client has processed all results
> + *     from all outstanding batches and the connection is idle.

That seems wrong - will lead to hard to diagnose errors.


> + *     Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> +    if (!conn)
> +        return false;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return true;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_IDLE:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, IDLE in batch mode"));
> +            break;
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;

So we'll still check the queue in this case, that's a bit weird?



> +/*
> + * PQbatchQueueSync

Out of sync.

> + *     End a batch submission by sending a protocol sync. The connection will
> + *     remain in batch mode and unavailable for new non-batch commands until all
> + *     results from the batch are processed by the client.

"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).

> + *     It's legal to start submitting another batch immediately, without waiting
> + *     for the results of the current batch. There's no need to end batch mode
> + *     and start it again.
> + *
> + *     If a command in a batch fails, every subsequent command up to and including
> + *     the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the
> + *     whole batch is processed without error, a PGresult with PGRES_BATCH_END is
> + *     produced.

Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?



> +/*
> + * PQbatchQueueProcess

Out of sync.


> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.

Last complaint about this - think this forgiving mode is a mistake.


> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{

> +    /* This command's results will come in immediately.
> +     * Initialize async result-accumulation state */
> +    pqClearAsyncResult(conn);

I'm not following?


>  /*
>   * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)

> +            if (conn->batch_status != PQBATCH_MODE_OFF)
> +            {
> +                /*
> +                 * batched queries aren't followed by a Sync to put us back in
> +                 * PGASYNC_IDLE state, and when we do get a sync we could
> +                 * still have another batch coming after this one.

This needs rephrasing.


> +                 * The connection isn't idle since we can't submit new
> +                 * nonbatched commands. It isn't also busy since the current
> +                 * command is done and we need to process a new one.
> +                 */
> +                conn->asyncStatus = PGASYNC_QUEUED;

Not sure I like the name.


> +    if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF)
> +    {
> +        printfPQExpBuffer(&conn->errorMessage,
> +                          libpq_gettext("Synchronous command execution functions are not allowed in batch
mode\n"));
> +        return false;
> +    }

Why do we need the PGASYNC_QUEUED test here?

> +/* pqBatchFlush
> + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> + * In non-batch mode, data will be flushed all the time.
> + */
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> +    if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> +        return(pqFlush(conn));
> +    return 0; /* Just to keep compiler quiet */
> +}

This should be defined in a macro or such, rather than hardcoded.


Falling over now. This seems like enough feedback for a bit of work
anyway.

Regards,

Andres



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres@anarazel.de> wrote:



> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...


Thanks for reviewing the patch and yes ok.
 


> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.

Client application complexity is increased in batch mode,because application needs to remember the query queue status. Results processing can be done at anytime, so the application needs to know till what query, the results are consumed. 
 

> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?


Hmm, though SELECTs also benefit from batch mode, doing multiple selects in batch mode will fill up the memory rapidly and might not be as beneficial as other operations listed.

 
> +   <para>
> +    Batching is less useful when information from one operation is required by the
> +    client before it knows enough to send the next operation.

s/less/not/


Corrected.
 

> +   <note>
> +    <para>
> +     The batch API was introduced in PostgreSQL 10.0, but clients using PostgresSQL 10.0 version of libpq can
> +     use batches on server versions 8.4 and newer. Batching works on any server
> +     that supports the v3 extended query protocol.
> +    </para>
> +   </note>

Where's the 8.4 coming from?



I guess it is 7.4 where "PQsendQueryParams" is introduced, and not 8.4. Corrected.


> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server will
> +     block trying to send results from queries it has already processed to the
> +     client. This only occurs when the client sends enough queries to fill its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Mention that nonblocking only actually helps if send/recv is done as
required, and can essentially require unbound memory?  We probably
should either document or implement some smarts about when to signal
read/write readyness. Otherwise we e.g. might be receiving tons of
result data without having sent the next query - or the other way round.



Added a statement for caution in documentation and again this is one of the reason why SELECT query is not so beneficial in batch mode.

 
Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


This function not only does error handling, but also sends the "Sync" message to backend. In batch mode, "Sync" message is not sent with every query but will
be sent only via this function to mark the end of implicit transaction.  Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any other better name. 



> +    <note>
> +     <para>
> +      The client must not assume that work is committed when it
> +      <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +      corresponding result is received to confirm the commit is complete.
> +      Because errors arrive asynchronously the application needs to be able to
> +      restart from the last <emphasis>received</emphasis> committed change and
> +      resend work done after that point if something goes wrong.
> +     </para>
> +    </note>

This seems fairly independent of batching.


Yes and the reason why is it explicitly specified for batch mode is that if more than one explicit transactions are used in Single batch, then failure of one transaction will lead to skipping the consequent transactions until the end of current batch is reached. This behavior is specific to batch mode, so adding a precautionary note here is needed I think. 

 
> +   </sect3>
> +
> +   <sect3 id="libpq-batch-interleave">
> +    <title>Interleaving result processing and query dispatch</title>
> +
> +    <para>
> +     To avoid deadlocks on large batches the client should be structured around
> +     a nonblocking I/O loop using a function like <function>select</function>,
> +     <function>poll</function>, <function>epoll</function>,
> +     <function>WaitForMultipleObjectEx</function>, etc.
> +    </para>
> +
> +    <para>
> +     The client application should generally maintain a queue of work still to
> +     be dispatched and a queue of work that has been dispatched but not yet had
> +     its results processed.

Hm. Why? If queries are just issued, no such queue is required?


This is essentially telling that client application should remember the order of queries sent , so that result processing will be based on what results are expected. For e.g, if the order of queries are 1)SELECT, 2)INSERT, then in result processing, "PGRES_TUPLES_OK" and later "PGRES_COMMAND_OK" checks are required. Above mentioned queue of work means the intelligent to remember the queue of queries.  

 
> When the socket is writable it should dispatch more
> +     work. When the socket is readable it should read results and process them,
> +     matching them up to the next entry in its expected results queue. Batches
> +     should be scoped to logical units of work, usually (but not always) one
> +     transaction per batch. There's no need to exit batch mode and re-enter it
> +     between batches or to wait for one batch to finish before sending the next.
> +    </para>

This really needs to take memory usage into account.


Modified the para to include that frequency of result processing should be based on available memory.

 
> +<synopsis>
> +int PQenterBatchMode(PGconn *conn);
> +</synopsis>
> +<synopsis>
> +int PQexitBatchMode(PGconn *conn);
> +</synopsis>
> +        </para>
> +        <para>Returns 1 for success.
> +      Returns 1 and takes no action if not in batch mode. If the connection has

"returns 1"?


Modified the code to return 1/0.

 


> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


Renamed to PQbatchCommitQueue. 

 
> +<synopsis>
> +int PQbatchQueueCount(PGconn *conn);
> +</synopsis>
> +

Given that apps are supposed to track this, I'm not sure why we have
this?


Removed this function considering your another comment about O(N) as well.

 
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +     if (entry == NULL)
> +             return;
> +     if (entry->next != NULL)
> +     {
> +             fprintf(stderr, libpq_gettext("tried to recycle non-dangling command queue entry"));
> +             abort();

Don't think we use abort() in libpq like that. There's some Assert()s
tho.



For out-of-memory cases, we do abort(), and here abort will happen only during some memory corruption. So, I think it is ok to abort here. Please let me know if you think otherwise.

 

>  static bool
>  PQsendQueryStart(PGconn *conn)
..... 
 
Weirdly indented.


Corrected.
 


> +/*
> + * PQbatchBegin

Mismatched w/ actual function name.


Corrected.

 

> + *   Put an idle connection in batch mode. Commands submitted after this
> + *   can be pipelined on the connection, there's no requirement to wait for
> + *   one to finish before the next is dispatched.
> + *
> + *   Queuing of new query or syncing during COPY is not allowed.

+"a"?

Hmm, Can you explain the question please. I don't understand.
 

> +/*
> + * PQbatchEnd

wrong name.

Corrected.
 

> + *   End batch mode and return to normal command mode.
> + *
> + *   Has no effect unless the client has processed all results
> + *   from all outstanding batches and the connection is idle.

That seems wrong - will lead to hard to diagnose errors.

I have now added a error message when the batch end is failed to ease the error diagnose. And, this behavior is documented. So the application should not assume that the batch mode ended without checking the return value.  

 
> + *   Returns true if batch mode ended.
> + */
> +int
> +PQexitBatchMode(PGconn *conn)
> +{
> +     if (!conn)
> +             return false;
> +
> +     if (conn->batch_status == PQBATCH_MODE_OFF)
> +             return true;
> +
> +     switch (conn->asyncStatus)
> +     {
> +             case PGASYNC_IDLE:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, IDLE in batch mode"));
> +                     break;
> +             case PGASYNC_COPY_IN:
> +             case PGASYNC_COPY_OUT:
> +             case PGASYNC_COPY_BOTH:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, COPY in batch mode"));
> +                     break;

So we'll still check the queue in this case, that's a bit weird?

Removed above cases as adding error here is not of much help as well. 

 

> +/*
> + * PQbatchQueueSync

Out of sync.


Corrected.

 

> + *   End a batch submission by sending a protocol sync. The connection will
> + *   remain in batch mode and unavailable for new non-batch commands until all
> + *   results from the batch are processed by the client.

"unavailable for new non-batch commands" - that's hard to follow, and
seems pretty redundant with PQendBatchMode (or however it's called).


Modified documentation to be more precise and difference between PQexitBatchMode and this one is that to exit batch, application should have consumed all the results pending in buffer and post exit batch, an application can call synchronous command execution functions. Whereas with batch sync/commit, application cannot issue synchronous commands but can issue asynchronous commands without reading all the results pending in buffer. This function basically marks the end of implicit transaction. 

 
> + *   It's legal to start submitting another batch immediately, without waiting
> + *   for the results of the current batch. There's no need to end batch mode
> + *   and start it again.
> + *
> + *   If a command in a batch fails, every subsequent command up to and including
> + *   the PQbatchQueueSync command result gets set to PGRES_BATCH_ABORTED state. If the
> + *   whole batch is processed without error, a PGresult with PGRES_BATCH_END is
> + *   produced.

Hm, should probably mention that that's only true for commands since the
last PQbatchQueueSync?



"Batch" means the set of commands between PQbatchCommitQueue(newly renamed name) calls except first batch which count from beginning to PQbatchCommitQueue. And in documentation it is mentioned that "The result for <function>PQbatchSyncQueue</function> is reported as
<literal>PGRES_BATCH_END</literal> to signal the end of the aborted batch and resumption of normal result processing." 


 

> +/*
> + * PQbatchQueueProcess

Out of sync.

Corrected .

 
> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.

Last complaint about this - think this forgiving mode is a mistake.


Now added an error message using printfPQExpBuffer(). I think aborting here is not really needed. Please let me know if you prefer to abort() rather than this solution.

 

> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{

> +     /* This command's results will come in immediately.
> +      * Initialize async result-accumulation state */
> +     pqClearAsyncResult(conn);

I'm not following?


First line of comment is removed. It is copy/paste error.

 

>  /*
>   * PQgetResult
> @@ -1749,10 +2228,32 @@ PQgetResult(PGconn *conn)

> +                     if (conn->batch_status != PQBATCH_MODE_OFF)
> +                     {
> +                             /*
> +                              * batched queries aren't followed by a Sync to put us back in
> +                              * PGASYNC_IDLE state, and when we do get a sync we could
> +                              * still have another batch coming after this one.

This needs rephrasing.



Rephrased as "In batch mode, query execution state cannot be IDLE as there can be other queries or results waiting in the queue ..." . Hope this is simple and enough.
 


> +                              * The connection isn't idle since we can't submit new
> +                              * nonbatched commands. It isn't also busy since the current
> +                              * command is done and we need to process a new one.
> +                              */
> +                             conn->asyncStatus = PGASYNC_QUEUED;

Not sure I like the name.


Changed the name to PGASYNC_BATCH to make it more align to batch mode as this status will not be set anywhere in non-batch mode.
 

> +     if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status != PQBATCH_MODE_OFF)
> +     {
> +             printfPQExpBuffer(&conn->errorMessage,
> +                                               libpq_gettext("Synchronous command execution functions are not allowed in batch mode\n"));
> +             return false;
> +     }

Why do we need the PGASYNC_QUEUED test here?


Yes, we don't need this check here. It was removed in PQfn and similarly it is not needed here and in pqParseInput2 too.

 
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> +     if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount>=65536))
> +             return(pqFlush(conn));
> +     return 0; /* Just to keep compiler quiet */
> +}

This should be defined in a macro or such, rather than hardcoded.


Added macro "OUTBUFFER_THRESHOLD" 

 

Falling over now. This seems like enough feedback for a bit of work
anyway.

Thanks once again for reviewing the patch. Attached the patch with your review comments incorporation.
 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.
Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:


On 13 September 2017 at 13:06, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:


On Wed, Aug 23, 2017 at 7:40 PM, Andres Freund <andres@anarazel.de> wrote:



> Am failing to see the benefit in allowing user to set
> PQBatchAutoFlush(true|false) property? Is it really needed?

I'm inclined not to introduce that for now. If somebody comes up with a
convincing usecase and numbers, we can add it later. Libpq API is set in
stone, so I'd rather not introduce unnecessary stuff...


Thanks for reviewing the patch and yes ok.
 


> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    can sometimes offer considerable performance improvements.
> +   </para>

That's not necessarily true, is it? Unless you count always doing
batches of exactly size 1.

Client application complexity is increased in batch mode,because application needs to remember the query queue status. Results processing can be done at anytime, so the application needs to know till what query, the results are consumed. 
 

Yep. Also, the client/server deadlocks at issue here are a buffer management issue, and deadlock is probably not exactly the right word. Your app has to process replies from the server while it's sending queries, otherwise it can get into a state where it has no room left in its send buffer, but the server isn't consuming its receive buffer because the server's send buffer is full. To allow the system to make progress, the client must read from the client receive buffer.

This isn't an issue when using libpq normally.

PgJDBC has similar issues with its batch mode, but in PgJDBC it's much worse because there's no non-blocking send available. In libpq you can at least set your sending socket to non-blocking.

 

> +   <para>
> +    Use batches when your application does lots of small
> +    <literal>INSERT</literal>, <literal>UPDATE</literal> and
> +    <literal>DELETE</literal> operations that can't easily be transformed into
> +    operations on sets or into a
> +    <link linkend="libpq-copy"><literal>COPY</literal></link> operation.
> +   </para>

Aren't SELECTs also a major beneficiarry of this?

Yes, many individual SELECTs that cannot be assembled into a single more efficient query would definitely also benefit.
 
Hmm, though SELECTs also benefit from batch mode, doing multiple selects in batch mode will fill up the memory rapidly and might not be as beneficial as other operations listed.

Depends on the SELECT. With wide results you'll get less benefit, but even then you can gain if you're on a high latency network. With "n+1" patterns and similar, you'll see huge gains.
 
Maybe note that multiple batches can be "in flight"?
I.e. PQbatchSyncQueue() is about error handling, nothing else? Don't
have a great idea, but we might want to rename...


This function not only does error handling, but also sends the "Sync" message to backend. In batch mode, "Sync" message is not sent with every query but will
be sent only via this function to mark the end of implicit transaction.  Renamed it to PQbatchCommitQueue. Kindly let me know if you think of any other better name.

I really do not like calling it "commit" as that conflates with a database commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.



> +    <varlistentry id="libpq-PQbatchSyncQueue">
> +     <term>
> +      <function>PQbatchSyncQueue</function>
> +      <indexterm>
> +       <primary>PQbatchSyncQueue</primary>
> +      </indexterm>
> +     </term>

I wonder why this isn't framed as PQbatchIssue/Send/...()? Syncing seems
to mostly make sense from a protocol POV.


Renamed to PQbatchCommitQueue. 


Per above, strong -1 on that. But SendQueue seems OK, or FlushQueue?



> + *   Put an idle connection in batch mode. Commands submitted after this
> + *   can be pipelined on the connection, there's no requirement to wait for
> + *   one to finish before the next is dispatched.
> + *
> + *   Queuing of new query or syncing during COPY is not allowed.

+"a"?

Hmm, Can you explain the question please. I don't understand.

s/of new query/of a new query/


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

I really do not like calling it "commit" as that conflates with a database commit.

A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.

Ok, SendQueue seems ok to me as well. Will change it in next version. 
 


+"a"?

Hmm, Can you explain the question please. I don't understand.

s/of new query/of a new query/


Thanks for explaining. Will change this too in next version. 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 13 September 2017 at 13:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
 
Thanks for explaining. Will change this too in next version. 


Thankyou, a lot, for picking up this patch. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Daniel Gustafsson
Date:
> On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
>
> On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com <mailto:craig@2ndquadrant.com>> wrote:
>
> I really do not like calling it "commit" as that conflates with a database commit.
>
> A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and
commit,then a later part to fail, if that's the case. So that name is IMO wrong. 
>
> Ok, SendQueue seems ok to me as well. Will change it in next version.
>
> +"a"?
>
> Hmm, Can you explain the question please. I don't understand.
>
> s/of new query/of a new query/
>
> Thanks for explaining. Will change this too in next version.

Based on the discussions in this thread, and that a new version hasn’t been
submitted, I’m marking this Returned with Feedback.  Please re-submit the new
version in an upcoming commitfest when ready.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Mon, Oct 2, 2017 at 8:31 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 13 Sep 2017, at 07:44, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:
>
> On Wed, Sep 13, 2017 at 3:33 PM, Craig Ringer <craig@2ndquadrant.com <mailto:craig@2ndquadrant.com>> wrote:
>
> I really do not like calling it "commit" as that conflates with a database commit.
>
> A batch can embed multiple BEGINs and COMMITs. It's entirely possible for an earlier part of the batch to succeed and commit, then a later part to fail, if that's the case. So that name is IMO wrong.
>
> Ok, SendQueue seems ok to me as well. Will change it in next version.
>
> +"a"?
>
> Hmm, Can you explain the question please. I don't understand.
>
> s/of new query/of a new query/
>
> Thanks for explaining. Will change this too in next version.

Based on the discussions in this thread, and that a new version hasn’t been
submitted, I’m marking this Returned with Feedback.  Please re-submit the new
version in an upcoming commitfest when ready.


Thanks for the suggestion and, OK I will create a new patch in upcoming commitfest with attached patch addressing above review comments.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 
Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks for the suggestion and, OK I will create a new patch in upcoming
> commitfest with attached patch addressing above review comments.

The patch still applies and there has been no updates for the last
month, as well as no reviews. I am bumping it to next CF.
-- 
Michael


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Tue, Nov 28, 2017 at 12:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks for the suggestion and, OK I will create a new patch in upcoming
> commitfest with attached patch addressing above review comments.

The patch still applies and there has been no updates for the last
month, as well as no reviews. I am bumping it to next CF.

Thank you, I see the patch generates a compilation error due to usage of "FALSE" with latest postgres code, Hence attaching the patch with correction. 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Fri, Jan 5, 2018 at 4:55 PM, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:


On Tue, Nov 28, 2017 at 12:57 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Oct 5, 2017 at 9:58 AM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks for the suggestion and, OK I will create a new patch in upcoming
> commitfest with attached patch addressing above review comments.

The patch still applies and there has been no updates for the last
month, as well as no reviews. I am bumping it to next CF.

Thank you, I see the patch generates a compilation error due to usage of "FALSE" with latest postgres code, Hence attaching the patch with correction. 
 
Corrected compilation error in documentation portion of patch with latest postgres code. Attached the corrected patch.  

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Kyotaro HORIGUCHI
Date:
Hello.

At Fri, 12 Jan 2018 10:12:35 +1100, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote in
<CAOoUkxRRTBm8ztzVXL45EVk=BC8AjfaFM5=ZMtrOMMfm+dnbwA@mail.gmail.com>
> Corrected compilation error in documentation portion of patch with latest
> postgres code. Attached the corrected patch.

My understanding of this patch is that it intends to do the
following things without changing the protocol.

A. Refrain from sending sync message and socket flush during batching.
B. Set required information to PGconn before receiving a result.

Multi statement query does the a similar thing but a bit
different.  Queries are not needed to be built and mreged as a
string in advance with this feature.

Since I'm not sure what is the current stage of this patch and I
haven't fully recognize the disucssion so far, I might make
stupid or duplicate comments but would like to make some
comments.


- Previous comments

A disucssion on psql batch mode was held in another branch of
this thread. How do we treat that?

- Interface complteness 

I think PQenter/exitBatchMode() is undoubtedly needed.

PQbatchSendQueue() seems to be required to make sure to send
queries before calling PQgetResult.

PQbatchProcessQueue() doesn't seem essential. If there's no case
where it is called without following PQgetResult, it can be
included in PQgetResult. Client code may be simpler if it is not
required.

The latest patch has extern definition of PQbatchQueueCount, the
body and doc of which seems to have been removed, or haven't
exist since the beginning.


- Some comments on the code

-- PQbatchProcessQueue()

PQbatchProcessQueue() returns 0 for several undistinguishable
reasons. If asyncStatus is PGASYNC_BUSY at the time, the
connection is out of sync and doesn't accept further
operations. So it should be distinguished from the end-of-queue
case. (It can reutrn an error result if combining with
PQgetResult.)

The function handles COPY_* in inconsistent way. It sets an error
message to conn in hardly noticeable way but allows to go
further. The document is telling as the follows.

> COPY is not recommended as it most likely will trigger failure
> in batch processing

I don't think the desciription is informative, but the reason for
the description would be the fact that we cannot detect a command
leads to protocol failure before the failure actually
happens. The test code suggests that that is the case where any
command (other than Sync) following "COPY FROM" breaks the
protocol. We get an error messages like below in the case. (The
header part is of my test program.)

> ABORTED: (PGRES_BATCH_ABORTED) ERROR:  unexpected message type 0x42 during COPY from stdin
> CONTEXT:  COPY t, line 1

I haven't investigated further, but it seems unable to restore
the sync of connection until disconnection. It would need a
side-channel for COPY_IN to avoid the failure but we don't have
it now. The possible choices here are:

A. Make the description more precisely, by adding the condition
   to cause the error and how it looks.

B. The situation seems detectable in PGgetResult. Return an error
   result containing any meaningful error. But this doesn't stop
   the following protocol error messages.

> ABORTED: (PGRES_FATAL_ERROR) COPY_IN cannot be followed by any command in a batch
> ABORTED: (PGRES_FATAL_ERROR) ERROR:  08P01: unexpected message type 0x42 during COPY from stdin
> ABORTED: (PGRES_BATCH_ABORTED) ERROR:  unexpected message type 0x42 during COPY from stdin
> ABORTED: (PGRES_BATCH_ABORTED) ERROR:  unexpected message type 0x42 during COPY from stdin

C. Cancel COPY on the server and return error to the client for
   the situation. This might let us be able to avoid
   unrecoverable desync of the protocol in the case. (I'm not
   sure this can be done cleanly.)

D. Wait for a protocol change that can solve this problem.


-- PQbatchFlush()

> static int pqBatchFlush(Pgconn *conn) 
...
>  if (...)
>     return(pqFlush(conn));
>  return 0; /* Just to keep compiler quiet */

The comment in the last line is wrong.

+    if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_T

Isn't this a bit too long?


-- PGcommandQueueEntry

The resason for the fact that it is a linked list seems to me to
allow queueing and result processing happen alternately in one
batch period. But I coundn't find a description about the
behavior in the documentation. (State transition chart is
required?)

Aside from the above, the interface to handle the command queue
seems to be divided into too-small pieces.

 PQsendQueryGuts()
 >  el = PQmakePipelinedCommand(conn)
 >  el->members = hoge;
 >  ..
 >  PQappendPipelinedCommand(conn, el)

Isn't the following enough instead of the above?

 >  PQappendPipelinedCommand(conn, conn->queryclass, conn->last_query);


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Kyotaro HORIGUCHI wrote:

> A disucssion on psql batch mode was held in another branch of
> this thread. How do we treat that?

There's a batch mode for pgbench in a patch posted in [1],
with \beginbatch and \endbatch commands, but nothing
for psql AFAICS.
psql is more complicated because currently it uses a
blocking PQexec() call at its core. Craig mentioned psql
integration in [2] and [3].
Also a script can have inter-query dependencies such as in
   insert into table(...) returning id \gset
   update othertable set col= :id where ...;
which is a problem in batch mode, as we don't want
to send the update before the right value for :id is
known. Whether we want to support these dependencies
and how needs discussion.
For instance we might not support them at all, or
create a synchronization command that collects
all results of queries sent so far, or do it implicitly
when a variable is injected into a query...
This looks like substantial work that might be best
done separately from the libpq patch.

[1]
https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da26d7@manitou-mail.org
[2]
https://www.postgresql.org/message-id/CAMsr+YGLhaDkjymLuNVQy4MrSKQoA=F1vO=aN8XQf30N=aQuVA@mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAMsr+YE6BK4iAaQz=nY3xDnbLhnNZ_4tp-PTJqbNNpsZMgoo8Q@mail.gmail.com


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Fri, Mar 23, 2018 at 02:18:09PM +0100, Daniel Verite wrote:
> There's a batch mode for pgbench in a patch posted in [1],
> with \beginbatch and \endbatch commands, but nothing
> for psql AFAICS.
> psql is more complicated because currently it uses a
> blocking PQexec() call at its core. Craig mentioned psql
> integration in [2] and [3].

This patch has been around for some time now, the last version fails to
apply cleanly and in-depth reviews have happened.  I am moving that to
the next CF, waiting on its author.
--
Michael

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Dmitry Dolgov
Date:
> On Mon, Oct 1, 2018 at 8:34 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 23, 2018 at 02:18:09PM +0100, Daniel Verite wrote:
> > There's a batch mode for pgbench in a patch posted in [1],
> > with \beginbatch and \endbatch commands, but nothing
> > for psql AFAICS.
> > psql is more complicated because currently it uses a
> > blocking PQexec() call at its core. Craig mentioned psql
> > integration in [2] and [3].
>
> This patch has been around for some time now, the last version fails to
> apply cleanly and in-depth reviews have happened.  I am moving that to
> the next CF, waiting on its author.

Unfortunately, nothing was changed since then, so there is already some amount
of unaddressed review feedback. I'll move this patch to "Returned with
feedback".


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Nikhil Sontakke
Date:
Hi,

> > This patch has been around for some time now, the last version fails to
> > apply cleanly and in-depth reviews have happened.  I am moving that to
> > the next CF, waiting on its author.
>
> Unfortunately, nothing was changed since then, so there is already some amount
> of unaddressed review feedback. I'll move this patch to "Returned with
> feedback".
>

Craig Ringer mentioned about this thread to me recently.

This effort has seen decent reviews from Craig, Andres and Michael
already. So, I thought of refreshing it to work against latest master
HEAD.

PFA, main patch as well as the test patch (I named the test patch v17
to be consistent with the main patch). The major grouse with the test
patch AFAICS was the use of non-Windows compliant timersub() function.
I have now used INSTR_TIME_SET_CURRENT/INSTR_TIME_SUBTRACT family of
portable macros for the same.

Please let me know on what we think of the above.

Regards,
Nikhil
-- 
Nikhil Sontakke
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Amit Kapila
Date:
On Fri, Aug 30, 2019 at 7:06 PM Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:
>
> Hi,
>
> > > This patch has been around for some time now, the last version fails to
> > > apply cleanly and in-depth reviews have happened.  I am moving that to
> > > the next CF, waiting on its author.
> >
> > Unfortunately, nothing was changed since then, so there is already some amount
> > of unaddressed review feedback. I'll move this patch to "Returned with
> > feedback".
> >
>
> Craig Ringer mentioned about this thread to me recently.
>
> This effort has seen decent reviews from Craig, Andres and Michael
> already. So, I thought of refreshing it to work against latest master
> HEAD.
>

Thanks for picking up this.  However, I noticed that previously
Horiguchi-San has given some comments on this patch [1] which doesn't
seem to be addressed or at least not all of them are addressed.  It is
possible that you would have already addressed those, but in that
case, it would be good if you respond to his email as well.  If those
are not addressed, then it will be good to address those.


[1] - https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2019-Sep-09, Amit Kapila wrote:

> Thanks for picking up this.  However, I noticed that previously
> Horiguchi-San has given some comments on this patch [1] which doesn't
> seem to be addressed or at least not all of them are addressed.  It is
> possible that you would have already addressed those, but in that
> case, it would be good if you respond to his email as well.  If those
> are not addressed, then it will be good to address those.

Totally unasked for, here's a rebase of this patch series.  I didn't do
anything other than rebasing to current master, solving a couple of very
trivial conflicts, fixing some whitespace complaints by git apply, and
running tests to verify everthing works.

I don't foresee working on this at all, so if anyone is interested in
seeing this feature in, I encourage them to read and address
Horiguchi-san's feedback.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote:
> Totally unasked for, here's a rebase of this patch series.  I didn't do
> anything other than rebasing to current master, solving a couple of very
> trivial conflicts, fixing some whitespace complaints by git apply, and
> running tests to verify everthing works.
> 
> I don't foresee working on this at all, so if anyone is interested in
> seeing this feature in, I encourage them to read and address
> Horiguchi-san's feedback.

Nor am I planning to do so, but I do think its a pretty important
improvement.



> +/*
> + * PQrecyclePipelinedCommand
> + *    Push a command queue entry onto the freelist. It must be a dangling entry
> + *    with null next pointer and not referenced by any other entry's next pointer.
> + */

Dangling sounds a bit like it's already freed.



> +/*
> + * PQbatchSendQueue
> + *     End a batch submission by sending a protocol sync. The connection will
> + *     remain in batch mode and unavailable for new synchronous command execution
> + *     functions until all results from the batch are processed by the client.

I feel like the reference to the protocol sync is a bit too low level
for an external API. It should first document what the function does
from a user's POV.

I think it'd also be good to document whether / whether not queries can
already have been sent before PQbatchSendQueue is called or not.


> +/*
> + * PQbatchProcessQueue
> + *     In batch mode, start processing the next query in the queue.
> + *
> + * Returns 1 if the next query was popped from the queue and can
> + * be processed by PQconsumeInput, PQgetResult, etc.
> + *
> + * Returns 0 if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.
> + */
> +int
> +PQbatchProcessQueue(PGconn *conn)
> +{
> +    PGcommandQueueEntry *next_query;
> +
> +    if (!conn)
> +        return 0;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return 0;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;

Shouldn't there be a return 0 here?



> +    if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> +    {
> +        /*
> +         * In an aborted batch we don't get anything from the server for each
> +         * result; we're just discarding input until we get to the next sync
> +         * from the server. The client needs to know its queries got aborted
> +         * so we create a fake PGresult to return immediately from
> +         * PQgetResult.
> +         */
> +        conn->result = PQmakeEmptyPGresult(conn,
> +                                           PGRES_BATCH_ABORTED);
> +        if (!conn->result)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("out of memory"));
> +            pqSaveErrorResult(conn);
> +            return 0;

Is there any way an application can recover at this point? ISTM we'd be
stuck in the previous asyncStatus, no?


> +/* pqBatchFlush
> + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> + * In non-batch mode, data will be flushed all the time.
> + */
> +static int
> +pqBatchFlush(PGconn *conn)
> +{
> +    if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD))
> +        return(pqFlush(conn));
> +    return 0; /* Just to keep compiler quiet */
> +}

unnecessarily long line.


> +/*
> + * Connection's outbuffer threshold is set to 64k as it is safe
> + * in Windows as per comments in pqSendSome() API.
> + */
> +#define OUTBUFFER_THRESHOLD    65536

I don't think the comment explains much. It's fine to send more than 64k
with pqSendSome(), they'll just be send with separate pgsecure_write()
invocations. And only on windows.

It clearly makes sense to start sending out data at a certain
granularity to avoid needing unnecessary amounts of memory, and to make
more efficient use of latency / serer side compute.

It's not implausible that 64k is the right amount for that, I just don't
think the explanation above is good.

> diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c
> new file mode 100644
> index 0000000000..4d6ba266e5
> --- /dev/null
> +++ b/src/test/modules/test_libpq/testlibpqbatch.c
> @@ -0,0 +1,1456 @@
> +/*
> + * src/test/modules/test_libpq/testlibpqbatch.c
> + *
> + *
> + * testlibpqbatch.c
> + *        Test of batch execution functionality
> + */
> +
> +#ifdef WIN32
> +#include <windows.h>
> +#endif

ISTM that this shouldn't be needed in a test program like this?
Shouldn't libpq abstract all of this away?


> +static void
> +simple_batch(PGconn *conn)
> +{

ISTM that all or at least several of these should include tests of
transactional behaviour with pipelining (e.g. using both implicit and
explicit transactions inside a single batch, using transactions across
batches, multiple explicit transactions inside a batch).



> +    PGresult   *res = NULL;
> +    const char *dummy_params[1] = {"1"};
> +    Oid            dummy_param_oids[1] = {INT4OID};
> +
> +    fprintf(stderr, "simple batch... ");
> +    fflush(stderr);

Why do we need fflush()? IMO that shouldn't be needed in a use like
this? Especially not on stderr, which ought to be unbuffered?


> +    /*
> +     * Enter batch mode and dispatch a set of operations, which we'll then
> +     * process the results of as they come in.
> +     *
> +     * For a simple case we should be able to do this without interim
> +     * processing of results since our out buffer will give us enough slush to
> +     * work with and we won't block on sending. So blocking mode is fine.
> +     */
> +    if (PQisnonblocking(conn))
> +    {
> +        fprintf(stderr, "Expected blocking connection mode\n");
> +        goto fail;
> +    }

Perhaps worth adding a helper for this?

#define EXPECT(condition, explanation) \
...
or such?

Greetings,

Andres Freund