Thread: PATCH: Batch/pipelining support for libpq

PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
Hi all

Following on from the foreign table batch inserts thread[1], here's a patch to add support for pipelining queries into asynchronous batches in libpq. 

Attached, and also available at https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch (subject to rebasing and force pushes).

It's cleaned up over the draft I posted on that thread and has error recovery implemented. I've written and included the SGML docs for it. The test program is now pretty comprehensive, more so than for anything else in libpq anyway. I'll submit it to the next CF as a 9.7/10.0 candidate.

I'm measuring 300x (not %) performance improvements doing batches on servers over the Internet, so this seems pretty worthwhile. It turned out to be way less invasive than I expected too.

(I intentionally didn't add any way for clients to annotate each work-item in a batch with their own private data. I think that'd be really useful and would make implementing clients easier, but should be a separate patch).

This should be very useful for optimising FDWs, Postgres-XC, etc.



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2016-05-23 17:19:09 +0800, Craig Ringer wrote:
> Hi all
> Following on from the foreign table batch inserts thread[1], here's a patch
> to add support for pipelining queries into asynchronous batches in libpq.

Yay!


> I'm measuring 300x (not %) performance improvements doing batches on
> servers over the Internet, so this seems pretty worthwhile. It turned out
> to be way less invasive than I expected too.

yay^2.


> (I intentionally didn't add any way for clients to annotate each work-item
> in a batch with their own private data. I think that'd be really useful and
> would make implementing clients easier, but should be a separate patch).
> 
> This should be very useful for optimising FDWs, Postgres-XC, etc.

And optimizing normal clients.

Not easy, but I'd be very curious how much psql's performance improves
when replaying a .sql style dump, and replaying from a !tty fd, after
hacking it up to use the batch API.

- Andres



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Mon, May 23, 2016 at 8:50 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-05-23 17:19:09 +0800, Craig Ringer wrote:
>> Following on from the foreign table batch inserts thread[1], here's a patch
>> to add support for pipelining queries into asynchronous batches in libpq.
>
> Yay!
>> I'm measuring 300x (not %) performance improvements doing batches on
>> servers over the Internet, so this seems pretty worthwhile. It turned out
>> to be way less invasive than I expected too.
>
> yay^2.

I'll follow this mood. Yeha.

>> (I intentionally didn't add any way for clients to annotate each work-item
>> in a batch with their own private data. I think that'd be really useful and
>> would make implementing clients easier, but should be a separate patch).
>>
>> This should be very useful for optimising FDWs, Postgres-XC, etc.
>
> And optimizing normal clients.
>
> Not easy, but I'd be very curious how much psql's performance improves
> when replaying a .sql style dump, and replaying from a !tty fd, after
> hacking it up to use the batch API.

Did you consider the use of simple_list.c instead of introducing a new
mimic as PGcommandQueueEntry? It would be cool avoiding adding new
list emulations on frontends.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 24 May 2016 at 00:00, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, May 23, 2016 at 8:50 AM, Andres Freund <andres@anarazel.de> wrote: 
>> This should be very useful for optimising FDWs, Postgres-XC, etc.
>
> And optimizing normal clients.
>
> Not easy, but I'd be very curious how much psql's performance improves
> when replaying a .sql style dump, and replaying from a !tty fd, after
> hacking it up to use the batch API.

I didn't, but agree it'd be interesting. So would pg_restore, for that matter, though its use of COPY for the bulk of its work means it wouldn't make tons of difference.

I think it'd be safe to enable it automatically in psql's --single-transaction mode. It's also safe to send anything after an explicit BEGIN and until the next COMMIT as a batch from libpq, and since it parses the SQL enough to detect statement boundaries already that shouldn't be too hard to handle.

However: psql is synchronous, using the PQexec family of blocking calls. It's all fairly well contained in SendQuery and PSQLexec, but making it use batching still require restructuring those to use the asynchronous nonblocking API and append the query to a pending-list, plus the addition of a select() loop to handle results and dispatch more work. MainLoop() isn't structured around a select or poll, it loops over gets. So while it'd be interesting to see what difference batching made the changes to make psql use it would be a bit more invasive. Far from a rewrite, but to avoid lots of code duplication it'd have to change everything to use nonblocking mode and a select loop, which is a big change for such a core tool.

This is a bit of a side-project and I've got to get back to "real work" so I don't expect to do a proper patch for psql any time soon. I'd rather not try to build too much on this until it's seen some review and I know the API won't need a drastic rewrite anyway. I'll see if I can do a hacked-up version one evening to see what it does for performance though.

Did you consider the use of simple_list.c instead of introducing a new
mimic as PGcommandQueueEntry? It would be cool avoiding adding new
list emulations on frontends.

Nope. I didn't realise it was there; I've done very little on the C client and library side so far. So thanks, I'll update it accordingly.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:

On 24 May 2016 at 00:00, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, May 23, 2016 at 8:50 AM, Andres Freund <andres@anarazel.de> wrote:
 
> yay^2.

I'll follow this mood. Yeha.

 
BTW, I've publushed the HTML-ified SGML docs to http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 24 May 2016 at 00:00, Michael Paquier <michael.paquier@gmail.com> wrote:
 

Did you consider the use of simple_list.c instead of introducing a new
mimic as PGcommandQueueEntry? It would be cool avoiding adding new
list emulations on frontends.

I'd have to extend simple_list to add a generic object version, like


struct my_list_elem
{
    PG_SIMPLE_LIST_ATTRS;
    mytype mycol;
    myothertype myothercol;

Objections?

I could add a void* version that's a simple clone of the string version, but having to malloc both a list cell and its contents separately is annoying.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: Batch/pipelining support for libpq

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 24 May 2016 at 00:00, Michael Paquier <michael.paquier@gmail.com> wrote:
>> Did you consider the use of simple_list.c instead of introducing a new
>> mimic as PGcommandQueueEntry? It would be cool avoiding adding new
>> list emulations on frontends.

> I'd have to extend simple_list to add a generic object version, like

> struct my_list_elem
> {
>     PG_SIMPLE_LIST_ATTRS;
>     mytype mycol;
>     myothertype myothercol;
> }

> Objections?

That doesn't look exactly "generic".

> I could add a void* version that's a simple clone of the string version,
> but having to malloc both a list cell and its contents separately is
> annoying.

I'd be okay with a void* version, but I'm not sure about this.
        regards, tom lane



Re: PATCH: Batch/pipelining support for libpq

From
Jim Nasby
Date:
On 5/23/16 4:19 AM, Craig Ringer wrote:
> +    Batching less useful when information from one operation is required by the

SB "Batching is less useful".
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PATCH: Batch/pipelining support for libpq

From
"Tsunakawa, Takayuki"
Date:

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Craig Ringer
I'll follow this mood. Yeha.

 

 

BTW, I've publushed the HTML-ified SGML docs to http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.

 

 

Sorry for my late reply.  Fantastic performance improvement, nice documentation, and amazing rapid development!  I think I’ll join the review & testing in 2016/9 CF.

 

Regards

Takayuki Tsunakawa

 

 

Re: PATCH: Batch/pipelining support for libpq

From
Dmitry Igrishin
Date:
2016-05-24 5:01 GMT+03:00 Craig Ringer <craig@2ndquadrant.com>:
>
> On 24 May 2016 at 00:00, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Mon, May 23, 2016 at 8:50 AM, Andres Freund <andres@anarazel.de> wrote:
>
>
>>
>> > yay^2.
>>
>> I'll follow this mood. Yeha.
>
>
>
> BTW, I've publushed the HTML-ified SGML docs to
> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.
Typo detected: "Returns 1 if the batch curently being received" -- "curently".

-- 
// Dmitry.



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin <dmitigr@gmail.com> wrote:
>> BTW, I've publushed the HTML-ified SGML docs to
>> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.
> Typo detected: "Returns 1 if the batch curently being received" -- "curently".

I am looking a bit more seriously at this patch and assigned myself as
a reviewer.

testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]       printf("batch insert elapsed:
%ld.%06lds\n",
elapsed_time.tv_sec, elapsed_time.tv_usec);
macos complains here. You may want to replace %06lds by just %06d.

(lldb) bt
* thread #1: tid = 0x0000, 0x0000000108bcdd56
libpq.5.dylib`closePGconn(conn=0x00007fd642d002d0) + 438 at
fe-connect.c:3010, stop reason = signal SIGSTOP * frame #0: 0x0000000108bcdd56
libpq.5.dylib`closePGconn(conn=0x00007fd642d002d0) + 438 at
fe-connect.c:3010   frame #1: 0x0000000108bc9db0
libpq.5.dylib`PQfinish(conn=0x00007fd642d002d0) + 32 at
fe-connect.c:3072   frame #2: 0x0000000108bc9ede
libpq.5.dylib`PQping(conninfo="dbname=postgres port=5432 host='/tmp'
connect_timeout=5") + 46 at fe-connect.c:539   frame #3: 0x0000000108bb5210
pg_ctl`test_postmaster_connection(pm_pid=78218, do_checkpoint='\0') +
976 at pg_ctl.c:681   frame #4: 0x0000000108bb388e pg_ctl`do_start + 302 at pg_ctl.c:915   frame #5: 0x0000000108bb29b4
pg_ctl`main(argc=5,
argv=0x00007fff5704e5c0) + 2836 at pg_ctl.c:2416   frame #6: 0x00007fff8b8b65ad libdyld.dylib`start + 1
(lldb) down 1
frame #0: 0x0000000108bcdd56
libpq.5.dylib`closePGconn(conn=0x00007fd642d002d0) + 438 at
fe-connect.c:3010  3007        queue = conn->cmd_queue_recycle;  3008        {  3009            PGcommandQueueEntry
*prev= queue;
 
-> 3010            queue = queue->next;  3011            free(prev);  3012        }  3013
conn->cmd_queue_recycle= NULL;
 
This patch generates a core dump, use for example pg_ctl start -w and
you'll bump into the trace above. There is something wrong with the
queue handling.

Do you have plans for a more generic structure for the command queue list?

+   <application>libpq</application> supports queueing up mulitiple queries into
s/mulitiple/multiple/.

+  <para>
+   An example of batch use may be found in the source distribution in
+   <filename>src/test/examples/libpqbatch.c</filename>.
+  </para>
You mean testlibpqbatch.c here.

+   <para>
+    Batching less useful when information from one operation is required by the
+    client before it knows enough to send the next operation. The client must
"Batching *is* less useful".

src/test/examples/.gitignore needs a new entry for the new test binary.

+           fprintf(stderr, "internal error, COPY in batch mode");
+           abort();
I don't think that's a good idea. defaultNoticeProcessor can be
overridden to allow applications to have error messages sent
elsewhere. Error messages should also use libpq_gettext, and perhaps
be stored in conn->errorMessage as we do so for OOMs happening on
client-side and reporting them back even if they are not expected
(those are blocked PQsendQueryStart in your patch).

src/test/examples is a good idea to show people what this new API can
do, but this is never getting compiled. It could as well be possible
to include tests in src/test/modules/, in the same shape as what
postgres_fdw is doing by connecting to itself and link it to libpq. As
this patch complicates quote a lot fe-exec.c, I think that this would
be worth it. Thoughts?
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 10 August 2016 at 14:44, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 3, 2016 at 8:51 PM, Dmitry Igrishin <dmitigr@gmail.com> wrote:
>> BTW, I've publushed the HTML-ified SGML docs to
>> http://2ndquadrant.github.io/postgres/libpq-batch-mode.html as a preview.
> Typo detected: "Returns 1 if the batch curently being received" -- "curently".

I am looking a bit more seriously at this patch and assigned myself as
a reviewer.

Much appreciated.
 
testlibpqbatch.c:1239:73: warning: format specifies type 'long' but
the argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
        printf("batch insert elapsed:      %ld.%06lds\n",
elapsed_time.tv_sec, elapsed_time.tv_usec);
macos complains here. You may want to replace %06lds by just %06d.

Yeah, or cast to a type known to be big enough. Will amend.
  
This patch generates a core dump, use for example pg_ctl start -w and
you'll bump into the trace above. There is something wrong with the
queue handling.

Huh. I didn't see that here (Fedora 23). I'll look more closely.
 
Do you have plans for a more generic structure for the command queue list?

No plans, no. This was a weekend experiment that turned into a useful patch and I'm having to scrape up time for it amongst much more important things like logical failover / sequence decoding and various other replication work.
 
Thanks for the docs review too, will amend.
 
+           fprintf(stderr, "internal error, COPY in batch mode");
+           abort();
I don't think that's a good idea. defaultNoticeProcessor can be
overridden to allow applications to have error messages sent
elsewhere. Error messages should also use libpq_gettext, and perhaps
be stored in conn->errorMessage as we do so for OOMs happening on
client-side and reporting them back even if they are not expected
(those are blocked PQsendQueryStart in your patch).

src/test/examples is a good idea to show people what this new API can
do, but this is never getting compiled. It could as well be possible
to include tests in src/test/modules/, in the same shape as what
postgres_fdw is doing by connecting to itself and link it to libpq. As
this patch complicates quote a lot fe-exec.c, I think that this would
be worth it. Thoughts?

I didn't think it added much complexity to fe-exec.c personally. A lot of the appeal is that it has very minor impact on anything that isn't using it.

I think it makes sense to (ab)use the recovery module tests for this, invoking the test program from there.

Ideally I'd like to teach pgsql and pg_restore how to use async mode, but that's a whole separate patch.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 23 August 2016 at 08:27, Craig Ringer <craig@2ndquadrant.com> wrote:
On 10 August 2016 at 14:44, Michael Paquier <michael.paquier@gmail.com> wrote:
 
I am looking a bit more seriously at this patch and assigned myself as
a reviewer.

Much appreciated.
 
 
macos complains here. You may want to replace %06lds by just %06d.

Yeah, or cast to a type known to be big enough. Will amend.

I used an upcast to (long), because on Linux it's a long. I don't see the point of messing about adding a configure test for something this trivial.
 
This patch generates a core dump, use for example pg_ctl start -w and
you'll bump into the trace above. There is something wrong with the
queue handling.

Huh. I didn't see that here (Fedora 23). I'll look more closely.
 
Do you have plans for a more generic structure for the command queue list?

No plans, no. This was a weekend experiment that turned into a useful patch and I'm having to scrape up time for it amongst much more important things like logical failover / sequence decoding and various other replication work.
 
Thanks for the docs review too, will amend.
 
+           fprintf(stderr, "internal error, COPY in batch mode");
+           abort();
I don't think that's a good idea.

My thinking there was that it's a "shouldn't happen" case. It's a problem in library logic. I'd use an Assert() here in the backend.

I could printfPQExpBuffer(...) an error and return failure instead if you think that's more appropriate. I'm not sure how the app would handle it correctly, but OTOH it's generally better for libraries not to call abort(). So I'll do that, but since it's an internal error that's not meant to happen I'll skip the gettext calls.
 
Error messages should also use libpq_gettext, and perhaps
be stored in conn->errorMessage as we do so for OOMs happening on
client-side and reporting them back even if they are not expected
(those are blocked PQsendQueryStart in your patch).

I didn't get that last part, re PQsendQueryStart.
 
src/test/examples is a good idea to show people what this new API can
do, but this is never getting compiled. It could as well be possible
to include tests in src/test/modules/, in the same shape as what
postgres_fdw is doing by connecting to itself and link it to libpq. As
this patch complicates quote a lot fe-exec.c, I think that this would
be worth it. Thoughts?

I think it makes sense to use the TAP framework. Added src/test/modules/test_libpq/ with a test for async mode. Others can be added/migrated based on that. I thought it made more sense for the tests to live there than in src/interfaces//libpq/ since they need test client programs and shouldn't pollute the library directory.

I've made the docs changes too. Thanks.

I fixed the list handling error. I'm amazed it appears to run fine, and without complaint from valgrind, here, since it was an accidentally _deleted_ line.

Re lists, I looked at simple_list.c and it's exceedingly primitive. Using it would mean more malloc()ing since we'll have a list cell and then a struct pointed to it, and won't recycle members, but... whatever. It's not going to matter a great deal. The reason I did it with an embedded list originally was because that's how it's done for PGnotify, but that's not exactly new code 

The bigger problem is that simple_list also uses pg_malloc, which won't set conn->errorMessage, it'll just fprintf() and exit(1). I'm not convinced it's appropriate to use that for libpq.

For now I've left list handling unchanged. If it's to move to a generic list, it should probably be one that knows how to use pg_malloc_extended(size, MCXT_ALLOC_NO_OOM) and emit its own libpq-error-handling-aware error. I'm not sure whether that list should use cell heads embedded in the structures it manages or pointing to them, either.

Updated patch attached.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Craig Ringer wrote:

> Updated patch attached.

Please find attached a couple fixes for typos I've came across in
the doc part.

Also it appears that PQqueriesInBatch() doesn't work as documented.
It says:
 "Returns the number of queries still in the queue for this batch"
but in fact it's implemented as a boolean:

+/* PQqueriesInBatch
+ *   Return true if there are queries currently pending in batch mode
+ */+int
+PQqueriesInBatch(PGconn *conn)
+{
+    if (!PQisInBatchMode(conn))
+        return false;
+
+    return conn->cmd_queue_head != NULL;
+}

However, is this function really needed? It doesn't seem essential to
the API. You don't call it in the test program either.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 6 September 2016 at 16:10, Daniel Verite <daniel@manitou-mail.org> wrote:
>         Craig Ringer wrote:
>
>> Updated patch attached.
>
> Please find attached a couple fixes for typos I've came across in
> the doc part.

Thanks, will apply and post a rebased patch soon, or if someone picks
this up in the mean time they can apply your diff on top of the patch.

> Also it appears that PQqueriesInBatch() doesn't work as documented.
> It says:
>  "Returns the number of queries still in the queue for this batch"
> but in fact it's implemented as a boolean:

Whoops. Will fix.

I think the function is useful and necessary. There's no reason not to
expose that, but also it's good for when your query dispatch isn't as
tightly coupled to your query handling as in the example, so your app
might need to keep processing until it sees the end of queued results.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 6 September 2016 at 16:10, Daniel Verite <daniel@manitou-mail.org> wrote:
>>         Craig Ringer wrote:
>>
>>> Updated patch attached.
>>
>> Please find attached a couple fixes for typos I've came across in
>> the doc part.
>
> Thanks, will apply and post a rebased patch soon, or if someone picks
> this up in the mean time they can apply your diff on top of the patch.

Could you send an updated patch then? At the same time I am noticing
that git --check is complaining... This patch has tests and a
well-documented feature, so I'll take a look at it soon at the top of
my list. Moved to next CF for now.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 3 October 2016 at 10:10, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 6 September 2016 at 16:10, Daniel Verite <daniel@manitou-mail.org> wrote:
>>>         Craig Ringer wrote:
>>>
>>>> Updated patch attached.
>>>
>>> Please find attached a couple fixes for typos I've came across in
>>> the doc part.
>>
>> Thanks, will apply and post a rebased patch soon, or if someone picks
>> this up in the mean time they can apply your diff on top of the patch.
>
> Could you send an updated patch then? At the same time I am noticing
> that git --check is complaining... This patch has tests and a
> well-documented feature, so I'll take a look at it soon at the top of
> my list. Moved to next CF for now.

Thanks.

I'd really like to teach psql in non-interactive mode to use it, but
(a) I'm concerned about possible subtle behaviour differences arising
if we do that and (b) I won't have the time. I think it's mostly of
interest to app authors and driver developers and that's what it's
aimed at. pg_restore could benefit a lot too.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Craig Ringer wrote:

> I think it's mostly of interest to app authors and driver developers
> and that's what it's aimed at. pg_restore could benefit a lot too.

Wouldn't pgbench benefit from it?
It was mentioned some time ago [1], in relationship to the
\into construct, how client-server latency was important enough to
justify the use of a "\;" separator between statements, to send them
as a group.

But with the libpq batch API, maybe this could be modernized
with meta-commands like this: \startbatch ... \endbatch
which would essentially call PQbeginBatchMode() and PQsendEndBatch().
Inside the batch section, collecting results would be interleaved
with sending queries.
Interdepencies between results and subsequent queries could
be handled or ignored, depending on how sophisticated we'd want
this.

This might also draw more users to the batch API, because
it would make it easier to check how exactly it affects
the performance of specific sequences of SQL statements to be
grouped in a batch.
For instance it would make sense for programmers to benchmark mock-ups
of their code with pgbench with/without batching, before embarking on
adapting it from blocking mode to asynchronous/non-blocking mode.


[1]
https://www.postgresql.org/message-id/alpine.DEB.2.20.1607140925400.1962%40sto

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
> Wouldn't pgbench benefit from it?
> It was mentioned some time ago [1], in relationship to the
> \into construct, how client-server latency was important enough to
> justify the use of a "\;" separator between statements, to send them
> as a group.
>
> But with the libpq batch API, maybe this could be modernized
> with meta-commands like this:
>   \startbatch
>   ...
>   \endbatch

Or just \batch [on|off], which sounds like a damn good idea to me for
some users willing to test some workloads before integrating it in an
application.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
<p dir="ltr"><p dir="ltr">On 4 Oct. 2016 15:15, "Michael Paquier" <<a
href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>>wrote:<br /> ><br /> > On Mon, Oct 3,
2016at 11:52 PM, Daniel Verite <<a href="mailto:daniel@manitou-mail.org">daniel@manitou-mail.org</a>> wrote:<br
/>> > Wouldn't pgbench benefit from it?<br /> > > It was mentioned some time ago [1], in relationship to
the<br/> > > \into construct, how client-server latency was important enough to<br /> > > justify the use
ofa "\;" separator between statements, to send them<br /> > > as a group.<br /> > ><br /> > > But
withthe libpq batch API, maybe this could be modernized<br /> > > with meta-commands like this:<br /> > > 
 \startbatch<br/> > >   ...<br /> > >   \endbatch<br /> ><br /> > Or just \batch [on|off], which
soundslike a damn good idea to me for<br /> > some users willing to test some workloads before integrating it in
an<br/> > application.<p dir="ltr">A batch jsnt necessarily terminated by a commit, so I'm more keen on start/end
batch.It's more in line with begin/commit. Batch is not only a mode, you also have to delineate batches.<br /> 

Re: PATCH: Batch/pipelining support for libpq

From
Gavin Flower
Date:
On 04/10/16 20:15, Michael Paquier wrote:
> On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
>> Wouldn't pgbench benefit from it?
>> It was mentioned some time ago [1], in relationship to the
>> \into construct, how client-server latency was important enough to
>> justify the use of a "\;" separator between statements, to send them
>> as a group.
>>
>> But with the libpq batch API, maybe this could be modernized
>> with meta-commands like this:
>>    \startbatch
>>    ...
>>    \endbatch
> Or just \batch [on|off], which sounds like a damn good idea to me for
> some users willing to test some workloads before integrating it in an
> application.

+1

'\batch' is a bit easier, to find, & to remember than '\startbatch'




Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Mon, Oct 3, 2016 at 12:48 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 October 2016 at 10:10, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Tue, Sep 6, 2016 at 8:01 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> On 6 September 2016 at 16:10, Daniel Verite <daniel@manitou-mail.org> wrote:
>>>>         Craig Ringer wrote:
>>>>
>>>>> Updated patch attached.
>>>>
>>>> Please find attached a couple fixes for typos I've came across in
>>>> the doc part.
>>>
>>> Thanks, will apply and post a rebased patch soon, or if someone picks
>>> this up in the mean time they can apply your diff on top of the patch.
>>
>> Could you send an updated patch then? At the same time I am noticing
>> that git --check is complaining... This patch has tests and a
>> well-documented feature, so I'll take a look at it soon at the top of
>> my list. Moved to next CF for now.
>
> Thanks.
>
> I'd really like to teach psql in non-interactive mode to use it, but
> (a) I'm concerned about possible subtle behaviour differences arising
> if we do that and (b) I won't have the time. I think it's mostly of
> interest to app authors and driver developers and that's what it's
> aimed at. pg_restore could benefit a lot too.

Looking at it now... The most interesting comments are first.

I wanted to congratulate you. I barely see a patch with this level of
details done within the first versions. Anybody can review the patch
just by looking at the code and especially the docs without looking at
the thread. There are even tests to show what this does, for the
client.

+PQisInBatchMode           172
+PQqueriesInBatch          173
+PQbeginBatchMode          174
+PQendBatchMode            175
+PQsendEndBatch            176
+PQgetNextQuery            177
+PQbatchIsAborted          178
This set of routines is a bit inconsistent. Why not just prefixing
them with PQbatch? Like that for example:
PQbatchStatus(): consists of disabled/inactive/none, active, error.
This covers both PQbatchIsAborted() and PQisInBatchMode().
PQbatchBegin()
PQbatchEnd()
PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to
add and process a sync message into the queue.
PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,
-1 on failure
PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on
failure (OOM)

+   <para>
+    Much like asynchronous query mode, there is no performance disadvantage to
+    using batching and pipelining. It somewhat increased client application
+    complexity and extra caution is required to prevent client/server network
+    deadlocks, but can offer considerable performance improvements.
+   </para>
I would reword that a bit "it increases client application complexity
and extra caution is required to prevent client/server deadlocks but
offers considerable performance improvements".

+    ("ping time") is high, and when many small operations are being
performed in
Nit: should use <quote> here. Still not quoting it would be fine.

+     After entering batch mode the application dispatches requests
+     using normal asynchronous <application>libpq</application> functions like
+     <function>PQsendQueryParams</function>,
<function>PQsendPrepare</function>,
+     etc. The asynchronous requests are followed by a <link
It may be better to list all the functions here, PQSendQuery
 *   query work remains or an error has occurred (e.g. out of *   memory). */
-PGresult *PQgetResult(PGconn *conn)
Some noise in the patch.

git diff --check complains:               usage_exit(argv[0]);
warning: 1 line adds whitespace errors.

+++ b/src/interfaces/libpq/t/001_libpq_async.pl
@@ -0,0 +1,15 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+
+my $port = $node->port;
+
+$node->stop('fast');
Er... This does nothing..

The changes in src/test/examples/ are not necessary anymore. You moved
all the tests to test_libpq (for the best actually).

+   while (queue != NULL)
+   {
+       PGcommandQueueEntry *prev = queue;
+       queue = queue->next;
+       free(prev);
+   }
This should free prev->query.

/* blah comment
* blah2 comment
*/
A lot of comment blocks are like that, those should be reformated.

Running directly make check in src/test/modules/test_libpq/ does not work:
# Postmaster PID for node "main" is 10225
# Running: testlibpqbatch dbname=postgres 10000 disallowed_in_batch
Command 'testlibpqbatch' not found in [...PATH list ...]
The problem is that testlibpqbatch is not getting compiled but I think
it should.

+ * testlibpqbatch.c
+ *     Test of batch execution funtionality
[...]
+               fprintf(stderr, "%s is not a recognised test name\n", argv[3]);
s/funtionality/functionality/
s/recognised/recognized/

+   if (!PQsendQueryParams(conn, "SELECT $1", 1, dummy_param_oids,
+                          dummy_params, NULL, NULL, 0))
+   {
+       fprintf(stderr, "dispatching first SELECT failed: %s\n",
PQerrorMessage(conn));
+       goto fail;
+   }
+
+   if (!PQsendEndBatch(conn))
+   {
+       fprintf(stderr, "Ending first batch failed: %s\n",
PQerrorMessage(conn));
+       goto fail;
+   }
+
+   if (!PQsendQueryParams(conn, "SELECT $1", 1, dummy_param_oids,
+                          dummy_params, NULL, NULL, 0))
+   {
+       fprintf(stderr, "dispatching second SELECT failed: %s\n",
PQerrorMessage(conn));
+       goto fail;
+   }
May be better to use a loop here and set in the queue a bunch of queries..

You could just remove the VERBOSE flag in the tests, having a test
more talkative is always better.

+    <para>
+     The batch API was introduced in PostgreSQL 9.6, but clients using it can
+     use batches on server versions 8.4 and newer. Batching works on any server
+     that supports the v3 extended query protocol.
+    </para>
Postgres 10, not 9.6.

It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an
error for the new routines.

+int
+PQgetNextQuery(PGconn *conn)
[...]
+       return false;
The new routines had better return explicitly 0 or 1, not a boolean
for consistency with the rest.

+      for processing. Returns 0 and has no effect if there are no query results
+      pending, batch mode is not enabled, or if the query currently processed
+      is incomplete or still has pending results. See <link
It would be useful for the user to make a difference between those
different statuses. As of your patch, PQgetNextQuery() returns false/0
in all those cases so it is a bit hard to know what's going on. Maybe
PQgetNextQuery() could be renamed to PQbatchGetNext, PQbatchQueueNext
to outline the fact that it is beginning the next query in the queue.
This helps also to understand that this can just be used with the
batch mode. Actually no, I am wrong. It is possible to guess each one
of those errors with respectively PQgetResult == NULL,
PQisInBatchMode() and PQqueriesInBatch(). Definitely it should be
mentioned in the docs that it is possible to make a difference between
all those states.

+   entry = PQmakePipelinedCommand(conn);
+   entry->queryclass = PGQUERY_SYNC;
+   entry->query = NULL;
PQmakePipelinedCommand() returns NULL, and boom.

+   bool        in_batch;       /* connection is in batch (pipelined) mode */
+   bool        batch_aborted;  /* current batch is aborted,
discarding until next Sync */
Having only one flag would be fine. batch_aborted is set of used only
when in_batch is used, so both have a strong link.
   /* OK, it's launched! */
-   conn->asyncStatus = PGASYNC_BUSY;
+   if (conn->in_batch)
+       PQappendPipelinedCommand(conn, pipeCmd);
+   else
+       conn->asyncStatus = PGASYNC_BUSY;
No, it is put in the queue.

+      <para>
+      Returns the number of queries still in the queue for this batch, not
+      including any query that's currently having results being processsed.
+      This is the number of times <function>PQgetNextQuery</function> has to be
+      called before the query queue is empty again.
+
+<synopsis>
+int PQqueriesInBatch(PGconn *conn);
+</synopsis>
This is not true. It does not return a count, just 0 or 1.

It may be a good idea to add a test for COPY and trigger a failure.

If I read the code correctly, it seems to me that it is possible to
enable the batch mode, and then to use PQexec(), PQsendPrepare will
just happily process queue the command. Shouldn't PQexec() be
prevented in batch mode? Similar remark for PQexecParams(),
PQexecPrepared() PQdescribePrepared and PQprepare(). In short
everything calling PQexecFinish().

I haven't run the tests directly on Windows wiht MSVC, but they won't
run as vcregress modulescheck lacks knowledge about modules using TAP.
I cannot blame you for that..
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Shay Rojansky
Date:
<div dir="ltr"><div class="gmail_extra">Hi all. I thought I'd share some experience from Npgsql regarding
batching/pipelining- hope this isn't off-topic.</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Npgsqlhas supported batching for quite a while, similar to what this patch proposes - with a single
Syncmessage is sent at the end.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">It has recently come
tomy attention that this implementation is problematic because it forces the batch to occur within a transaction; in
otherwords, there's no option for a non-transactional batch. This can be a problem for several reasons: users may want
tosent off a batch of inserts, not caring whether one of them fails (e.g. because of a unique constraint violation). In
otherwords, in some scenarios it may be appropriate for later batched statements to be executed when an earlier batched
statementraised an error. If Sync is only sent at the very end, this isn't possible. Another example of a problem
(whichactually happened) is that transactions acquire row-level locks, and so may trigger deadlocks if two different
batchesupdate the same rows in reverse order. Both of these issues wouldn't occur if the batch weren't implicitly
batched.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">My current plan is to modify the batch
implementationbased on whether we're in an (explicit) transaction or not. If we're in a transaction, then it makes
perfectsense to send a single Sync at the end as is being proposed here - any failure would cause the transaction to
failanyway, so skipping all subsequent statements until the batch's end makes sense. However, if we're not in an
explicittransaction, I plan to insert a Sync message after each individual Execute, making non-transactional batched
statementsmore or less identical in behavior to non-transactional unbatched statements. Note that this mean that a
batchcan generate multiple errors, not just one.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">I'm
sharingthis since it may be relevant to the libpq batching implementation as well, and also to get any feedback
regardinghow Npgsql should act.</div></div> 

Re: PATCH: Batch/pipelining support for libpq

From
Jim Nasby
Date:
On 10/4/16 11:54 PM, Michael Paquier wrote:
> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It somewhat increased client application
> +    complexity and extra caution is required to prevent client/server network
> +    deadlocks, but can offer considerable performance improvements.
> +   </para>
> I would reword that a bit "it increases client application complexity
> and extra caution is required to prevent client/server deadlocks but
> offers considerable performance improvements".

Unrelated, but another doc bug, on line 4647:
 +     The batch API was introduced in PostgreSQL 9.6, but clients 
using it can

That should read 10.0 (or just 10?)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 12 October 2016 at 19:51, Shay Rojansky <roji@roji.org> wrote:
> Hi all. I thought I'd share some experience from Npgsql regarding
> batching/pipelining - hope this isn't off-topic.

Not at all.

> Npgsql has supported batching for quite a while, similar to what this patch
> proposes - with a single Sync message is sent at the end.

PgJDBC does too. The benefits of it there are what prompted me to do
this, not only so libpq users can use it directly but so psqlODBC,
psycopg2, etc can benefit from it if they choose to expose it.

> It has recently come to my attention that this implementation is problematic
> because it forces the batch to occur within a transaction; in other words,
> there's no option for a non-transactional batch.

That's not strictly the case. If you explicitly BEGIN and COMMIT,
those operations are honoured within the batch.

What's missing is implicit transactions. Usually if you send a series
of messages they will each get their own implicit transaction. If you
batch them, the whole lot gets an implicit transaction. This is
because the PostgreSQL v3 protocol conflates transaction delineation
with protocol error recovery into a single Sync message type.

It's very similar to the behaviour of multi-statements, where:

psql -c "CREATE TABLE fred(x integer); DROP TABLE notexists;"

doesn't create "fred", but

psql -c "BEGIN; CREATE TABLE fred(x integer); COMMIT; BEGIN; DROP
TABLE notexists; COMMMIT;"

... does. And in fact it's for almost the same reason. They're sent as
a single SimpleQuery message by psql and split up client side, but the
effect is the same as two separate Query messages followed by a Sync.

It isn't simple to manage this client-side, because libpq doesn't know
whether a given command string may contain transaction delimiting
statements or not and can't reliably look for them without client-side
parsing of the SQL. So it can't dispatch its own BEGIN/COMMIT around
statements in a batch that it thinks might be intended to be
autocommit, and anyway that'd result in sending 3 queries for every 1
client query, which would suck.

If the mythical v4 protocol ever happens I'd want to split Sync into
two messages, one which is a protocol synchronisation message and
another that is a transaction delimiter. Or give it flags or whatever.

In the mean time:

> This can be a problem for
> several reasons: users may want to sent off a batch of inserts, not caring
> whether one of them fails (e.g. because of a unique constraint violation).
> In other words, in some scenarios it may be appropriate for later batched
> statements to be executed when an earlier batched statement raised an error.
> If Sync is only sent at the very end, this isn't possible.

Right, and that remains the case even with explicit transaction
delineation, because the first ERROR causes processing of all
subsequent messages to be skipped.

The design I have in libpq allows for this by allowing the client to
delimit batches without ending batch mode, concurrently consuming a
stream of multiple batches. Each endbatch is a Sync. So a client that
wants autocommit-like behavour can send a series of 1-query batches.

I think I'll need to document this a bit better since it's more subtle
than I properly explained.

> Another example
> of a problem (which actually happened) is that transactions acquire
> row-level locks, and so may trigger deadlocks if two different batches
> update the same rows in reverse order. Both of these issues wouldn't occur
> if the batch weren't implicitly batched.

Same solution as above.

> My current plan is to modify the batch implementation based on whether we're
> in an (explicit) transaction or not. If we're in a transaction, then it
> makes perfect sense to send a single Sync at the end as is being proposed
> here - any failure would cause the transaction to fail anyway, so skipping
> all subsequent statements until the batch's end makes sense. However, if
> we're not in an explicit transaction, I plan to insert a Sync message after
> each individual Execute, making non-transactional batched statements more or
> less identical in behavior to non-transactional unbatched statements. Note
> that this mean that a batch can generate multiple errors, not just one.

Yes, that's what I suggest, and basically what the libpq batch
interface does, though it expects the client to deal with the
transaction boundaries.

You will need to think hard about transaction boundaries as they
relate to multi-statements unless nPgSQL parses out each statement
from multi-statement strings like PgJDBC does. Otherwise a user can
sneak in:

somestatement; BEGIN; someotherstatement;

or

somestatement; CoMMiT; otherstatement;

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Shay Rojansky
Date:
> It has recently come to my attention that this implementation is problematic
> because it forces the batch to occur within a transaction; in other words,
> there's no option for a non-transactional batch.

That's not strictly the case. If you explicitly BEGIN and COMMIT,
those operations are honoured within the batch.

I wasn't precise in my formulation (although I think we understand each other)... The problem I'm trying to address here is the fact that in the "usual" batching implementation (i.e. where a single Sync message is sent at the end of the batch), there's no support for batches which have no transactions whatsoever (i.e. where each statement is auto-committed and errors in earlier statements don't trigger skipping of later statements).

What's missing is implicit transactions. Usually if you send a series
of messages they will each get their own implicit transaction. If you
batch them, the whole lot gets an implicit transaction. This is
because the PostgreSQL v3 protocol conflates transaction delineation
with protocol error recovery into a single Sync message type.

That's right. I sent a message complaining about this conflation a while ago: https://www.postgresql.org/message-id/CADT4RqDdo9EcFbxwB_YO2H3BVZ0t-1qqZ%3D%2B%2BdVMnYaN6BpyUGQ%40mail.gmail.com. There weren't any responses, although I'll add a note on the wiki on this as a requested feature for the v4 protocol.

If the mythical v4 protocol ever happens I'd want to split Sync into
two messages, one which is a protocol synchronisation message and
another that is a transaction delimiter. Or give it flags or whatever.

Totally agree.
 
In the mean time:

> This can be a problem for
> several reasons: users may want to sent off a batch of inserts, not caring
> whether one of them fails (e.g. because of a unique constraint violation).
> In other words, in some scenarios it may be appropriate for later batched
> statements to be executed when an earlier batched statement raised an error.
> If Sync is only sent at the very end, this isn't possible.

Right, and that remains the case even with explicit transaction
delineation, because the first ERROR causes processing of all
subsequent messages to be skipped.

The design I have in libpq allows for this by allowing the client to
delimit batches without ending batch mode, concurrently consuming a
stream of multiple batches. Each endbatch is a Sync. So a client that
wants autocommit-like behavour can send a series of 1-query batches. 

I think I'll need to document this a bit better since it's more subtle
than I properly explained.

Ah, I see. libpq's API is considerably more low-level than what Npgsql needs to provide. If I understand correctly, you allow users to specify exactly where to insert Sync messages (if at all), so that any number of statements arbitrarily interleaved with Sync messages can be sent without starting to read any results. If so, then the user indeed has everything they need to control the exact transactional behavior they want (including full auto-commit) without compromising on performance in any way (i.e. by increasing roundtrips).

The only minor problem I can see is that PQsendEndBatch not only adds a Sync message to the buffer, but also flushes it. This means that you may be forcing users to needlessly flush the buffer just because they wanted to insert a Sync. In other words, users can't send the following messages in a single buffer/packet: Prepare1/Bind1/Describe1/Execute1/Sync1/Prepare2/Bind2/Describe2/Execute2/Sync2 - they have to be split into different packets. Of course, this is a relatively minor performance issue (especially when compared to the overall performance benefits provided by batching), and providing an API distinction between adding a Sync and flushing the buffer may over-complicate the API. I just thought I'd mention it.
 
Yes, that's what I suggest, and basically what the libpq batch
interface does, though it expects the client to deal with the
transaction boundaries.

You will need to think hard about transaction boundaries as they
relate to multi-statements unless nPgSQL parses out each statement
from multi-statement strings like PgJDBC does. Otherwise a user can
sneak in:

somestatement; BEGIN; someotherstatement;

or

somestatement; CoMMiT; otherstatement;

That's a good point. I definitely don't want to depend on client-side parsing of SQL in any way (in fact a planned feature is to allow using Npgsql without any sort of client-side parsing/manipulation of SQL).  However, the fact that BEGIN/COMMIT can be sent in batches doesn't appear too problematic to me.

When it's about to send a batch, Npgsql knows whether it's in an (explicit) transaction or not (by examining the transaction indicator on the last ReadyForQuery message it received). If it's not in an (explicit) transaction, it automatically inserts a Sync message after every Execute. If some statement happens to be a BEGIN, it will be executed as a normal statement and so on. The only issue is that if an error occurs after a sneaked-in BEGIN, all subsequent statements will fail, and all have the Sync messages Npgsql inserted. The end result will be a series of errors that will be raised up to the user, but this isn't fundamentally different from the case of a simple auto-commit batch containing multiple failures (because of unique constraint violation or whatever) - multiple errors is something that will have to be handled in any case.

Thanks for all your comments. Npgsql's support of batches needs to be more complicated than libpq's since it's a more high-level interface - whereas libpq offloads some of the sending/processing complexity to the user, Npgsql needs to take care of most of it internally (another good example is deadlock avoidance...).

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 14 October 2016 at 18:09, Shay Rojansky <roji@roji.org> wrote:
>> > It has recently come to my attention that this implementation is
>> > problematic
>> > because it forces the batch to occur within a transaction; in other
>> > words,
>> > there's no option for a non-transactional batch.
>>
>> That's not strictly the case. If you explicitly BEGIN and COMMIT,
>> those operations are honoured within the batch.
>
>
> I wasn't precise in my formulation (although I think we understand each
> other)... The problem I'm trying to address here is the fact that in the
> "usual" batching implementation (i.e. where a single Sync message is sent at
> the end of the batch), there's no support for batches which have no
> transactions whatsoever (i.e. where each statement is auto-committed and
> errors in earlier statements don't trigger skipping of later statements).

Right, you can't use implicit transactions delimited by protocol
message boundaries when batching.

>> The design I have in libpq allows for this by allowing the client to
>> delimit batches without ending batch mode, concurrently consuming a
>> stream of multiple batches. Each endbatch is a Sync. So a client that
>> wants autocommit-like behavour can send a series of 1-query batches.
>
> Ah, I see. libpq's API is considerably more low-level than what Npgsql needs
> to provide. If I understand correctly, you allow users to specify exactly
> where to insert Sync messages (if at all)

They don't get total control, but they can cause a Sync to be emitted
after any given statement when in batch mode.

> so that any number of statements
> arbitrarily interleaved with Sync messages can be sent without starting to
> read any results.

Right.

> if so, then the user indeed has everything they need to
> control the exact transactional behavior they want (including full
> auto-commit) without compromising on performance in any way (i.e. by
> increasing roundtrips).

Right.

> The only minor problem I can see is that PQsendEndBatch not only adds a Sync
> message to the buffer, but also flushes it.

It only does a non-blocking flush.

> This means that you may be
> forcing users to needlessly flush the buffer just because they wanted to
> insert a Sync. In other words, users can't send the following messages in a
> single buffer/packet:
> Prepare1/Bind1/Describe1/Execute1/Sync1/Prepare2/Bind2/Describe2/Execute2/Sync2
> - they have to be split into different packets.

Oh, right. That's true, but I'm not really sure we care. I suspect
that the OS will tend to coalesce them anyway, since we're not
actually waiting until the socket sends the message. At least when the
socket isn't able to send as fast as input comes in. It might matter
for local performance in incredibly high throughput applications, but
I suspect there will be a great many other things that come first.

Anyway, the client can already control this with TCP_CORK.

>  Of course, this is a
> relatively minor performance issue (especially when compared to the overall
> performance benefits provided by batching), and providing an API distinction
> between adding a Sync and flushing the buffer may over-complicate the API. I
> just thought I'd mention it.

Unnecessary IMO. If we really want to add it later we'd probably do so
by setting a "no flush on endbatch" mode and adding an explicit flush
call. But I expect TCP_CORK will satisfy all realistic needs.

> That's a good point. I definitely don't want to depend on client-side
> parsing of SQL in any way (in fact a planned feature is to allow using
> Npgsql without any sort of client-side parsing/manipulation of SQL).
> However, the fact that BEGIN/COMMIT can be sent in batches doesn't appear
> too problematic to me.
>
> When it's about to send a batch, Npgsql knows whether it's in an (explicit)
> transaction or not (by examining the transaction indicator on the last
> ReadyForQuery message it received). If it's not in an (explicit)
> transaction, it automatically inserts a Sync message after every Execute. If
> some statement happens to be a BEGIN, it will be executed as a normal
> statement and so on. The only issue is that if an error occurs after a
> sneaked-in BEGIN, all subsequent statements will fail, and all have the Sync
> messages Npgsql inserted. The end result will be a series of errors that
> will be raised up to the user, but this isn't fundamentally different from
> the case of a simple auto-commit batch containing multiple failures (because
> of unique constraint violation or whatever) - multiple errors is something
> that will have to be handled in any case.

I'm a bit hesitant about how this will interact with multi-statements
containing embedded BEGIN or COMMIT, where a single protocol message
contains multiple ; delimited queries. But at least at this time of
night I can't give a concrete problematic example.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Shay Rojansky
Date:
>  Of course, this is a
> relatively minor performance issue (especially when compared to the overall
> performance benefits provided by batching), and providing an API distinction
> between adding a Sync and flushing the buffer may over-complicate the API. I
> just thought I'd mention it.

Unnecessary IMO. If we really want to add it later we'd probably do so
by setting a "no flush on endbatch" mode and adding an explicit flush
call. But I expect TCP_CORK will satisfy all realistic needs.

Unless I'm mistaken TCP_CORK is not necessarily going to work across all platforms (e.g. Windows), although SO_LINGER (which is more standard) also helps here.

> When it's about to send a batch, Npgsql knows whether it's in an (explicit)
> transaction or not (by examining the transaction indicator on the last
> ReadyForQuery message it received). If it's not in an (explicit)
> transaction, it automatically inserts a Sync message after every Execute. If
> some statement happens to be a BEGIN, it will be executed as a normal
> statement and so on. The only issue is that if an error occurs after a
> sneaked-in BEGIN, all subsequent statements will fail, and all have the Sync
> messages Npgsql inserted. The end result will be a series of errors that
> will be raised up to the user, but this isn't fundamentally different from
> the case of a simple auto-commit batch containing multiple failures (because
> of unique constraint violation or whatever) - multiple errors is something
> that will have to be handled in any case.

I'm a bit hesitant about how this will interact with multi-statements
containing embedded BEGIN or COMMIT, where a single protocol message
contains multiple ; delimited queries. But at least at this time of
night I can't give a concrete problematic example.

Unless I'm mistaken, in the extended protocol you can't combine multiple ; delimited queries in a single Parse - that's a feature supported only by the Query message of the simple protocol. But then, if you're in the simple protocol Sync doesn't play any role, does it? I mean, a Query message behaves as though it's implicitly followed by a Sync message - an error in a previous Query message doesn't cause later messages to be skipped...

Note that Npgsql only rarely uses the simple protocol for user messages. Npgsql is a binary-only driver, and the simple protocol doesn't support requesting binary results. So only the extended protocol is used, except for some edge cases where it's possible to use the simple protocol for efficiency - statements with no parameters and where the ExecuteNonQuery API is used (i.e. the user won't access any results).

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 14 October 2016 at 22:15, Shay Rojansky <roji@roji.org> wrote:

> Unless I'm mistaken TCP_CORK is not necessarily going to work across all
> platforms (e.g. Windows), although SO_LINGER (which is more standard) also
> helps here.

Yeah, true.

You can also twiddle TCP_NODELAY on and off on other platforms.

Anyway, my point is that it's not likely to be crucial, especially
since even without socket options the host will often do packet
combining if the queue isn't empty.

> Unless I'm mistaken, in the extended protocol you can't combine multiple ;
> delimited queries in a single Parse - that's a feature supported only by the
> Query message of the simple protocol. But then, if you're in the simple
> protocol Sync doesn't play any role, does it? I mean, a Query message
> behaves as though it's implicitly followed by a Sync message - an error in a
> previous Query message doesn't cause later messages to be skipped.

Right, good point. So that concern isn't relevant.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
"Tsunakawa, Takayuki"
Date:
Hello, Craig,

I'm sorry to be late to review your patch.  I've just been able to read the HTML doc first.  Can I get the latest
.patchfile for reading and running the code?
 

Here are some comments and questions.  I tried to avoid the same point as other reviewers, but there may be an
overlap.


(1)
The example    UPDATE mytable SET x = x + 1;
should be    UPDATE mytable SET x = x + 1 WHERE id = 42;


(2)
"The server usually begins executing the batch before all commands in the batch are queued and the end of batch command
issent."
 

Does this mean that the app developer cannot control or predict how many TCP transmissions a batch is sent with?  For
example,if I want to insert 10 rows into a table in bulk, can I send those 10 rows (and the end of batch command)
efficientlyin one TCP transmission, or are they split by libpq into multiple TCP transmissions?
 


(3)
"To avoid deadlocks on large batches the client should be structured around a nonblocking I/O loop using a function
likeselect, poll, epoll, WaitForMultipleObjectEx, etc."
 

Can't we use some (new) platform-independent API instead of using poll() or WaitForMultipleObject()?  e.g. some thin
wrapperaround pqWait().  It seems a bit burdonsome to have to use an OS-specific API to just wait for libpq.  Apart
fromthat, it does not seem possible to wait for the socket in 64-bit apps on Windows, because SOCKET is 64-bit while
PQsocket()returns int.
 

[winsock2.h]
/** The new type to be used in all* instances which refer to sockets.*/
typedef UINT_PTR        SOCKET;

Regards
Takayuki Tsunakawa


Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 18 November 2016 at 14:04, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> Hello, Craig,
>
> I'm sorry to be late to review your patch.  I've just been able to read the HTML doc first.  Can I get the latest
.patchfile for reading and running the code? 

The latest is what's attached upthread and what's in the git repo also
referenced upthread.

I haven't been able to update in response to more recent review due to
other development commitments. At this point I doubt I'll be able to
get update it again in time for v10, so anyone who wants to adopt it
is welcome.

> Here are some comments and questions.  I tried to avoid the same point as other reviewers, but there may be an
overlap.
>
>
> (1)
> The example
>      UPDATE mytable SET x = x + 1;
> should be
>      UPDATE mytable SET x = x + 1 WHERE id = 42;

Good catch.

> (2)
> "The server usually begins executing the batch before all commands in the batch are queued and the end of batch
commandis sent." 
>
> Does this mean that the app developer cannot control or predict how many TCP transmissions a batch is sent with?

That's not what that sentence means since the TCP layer is much lower
level, but what you say is also true.

All the docs are saying there is that there's no explicit control over
when we start sending the batch to the server. How that translates to
individual TCP packets, etc, is not its problem.

>  For example, if I want to insert 10 rows into a table in bulk, can I send those 10 rows (and the end of batch
command)efficiently in one TCP transmission, or are they split by libpq into multiple TCP transmissions? 

libpq neither knows nor cares about individual TCP packets. It sends
things to the kernel and lets the kernel deal with that.

That said, you can use socket options TCP_CORK and TCP_NODELAY to get
some control over how and when data is sent. If you know you're about
to send more, you might cork the socket to give the kernel a hint that
it should expect more data to send.

> (3)
> "To avoid deadlocks on large batches the client should be structured around a nonblocking I/O loop using a function
likeselect, poll, epoll, WaitForMultipleObjectEx, etc." 
>
> Can't we use some (new) platform-independent API instead of using poll() or WaitForMultipleObject()?  e.g. some thin
wrapperaround pqWait().  It seems a bit burdonsome to have to use an OS-specific API to just wait for libpq.  Apart
fromthat, it does not seem possible to wait for the socket in 64-bit apps on Windows, because SOCKET is 64-bit while
PQsocket()returns int. 

IMO this problem is out of scope for this patch. A wait abstraction
might be nice, but next thing we know we'll be reinventing APR or
NSPR, I think that's a totally different problem.

Not being able to get a win32 SOCKET from libpq seems like a bit of an
oversight, and it'd definitely be good to address that to make async
mode more usable on Windows. There's some other ugliness in PQsocket
already per the comments there. I think it should be a separate patch,
though.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Haribabu Kommi
Date:

On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
The latest is what's attached upthread and what's in the git repo also
referenced upthread.

I haven't been able to update in response to more recent review due to
other development commitments. At this point I doubt I'll be able to
get update it again in time for v10, so anyone who wants to adopt it
is welcome.

Currently patch status is marked as "returned with feedback" in the 11-2016 
commitfest. Anyone who wants to work on it can submit the updated patch
by taking care of all review comments and change the status of the patch
at any time.

Thanks for the patch.
 
Regards,
Hari Babu
Fujitsu Australia

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 22 November 2016 at 15:14, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
> On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>
>> The latest is what's attached upthread and what's in the git repo also
>> referenced upthread.
>>
>> I haven't been able to update in response to more recent review due to
>> other development commitments. At this point I doubt I'll be able to
>> get update it again in time for v10, so anyone who wants to adopt it
>> is welcome.
>
>
> Currently patch status is marked as "returned with feedback" in the 11-2016
> commitfest. Anyone who wants to work on it can submit the updated patch
> by taking care of all review comments and change the status of the patch
> at any time.
>
> Thanks for the patch.

Thanks. Sorry I haven't had time to work on it. Priorities.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Iwata, Aya"
Date:

Hi,

 

On 18 November 2016 at 08:18, Craig Ringer wrote:

>At this point I doubt I'll be able to

>get update it again in time for v10, so anyone who wants to adopt it

>is welcome.

I am interested in pipeline/batch support for ECPG, and found this thread.

I updated Craig's patch [1] to apply this one to HEAD. Moreover, I fixed an easy typo.

 

First, I'm changing PQqueriesInBatch() to work as documented.

After that, I plan to reflect contents of reviews in the patch.

 

On Mon, Oct 3, 2016 at 11:52 PM, Daniel Verite wrote:

> Wouldn't pgbench benefit from it?

> It was mentioned some time ago [1], in relationship to the

> \into construct, how client-server latency was important enough to

> justify the use of a "\;" separator between statements, to send them

> as a group.

> But with the libpq batch API, maybe this could be modernized

> with meta-commands like this:

>   \startbatch

>   ...

>   \endbatch

I'm planning to work on meta-commands to support batch mode after committing this patch successfully.

 

 

[1]https://github.com/2ndQuadrant/postgres/tree/dev/libpq-async-batch

Regards,

Aya Iwata

FUJITSU

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Prabakaran, Vaishnavi"
Date:
On 22 November 2016 at 18:32, Craig Ringer<craig@2ndquadrant.com> wrote:
> On 22 November 2016 at 15:14, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
> >
> > On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer <craig@2ndquadrant.com>
> wrote:
> >>
> >> The latest is what's attached upthread and what's in the git repo
> >> also referenced upthread.
> >>
> >> I haven't been able to update in response to more recent review due
> >> to other development commitments. At this point I doubt I'll be able
> >> to get update it again in time for v10, so anyone who wants to adopt
> >> it is welcome.
> >
> >
> > Currently patch status is marked as "returned with feedback" in the
> > 11-2016 commitfest. Anyone who wants to work on it can submit the
> > updated patch by taking care of all review comments and change the
> > status of the patch at any time.
> >
> > Thanks for the patch.
>
> Thanks. Sorry I haven't had time to work on it. Priorities.
Hi,
I am interested in this patch and addressed various below comments from reviewers. And, I have separated out code and
testmodule into 2 patches. So, If needed, test patch can be enhanced more, meanwhile code patch can be committed.
 

>Renaming and refactoring new APIs
> +PQisInBatchMode           172
>+PQqueriesInBatch          173
>+PQbeginBatchMode          174
>+PQendBatchMode            175
>+PQsendEndBatch            176
>+PQgetNextQuery            177
>+PQbatchIsAborted          178
>This set of routines is a bit inconsistent. Why not just prefixing them with PQbatch? Like that for example:
> PQbatchStatus(): consists of disabled/inactive/none, active, error. This covers both PQbatchIsAborted() and
PQisInBatchMode().
>PQbatchBegin()
>PQbatchEnd()
>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and process a sync message into the queue.

Renamed and modified batch status APIs as below
PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus
PQqueriesInBatch ==> PQbatchQueueCount
PQbeginBatchMode ==> PQbatchBegin
PQendBatchMode ==> PQbatchEnd
PQsendEndBatch ==> PQbatchQueueSync
PQgetNextQuery ==> PQbatchQueueProcess

>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on failure
>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on failure (OOM)
I think it is still ok to keep the current behaviour like other ones present in the same file. E.g:"PQsendPrepare"
"PQsendQueryGuts"

>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented.
>It says:
>"Returns the number of queries still in the queue for this batch"
>but in fact it's implemented as a Boolean.
Modified the logic to count number of entries in pending queue and return the count

>The changes in src/test/examples/ are not necessary anymore. You moved all the tests to test_libpq (for the best
actually).
Removed these unnecessary changes from src/test/examples folder and corrected the path mentioned in comments section of
testlibpqbatch.c

> +   while (queue != NULL)
>+  {
>       PGcommandQueueEntry *prev = queue;
>+       queue = queue->next;
>+       free(prev);
>+   }
>This should free prev->query.
Both prev->query and prev is freed. Also, this applies to "cmd_queue_recycle" too.

>Running directly make check in src/test/modules/test_libpq/ does not work
Modified "check" rule in makefile

>You could just remove the VERBOSE flag in the tests, having a test more talkative is always better.
Removed ifdef VERBOSE checks.

>But with the libpq batch API, maybe this could be modernized
>with meta-commands like this:
>\startbatch
>...
>\endbatch
I think it is a separate patch candidate.

> It is possible to guess each one of those errors(occurred in PQbatchQueueProcess API) with respectively
> PQgetResult == NULL, PQisInBatchMode() and PQqueriesInBatch().
> Definitely it should be mentioned in the docs that it is possible to make a difference between all those states.
Updated documentation section of PQbatchQueueProcess() with these details.

> +   entry = PQmakePipelinedCommand(conn);
>+   entry->queryclass = PGQUERY_SYNC;
>+   entry->query = NULL;
>PQmakePipelinedCommand() returns NULL, and boom.
Corrected to return false if PQmakePipelinedCommand() returns NULL.

> +   bool        in_batch;       /* connection is in batch (pipelined) mode */
>+   bool        batch_aborted;  /* current batch is aborted, discarding until next Sync */
>Having only one flag would be fine. batch_aborted is set of used only
>when in_batch is used, so both have a strong link
Yes, agree that tracking the batch status via one flag is more clean
So, Added new enum typedef enum
{
        PQBATCH_MODE_OFF,
        PQBATCH_MODE_ON,
        PQBATCH_MODE_ABORTED
} PQBatchStatus;
and " PQBatchStatus batch_status"  member of pg_conn is used to track the status of batch mode.

>/* OK, it's launched! */
>-   conn->asyncStatus = PGASYNC_BUSY;
>+   if (conn->in_batch)
>+       PQappendPipelinedCommand(conn, pipeCmd);
>+   else
>+       conn->asyncStatus = PGASYNC_BUSY;
>No, it is put in the queue
Though it is put in queue, technically the command is sent to server and server might have started executing it.
So it is ok to use launched I think.

> It may be a good idea to add a test for COPY and trigger a failure.
Added new test - "copy_failure" to trigger failures during copy state.
Please note that COPY is allowed in batch mode, but only syncing batch/queuing any command while copy is in progress is
blockeddue to potential failure of entire batch.
 
So updated the documentation for more clarity in this case.

>If I read the code correctly, it seems to me that it is possible to enable the batch mode, and then to use PQexec(),
PQsendPreparewill
 
>just happily process queue the command. Shouldn't PQexec() be prevented in batch mode? Similar remark for
PQexecParams(),
>PQexecPrepared() PQdescribePrepared and PQprepare(). In short everything calling PQexecFinish().
Check to verify whether the connection is in batch mode or not is already present in PQexecStart().
And, all the functions calling PQexecFinish() will not be allowed in batch mode anyways. So, no new check is needed I
think.

> It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an error for the new routines.
All the APIs which supports asynchronous command processing can be executed only if PG_PROTOCOL_MAJOR >= 3. So, adding
itto new routines are not really needed.
 

> +     After entering batch mode the application dispatches requests
>+     using normal asynchronous <application>libpq</application> functions like
>+     <function>PQsendQueryParams</function>,
><function>PQsendPrepare</function>,
>+     etc.
>It may be better to list all the functions here, PQSendQuery
Documentation updated with list of functions - PQsendQueryParams,PQsendPrepare,
PQsendQueryPrepared,PQdescribePortal,PQdescribePrepared,
PQsendDescribePortal,PQsendDescribePrepared.

>1. Typos in document part - diff-typos.txt file contains the typos specified here
>2. git --check is complaining
>3. s/funtionality/functionality/ and s/recognised/recognized/ typos in testlibpqbatch.c file
>4. s/Batching less useful/Batching is less useful in libpq.sgml
>5.  +   <para>
>+    Much like asynchronous query mode, there is no performance disadvantage to
>+    using batching and pipelining. It somewhat increased client application
>+    complexity and extra caution is required to prevent client/server network
>+    deadlocks, but can offer considerable performance improvements.
>+   </para>
>I would reword that a bit "it increases client application complexity
>and extra caution is required to prevent client/server deadlocks but
>offers considerable performance improvements".
>6. +    ("ping time") is high, and when many small operations are being performed in
>Nit: should use <quote> here. Still not quoting it would be fine.
>7. Postgres 10, not 9.6.
Corrected.


Thanks & Regards,
Vaishnavi
Fujitsu Australia
Disclaimer

The information in this e-mail is confidential and may contain content that is subject to copyright and/or is
commercial-in-confidenceand is intended only for the use of the above named addressee. If you are not the intended
recipient,you are hereby notified that dissemination, copying or use of the information is strictly prohibited. If you
havereceived this e-mail in error, please telephone Fujitsu Australia Software Technology Pty Ltd on + 61 2 9452 9000
orby reply e-mail to the sender and delete the document and all copies thereof.
 


Whereas Fujitsu Australia Software Technology Pty Ltd would not knowingly transmit a virus within an email
communication,it is the receiver’s responsibility to scan all communication and any files attached for computer viruses
andother defects. Fujitsu Australia Software Technology Pty Ltd does not accept liability for any loss or damage
(whetherdirect, indirect, consequential or economic) however caused, and whether by negligence or otherwise, which may
resultdirectly or indirectly from this communication or any files attached.
 


If you do not wish to receive commercial and/or marketing email messages from Fujitsu Australia Software Technology Pty
Ltd,please email unsubscribe@fast.au.fujitsu.com
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Fri, Feb 17, 2017 at 2:17 PM, Prabakaran, Vaishnavi
<VaishnaviP@fast.au.fujitsu.com> wrote:
> On 22 November 2016 at 18:32, Craig Ringer<craig@2ndquadrant.com> wrote:
> I am interested in this patch and addressed various below comments from reviewers. And, I have separated out code and
testmodule into 2 patches. So, If needed, test patch can be enhanced more, meanwhile code patch can be committed.
 

Cool. Nice to see a new version of this patch.
(You may want to your replies breath a bit, for example by adding an
extra newline to the paragraph you are answering to.)

>>Renaming and refactoring new APIs
>> +PQisInBatchMode           172
>>+PQqueriesInBatch          173
>>+PQbeginBatchMode          174
>>+PQendBatchMode            175
>>+PQsendEndBatch            176
>>+PQgetNextQuery            177
>>+PQbatchIsAborted          178
>>This set of routines is a bit inconsistent. Why not just prefixing them with PQbatch? Like that for example:
>> PQbatchStatus(): consists of disabled/inactive/none, active, error. This covers both PQbatchIsAborted() and
PQisInBatchMode().
>>PQbatchBegin()
>>PQbatchEnd()
>>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and process a sync message into the queue.
>
> Renamed and modified batch status APIs as below
> PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus
> PQqueriesInBatch ==> PQbatchQueueCount
> PQbeginBatchMode ==> PQbatchBegin
> PQendBatchMode ==> PQbatchEnd
> PQsendEndBatch ==> PQbatchQueueSync
> PQgetNextQuery ==> PQbatchQueueProcess

Thanks. This seems a way cleaner interface to me. Others may have a
different opinion so let's see if there are some.

>>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on failure
>>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on failure (OOM)
>
> I think it is still ok to keep the current behaviour like other ones present in the same file. E.g:"PQsendPrepare"
"PQsendQueryGuts"
>
>>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented.
>>It says:
>>"Returns the number of queries still in the queue for this batch"
>>but in fact it's implemented as a Boolean.
>
> Modified the logic to count number of entries in pending queue and return the count
>
>>The changes in src/test/examples/ are not necessary anymore. You moved all the tests to test_libpq (for the best
actually).
> Removed these unnecessary changes from src/test/examples folder and corrected the path mentioned in comments section
oftestlibpqbatch.c
 

>>But with the libpq batch API, maybe this could be modernized
>>with meta-commands like this:
>>\startbatch
>>...
>>\endbatch
>
> I think it is a separate patch candidate.

Definitely agreed on that. Let's not complicate things further more.

>> It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an error for the new routines.
>
> All the APIs which supports asynchronous command processing can be executed only if PG_PROTOCOL_MAJOR >= 3. So,
addingit to new routines are not really needed.
 

OK. Let's not complicate the patch more than necessary.

After an extra lookup at the patch, here are some comments.

In the docs when listing any of the PQBATCH_MODE_* variables, you
should have a markup <literal> around them. It would be cleaner to
make a list of the potential values that can be returned by
PQbatchStatus() using <variablelist>.

+   while (queue != NULL)
+   {
+       PGcommandQueueEntry *prev = queue;
+
+       queue = queue->next;
+       if (prev->query)
+           free(prev->query);
+       free(prev);
+   }
+   conn->cmd_queue_recycle = NULL
This could be useful as a separate routine.

+/* PQmakePipelinedCommand
+ * Get a new command queue entry, allocating it if required. Doesn't add it to
+ * the tail of the queue yet, use PQappendPipelinedCommand once the command has
+ * been written for that. If a command fails once it's called this, it should
+ * use PQrecyclePipelinedCommand to put it on the freelist or release it.
+ *
This comment block is not project-like. Please use an empty line as
first line with only "/*". The same comment applies to a bunch of the
routines introduced by this patch.

Not sure there is much point in having PQstartProcessingNewQuery. It
only does two things in two places, so that's not worth the loss in
readability.

+   if (conn->batch_status != PQBATCH_MODE_OFF)
+   {
+       pipeCmd = PQmakePipelinedCommand(conn);
+
+       if (pipeCmd == NULL)
+           return 0;           /* error msg already set */
+
+       last_query = &pipeCmd->query;
+       queryclass = &pipeCmd->queryclass;
+   }
+   else
+   {
+       last_query = &conn->last_query;
+       queryclass = &conn->queryclass;
+   }
This setup should happen further down.

conn->in_batch is undefined, causing the patch to fail to compile. And
actually you should not need it.

-                       conn->asyncStatus = PGASYNC_READY;
+                       conn->asyncStatus = PGASYNC_READY_MORE;                       return;
This should really not be changed, or it should be changed to a status
dedicated to batching only if batch mode is activated.

If you are planning for integrating this patch into Postres 10, please
be sure that this is registered in the last commit fest that will
begin next week:
https://commitfest.postgresql.org/13/
I'll try to book a couple of cycles to look at it if you are able to
register it into the CF and provide a new version.
-- 
Michael



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:
On Tue, Feb 21, 2017 at 6:51 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Feb 17, 2017 at 2:17 PM, Prabakaran, Vaishnavi
<VaishnaviP@fast.au.fujitsu.com> wrote:
> On 22 November 2016 at 18:32, Craig Ringer<craig@2ndquadrant.com> wrote:
> I am interested in this patch and addressed various below comments from reviewers. And, I have separated out code and test module into 2 patches. So, If needed, test patch can be enhanced more, meanwhile code patch can be committed.

Cool. Nice to see a new version of this patch.
(You may want to your replies breath a bit, for example by adding an
extra newline to the paragraph you are answering to.)

>>Renaming and refactoring new APIs
>> +PQisInBatchMode           172
>>+PQqueriesInBatch          173
>>+PQbeginBatchMode          174
>>+PQendBatchMode            175
>>+PQsendEndBatch            176
>>+PQgetNextQuery            177
>>+PQbatchIsAborted          178
>>This set of routines is a bit inconsistent. Why not just prefixing them with PQbatch? Like that for example:
>> PQbatchStatus(): consists of disabled/inactive/none, active, error. This covers both PQbatchIsAborted() and PQisInBatchMode().
>>PQbatchBegin()
>>PQbatchEnd()
>>PQbatchQueueSync() or PQbatchQueueFlush, same as PQsendEndBatch() to add and process a sync message into the queue.
>
> Renamed and modified batch status APIs as below
> PQisInBatchMode & PQbatchIsAborted ==> PQbatchStatus
> PQqueriesInBatch ==> PQbatchQueueCount
> PQbeginBatchMode ==> PQbatchBegin
> PQendBatchMode ==> PQbatchEnd
> PQsendEndBatch ==> PQbatchQueueSync
> PQgetNextQuery ==> PQbatchQueueProcess

Thanks. This seems a way cleaner interface to me. Others may have a
different opinion so let's see if there are some. 
 
>>PQbatchQueueCount(): returns N>0 if there are N entries, 0 if empty,-1 on failure
>>PQbatchQueueProcess(): returns 1 if process can begin, 0 if not, -1 on failure (OOM)
>
> I think it is still ok to keep the current behaviour like other ones present in the same file. E.g:"PQsendPrepare" "PQsendQueryGuts"
>
>>PQqueriesInBatch() (Newname(NN):PQbatchQueueCount)doesn't work as documented.
>>It says:
>>"Returns the number of queries still in the queue for this batch"
>>but in fact it's implemented as a Boolean.
>
> Modified the logic to count number of entries in pending queue and return the count
>
>>The changes in src/test/examples/ are not necessary anymore. You moved all the tests to test_libpq (for the best actually).
> Removed these unnecessary changes from src/test/examples folder and corrected the path mentioned in comments section of testlibpqbatch.c

>>But with the libpq batch API, maybe this could be modernized
>>with meta-commands like this:
>>\startbatch
>>...
>>\endbatch
>
> I think it is a separate patch candidate.

Definitely agreed on that. Let's not complicate things further more.

>> It may be a good idea to check for PG_PROTOCOL_MAJOR < 3 and issue an error for the new routines.
>
> All the APIs which supports asynchronous command processing can be executed only if PG_PROTOCOL_MAJOR >= 3. So, adding it to new routines are not really needed.

OK. Let's not complicate the patch more than necessary.

After an extra lookup at the patch, here are some comments.

Thanks for reviewing the patch.

 

In the docs when listing any of the PQBATCH_MODE_* variables, you
should have a markup <literal> around them. It would be cleaner to
make a list of the potential values that can be returned by
PQbatchStatus() using <variablelist>.

Modified the format of PQbatchStatus() and other batch APIs too in documentation along with addition of <literal> tags wherever required. 

 

+   while (queue != NULL)
+   {
+       PGcommandQueueEntry *prev = queue;
+
+       queue = queue->next;
+       if (prev->query)
+           free(prev->query);
+       free(prev);
+   }
+   conn->cmd_queue_recycle = NULL
This could be useful as a separate routine.

Ok, Moved this code to new function - PQfreeCommandQueue(). 

 

+/* PQmakePipelinedCommand
+ * Get a new command queue entry, allocating it if required. Doesn't add it to
+ * the tail of the queue yet, use PQappendPipelinedCommand once the command has
+ * been written for that. If a command fails once it's called this, it should
+ * use PQrecyclePipelinedCommand to put it on the freelist or release it.
+ *
This comment block is not project-like. Please use an empty line as
first line with only "/*". The same comment applies to a bunch of the
routines introduced by this patch.

Corrected this.

 

Not sure there is much point in having PQstartProcessingNewQuery. It
only does two things in two places, so that's not worth the loss in
readability.

Yes, removed this function and placed those two things in line. 

 

+   if (conn->batch_status != PQBATCH_MODE_OFF)
+   {
+       pipeCmd = PQmakePipelinedCommand(conn);
+
+       if (pipeCmd == NULL)
+           return 0;           /* error msg already set */
+
+       last_query = &pipeCmd->query;
+       queryclass = &pipeCmd->queryclass;
+   }
+   else
+   {
+       last_query = &conn->last_query;
+       queryclass = &conn->queryclass;
+   }
This setup should happen further down.


Moved it down post other validations in this function.


 

conn->in_batch is undefined, causing the patch to fail to compile. And
actually you should not need it.

That is an update missed in my previous patch, corrected in the new patch.

 

-                       conn->asyncStatus = PGASYNC_READY;
+                       conn->asyncStatus = PGASYNC_READY_MORE;
                        return;
This should really not be changed, or it should be changed to a status
dedicated to batching only if batch mode is activated.

Since pqParseInput3() reads all the input data post "Row description" message, yes, this change is not needed here. 

 

If you are planning for integrating this patch into Postres 10, please
be sure that this is registered in the last commit fest that will
begin next week:
https://commitfest.postgresql.org/13/

Yes, I have created a new patch entry into the commitfest 2017-03 and attached the latest patch with this e-mail. 

 

I'll try to book a couple of cycles to look at it if you are able to
register it into the CF and provide a new version.


Thanks again.

Regards,
Vaishnavi,
Fujitsu Australia. 
 

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:


On 22 Feb. 2017 14:14, "Vaishnavi Prabakaran" <vaishnaviprabakaran@gmail.com> wrote:
Thanks for reviewing the patch.

Thanks for picking it up! I've wanted to see this process for some time, but just haven't had the bandwidth for it.

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

> Yes, I have created a new patch entry into the commitfest 2017-03 and
> attached the latest patch with this e-mail.

Please find attached a companion patch implementing the batch API in
pgbench, exposed as \beginbatch and \endbatch meta-commands
(without documentation).

The idea for now is to make it easier to exercise the API and test
how batching performs. I guess I'll submit the patch separately in
a future CF, depending on when/if the libpq patch goes in.

While developing this, I noted a few things with 0001-v4:

1. lack of initialization for count in PQbatchQueueCount.
Trivial fix:

--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1874,7 +1874,7 @@ PQisBusy(PGconn *conn)
 int
 PQbatchQueueCount(PGconn *conn)
 {
-    int            count;
+    int            count = 0;
    PGcommandQueueEntry *entry;

2. misleading error message in PQexecStart. It gets called by a few other
functions than PQexec, such as PQprepare. As I understand it, the intent
here is to forbid the synchronous functions in batch mode, so this error
message should not single out PQexec.

@@ -1932,6 +2425,13 @@ PQexecStart(PGconn *conn)
    if (!conn)
        return false;

+    if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status !=
PQBATCH_MODE_OFF)
+    {
+        printfPQExpBuffer(&conn->errorMessage,
+                          libpq_gettext("cannot
PQexec in batch mode\n"));
+        return false;
+    }
+

3. In relation to #2, PQsendQuery() is not forbidden in batch mode
although I don't think it can work with it, as it's based on the old
protocol.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 8 March 2017 at 00:52, Daniel Verite <daniel@manitou-mail.org> wrote:
>         Vaishnavi Prabakaran wrote:
>
>> Yes, I have created a new patch entry into the commitfest 2017-03 and
>> attached the latest patch with this e-mail.
>
> Please find attached a companion patch implementing the batch API in
> pgbench, exposed as \beginbatch and \endbatch meta-commands
> (without documentation).
>
> The idea for now is to make it easier to exercise the API and test
> how batching performs. I guess I'll submit the patch separately in
> a future CF, depending on when/if the libpq patch goes in.

That's great, thanks, and thanks also for the fixes. Any chance you
can attach your updated patch?

I looked at modifying psql to support batching when run
non-interactively, but it would've required major restructuring of its
control loop and I ran out of time. I didn't think of modifying
pgbench. Great to see.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Wed, Mar 8, 2017 at 3:52 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Vaishnavi Prabakaran wrote:

> Yes, I have created a new patch entry into the commitfest 2017-03 and
> attached the latest patch with this e-mail.

Please find attached a companion patch implementing the batch API in
pgbench, exposed as \beginbatch and \endbatch meta-commands
(without documentation).

The idea for now is to make it easier to exercise the API and test
how batching performs. I guess I'll submit the patch separately in
a future CF, depending on when/if the libpq patch goes in.


 
Thanks for the companion patch and here are some comments:

1. I see, below check is used to verify if the connection is not in batch mode:
    if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)

    But, it is better to use if (PQbatchStatus(st->con) == PQBATCH_MODE_OFF) for this verification. Reason is there is one more state in batch mode - PQBATCH_MODE_ABORTED. So, if the batch mode is in aborted status, this check will still assume that the connection is not in batch mode.  
 
In a same way, "if(PQbatchStatus(st->con) == PQBATCH_MODE_ON)" check is better to be modified as "PQbatchStatus(st->con) != PQBATCH_MODE_OFF". 


2.  @@ -2207,7 +2227,47 @@ doCustom(TState *thread, CState *st, StatsData *agg)
+ if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
+ {
+ commandFailed(st, "already in batch mode");
+ break;
+ }

This is not required as below PQbatchBegin() internally does this verification.

+ PQbatchBegin(st->con);


3. It is better to check the return value of PQbatchBegin() rather than assuming success. E.g: PQbatchBegin() will return false if the connection is in copy in/out/both states. 

 
While developing this, I noted a few things with 0001-v4:

1. lack of initialization for count in PQbatchQueueCount.
Trivial fix:

--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1874,7 +1874,7 @@ PQisBusy(PGconn *conn)
 int
 PQbatchQueueCount(PGconn *conn)
 {
-       int                     count;
+       int                     count = 0;
        PGcommandQueueEntry *entry;


Thanks for your review and yes, Corrected.
 
2. misleading error message in PQexecStart. It gets called by a few other
functions than PQexec, such as PQprepare. As I understand it, the intent
here is to forbid the synchronous functions in batch mode, so this error
message should not single out PQexec.

@@ -1932,6 +2425,13 @@ PQexecStart(PGconn *conn)
        if (!conn)
                return false;

+       if (conn->asyncStatus == PGASYNC_QUEUED || conn->batch_status !=
PQBATCH_MODE_OFF)
+       {
+               printfPQExpBuffer(&conn->errorMessage,
+                                                 libpq_gettext("cannot
PQexec in batch mode\n"));
+               return false;
+       }
+



Hmm, this error message goes with the flow of other error messages in the same function. E.g: "PQexec not allowed during COPY BOTH" . But, anyways I modified the 
error message to be more generic. 

 
3. In relation to #2, PQsendQuery() is not forbidden in batch mode
although I don't think it can work with it, as it's based on the old
protocol.



It is actually forbidden and you can see the error message "cannot PQsendQuery in batch mode, use PQsendQueryParams" when you use PQsendQuery(). 
Attached the updated patch. 

Thanks & Regards,
Vaishnavi
Fujitsu Australia. 
Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

>    if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)
>
>    But, it is better to use if (PQbatchStatus(st->con) ==
> PQBATCH_MODE_OFF) for this verification. Reason is there is one more state
> in batch mode - PQBATCH_MODE_ABORTED. So, if the batch mode is in aborted
> status, this check will still assume that the connection is not in batch
> mode.
> In a same way, "if(PQbatchStatus(st->con) == PQBATCH_MODE_ON)" check is
> better to be modified as "PQbatchStatus(st->con) != PQBATCH_MODE_OFF".

Agreed, these two tests need to be changed to account for the
PQBATCH_MODE_ABORTED case. Fixed in new patch.

> 2.  @@ -2207,7 +2227,47 @@ doCustom(TState *thread, CState *st, StatsData
> *agg)
> + if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
> + {
> + commandFailed(st, "already in batch mode");
> + break;
> + }
>
> This is not required as below PQbatchBegin() internally does this
> verification.
>
> + PQbatchBegin(st->con);

The point of this test is to specifically forbid a sequence like this
\beginbatch
...(no \endbatch)
\beginbatch
...
and according to the doc for PQbatchBegin, it looks like the return
code won't tell us:
   "Causes a connection to enter batch mode if it is currently idle or
   already in batch mode.
    int PQbatchBegin(PGconn *conn);
   Returns 1 for success"

My understanding is that "already in batch mode" is not an error.

   "Returns 0 and has no effect if the connection is not currently
   idle, i.e. it has a result ready, is waiting for more input from
   the server, etc. This function does not actually send anything to
   the server, it just changes the libpq connection state"

> 3. It is better to check the return value of PQbatchBegin() rather than
> assuming success. E.g: PQbatchBegin() will return false if the connection
> is in copy in/out/both states.

Given point #2 above, I think both tests are needed:
   if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
    {
       commandFailed(st, "already in batch mode");
       break;
    }
    if (PQbatchBegin(st->con) == 0)
    {
       commandFailed(st, "failed to start a batch");
       break;
    }

> > 3. In relation to #2, PQsendQuery() is not forbidden in batch mode
> > although I don't think it can work with it, as it's based on the old
> > protocol.
> >
> It is actually forbidden and you can see the error message "cannot
> PQsendQuery in batch mode, use PQsendQueryParams" when you use
> PQsendQuery().

Sorry for blaming the innocent piece of code, looking closer
it's pgbench that does not display the message.
With the simple query protocol, sendCommand() does essentially:

  r = PQsendQuery(st->con, sql);

... other stuff...

  if (r == 0)
  {
    if (debug)
      fprintf(stderr, "client %d could not send %s\n",
      st->id, command->argv[0]);
    st->ecnt++;
    return false;
  }

So only in debug mode does it inform the user that it failed, and
even then, it does not display PQerrorMessage(st->con).

In non-debug mode it's opaque to the user. If it always fail, it appears
to loop forever. The caller has this comment that is relevant to the problem:

    if (!sendCommand(st, command))
      {
        /*
         * Failed. Stay in CSTATE_START_COMMAND state, to
         * retry. ??? What the point or retrying? Should
         * rather abort?
         */
        return;
      }

I think it doesn't bother anyone up to now because the immediate
failure modes of PQsendQuery() are resource allocation or protocol
failures that tend to never happen.

Anyway currently \beginbatch avoids the problem by checking
whether querymode == QUERY_SIMPLE, to fail gracefully
instead of letting the endless loop happen.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
  Hi,

I notice that PQsetSingleRowMode() doesn't work when getting batch results.

The function is documented as:
" int PQsetSingleRowMode(PGconn *conn);

  This function can only be called immediately after PQsendQuery or one
  of its sibling functions, before any other operation on the connection
  such as PQconsumeInput or PQgetResult"

But PQbatchQueueProcess() unconditionally clears conn->singleRowMode,
so whatever it was when sending the query gets lost, and besides
other queries might have been submitted in the meantime.

Also if trying to set that mode when fetching like this

     while (QbatchQueueProcess(conn)) {
       r = PQsetSingleRowMode(conn);
       if (r!=1) {
      fprintf(stderr, "PQsetSingleRowMode() failed");
       }
       ..

it might work the first time, but on the next iterations, conn->asyncStatus
might be PGASYNC_READY, which is a failure condition for
PQsetSingleRowMode(), so that won't do.

ISTM that the simplest fix would be that when in batch mode,
PQsetSingleRowMode() should register that the last submitted query
does request that mode, and when later QbatchQueueProcess dequeues
the batch entry for the corresponding query, this flag would be popped off
and set as the current mode.

Please find attached the suggested fix, against the v5 of the patch.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Sat, Mar 11, 2017 at 12:52 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
  Hi,

I notice that PQsetSingleRowMode() doesn't work when getting batch results.

The function is documented as:
" int PQsetSingleRowMode(PGconn *conn);

  This function can only be called immediately after PQsendQuery or one
  of its sibling functions, before any other operation on the connection
  such as PQconsumeInput or PQgetResult"

But PQbatchQueueProcess() unconditionally clears conn->singleRowMode,
so whatever it was when sending the query gets lost, and besides
other queries might have been submitted in the meantime.

Also if trying to set that mode when fetching like this

     while (QbatchQueueProcess(conn)) {
       r = PQsetSingleRowMode(conn);
       if (r!=1) {
          fprintf(stderr, "PQsetSingleRowMode() failed");
       }
       ..

it might work the first time, but on the next iterations, conn->asyncStatus
might be PGASYNC_READY, which is a failure condition for
PQsetSingleRowMode(), so that won't do.


Thanks for investigating the problem, and could you kindly explain what "next iteration" you mean here? Because I don't see any problem in following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode() , PQgetResult(). Am I missing anything?
Please note that it is MUST to call PQgetResult immediately after PQbatchQueueProcess() as documented. 

 

ISTM that the simplest fix would be that when in batch mode,
PQsetSingleRowMode() should register that the last submitted query
does request that mode, and when later QbatchQueueProcess dequeues
the batch entry for the corresponding query, this flag would be popped off
and set as the current mode.

Please find attached the suggested fix, against the v5 of the patch.

Before going with this fix, I would like you to consider the option of asking batch processing users(via documentation) to set single-row mode before calling PQgetResult(). 
Either way we need to fix the documentation part, letting users know how they can activate single-row mode while using batch processing. 
Let me know your thoughts.

Best Regards,
Vaishnavi,
Fujitsu Australia. 

 

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 13 March 2017 at 08:54, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:

> Before going with this fix, I would like you to consider the option of
> asking batch processing users(via documentation) to set single-row mode
> before calling PQgetResult().
> Either way we need to fix the documentation part, letting users know how
> they can activate single-row mode while using batch processing.
> Let me know your thoughts.

Thanks for looking into this, it's clear that I didn't cover the combo
of single-row-mode and batch mode adequately.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

> >      while (QbatchQueueProcess(conn)) {
> >        r = PQsetSingleRowMode(conn);
> >        if (r!=1) {
> >           fprintf(stderr, "PQsetSingleRowMode() failed");
> >        }
> >        ..

> Thanks for investigating the problem, and could you kindly explain what
> "next iteration" you mean here? Because I don't see any problem in
> following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode()
> , PQgetResult()

I mean the next iteration of the above while statement. Referring
to the doc, that would be the "next batch entry":

  " To get the result of the first batch entry the client must call
   PQbatchQueueProcess. It must then call PQgetResult and handle the
   results until PQgetResult returns null (or would return null if
   called). The result from the next batch entry may then be retrieved
   using PQbatchQueueProcess and the cycle repeated"

Attached is a bare-bones testcase showing the problem.
As it is, it works, retrieving results for three "SELECT 1"
in the same batch.  Now in order to use the single-row
fetch mode, consider adding this:

    r = PQsetSingleRowMode(conn);
    if (r!=1) {
      fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
    }

When inserted after the call to PQbatchQueueProcess,
which is what I understand you're saying works for you,
it fails for me when starting to get the results of the 2nd query
and after.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Tue, Mar 14, 2017 at 4:19 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
 
I mean the next iteration of the above while statement. Referring
to the doc, that would be the "next batch entry":

  " To get the result of the first batch entry the client must call
   PQbatchQueueProcess. It must then call PQgetResult and handle the
   results until PQgetResult returns null (or would return null if
   called). The result from the next batch entry may then be retrieved
   using PQbatchQueueProcess and the cycle repeated"

Attached is a bare-bones testcase showing the problem.
As it is, it works, retrieving results for three "SELECT 1"
in the same batch.  Now in order to use the single-row
fetch mode, consider adding this:

    r = PQsetSingleRowMode(conn);
    if (r!=1) {
      fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
    }

When inserted after the call to PQbatchQueueProcess,
which is what I understand you're saying works for you,
it fails for me when starting to get the results of the 2nd query
and after.



Thanks for explaining the issue in detail and yes, I do see the issue using your attached test file.

And I think the problem is with the "parseInput" call at the end of PQbatchQueueProcess(). 
I don't see the need for "parseInput" to cover the scope of PQbatchQueueProcess(), also anyways we do it in PQgetResult(). 
This function call changes the asyncstatus of connection to READY(following command complete message from backend), so eventually PQsetSingleRowMode() is failing. So, attached the alternative fix for this issue. 
Please share me your thoughts.

I would also like to hear Craig's opinion on it before applying this fix to the original patch, just to make sure am not missing anything here. 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 



Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Tue, Mar 14, 2017 at 5:50 PM, Vaishnavi Prabakaran <vaishnaviprabakaran@gmail.com> wrote:


On Tue, Mar 14, 2017 at 4:19 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
 
I mean the next iteration of the above while statement. Referring
to the doc, that would be the "next batch entry":

  " To get the result of the first batch entry the client must call
   PQbatchQueueProcess. It must then call PQgetResult and handle the
   results until PQgetResult returns null (or would return null if
   called). The result from the next batch entry may then be retrieved
   using PQbatchQueueProcess and the cycle repeated"

Attached is a bare-bones testcase showing the problem.
As it is, it works, retrieving results for three "SELECT 1"
in the same batch.  Now in order to use the single-row
fetch mode, consider adding this:

    r = PQsetSingleRowMode(conn);
    if (r!=1) {
      fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
    }

When inserted after the call to PQbatchQueueProcess,
which is what I understand you're saying works for you,
it fails for me when starting to get the results of the 2nd query
and after.



Thanks for explaining the issue in detail and yes, I do see the issue using your attached test file.

And I think the problem is with the "parseInput" call at the end of PQbatchQueueProcess(). 
I don't see the need for "parseInput" to cover the scope of PQbatchQueueProcess(), also anyways we do it in PQgetResult(). 
This function call changes the asyncstatus of connection to READY(following command complete message from backend), so eventually PQsetSingleRowMode() is failing. So, attached the alternative fix for this issue.  
Please share me your thoughts.

I would also like to hear Craig's opinion on it before applying this fix to the original patch, just to make sure am not missing anything here. 


Attached the code patch applied with the fix explained above and moving the CF status seeking review. 

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 


Attachment

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
Vaishnavi Prabakaran wrote:

> So, attached the alternative fix for this issue.
> Please share me your thoughts.

I assume you prefer the alternative fix because it's simpler.

> I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.

+1

The main question is whether the predicates enforced
by PQsetSingleRowMode() apply in batch mode in all cases
when it's legit to call that function. Two predicates
that may be problematic are:if (conn->asyncStatus != PGASYNC_BUSY)    return 0;
andif (conn->result)    return 0;

The general case with batch mode is that, from the doc:
"The client interleaves result processing with sending batch queries"
Note that I've not even tested that here, I've tested
batching a bunch of queries in a first step and getting the results
in a second step.
I am not confident that the above predicates will be true
in all cases. Also your alternative fix assumes that we add
a user-visible exception to PQsetSingleRowMode in batch mode,
whereby it must not be called as currently documented: "call PQsetSingleRowMode immediately after a successful call of
PQsendQuery (or a sibling function)" 
My gut feeling is that it's not the right direction, I prefer making
the single-row a per-query attribute internally and keep
PQsetSingleRowMode's contract unchanged from the
user's perspective.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Vaishnavi Prabakaran wrote:

> So, attached the alternative fix for this issue.
> Please share me your thoughts.

I assume you prefer the alternative fix because it's simpler.

I would like add one more reason for this fix, I think "PQsetSingleRowMode" should be called only when the result is ready to be processed and before starting to consume result as it is documented currently as follows -
"To enter single-row mode, call PQsetSingleRowMode immediately after a successful call of PQsendQuery (or a sibling function). This mode selection is effective only for the currently executing query. Then call PQgetResult repeatedly..."


I agree that first fix (you shared) will allow user to set single-row mode after PQsendQuery, but it also allows user to set this mode at any time of batch processing(not necessarily "immediately after PQsendQuery"), also "mode selection is effective only for the currently executing query" will be false. Please note that I don't see any problem with this deviation. I like to outline that documentation here anyways needs an update/note. 


 Before going further, I would like to mention that I have modified the documentation of batch processing( in v6 code patch) as below:
"To enter single-row mode, call PQsetSingleRowMode immediately after a successful call of PQbatchQueueProcess. This mode selection is effective only for the currently executing query. For more information on the use of PQsetSingleRowMode , refer to Section 33.6, “Retrieving Query Results Row-By-Row”. "

Please let me know if you think this is not enough but wanted to update section 33.6 also?

 

> I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.

+1

The main question is whether the predicates enforced
by PQsetSingleRowMode() apply in batch mode in all cases
when it's legit to call that function. Two predicates
that may be problematic are:
        if (conn->asyncStatus != PGASYNC_BUSY)
                return 0;
and
        if (conn->result)
                return 0;

The general case with batch mode is that, from the doc:
"The client interleaves result processing with sending batch queries"

While sending batch queries in middle of result processing, only the query is appended to the list of queries maintained for batch processing and no current connection attribute impacting result processing will be changed. So, calling the PQsetSingleRowMode in-between result processing will fail as it tries to set single-row mode for currently executing query for which result processing is already started. 


Note that I've not even tested that here,
I've tested
batching a bunch of queries in a first step and getting the results
in a second step.
I am not confident that the above predicates will be true
in all cases.
Also your alternative fix assumes that we add
a user-visible exception to PQsetSingleRowMode in batch mode,
whereby it must not be called as currently documented:
  "call PQsetSingleRowMode immediately after a successful call of
   PQsendQuery (or a sibling function)"
My gut feeling is that it's not the right direction, I prefer making
the single-row a per-query attribute internally and keep
PQsetSingleRowMode's contract unchanged from the
user's perspective.


Am going to include the test which you shared in the test patch. Please let me know if you want to cover anymore specific cases to gain confidence.  
 
Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Re: PATCH: Batch/pipelining support for libpq

From
David Steele
Date:
Hi Vaishnavi,

On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:
> On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite <daniel@manitou-mail.org
>
> Please let me know if you think this is not enough but wanted to update
> section 33.6 also?

Daniel, any input here?

>     > I would also like to hear Craig's opinion on it before applying this fix
>     > to the original patch, just to make sure am not missing anything here.

Craig?

>
> Am going to include the test which you shared in the test patch. Please
> let me know if you want to cover anymore specific cases to gain
> confidence.

I have marked this submission "Waiting for Author" since it appears a 
new patch is required based on input and adding a new test.

-- 
-David
david@pgmasters.net



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 24 March 2017 at 23:21, David Steele <david@pgmasters.net> wrote:
> Hi Vaishnavi,
>
> On 3/19/17 9:32 PM, Vaishnavi Prabakaran wrote:
>>
>> On Fri, Mar 17, 2017 at 12:37 AM, Daniel Verite <daniel@manitou-mail.org
>>
>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?
>
>
> Daniel, any input here?
>
>>     > I would also like to hear Craig's opinion on it before applying this
>> fix
>>     > to the original patch, just to make sure am not missing anything
>> here.
>
>
> Craig?

I'm fairly confident that I overlooked single row mode entirely in the
original patch, though it's long enough ago that it's hard for me to
remember exactly.

I don't really have much of an opinion on the best handling of it.

I would expect to be setting single-row mode just before I requested
the *result* from the next pending query, since it relates to result
processing rather than query dispatch. But that's about all the
opinion I have here.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I'm fairly confident that I overlooked single row mode entirely in the
> original patch, though it's long enough ago that it's hard for me to
> remember exactly.
>
> I don't really have much of an opinion on the best handling of it.
>
> I would expect to be setting single-row mode just before I requested
> the *result* from the next pending query, since it relates to result
> processing rather than query dispatch. But that's about all the
> opinion I have here.

Yeah, I think that it makes sense to allow users to switch to single
row mode before requesting a result in the queue. It seems to me that
this should also be effective only during the fetching of one single
result set. When the client moves on to the next item in the queue we
should make necessary again a call to PQsetSingleRowMode().

Regarding the patch, I have spotted the following things in the last version:
- src/test/modules/Makefile should include test_libpq.
- Regression tests from 0002 are failing:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On 27 March 2017 at 15:26, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'm fairly confident that I overlooked single row mode entirely in the
>> original patch, though it's long enough ago that it's hard for me to
>> remember exactly.
>>
>> I don't really have much of an opinion on the best handling of it.
>>
>> I would expect to be setting single-row mode just before I requested
>> the *result* from the next pending query, since it relates to result
>> processing rather than query dispatch. But that's about all the
>> opinion I have here.
>
> Yeah, I think that it makes sense to allow users to switch to single
> row mode before requesting a result in the queue. It seems to me that
> this should also be effective only during the fetching of one single
> result set. When the client moves on to the next item in the queue we
> should make necessary again a call to PQsetSingleRowMode().
>
> Regarding the patch, I have spotted the following things in the last version:
> - src/test/modules/Makefile should include test_libpq.
> - Regression tests from 0002 are failing:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> not ok 6 - testlibpqbatch copyfailure

There are only a few more days left of this commit fest.

Things are sounding pretty ready here, with some final cleanups
pending. It'd be cool to get this into Pg 10 :)

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Mon, Mar 27, 2017 at 4:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 27 March 2017 at 15:26, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> I'm fairly confident that I overlooked single row mode entirely in the
>>> original patch, though it's long enough ago that it's hard for me to
>>> remember exactly.
>>>
>>> I don't really have much of an opinion on the best handling of it.
>>>
>>> I would expect to be setting single-row mode just before I requested
>>> the *result* from the next pending query, since it relates to result
>>> processing rather than query dispatch. But that's about all the
>>> opinion I have here.
>>
>> Yeah, I think that it makes sense to allow users to switch to single
>> row mode before requesting a result in the queue. It seems to me that
>> this should also be effective only during the fetching of one single
>> result set. When the client moves on to the next item in the queue we
>> should make necessary again a call to PQsetSingleRowMode().
>>
>> Regarding the patch, I have spotted the following things in the last version:
>> - src/test/modules/Makefile should include test_libpq.
>> - Regression tests from 0002 are failing:
>> ok 1 - testlibpqbatch disallowed_in_batch
>> ok 2 - testlibpqbatch simple_batch
>> ok 3 - testlibpqbatch multi_batch
>> ok 4 - testlibpqbatch batch_abort
>> ok 5 - testlibpqbatch timings
>> not ok 6 - testlibpqbatch copyfailure
>
> There are only a few more days left of this commit fest.
>
> Things are sounding pretty ready here, with some final cleanups
> pending. It'd be cool to get this into Pg 10 :)

Yes, that's one of the items I'd like to help with by the end of the CF.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

> Please let me know if you think this is not enough but wanted to update
> section 33.6 also?

Yes, if the right place to call PQsetSingleRowMode() is immediately
after PQbatchQueueProcess(), I think we need to update
"33.6. Retrieving Query Results Row-By-Row"
with that information, otherwise what it says is just wrong
when in batch mode.

Also, in 33.5, I suggest to not use the "currently executing
query" as a synonym for the "query currently being processed"
to avoid any confusion between what the backend is executing
and a prior query whose results are being collected by the client
at the same moment.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

> Am going to include the test which you shared in the test patch. Please let
> me know if you want to cover anymore specific cases to gain confidence.

One bit of functionality that does not work in batch mode and is left
as is by the patch is the PQfn() fast path interface and the large object
functions that use it.

Currently, calling lo_* functions inside a batch will fail with a message
that depends on whether the internal lo_initialize() has been successfully
called before.

If it hasn't, PQerrorMessage() will be:  "Synchronous command execution functions are not allowed in batch mode"
which is fine, but it comes by happenstance from lo_initialize()
calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.

OTOH, if lo_initialize() has succeeded before, a call to a large
object function will fail with a different message: "connection in wrong state"
which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE

Having an unified error message would be more user friendly.

Concerning the doc, when saying in 33.5.2:"In batch mode only asynchronous operations are permitted"
the server-side functions are indeed ruled out, since PQfn() is
synchronous, but maybe we should be a bit more explicit
about that?

Chapter 34.3 (Large Objects / Client Interfaces) could also
mention the incompatibility with batch mode.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:
Thanks Craig and Michael for confirming that "PQsetSingleRowMode" should be called right after "PQbatchQueueProcess". 

Michael Paquier wrote:

> It seems to me that
>this should also be effective only during the fetching of one single
>result set. When the client moves on to the next item in the queue we
>should make necessary again a call to PQsetSingleRowMode().

Yes, the current behavior(with V6 code patch) is exactly the same as you described above. PQsetSingleRowMode() should be called each time after "PQbatchQueueProcess" to set result processing to single-row mode. 

>src/test/modules/Makefile should include test_libpq.

Yes, added test_libpq to the list in Makefile. 

Daniel Verite wrote:

>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?

>Yes, if the right place to call PQsetSingleRowMode() is immediately
>after PQbatchQueueProcess(), I think we need to update
>"33.6. Retrieving Query Results Row-By-Row"
>with that information, otherwise what it says is just wrong
>when in batch mode.

Yes, I have updated Chapter 33.6 by adding note for batch mode. 

>Also, in 33.5, I suggest to not use the "currently executing
>query" as a synonym for the "query currently being processed"
>to avoid any confusion between what the backend is executing
>and a prior query whose results are being collected by the client
>at the same moment.

Yes correct, I modified the words to "query currently being processed" as suggested. 

 
One bit of functionality that does not work in batch mode and is left
as is by the patch is the PQfn() fast path interface and the large object
functions that use it.

Currently, calling lo_* functions inside a batch will fail with a message
that depends on whether the internal lo_initialize() has been successfully
called before.

If it hasn't, PQerrorMessage() will be:
   "Synchronous command execution functions are not allowed in batch mode"
which is fine, but it comes by happenstance from lo_initialize()
calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.

OTOH, if lo_initialize() has succeeded before, a call to a large
object function will fail with a different message:
  "connection in wrong state"
which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE

Having an unified error message would be more user friendly.


Thanks for finding out this and yes, added a new check in PQfn() to give the same error message - "Synchronous command execution functions are not allowed in batch mode" when called in batch mode.

 
Concerning the doc, when saying in 33.5.2:
 "In batch mode only asynchronous operations are permitted"
the server-side functions are indeed ruled out, since PQfn() is
synchronous, but maybe we should be a bit more explicit
about that? 

Chapter 34.3 (Large Objects / Client Interfaces) could also
mention the incompatibility with batch mode.


Updated 33.5.2 to be more clear about what functions are allowed and what are not allowed. Updated Chapter 33.3(Large Objects/ Client Interfaces) to let the user know about the incompatibility with batch mode . 
 

Attached the latest patch and here is the RT run result:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
ok 6 - testlibpqbatch copyfailure
ok 7 - testlibpqbatch test_singlerowmode
ok
All tests successful.
Files=1, Tests=7,  5 wallclock secs ( 0.01 usr  0.00 sys +  1.79 cusr  0.35 csys =  2.15 CPU)
Result: PASS

Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Tue, Mar 28, 2017 at 1:57 PM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Thanks Craig and Michael for confirming that "PQsetSingleRowMode" should be
> called right after "PQbatchQueueProcess".
>
> Michael Paquier wrote:
>
>> It seems to me that
>>this should also be effective only during the fetching of one single
>>result set. When the client moves on to the next item in the queue we
>>should make necessary again a call to PQsetSingleRowMode().
>
> Yes, the current behavior(with V6 code patch) is exactly the same as you
> described above. PQsetSingleRowMode() should be called each time after
> "PQbatchQueueProcess" to set result processing to single-row mode.

Okay, that's fine for me then.

> Attached the latest patch and here is the RT run result:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> ok 6 - testlibpqbatch copyfailure
> ok 7 - testlibpqbatch test_singlerowmode
> ok
> All tests successful.
> Files=1, Tests=7,  5 wallclock secs ( 0.01 usr  0.00 sys +  1.79 cusr  0.35
> csys =  2.15 CPU)
> Result: PASS

ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure
Hm... Not here. I saw something like that a couple of days ago on my
macos laptop and that was related to a variable not initialized. From
001_libpq_async_main.log:
7-03-28 17:05:49.159 JST [31553] t/001_libpq_async.pl LOG:  execute
<unnamed>: copy batch_demo(id) to stdout;
2017-03-28 17:05:49.159 JST [31553] t/001_libpq_async.pl LOG:  execute
<unnamed>: copy batch_demo(itemno) FROM stdin;
2017-03-28 17:05:49.160 JST [31553] t/001_libpq_async.pl ERROR:
unexpected message type 0x50 during COPY from stdin
2017-03-28 17:05:49.160 JST [31553] t/001_libpq_async.pl CONTEXT:
COPY batch_demo, line 1

From regress_log_001_libpq_async :
ok 5 - testlibpqbatch timings
# Running: testlibpqbatch dbname=postgres 10000 copyfailure
dispatching SELECT query failed: cannot queue commands during COPY

COPYBUF: 5

Error status and message got from server due to COPY command failure
are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during
COPY from stdin
CONTEXT:  COPY batch_demo, line 1

So it seems to me that you are still missing something..
src/test/modules/Makefile                        |    1 +src/test/modules/test_libpq/.gitignore           |    5
+src/test/modules/test_libpq/Makefile            |   25 +src/test/modules/test_libpq/README               |    1
+src/test/modules/test_libpq/t/001_libpq_async.pl|   26 +src/test/modules/test_libpq/testlibpqbatch.c     | 1661
++++++++++++++++++++++6files changed, 1719 insertions(+)
 
Could you as well update src/tools/msvc/vcregress.pl, aka the routine
modulescheck so as this new test is skipped. I am sure that nobody
will scream if this test is not run on Windows, but the buildfarm will
complain if the patch is let in its current state.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
Michael Paquier wrote:

> # Running: testlibpqbatch dbname=postgres 10000 copyfailure
> dispatching SELECT query failed: cannot queue commands during COPY
>
> COPYBUF: 5
>
> Error status and message got from server due to COPY command failure
> are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during
> COPY from stdin
> CONTEXT:  COPY batch_demo, line 1

Same result here.

BTW the doc says:  "In batch mode only asynchronous operations are permitted, and COPY is  not recommended as it most
likelywill trigger failure in batch 
processing"
Yet it seems that the test assumes that it should work nonetheless.
I don't quite understand what we expect of this test, given what's
documented. Or what am I missing?

While looking at the regress log, I noticed multiple spurious
test_singlerowmode tests among the others. Should be fixed with:

--- a/src/test/modules/test_libpq/testlibpqbatch.c
+++ b/src/test/modules/test_libpq/testlibpqbatch.c
@@ -1578,6 +1578,7 @@ main(int argc, char **argv)        run_batch_abort = 0;        run_timings = 0;
run_copyfailure= 0; 
+            run_singlerowmode = 0;        if (strcmp(argv[3], "disallowed_in_batch") == 0)
run_disallowed_in_batch= 1;        else if (strcmp(argv[3], "simple_batch") == 0) 

There's also the fact that this test doesn't care whether it fails
(compare with all the "goto fail" of the other tests).

And it happens that PQsetSingleRowMode() fails after the call to
PQbatchQueueProcess() that instantiates a PGresult
of status PGRES_BATCH_END to indicate the end of the batch,

To me this shows how this way of setting the single row mode is not ideal.
In order to avoid the failure, the loop should know in advance
what kind of result it's going to dequeue before calling PQgetResult(),
which doesn't feel right.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:

Michael Paquier wrote:

>Could you as well update src/tools/msvc/
vcregress.pl, aka the routine
>modulescheck so as this new test is skipped. I am sure that nobody
>will scream if this test is not run on Windows, but the buildfarm will
>complain if the patch is let in its current state.

Thanks for the finding. Hmm, modulescheck will not pick up test_libpq as "subdircheck" will fail for this module(because it does not have "sql"/"expected" folders in it) and hence it will be skipped. 
But, Modified "Mkvcbuild.pm" to exclude "test_libpq" module, as this test anyways will not be run in Windows and also because it uses linux specific APIs(gettimeofday , timersub) . 

On Wed, Mar 29, 2017 at 12:28 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Michael Paquier wrote:

> # Running: testlibpqbatch dbname=postgres 10000 copyfailure
> dispatching SELECT query failed: cannot queue commands during COPY
>
> COPYBUF: 5
>
> Error status and message got from server due to COPY command failure
> are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during
> COPY from stdin
> CONTEXT:  COPY batch_demo, line 1

Same result here.

BTW the doc says:
  "In batch mode only asynchronous operations are permitted, and COPY is
   not recommended as it most likely will trigger failure in batch
processing"
Yet it seems that the test assumes that it should work nonetheless.
I don't quite understand what we expect of this test, given what's
documented. Or what am I missing?


copyfailure test is basically tests the error scenarios arising in batch mode on using copy command. And, test expects error(not that it should work) and the test is made to pass if expected error message is received.
So, it is error case testing behaves like:
expects error -> receives error = test pass
expects error -> no error = test fail 

There are 2 cases tested here:

1. Example for the case - Using COPY command in batch mode will trigger failure. (Specified in documentation)
Failure can be seen only when processing the copy command's results. That is, after dispatching COPY command, server goes into COPY state , but the client dispatched another query in batch mode. 

Below error messages belongs to this case :
Error status and message got from server due to COPY command failure are : PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during COPY from stdin
CONTEXT:  COPY batch_demo, line 1

message type 0x5a arrived from server while idle


2. When connection is is copy state, that is, while processing copy command's result, Cannot queue any query in batch mode. 

Below error message belongs to this case:
dispatching SELECT query failed: cannot queue commands during COPY

I added this test following Michael's review comment - "It may be a good idea to add a test for COPY and trigger a failure."
Hope this clarifies the presence of error message in log file and why the test is made to pass. 

 

While looking at the regress log, I noticed multiple spurious
test_singlerowmode tests among the others. Should be fixed with:

--- a/src/test/modules/test_libpq/testlibpqbatch.c
+++ b/src/test/modules/test_libpq/testlibpqbatch.c
@@ -1578,6 +1578,7 @@ main(int argc, char **argv)
                        run_batch_abort = 0;
                        run_timings = 0;
                        run_copyfailure = 0;
+                       run_singlerowmode = 0;
                        if (strcmp(argv[3], "disallowed_in_batch") == 0)
                                run_disallowed_in_batch = 1;
                        else if (strcmp(argv[3], "simple_batch") == 0)


Thanks for finding it out and yes corrected.

 
There's also the fact that this test doesn't care whether it fails
(compare with all the "goto fail" of the other tests).


Modified the test to check for failure condition. 
 

And it happens that PQsetSingleRowMode() fails after the call to
PQbatchQueueProcess() that instantiates a PGresult
of status PGRES_BATCH_END to indicate the end of the batch,

To me this shows how this way of setting the single row mode is not ideal.
In order to avoid the failure, the loop should know in advance
what kind of result it's going to dequeue before calling PQgetResult(),
which doesn't feel right.


Yes, it is one of the requirement that the batch mode application should know the order of queries it sent and the expected results. (Specified in "Processing results" section of batch mode documentation)

Attached the latest code and test patches. 

Thanks & Regards,
Vaishnavi
Fujitusu Australia.
Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Wed, Mar 29, 2017 at 12:40 PM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Michael Paquier wrote:
>>Could you as well update src/tools/msvc/vcregress.pl, aka the routine
>>modulescheck so as this new test is skipped. I am sure that nobody
>>will scream if this test is not run on Windows, but the buildfarm will
>>complain if the patch is let in its current state.
>
> Thanks for the finding. Hmm, modulescheck will not pick up test_libpq as
> "subdircheck" will fail for this module(because it does not have
> "sql"/"expected" folders in it) and hence it will be skipped.
> But, Modified "Mkvcbuild.pm" to exclude "test_libpq" module, as this test
> anyways will not be run in Windows and also because it uses linux specific
> APIs(gettimeofday , timersub).

src/port/gettimeofday.c extends things on Windows, and we could
consider to just drop the timing portion for simplicity (except
Windows I am not seeing any platform missing timersub). But that's
just a point of detail. Or we could just drop those tests from the
final patch, and re-implement them after having some psql-level
subcommands. That would far better for portability.

> There are 2 cases tested here:
>
> 1. Example for the case - Using COPY command in batch mode will trigger
> failure. (Specified in documentation)
> Failure can be seen only when processing the copy command's results. That
> is, after dispatching COPY command, server goes into COPY state , but the
> client dispatched another query in batch mode.
>
> Below error messages belongs to this case :
> Error status and message got from server due to COPY command failure are :
> PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during COPY from
> stdin
> CONTEXT:  COPY batch_demo, line 1
>
> message type 0x5a arrived from server while idle

Such messages are really confusing to the user as they refer to the
protocol going out of sync. I would think that something like "cannot
process COPY results during a batch processing" would be cleaner.
Isn't some more error handling necessary in GetCopyStart()?

> Attached the latest code and test patches.

For me the test is still broken:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure

Still broken here. I can see that this patch is having enough
safeguards in PQbatchBegin() and PQsendQueryStart(), but this issue is
really pointing out at something...
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


On Thu, Mar 30, 2017 at 12:08 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Mar 29, 2017 at 12:40 PM, Vaishnavi Prabakaran
<vaishnaviprabakaran@gmail.com> wrote:
> Michael Paquier wrote:
>>Could you as well update src/tools/msvc/vcregress.pl, aka the routine
>>modulescheck so as this new test is skipped. I am sure that nobody
>>will scream if this test is not run on Windows, but the buildfarm will
>>complain if the patch is let in its current state.
>
> Thanks for the finding. Hmm, modulescheck will not pick up test_libpq as
> "subdircheck" will fail for this module(because it does not have
> "sql"/"expected" folders in it) and hence it will be skipped.
> But, Modified "Mkvcbuild.pm" to exclude "test_libpq" module, as this test
> anyways will not be run in Windows and also because it uses linux specific
> APIs(gettimeofday , timersub).

src/port/gettimeofday.c extends things on Windows, and we could
consider to just drop the timing portion for simplicity (except
Windows I am not seeing any platform missing timersub). But that's
just a point of detail. Or we could just drop those tests from the
final patch, and re-implement them after having some psql-level
subcommands. That would far better for portability.

Yes agree, having tests with psql meta commands will be more easy to understand also.  And, that is one of the reason I separated the patch into two - code and test . So, yes, we can drop the test patch for now. 
 

> There are 2 cases tested here:
>
> 1. Example for the case - Using COPY command in batch mode will trigger
> failure. (Specified in documentation)
> Failure can be seen only when processing the copy command's results. That
> is, after dispatching COPY command, server goes into COPY state , but the
> client dispatched another query in batch mode.
>
> Below error messages belongs to this case :
> Error status and message got from server due to COPY command failure are :
> PGRES_FATAL_ERROR ERROR:  unexpected message type 0x50 during COPY from
> stdin
> CONTEXT:  COPY batch_demo, line 1
>
> message type 0x5a arrived from server while idle

Such messages are really confusing to the user as they refer to the
protocol going out of sync. I would think that something like "cannot
process COPY results during a batch processing" would be cleaner.
Isn't some more error handling necessary in GetCopyStart()?

Hmm, With batch mode, after sending COPY command to server(and server started processing the query and goes into COPY state) , client does not immediately read the result , but it keeps sending other queries to the server. By this time, server already encountered the error scenario(Receiving different message during COPY state) and sent error messages. So, when client starts reading the result, already error messages sent by server are present in socket.  I think trying to handle during result processing is more like masking the server's error message with new error message and I think it is not appropriate. 

 
 
> Attached the latest code and test patches.

For me the test is still broken:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure

Still broken here. I can see that this patch is having enough
safeguards in PQbatchBegin() and PQsendQueryStart(), but this issue is
really pointing out at something...


I will check this one with fresh setup again and I guess that this should not block the code patch. 

Thanks & Regards,
Vaishnavi
Fujitsu Australia. 

Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Vaishnavi Prabakaran wrote:

> Hmm, With batch mode, after sending COPY command to server(and server
> started processing the query and goes into COPY state) , client does not
> immediately read the result , but it keeps sending other queries to the
> server. By this time, server already encountered the error
> scenario(Receiving different message during COPY state) and sent error
> messages

IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
David Steele
Date:
Hi,

On 3/30/17 2:12 PM, Daniel Verite wrote:
>     Vaishnavi Prabakaran wrote:
>
>> Hmm, With batch mode, after sending COPY command to server(and server
>> started processing the query and goes into COPY state) , client does not
>> immediately read the result , but it keeps sending other queries to the
>> server. By this time, server already encountered the error
>> scenario(Receiving different message during COPY state) and sent error
>> messages
>
> IOW, the test intentionally violates the protocol and then all goes wonky
> because of that.
> That's why I was wondering upthread what's it's supposed to test.
> I mean, regression tests are meant to warn against a desirable behavior
> being unknowingly changed by new code into an undesirable behavior.
> Here we have the undesirable behavior to start with.
> What kind of regression could we fear from that?

The CF has been extended until April 7 but time is still growing short. 
Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this 
submission will be marked "Returned with Feedback".

Thanks,
-- 
-David
david@pgmasters.net



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:

On Sat, Apr 1, 2017 at 2:03 AM, David Steele <david@pgmasters.net> wrote:
Hi,

On 3/30/17 2:12 PM, Daniel Verite wrote:
        Vaishnavi Prabakaran wrote:

Hmm, With batch mode, after sending COPY command to server(and server
started processing the query and goes into COPY state) , client does not
immediately read the result , but it keeps sending other queries to the
server. By this time, server already encountered the error
scenario(Receiving different message during COPY state) and sent error
messages

IOW, the test intentionally violates the protocol and then all goes wonky
because of that.
That's why I was wondering upthread what's it's supposed to test.
I mean, regression tests are meant to warn against a desirable behavior
being unknowingly changed by new code into an undesirable behavior.
Here we have the undesirable behavior to start with.
What kind of regression could we fear from that?

Yes, completely agree, demonstrating the undesirable behavior is not needed as documentation gives enough warning to user. 
The test patch is decided not to go in for now, but will be re-implemented with PSQL commands later. So, during the re-implementation of test patch, I will remove this test. Thanks . 
 

The CF has been extended until April 7 but time is still growing short. Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback".


Thanks for the information, attached the latest patch resolving one compilation warning. And, please discard the test patch as it will be re-implemented later separately. 


Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > The CF has been extended until April 7 but time is still growing short.
> > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this
> > submission will be marked "Returned with Feedback".
> >
> >
> Thanks for the information, attached the latest patch resolving one
> compilation warning. And, please discard the test patch as it will be
> re-implemented later separately.

Hm.  If the tests aren't ready yet, it seems we'll have to move this to
the next CF.


> + <sect1 id="libpq-batch-mode">
> +  <title>Batch mode and query pipelining</title>
> +
> +  <indexterm zone="libpq-batch-mode">
> +   <primary>libpq</primary>
> +   <secondary>batch mode</secondary>
> +  </indexterm>
> +
> +  <indexterm zone="libpq-batch-mode">
> +   <primary>libpq</primary>
> +   <secondary>pipelining</secondary>
> +  </indexterm>
> +
> +  <para>
> +   <application>libpq</application> supports queueing up multiple queries into
> +   a pipeline to be executed as a batch on the server. Batching queries allows
> +   applications to avoid a client/server round-trip after each query to get
> +   the results before issuing the next query.
> +  </para>

"queueing .. into a pipeline" sounds weird to me - but I'm not a native
speaker.  Also batching != pipelining.


> +  <sect2>
> +   <title>When to use batching</title>
> +
> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    offers considerable performance improvements.
> +   </para>

s/offers/can sometimes offer/


> +  <sect2 id="libpq-batch-using">
> +   <title>Using batch mode</title>
> +
> +   <para>
> +    To issue batches the application must switch
> +    <application>libpq</application> into batch mode.

s/libpq/a connection/?


> Enter batch mode with <link
> +    linkend="libpq-PQbatchBegin"><function>PQbatchBegin(conn)</function></link> or test
> +    whether batch mode is active with <link
> +    linkend="libpq-PQbatchStatus"><function>PQbatchStatus(conn)</function></link>. In batch mode only <link
> +    linkend="libpq-async">asynchronous operations</link> are permitted, and
> +    <literal>COPY</literal> is not recommended as it most likely will trigger failure in batch processing. 
> +    (The restriction on <literal>COPY</literal> is an implementation
> +    limit; the PostgreSQL protocol and server can support batched
> <literal>COPY</literal>).

Hm, I'm unconvinced that that's a useful parenthetical in the libpq
docs.


> +   <para>
> +    The client uses libpq's asynchronous query functions to dispatch work,
> +    marking the end of each batch with <function>PQbatchQueueSync</function>.
> +    Concurrently, it uses <function>PQgetResult</function> and
> +    <function>PQbatchQueueProcess</function> to get results.

"Concurrently" imo is a dangerous word, somewhat implying multi-threading.


> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server will
> +     block trying to send results from queries it has already processed to the
> +     client. This only occurs when the client sends enough queries to fill its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Such deadlocks are possible just as well with non-blocking mode, unless
one can avoid sending queries and switching to receiving results anytime
in the application code.


> +    <para>
> +     Batched operations will be executed by the server in the order the client
> +     sends them. The server will send the results in the order the statements
> +     executed. The server may begin executing the batch before all commands
> +     in the batch are queued and the end of batch command is sent. If any
> +     statement encounters an error the server aborts the current transaction and
> +     skips processing the rest of the batch. Query processing resumes after the
> +     end of the failed batch.
> +    </para>

What if a batch contains transaction boundaries?


> +   <sect3 id="libpq-batch-results">
> +    <title>Processing results</title>
> +
> +    <para>
> +     The client <link linkend="libpq-batch-interleave">interleaves result
> +     processing with sending batch queries</link>, or for small batches may
> +     process all results after sending the whole batch.
> +    </para>

That's a very long <link> text, is it not?


> +    <para>
> +     To get the result of the first batch entry the client must call <link
> +     linkend="libpq-PQbatchQueueProcess"><function>PQbatchQueueProcess</function></link>. It must then call

What does 'QueueProcess' mean? Shouldn't it be 'ProcessQueue'? You're
not enquing a process or processing, right?


> +     <function>PQgetResult</function> and handle the results until
> +     <function>PQgetResult</function> returns null (or would return null if
> +     called).

What is that parenthetical referring to?  IIRC we don't provide any
external way to determine PQgetResult would return NULL.


Have you checked how this API works with PQsetSingleRowMode()?

> +    <para>
> +     To enter single-row mode, call <function>PQsetSingleRowMode</function> immediately after a
> +     successful call of <function>PQbatchQueueProcess</function>. This mode selection is effective 
> +     only for the query currently being processed. For more information on the use of <function>PQsetSingleRowMode
> +     </function>, refer to <xref linkend="libpq-single-row-mode">.

Hah ;).


> +    <para>
> +     The client must not assume that work is committed when it
> +     <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +     corresponding result is received to confirm the commit is complete.
> +     Because errors arrive asynchronously the application needs to be able to
> +     restart from the last <emphasis>received</emphasis> committed change and
> +     resend work done after that point if something goes wrong.
> +    </para>

That seems like a batch independent thing, right?  If so, maybe make it
a <note>?


> +     <listitem>
> +      <para>
> +      Causes a connection to enter batch mode if it is currently idle or
> +      already in batch mode.
> +
> +<synopsis>
> +int PQbatchBegin(PGconn *conn);
> +</synopsis>
> +
> +        </para>
> +        <para>
> +          Returns 1 for success. Returns 0 and has no 
> +          effect if the connection is not currently idle, i.e. it has a result 
> +          ready, is waiting for more input from the server, etc. This function 
> +          does not actually send anything to the server, it just changes the 
> +          <application>libpq</application> connection state.
> +
> +        </para>
> +     </listitem>
> +    </varlistentry>

That function name sounds a bit too much like it'd be relevant for a
single batch, not something that can send many batches. enterBatchMode?

> +    <varlistentry id="libpq-PQbatchEnd">
> +     <term>
> +      <function>PQbatchEnd</function>
> +      <indexterm>
> +       <primary>PQbatchEnd</primary>
> +      </indexterm>
> +     </term>
> +
> +     <listitem>
> +      <para>
> +      Causes a connection to exit batch mode if it is currently in batch mode
> +      with an empty queue and no pending results.
> +<synopsis>
> +int PQbatchEnd(PGconn *conn);
> +</synopsis>
> +        </para>
> +        <para>Returns 1 for success.
> +      Returns 1 and takes no action if not in batch mode. If the connection has
> +      pending batch items in the queue for reading with
> +      <function>PQbatchQueueProcess</function>, the current statement isn't finished
> +      processing or there are results pending for collection with
> +      <function>PQgetResult</function>, returns 0 and does nothing.
> +
> +      </para>
> +     </listitem>
> +    </varlistentry>

""


> +    <varlistentry id="libpq-PQbatchQueueSync">
> +     <term>
> +      <function>PQbatchQueueSync</function>
> +      <function>PQbatchQueueProcess</function>

As said above, I'm not a fan of these, because it sounds like you're
queueing a sync/process.


>  /* ----------------
> @@ -1108,7 +1113,7 @@ pqRowProcessor(PGconn *conn, const char **errmsgp)
>          conn->next_result = conn->result;
>          conn->result = res;
>          /* And mark the result ready to return */
> -        conn->asyncStatus = PGASYNC_READY;
> +        conn->asyncStatus = PGASYNC_READY_MORE;
>      }

Uhm, isn't that an API/ABI breakage issue?



>  /*
> - * Common startup code for PQsendQuery and sibling routines
> + * PQmakePipelinedCommand
> + *    Get a new command queue entry, allocating it if required. Doesn't add it to
> + *    the tail of the queue yet, use PQappendPipelinedCommand once the command has
> + *    been written for that. If a command fails once it's called this, it should
> + *    use PQrecyclePipelinedCommand to put it on the freelist or release it.

"command fails once it's called this"?


> +/*
> + * PQrecyclePipelinedCommand
> + *    Push a command queue entry onto the freelist. It must be a dangling entry
> + *    with null next pointer and not referenced by any other entry's next pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +    if (entry == NULL)
> +        return;
> +    if (entry->next != NULL)
> +    {
> +        fprintf(stderr, "tried to recycle non-dangling command queue entry");
> +        abort();

Needs a libpq_gettext()?



> +/*
> + * PQbatchEnd
> + *     End batch mode and return to normal command mode.
> + *
> + *     Has no effect unless the client has processed all results
> + *     from all outstanding batches and the connection is idle.
> + *
> + *     Returns true if batch mode ended.
> + */
> +int
> +PQbatchEnd(PGconn *conn)
> +{
> +    if (!conn)
> +        return false;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return true;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_IDLE:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, IDLE in batch mode"));
> +            break;
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;

Why aren't you returning false here,

> +        case PGASYNC_READY:
> +        case PGASYNC_READY_MORE:
> +        case PGASYNC_BUSY:
> +            /* can't end batch while busy */
> +            return false;

but are here?

> +        case PGASYNC_QUEUED:
> +            break;
> +    }
> +


> +int
> +PQbatchQueueSync(PGconn *conn)
> +{
> +    PGcommandQueueEntry *entry;
> +
> +    if (!conn)
> +        return false;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return false;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_IDLE:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, IDLE in batch mode"));
> +            break;
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;
> +        case PGASYNC_READY:
> +        case PGASYNC_READY_MORE:
> +        case PGASYNC_BUSY:
> +        case PGASYNC_QUEUED:
> +            /* can send sync to end this batch of cmds */
> +            break;
> +    }

Uhm, what is that switch actually achieving? We're not returning an
error code, so ...?

> +    /* Should try to flush immediately if there's room */
> +    PQflush(conn);

"room"?

Also, don't we need to process PQflush's return value?


> +/*
> + * PQbatchQueueProcess
> + *     In batch mode, start processing the next query in the queue.
> + *
> + * Returns true if the next query was popped from the queue and can
> + * be processed by PQconsumeInput, PQgetResult, etc.
> + *
> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.
> + */
> +int
> +PQbatchQueueProcess(PGconn *conn)
> +{
> +    PGcommandQueueEntry *next_query;
> +
> +    if (!conn)
> +        return false;
> +
> +    if (conn->batch_status == PQBATCH_MODE_OFF)
> +        return false;
> +
> +    switch (conn->asyncStatus)
> +    {
> +        case PGASYNC_COPY_IN:
> +        case PGASYNC_COPY_OUT:
> +        case PGASYNC_COPY_BOTH:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, COPY in batch mode"));
> +            break;
> +        case PGASYNC_READY:
> +        case PGASYNC_READY_MORE:
> +        case PGASYNC_BUSY:
> +            /* client still has to process current query or results */
> +            return false;
> +            break;
> +        case PGASYNC_IDLE:
> +            printfPQExpBuffer(&conn->errorMessage,
> +                   libpq_gettext_noop("internal error, IDLE in batch mode"));
> +            break;
> +        case PGASYNC_QUEUED:
> +            /* next query please */
> +            break;
> +    }

Once more, I'm very unconvinced by the switch. Unless you do anything
with the errors, this seems pointless.


> +    if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> +    {
> +        /*
> +         * In an aborted batch we don't get anything from the server for each
> +         * result; we're just discarding input until we get to the next sync
> +         * from the server. The client needs to know its queries got aborted
> +         * so we create a fake PGresult to return immediately from
> +         * PQgetResult.
> +         */
> +        conn->result = PQmakeEmptyPGresult(conn,
> +                                           PGRES_BATCH_ABORTED);
> +        if (!conn->result)
> +        {
> +            printfPQExpBuffer(&conn->errorMessage,
> +                              libpq_gettext("out of memory"));
> +            pqSaveErrorResult(conn);
> +        }
> +        conn->asyncStatus = PGASYNC_READY;

So we still return true in the OOM case?



Greetings,

Andres Freund



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:

Hi, 
On Tue, Apr 4, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > The CF has been extended until April 7 but time is still growing short.
> > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or this
> > submission will be marked "Returned with Feedback".
> >
> >
> Thanks for the information, attached the latest patch resolving one
> compilation warning. And, please discard the test patch as it will be
> re-implemented later separately.

Hm.  If the tests aren't ready yet, it seems we'll have to move this to
the next CF.


Thanks for your review and I will address your review comments and send the newer version of patch shortly.
Just quickly, Is it not ok to consider only the code patch for this CF without test patch?

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.  
 

Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
> Hi,
> On Tue, Apr 4, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2017-04-03 14:10:47 +1000, Vaishnavi Prabakaran wrote:
> > > > The CF has been extended until April 7 but time is still growing short.
> > > > Please respond with a new patch by 2017-04-04 00:00 AoE (UTC-12) or
> > this
> > > > submission will be marked "Returned with Feedback".
> > > >
> > > >
> > > Thanks for the information, attached the latest patch resolving one
> > > compilation warning. And, please discard the test patch as it will be
> > > re-implemented later separately.
> >
> > Hm.  If the tests aren't ready yet, it seems we'll have to move this to
> > the next CF.
> >
> >
> Thanks for your review and I will address your review comments and send the
> newer version of patch shortly.

Cool.

> Just quickly, Is it not ok to consider only the code patch for this CF
> without test patch?

I'd say no, it's not acceptable.  This is too much new code for it not
to be tested.

Andres



Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Tue, Apr 4, 2017 at 8:26 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
>> Just quickly, Is it not ok to consider only the code patch for this CF
>> without test patch?
>
> I'd say no, it's not acceptable.  This is too much new code for it not
> to be tested.

Doesn't it depend actually? I would think that the final patch may not
include all the tests implemented if:
- The thread on which a patch has been developed had a set of tests
done and posted with it.
- Including them does not make sense if we have a way to run those
tests more efficiently. Sometimes a bunch of benchmarks or tests are
run on a patch bu for the final result keeping them around does not
make much sense.

In the case of this patch, it seems to me that we would have a far
better portable set of tests if we had a dedicated set of subcommands
available at psql level, particularly for Windows/MSVC. If that's a
requirement for this patch so let it be. I am not saying that tests
are not necessary. They are of course, but in this case having a bit
more infrastructure would be more be more helpful for users and the
tests themselves.

Note that I won't complain either if this set of C tests are included
at the end.
-- 
Michael



Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
On 2017-04-04 08:57:33 +0900, Michael Paquier wrote:
> On Tue, Apr 4, 2017 at 8:26 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-04-04 09:24:23 +1000, Vaishnavi Prabakaran wrote:
> >> Just quickly, Is it not ok to consider only the code patch for this CF
> >> without test patch?
> >
> > I'd say no, it's not acceptable.  This is too much new code for it not
> > to be tested.
> 
> Doesn't it depend actually?

Well, I didn't make a general statement, I made one about this patch.
And this would add a significant bunch of untested code, and it'll likely
take years till it gets decent coverage outside.


> In the case of this patch, it seems to me that we would have a far
> better portable set of tests if we had a dedicated set of subcommands
> available at psql level, particularly for Windows/MSVC.

That's a really large scope creep imo.  Adding a bunch of user-facing
psql stuff doesn't compare in complexity to running a test across
platforms.  We can just do that from regess.c or such, if that ends up
being a problem..

> If that's a  requirement for this patch so let it be. I am not saying that tests
> are not necessary. They are of course, but in this case having a bit
> more infrastructure would be more be more helpful for users and the
> tests themselves.

I'm not following.


Greetings,

Andres Freund



Re: PATCH: Batch/pipelining support for libpq

From
Vaishnavi Prabakaran
Date:


 Andres Freund <andres@anarazel.de> wrote:
> +
> +  <para>
> +   <application>libpq</application> supports queueing up multiple queries into
> +   a pipeline to be executed as a batch on the server. Batching queries allows
> +   applications to avoid a client/server round-trip after each query to get
> +   the results before issuing the next query.
> +  </para>

"queueing .. into a pipeline" sounds weird to me - but I'm not a native
speaker.  Also batching != pipelining.

Re-phrased the sentence as "libpq supports queries to be queued up in batches and pipeline them to the server, where it will be executed as a batch" . Pipelining queries allows applications to avoid a client/server round-trip after each query to get the results before issuing the next query.
 

> +  <sect2>
> +   <title>When to use batching</title>
> +
> +   <para>
> +    Much like asynchronous query mode, there is no performance disadvantage to
> +    using batching and pipelining. It increases client application complexity
> +    and extra caution is required to prevent client/server deadlocks but
> +    offers considerable performance improvements.
> +   </para>

s/offers/can sometimes offer/


Corrected. 
 

> +  <sect2 id="libpq-batch-using">
> +   <title>Using batch mode</title>
> +
> +   <para>
> +    To issue batches the application must switch
> +    <application>libpq</application> into batch mode.

s/libpq/a connection/?


Corrected. 
 

> Enter batch mode with <link
> +    linkend="libpq-PQbatchBegin"><function>PQbatchBegin(conn)</function></link> or test
> +    whether batch mode is active with <link
> +    linkend="libpq-PQbatchStatus"><function>PQbatchStatus(conn)</function></link>. In batch mode only <link
> +    linkend="libpq-async">asynchronous operations</link> are permitted, and
> +    <literal>COPY</literal> is not recommended as it most likely will trigger failure in batch processing.
> +    (The restriction on <literal>COPY</literal> is an implementation
> +    limit; the PostgreSQL protocol and server can support batched
> <literal>COPY</literal>).

Hm, I'm unconvinced that that's a useful parenthetical in the libpq
docs.


Removed the parenthesis as the line before gives enough warning. 

 

> +   <para>
> +    The client uses libpq's asynchronous query functions to dispatch work,
> +    marking the end of each batch with <function>PQbatchQueueSync</function>.
> +    Concurrently, it uses <function>PQgetResult</function> and
> +    <function>PQbatchQueueProcess</function> to get results.

"Concurrently" imo is a dangerous word, somewhat implying multi-threading.


Corrected by replacing "concurrently" with "And to get results"
 

> +   <note>
> +    <para>
> +     It is best to use batch mode with <application>libpq</application> in
> +     <link linkend="libpq-pqsetnonblocking">non-blocking mode</link>. If used in
> +     blocking mode it is possible for a client/server deadlock to occur. The
> +     client will block trying to send queries to the server, but the server will
> +     block trying to send results from queries it has already processed to the
> +     client. This only occurs when the client sends enough queries to fill its
> +     output buffer and the server's receive buffer before switching to
> +     processing input from the server, but it's hard to predict exactly when
> +     that'll happen so it's best to always use non-blocking mode.
> +    </para>
> +   </note>

Such deadlocks are possible just as well with non-blocking mode, unless
one can avoid sending queries and switching to receiving results anytime
in the application code.


True, and in the documentation , how to interleave result processing and query dispatch is given. 
 

> +    <para>
> +     Batched operations will be executed by the server in the order the client
> +     sends them. The server will send the results in the order the statements
> +     executed. The server may begin executing the batch before all commands
> +     in the batch are queued and the end of batch command is sent. If any
> +     statement encounters an error the server aborts the current transaction and
> +     skips processing the rest of the batch. Query processing resumes after the
> +     end of the failed batch.
> +    </para>

What if a batch contains transaction boundaries?


Regardless of transaction boundaries, failed command will make the entire batch aborted and no commands will be processed in server until Batch sync is sent. And, if the command inside the transaction fails, then the batch will be aborted, but transaction will still be open in the server. Client explicitly needs to close the transaction. This is explained further down in the documentation section "Error handling" , paragraph starting with "If the batch used an implicit transaction.." and "Note" in that section gives warning too. 
 

> +   <sect3 id="libpq-batch-results">
> +    <title>Processing results</title>
> +
> +    <para>
> +     The client <link linkend="libpq-batch-interleave">interleaves result
> +     processing with sending batch queries</link>, or for small batches may
> +     process all results after sending the whole batch.
> +    </para>

That's a very long <link> text, is it not?


Corrected. 
 

> +    <para>
> +     To get the result of the first batch entry the client must call <link
> +     linkend="libpq-PQbatchQueueProcess"><function>PQbatchQueueProcess</function></link>. It must then call

What does 'QueueProcess' mean? Shouldn't it be 'ProcessQueue'? You're
not enquing a process or processing, right?


Changed to PQbatchProcessQueue. 
 

> +     <function>PQgetResult</function> and handle the results until
> +     <function>PQgetResult</function> returns null (or would return null if
> +     called).

What is that parenthetical referring to?  IIRC we don't provide any
external way to determine PQgetResult would return NULL.


True, it is just application's extra knowledge. Removed the parenthesis.


> +    <para>
> +     The client must not assume that work is committed when it
> +     <emphasis>sends</emphasis> a <literal>COMMIT</literal>, only when the
> +     corresponding result is received to confirm the commit is complete.
> +     Because errors arrive asynchronously the application needs to be able to
> +     restart from the last <emphasis>received</emphasis> committed change and
> +     resend work done after that point if something goes wrong.
> +    </para>

That seems like a batch independent thing, right?  If so, maybe make it
a <note>?


Yes, made it as Note. 
 
> +
> +<synopsis>
> +int PQbatchBegin(PGconn *conn);
...

That function name sounds a bit too much like it'd be relevant for a
single batch, not something that can send many batches. enterBatchMode?

> +int PQbatchEnd(PGconn *conn);
...
""


Changed the function names to PQenterBatchMode and PQexitBatchMode.

 

> +    <varlistentry id="libpq-PQbatchQueueSync">
> +     <term>
> +      <function>PQbatchQueueSync</function>
> +      <function>PQbatchQueueProcess</function>

As said above, I'm not a fan of these, because it sounds like you're
queueing a sync/process.


Changed it to PQbatchSyncQueue.
 

>  /* ----------------
> @@ -1108,7 +1113,7 @@ pqRowProcessor(PGconn *conn, const char **errmsgp)
>               conn->next_result = conn->result;
>               conn->result = res;
>               /* And mark the result ready to return */
> -             conn->asyncStatus = PGASYNC_READY;
> +             conn->asyncStatus = PGASYNC_READY_MORE;
>       }

Uhm, isn't that an API/ABI breakage issue?


I think no, asynstatus is in libpq-int.h which is meant to be used only by frontend libpq, applications should use libpq-fe.h . Also, we don't have any API to return the async status of connection to application. So, I think it will not break any existing applications. 
 



>  /*
> - * Common startup code for PQsendQuery and sibling routines
> + * PQmakePipelinedCommand
> + *   Get a new command queue entry, allocating it if required. Doesn't add it to
> + *   the tail of the queue yet, use PQappendPipelinedCommand once the command has
> + *   been written for that. If a command fails once it's called this, it should
> + *   use PQrecyclePipelinedCommand to put it on the freelist or release it.

"command fails once it's called this"?


Corrected.- "Command sending fails"
 

> +/*
> + * PQrecyclePipelinedCommand
> + *   Push a command queue entry onto the freelist. It must be a dangling entry
> + *   with null next pointer and not referenced by any other entry's next pointer.
> + */
> +static void
> +PQrecyclePipelinedCommand(PGconn *conn, PGcommandQueueEntry * entry)
> +{
> +     if (entry == NULL)
> +             return;
> +     if (entry->next != NULL)
> +     {
> +             fprintf(stderr, "tried to recycle non-dangling command queue entry");
> +             abort();

Needs a libpq_gettext()?



Corrected.

 

> +/*
> + * PQbatchEnd
> + *   End batch mode and return to normal command mode.
> + *
> + *   Has no effect unless the client has processed all results
> + *   from all outstanding batches and the connection is idle.
> + *
> + *   Returns true if batch mode ended.
> + */
> +int
> +PQbatchEnd(PGconn *conn)
> +{
> +     if (!conn)
> +             return false;
> +
> +     if (conn->batch_status == PQBATCH_MODE_OFF)
> +             return true;
> +
> +     switch (conn->asyncStatus)
> +     {
> +             case PGASYNC_IDLE:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, IDLE in batch mode"));
> +                     break;
> +             case PGASYNC_COPY_IN:
> +             case PGASYNC_COPY_OUT:
> +             case PGASYNC_COPY_BOTH:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, COPY in batch mode"));
> +                     break;

Why aren't you returning false here,


Using copy command in batch mode is the error case, so instead of giving opportunity to the application to finish the error scenario, we set the error message and allows it to quit the batch mode. 



> +             case PGASYNC_READY:
> +             case PGASYNC_READY_MORE:
> +             case PGASYNC_BUSY:
> +                     /* can't end batch while busy */
> +                     return false;

but are here?

Application gets a chance to finish the result processing and exit the batch mode. 

 
> +int
> +PQbatchQueueSync(PGconn *conn)
> +{
> +     PGcommandQueueEntry *entry;
> +
> +     if (!conn)
> +             return false;
> +
> +     if (conn->batch_status == PQBATCH_MODE_OFF)
> +             return false;
> +
> +     switch (conn->asyncStatus)
> +     {
> +             case PGASYNC_IDLE:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, IDLE in batch mode"));
> +                     break;
> +             case PGASYNC_COPY_IN:
> +             case PGASYNC_COPY_OUT:
> +             case PGASYNC_COPY_BOTH:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, COPY in batch mode"));
> +                     break;
> +             case PGASYNC_READY:
> +             case PGASYNC_READY_MORE:
> +             case PGASYNC_BUSY:
> +             case PGASYNC_QUEUED:
> +                     /* can send sync to end this batch of cmds */
> +                     break;
> +     }

Uhm, what is that switch actually achieving? We're not returning an
error code, so ...?

These errors are marked here, so that the application later when reading the results for batch queries, will have more information about what went wrong. This is just an additional information to user only, but doesn't stop the error scenario. 

 


> +     /* Should try to flush immediately if there's room */
> +     PQflush(conn);

"room"?

Also, don't we need to process PQflush's return value?


Corrected both .

 

> +/*
> + * PQbatchQueueProcess
> + *    In batch mode, start processing the next query in the queue.
> + *
> + * Returns true if the next query was popped from the queue and can
> + * be processed by PQconsumeInput, PQgetResult, etc.
> + *
> + * Returns false if the current query isn't done yet, the connection
> + * is not in a batch, or there are no more queries to process.
> + */
> +int
> +PQbatchQueueProcess(PGconn *conn)
> +{
> +     PGcommandQueueEntry *next_query;
> +
> +     if (!conn)
> +             return false;
> +
> +     if (conn->batch_status == PQBATCH_MODE_OFF)
> +             return false;
> +
> +     switch (conn->asyncStatus)
> +     {
> +             case PGASYNC_COPY_IN:
> +             case PGASYNC_COPY_OUT:
> +             case PGASYNC_COPY_BOTH:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, COPY in batch mode"));
> +                     break;
> +             case PGASYNC_READY:
> +             case PGASYNC_READY_MORE:
> +             case PGASYNC_BUSY:
> +                     /* client still has to process current query or results */
> +                     return false;
> +                     break;
> +             case PGASYNC_IDLE:
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                libpq_gettext_noop("internal error, IDLE in batch mode"));
> +                     break;
> +             case PGASYNC_QUEUED:
> +                     /* next query please */
> +                     break;
> +     }

Once more, I'm very unconvinced by the switch. Unless you do anything
with the errors, this seems pointless.

Same answer as above switch question.
 




> +     if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> +     {
> +             /*
> +              * In an aborted batch we don't get anything from the server for each
> +              * result; we're just discarding input until we get to the next sync
> +              * from the server. The client needs to know its queries got aborted
> +              * so we create a fake PGresult to return immediately from
> +              * PQgetResult.
> +              */
> +             conn->result = PQmakeEmptyPGresult(conn,
> +                                                                                PGRES_BATCH_ABORTED);
> +             if (!conn->result)
> +             {
> +                     printfPQExpBuffer(&conn->errorMessage,
> +                                                       libpq_gettext("out of memory"));
> +                     pqSaveErrorResult(conn);
> +             }
> +             conn->asyncStatus = PGASYNC_READY;

So we still return true in the OOM case?


Corrected to return false, as further calls to this function will also fail. And , application as well can check for this failure reason from PQgetResult. 

Regarding test patch, I have corrected the test suite after David Steele's comments.  
Also, I would like to mention that a companion patch was submitted by David Steele up-thread. 

Attached the latest code and test patch. 


Thanks & Regards,
Vaishnavi,
Fujitsu Australia. 

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2017-04-05 17:00:42 +1000, Vaishnavi Prabakaran wrote:
> Regarding test patch, I have corrected the test suite after David Steele's
> comments.
> Also, I would like to mention that a companion patch was submitted by David
> Steele up-thread.
> 
> Attached the latest code and test patch.

My impression is that this'll need a couple more rounds of review. Given
that this'll establish API we'll pretty much ever going to be able to
change/remove, I think it'd be a bad idea to rush this into v10.
Therefore I propose moving this to the next CF.

- Andres



Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
Hi,

It seems like this patch is nearly finished. I fixed all the remaining
issues. I'm also asking
a confirmation of the test scenarios you want to see in the next
version of the patch.

> Hi,
>
> On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote:
> > Totally unasked for, here's a rebase of this patch series.  I didn't do
> > anything other than rebasing to current master, solving a couple of very
> > trivial conflicts, fixing some whitespace complaints by git apply, and
> > running tests to verify everthing works.
> >
> > I don't foresee working on this at all, so if anyone is interested in
> > seeing this feature in, I encourage them to read and address
> > Horiguchi-san's feedback.
>
> Nor am I planning to do so, but I do think its a pretty important
> improvement.
>
>
Fixed

>
>
> > +/*
> > + * PQrecyclePipelinedCommand
> > + * Push a command queue entry onto the freelist. It must be a dangling entry
> > + * with null next pointer and not referenced by any other entry's next pointer.
> > + */
>
> Dangling sounds a bit like it's already freed.
>
>
Fixed

>
>
> > +/*
> > + * PQbatchSendQueue
> > + * End a batch submission by sending a protocol sync. The connection will
> > + * remain in batch mode and unavailable for new synchronous command execution
> > + * functions until all results from the batch are processed by the client.
>
> I feel like the reference to the protocol sync is a bit too low level
> for an external API. It should first document what the function does
> from a user's POV.
>
> I think it'd also be good to document whether / whether not queries can
> already have been sent before PQbatchSendQueue is called or not.
>
Fixed

>
>
>
> > + if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> > + {
> > + /*
> > + * In an aborted batch we don't get anything from the server for each
> > + * result; we're just discarding input until we get to the next sync
> > + * from the server. The client needs to know its queries got aborted
> > + * so we create a fake PGresult to return immediately from
> > + * PQgetResult.
> > + */
> > + conn->result = PQmakeEmptyPGresult(conn,
> > +   PGRES_BATCH_ABORTED);
> > + if (!conn->result)
> > + {
> > + printfPQExpBuffer(&conn->errorMessage,
> > +  libpq_gettext("out of memory"));
> > + pqSaveErrorResult(conn);
> > + return 0;
>
> Is there any way an application can recover at this point? ISTM we'd be
> stuck in the previous asyncStatus, no?
>

conn->result is null when malloc(sizeof(PGresult)) returns null. It's
very unlikely unless
the server machine is out of memory, so the server will probably be
unresponsive anyway.

I'm leaving this as it is but if anyone has a solution simple to
implement I'll fix it.

>
>
> > +/* pqBatchFlush
> > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> > + * In non-batch mode, data will be flushed all the time.
> > + */
> > +static int
> > +pqBatchFlush(PGconn *conn)
> > +{
> > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD))
> > + return(pqFlush(conn));
> > + return 0; /* Just to keep compiler quiet */
> > +}
>
> unnecessarily long line.
>
Fixed

>
> > +/*
> > + * Connection's outbuffer threshold is set to 64k as it is safe
> > + * in Windows as per comments in pqSendSome() API.
> > + */
> > +#define OUTBUFFER_THRESHOLD 65536
>
> I don't think the comment explains much. It's fine to send more than 64k
> with pqSendSome(), they'll just be send with separate pgsecure_write()
> invocations. And only on windows.
>
> It clearly makes sense to start sending out data at a certain
> granularity to avoid needing unnecessary amounts of memory, and to make
> more efficient use of latency / serer side compute.
>
> It's not implausible that 64k is the right amount for that, I just don't
> think the explanation above is good.
>

Fixed

> > diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c
> > new file mode 100644
> > index 0000000000..4d6ba266e5
> > --- /dev/null
> > +++ b/src/test/modules/test_libpq/testlibpqbatch.c
> > @@ -0,0 +1,1456 @@
> > +/*
> > + * src/test/modules/test_libpq/testlibpqbatch.c
> > + *
> > + *
> > + * testlibpqbatch.c
> > + * Test of batch execution functionality
> > + */
> > +
> > +#ifdef WIN32
> > +#include <windows.h>
> > +#endif
>
> ISTM that this shouldn't be needed in a test program like this?
> Shouldn't libpq abstract all of this away?
>

Fixed.

>
> > +static void
> > +simple_batch(PGconn *conn)
> > +{
>
> ISTM that all or at least several of these should include tests of
> transactional behaviour with pipelining (e.g. using both implicit and
> explicit transactions inside a single batch, using transactions across
> batches, multiple explicit transactions inside a batch).
>

@Andres, just to make sure I understood, here is the test scenarios I'll add:

Implicit and explicit multiple transactions:
   start batch:
     create_table
     insert X
     begin transaction
       insert X
     commit transaction
     begin transaction
       insert Y
     commit transaction
  end batch

Transaction across batches:
   start batch:
     create_table
     begin transaction
       insert X
   end batch
   start batch:
       insert Y
    commit transaction
  end batch

>
>
> > + PGresult   *res = NULL;
> > + const char *dummy_params[1] = {"1"};
> > + Oid dummy_param_oids[1] = {INT4OID};
> > +
> > + fprintf(stderr, "simple batch... ");
> > + fflush(stderr);
>
> Why do we need fflush()? IMO that shouldn't be needed in a use like
> this? Especially not on stderr, which ought to be unbuffered?
>
Removed.

>
> > + /*
> > + * Enter batch mode and dispatch a set of operations, which we'll then
> > + * process the results of as they come in.
> > + *
> > + * For a simple case we should be able to do this without interim
> > + * processing of results since our out buffer will give us enough slush to
> > + * work with and we won't block on sending. So blocking mode is fine.
> > + */
> > + if (PQisnonblocking(conn))
> > + {
> > + fprintf(stderr, "Expected blocking connection mode\n");
> > + goto fail;
> > + }
>
> Perhaps worth adding a helper for this?
>
> #define EXPECT(condition, explanation) \
>

Fixed (and saved 600 lines :).



Once I get your confirmation of the test scenarios, I'll implement
them and share another patch.

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2020-Aug-31, Matthieu Garrigues wrote:

> It seems like this patch is nearly finished. I fixed all the remaining
> issues. I'm also asking a confirmation of the test scenarios you want
> to see in the next version of the patch.

Hmm, apparently nobody noticed that this patch is not registered in the
current commitfest.

Since it was submitted ahead of the deadline, I'm going to add it there.
(The patch has also undergone a lot of review already; it's certainly
not a newcomer.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Dave Cramer
Date:

Alvaro,


On Fri, 4 Sep 2020 at 17:26, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Aug-31, Matthieu Garrigues wrote:

> It seems like this patch is nearly finished. I fixed all the remaining
> issues. I'm also asking a confirmation of the test scenarios you want
> to see in the next version of the patch.

Hmm, apparently nobody noticed that this patch is not registered in the
current commitfest.

Since it was submitted ahead of the deadline, I'm going to add it there.
(The patch has also undergone a lot of review already; it's certainly
not a newcomer.)

I am looking for this in the commitfest and can't find it. However there is an old commitfest entry


Do you have the link for the new one ?

Dave Cramer
www.postgres.rocks
 


Re: PATCH: Batch/pipelining support for libpq

From
Dave Cramer
Date:

Dave Cramer
www.postgres.rocks


On Mon, 31 Aug 2020 at 11:46, Matthieu Garrigues <matthieu.garrigues@gmail.com> wrote:
Hi,

It seems like this patch is nearly finished. I fixed all the remaining
issues. I'm also asking
a confirmation of the test scenarios you want to see in the next
version of the patch.

> Hi,
>
> On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote:
> > Totally unasked for, here's a rebase of this patch series.  I didn't do
> > anything other than rebasing to current master, solving a couple of very
> > trivial conflicts, fixing some whitespace complaints by git apply, and
> > running tests to verify everthing works.
> >
> > I don't foresee working on this at all, so if anyone is interested in
> > seeing this feature in, I encourage them to read and address
> > Horiguchi-san's feedback.
>
> Nor am I planning to do so, but I do think its a pretty important
> improvement.
>
>
Fixed

>
>
> > +/*
> > + * PQrecyclePipelinedCommand
> > + * Push a command queue entry onto the freelist. It must be a dangling entry
> > + * with null next pointer and not referenced by any other entry's next pointer.
> > + */
>
> Dangling sounds a bit like it's already freed.
>
>
Fixed

>
>
> > +/*
> > + * PQbatchSendQueue
> > + * End a batch submission by sending a protocol sync. The connection will
> > + * remain in batch mode and unavailable for new synchronous command execution
> > + * functions until all results from the batch are processed by the client.
>
> I feel like the reference to the protocol sync is a bit too low level
> for an external API. It should first document what the function does
> from a user's POV.
>
> I think it'd also be good to document whether / whether not queries can
> already have been sent before PQbatchSendQueue is called or not.
>
Fixed

>
>
>
> > + if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> > + {
> > + /*
> > + * In an aborted batch we don't get anything from the server for each
> > + * result; we're just discarding input until we get to the next sync
> > + * from the server. The client needs to know its queries got aborted
> > + * so we create a fake PGresult to return immediately from
> > + * PQgetResult.
> > + */
> > + conn->result = PQmakeEmptyPGresult(conn,
> > +   PGRES_BATCH_ABORTED);
> > + if (!conn->result)
> > + {
> > + printfPQExpBuffer(&conn->errorMessage,
> > +  libpq_gettext("out of memory"));
> > + pqSaveErrorResult(conn);
> > + return 0;
>
> Is there any way an application can recover at this point? ISTM we'd be
> stuck in the previous asyncStatus, no?
>

conn->result is null when malloc(sizeof(PGresult)) returns null. It's
very unlikely unless
the server machine is out of memory, so the server will probably be
unresponsive anyway.

I'm leaving this as it is but if anyone has a solution simple to
implement I'll fix it.

>
>
> > +/* pqBatchFlush
> > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> > + * In non-batch mode, data will be flushed all the time.
> > + */
> > +static int
> > +pqBatchFlush(PGconn *conn)
> > +{
> > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD))
> > + return(pqFlush(conn));
> > + return 0; /* Just to keep compiler quiet */
> > +}
>
> unnecessarily long line.
>
Fixed

>
> > +/*
> > + * Connection's outbuffer threshold is set to 64k as it is safe
> > + * in Windows as per comments in pqSendSome() API.
> > + */
> > +#define OUTBUFFER_THRESHOLD 65536
>
> I don't think the comment explains much. It's fine to send more than 64k
> with pqSendSome(), they'll just be send with separate pgsecure_write()
> invocations. And only on windows.
>
> It clearly makes sense to start sending out data at a certain
> granularity to avoid needing unnecessary amounts of memory, and to make
> more efficient use of latency / serer side compute.
>
> It's not implausible that 64k is the right amount for that, I just don't
> think the explanation above is good.
>

Fixed

> > diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c
> > new file mode 100644
> > index 0000000000..4d6ba266e5
> > --- /dev/null
> > +++ b/src/test/modules/test_libpq/testlibpqbatch.c
> > @@ -0,0 +1,1456 @@
> > +/*
> > + * src/test/modules/test_libpq/testlibpqbatch.c
> > + *
> > + *
> > + * testlibpqbatch.c
> > + * Test of batch execution functionality
> > + */
> > +
> > +#ifdef WIN32
> > +#include <windows.h>
> > +#endif
>
> ISTM that this shouldn't be needed in a test program like this?
> Shouldn't libpq abstract all of this away?
>

Fixed.

>
> > +static void
> > +simple_batch(PGconn *conn)
> > +{
>
> ISTM that all or at least several of these should include tests of
> transactional behaviour with pipelining (e.g. using both implicit and
> explicit transactions inside a single batch, using transactions across
> batches, multiple explicit transactions inside a batch).
>

@Andres, just to make sure I understood, here is the test scenarios I'll add:

Implicit and explicit multiple transactions:
   start batch:
     create_table
     insert X
     begin transaction
       insert X
     commit transaction
     begin transaction
       insert Y
     commit transaction
  end batch

Transaction across batches:
   start batch:
     create_table
     begin transaction
       insert X
   end batch
   start batch:
       insert Y
    commit transaction
  end batch

>
>
> > + PGresult   *res = NULL;
> > + const char *dummy_params[1] = {"1"};
> > + Oid dummy_param_oids[1] = {INT4OID};
> > +
> > + fprintf(stderr, "simple batch... ");
> > + fflush(stderr);
>
> Why do we need fflush()? IMO that shouldn't be needed in a use like
> this? Especially not on stderr, which ought to be unbuffered?
>
Removed.

>
> > + /*
> > + * Enter batch mode and dispatch a set of operations, which we'll then
> > + * process the results of as they come in.
> > + *
> > + * For a simple case we should be able to do this without interim
> > + * processing of results since our out buffer will give us enough slush to
> > + * work with and we won't block on sending. So blocking mode is fine.
> > + */
> > + if (PQisnonblocking(conn))
> > + {
> > + fprintf(stderr, "Expected blocking connection mode\n");
> > + goto fail;
> > + }
>
> Perhaps worth adding a helper for this?
>
> #define EXPECT(condition, explanation) \
>

Fixed (and saved 600 lines :).



Once I get your confirmation of the test scenarios, I'll implement
them and share another patch.


There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San. 

From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.

The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.

Dave 

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2020-Sep-21, Dave Cramer wrote:

Hello Dave,

> I am looking for this in the commitfest and can't find it. However there is
> an old commitfest entry
> 
> https://commitfest.postgresql.org/13/1024/
> 
> Do you have the link for the new one ?

Here you go:

https://commitfest.postgresql.org/29/2724/

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Batch/pipelining support for libpq

From
Dave Cramer
Date:


On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <matthieu.garrigues@gmail.com> wrote:
Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>>
> There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San.
>
> From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.
>
> The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
>

Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
thing: I'll keep PQgetResult returning null between the result of two
batched query so the user
can know which result comes from which query.

Fair enough.

There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.

Thanks for working on this!


Dave Cramer
www.postgres.rocks 

Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>>
> There was a comment upthread a while back that people should look at the comments made in
https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jpby Horiguchi-San.
 
>
> From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting
itin PQgetResult.
 
>
> The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems
likeputting this inside PQgetResult would get my vote as it leaves the interface unchanged.
 
>

Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
thing: I'll keep PQgetResult returning null between the result of two
batched query so the user
can know which result comes from which query.



Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
On Mon, Sep 21, 2020 at 3:39 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>
>
>
> On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <matthieu.garrigues@gmail.com> wrote:
>>
>> Matthieu Garrigues
>>
>> On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>> >>
>> > There was a comment upthread a while back that people should look at the comments made in
https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jpby Horiguchi-San.
 
>> >
>> > From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just
puttingit in PQgetResult.
 
>> >
>> > The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it
seemslike putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
 
>> >
>>
>> Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
>> thing: I'll keep PQgetResult returning null between the result of two
>> batched query so the user
>> can know which result comes from which query.
>
>
> Fair enough.
>
> There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.
>
> Thanks for working on this!
>

Yes I already addressed the other things in the v19 patch:
https://www.postgresql.org/message-id/flat/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@mail.gmail.com



Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
Hi Dave,
I merged PQbatchProcessQueue into PQgetResult.
One first init call to PQbatchProcessQueue was also required in
PQsendQueue to have
PQgetResult ready to read the first batch query.

Tests and documentation are updated accordingly.

Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:39 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>
>
>
> On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <matthieu.garrigues@gmail.com> wrote:
>>
>> Matthieu Garrigues
>>
>> On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>> >>
>> > There was a comment upthread a while back that people should look at the comments made in
https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jpby Horiguchi-San.
 
>> >
>> > From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just
puttingit in PQgetResult.
 
>> >
>> > The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it
seemslike putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
 
>> >
>>
>> Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
>> thing: I'll keep PQgetResult returning null between the result of two
>> batched query so the user
>> can know which result comes from which query.
>
>
> Fair enough.
>
> There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.
>
> Thanks for working on this!
>
>
> Dave Cramer
> www.postgres.rocks

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Michael Paquier
Date:
On Mon, Sep 21, 2020 at 07:55:03PM +0200, Matthieu Garrigues wrote:
> I merged PQbatchProcessQueue into PQgetResult.
> One first init call to PQbatchProcessQueue was also required in
> PQsendQueue to have
> PQgetResult ready to read the first batch query.
>
> Tests and documentation are updated accordingly.

The documentation is failing to build, and the patch does not build
correctly on Windows.  Could you address that?
--
Michael

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
On Thu, Oct 1, 2020 at 6:35 AM Michael Paquier <michael@paquier.xyz> wrote:

> The documentation is failing to build, and the patch does not build
> correctly on Windows.  Could you address that?
> --
> Michael

Yes I'm on it.

-- 
Matthieu



Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
This patch fixes compilation on windows and compilation of the documentation.

Matthieu Garrigues

On Thu, Oct 1, 2020 at 8:41 AM Matthieu Garrigues
<matthieu.garrigues@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 6:35 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> > The documentation is failing to build, and the patch does not build
> > correctly on Windows.  Could you address that?
> > --
> > Michael
>
> Yes I'm on it.
>
> --
> Matthieu

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
I started reading this patch.  As a starting point I decided to post an
updated copy (v22), wherein I reverted the overwriting of
src/include/port/linux.h with the win32.h contents (?) and the inclusion
of <windows.h> in libpq-fe.h.  If those are needed, some different
solution will have to be found.

Trivial other changes (pgindent etc); nothing of substance.  With the
PQtrace() patch I posted at [1] the added test program crashes -- I
don't know if the fault lies in this patch or the other one.

[1] https://postgr.es/m/20201026162313.GA22502@alvherre.pgsql

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
In v23 I've gone over docs; discovered that PQgetResults docs were
missing the new values.  Added those.  No significant other changes yet.




Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.



Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Dave Cramer
Date:
Alvaro,


On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.



Thanks for looking at this.

What else does it need to get it in shape to apply?


Dave Cramer
www.postgres.rocks 

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
Hi Dave,

On 2020-Nov-03, Dave Cramer wrote:

> On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 
> > On 2020-Nov-02, Alvaro Herrera wrote:
> >
> > > In v23 I've gone over docs; discovered that PQgetResults docs were
> > > missing the new values.  Added those.  No significant other changes yet.
>
> Thanks for looking at this.
> 
> What else does it need to get it in shape to apply?

I want to go over the code in depth to grok the design more fully.

It would definitely help if you (and others) could think about the API
being added: Does it fulfill the promises being made?  Does it offer the
guarantees that real-world apps want to have?  I'm not much of an
application writer myself -- particularly high-traffic apps that would
want to use this.  As a driver author I would welcome your insight in
these questions.




Re: PATCH: Batch/pipelining support for libpq

From
Dave Cramer
Date:


On Tue, 3 Nov 2020 at 08:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hi Dave,

On 2020-Nov-03, Dave Cramer wrote:

> On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > On 2020-Nov-02, Alvaro Herrera wrote:
> >
> > > In v23 I've gone over docs; discovered that PQgetResults docs were
> > > missing the new values.  Added those.  No significant other changes yet.
>
> Thanks for looking at this.
>
> What else does it need to get it in shape to apply?

I want to go over the code in depth to grok the design more fully.

It would definitely help if you (and others) could think about the API
being added: Does it fulfill the promises being made?  Does it offer the
guarantees that real-world apps want to have?  I'm not much of an
application writer myself -- particularly high-traffic apps that would
want to use this.  As a driver author I would welcome your insight in
these questions.


I'm sort of in the same boat as you. While I'm closer to the client. I don't personally write that much client code.

I'd really like to hear from the users here.


Dave Cramer
www.postgres.rocks 

Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
I implemented a C++ async HTTP server using this new batch mode and it
provides everything I needed to transparently batch sql requests.
It gives a performance boost  between x2 and x3 on this benchmark:

https://www.techempower.com/benchmarks/#section=test&runid=3097dbae-5228-454c-ba2e-2055d3982790&hw=ph&test=query&a=2&f=zik0zj-zik0zj-zik0zj-zik0zj-zieepr-zik0zj-zik0zj-zik0zj-zik0zj-zik0zj-zik0zj

I'll ask other users interested in this to review the API.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 4:56 PM Dave Cramer <davecramer@postgres.rocks> wrote:
>
>
>
> On Tue, 3 Nov 2020 at 08:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> Hi Dave,
>>
>> On 2020-Nov-03, Dave Cramer wrote:
>>
>> > On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> >
>> > > On 2020-Nov-02, Alvaro Herrera wrote:
>> > >
>> > > > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > > > missing the new values.  Added those.  No significant other changes yet.
>> >
>> > Thanks for looking at this.
>> >
>> > What else does it need to get it in shape to apply?
>>
>> I want to go over the code in depth to grok the design more fully.
>>
>> It would definitely help if you (and others) could think about the API
>> being added: Does it fulfill the promises being made?  Does it offer the
>> guarantees that real-world apps want to have?  I'm not much of an
>> application writer myself -- particularly high-traffic apps that would
>> want to use this.  As a driver author I would welcome your insight in
>> these questions.
>>
>
> I'm sort of in the same boat as you. While I'm closer to the client. I don't personally write that much client code.
>
> I'd really like to hear from the users here.
>
>
> Dave Cramer
> www.postgres.rocks



Re: PATCH: Batch/pipelining support for libpq

From
"David G. Johnston"
Date:
On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.


Just reading the documentation of this patch, haven't been following the longer thread:

Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to how synchronous functions are disallowed?

"Batched operations will be executed by the server in the order the client
sends them. The server will send the results in the order the statements
executed."

Maybe:

"The server executes statements, and returns results, in the order the client sends them."

Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to add unnecessary cognitive load.

+     The client <link linkend="libpq-batch-interleave">interleaves result
+     processing</link> with sending batch queries, or for small batches may
+     process all results after sending the whole batch.

Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the complete batch has been sent."

I would expect to process the results of a batch only after sending the entire batch to the server.  That I don't have to is informative but knowing when I should avoid doing so, and why, is informative as well.  To the extreme while you can use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing pointless.  Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems worth considering.  The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or is it just process memory on the server) if interleaving it not performed and sizes are large.

I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored while previous completed transactions are retained" in the "When to Use Batching".  Something like "Batching is less useful, and more complex, when a single batch contains multiple transactions (see Error Handling)."

My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction, end the batch, check for batch failure and if it doesn't fail have the option to easily continue without processing individual pgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the rest, knowing that the transaction as a whole was reverted and the batch unapplied).  I've never interfaced with libpq directly.  Though given how the existing C API works what is implemented here seems consistent.

The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side behavior where nothing is sent to the server until the batch has been completed.  Reading further it becomes clear that all it basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while prior commands have their results waiting to be ingested by the client.

Batch seems like the user-visible term to describe this feature.  Pipeline seems like an implementation detail that doesn't need to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the first two paragraphs of the chapter and never without being linked directly to "batch".  I would probably leave the indexterm and have a paragraph describing that batching is implemented using a query pipeline so that people with the implementation detail on their mind can find this chapter, but the prose for the user should just stick to batching.

Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some more words spent on what trade-offs are being made when using batching versus normal command-response processing.  That said, while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the documentation at the moment.  Hopefully my perspective helps though, and depending on what happens next I may try and make my thoughts more concrete with an actual patch.

David J.

Re: PATCH: Batch/pipelining support for libpq

From
Andres Freund
Date:
Hi,

On 2020-11-03 10:42:34 -0300, Alvaro Herrera wrote:
> It would definitely help if you (and others) could think about the API
> being added: Does it fulfill the promises being made?  Does it offer the
> guarantees that real-world apps want to have?  I'm not much of an
> application writer myself -- particularly high-traffic apps that would
> want to use this.

Somewhere earlier in this thread there was a patch with support for
batching in pgbench. I think it'd be good to refresh that. Both because
it shows at least some real-world-lite usage of the feature and because
we need a way to stress it to see whether it has unnecessary
bottlenecks.

Greetings,

Andres Freund



Re: PATCH: Batch/pipelining support for libpq

From
Matthieu Garrigues
Date:
Hi David,

Thanks for the feedback. I did rework a bit the doc based on your
remarks. Here is the v24 patch.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 6:21 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> On 2020-Nov-02, Alvaro Herrera wrote:
>>
>> > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > missing the new values.  Added those.  No significant other changes yet.
>>
>
> Just reading the documentation of this patch, haven't been following the longer thread:
>
> Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to
howsynchronous functions are disallowed? 
>
> "Batched operations will be executed by the server in the order the client
> sends them. The server will send the results in the order the statements
> executed."
>
> Maybe:
>
> "The server executes statements, and returns results, in the order the client sends them."
>
> Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to
addunnecessary cognitive load. 
>
> +     The client <link linkend="libpq-batch-interleave">interleaves result
> +     processing</link> with sending batch queries, or for small batches may
> +     process all results after sending the whole batch.
>
> Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the
completebatch has been sent." 
>
> I would expect to process the results of a batch only after sending the entire batch to the server.  That I don't
haveto is informative but knowing when I should avoid doing so, and why, is informative as well.  To the extreme while
youcan use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing
pointless. Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems
worthconsidering.  The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or
isit just process memory on the server) if interleaving it not performed and sizes are large. 
>
> I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored
whileprevious completed transactions are retained" in the "When to Use Batching".  Something like "Batching is less
useful,and more complex, when a single batch contains multiple transactions (see Error Handling)." 
>
> My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction,
endthe batch, check for batch failure and if it doesn't fail have the option to easily continue without processing
individualpgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the
rest,knowing that the transaction as a whole was reverted and the batch unapplied).  I've never interfaced with libpq
directly. Though given how the existing C API works what is implemented here seems consistent. 
>
> The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side
behaviorwhere nothing is sent to the server until the batch has been completed.  Reading further it becomes clear that
allit basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while
priorcommands have their results waiting to be ingested by the client. 
>
> Batch seems like the user-visible term to describe this feature.  Pipeline seems like an implementation detail that
doesn'tneed to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the
firsttwo paragraphs of the chapter and never without being linked directly to "batch".  I would probably leave the
indextermand have a paragraph describing that batching is implemented using a query pipeline so that people with the
implementationdetail on their mind can find this chapter, but the prose for the user should just stick to batching. 
>
> Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some
morewords spent on what trade-offs are being made when using batching versus normal command-response processing.  That
said,while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the
documentationat the moment.  Hopefully my perspective helps though, and depending on what happens next I may try and
makemy thoughts more concrete with an actual patch. 
>
> David J.
>

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
(Adding previous reviewers to CC)

On 2020-Nov-03, David G. Johnston wrote:

> Given the caveats around blocking mode connections why not just require
> non-blocking mode, in a similar fashion to how synchronous functions are
> disallowed?

This is a very good question.  Why indeed?  Does anybody have a good
answer to this?  If not, I propose we just require that non-blocking
mode is in use in order for batch mode to be used.

I've been doing a review pass over this patch and have an updated
version, which I intend to share later today (after I fix what appears
to be a misunderstanding in the "singlerow" test in testlibpqbatch.c)



Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
Here's a v25.

I made a few more changes to the docs per David's suggestions; I also
reordered the sections quite a bit.  It's now like this:
 * Batch Mode
   * Using Batch Mode
     * Issuing Queries
     * Processing Results
     * Error Handling
     * Interleaving Result Processing and Query Dispatch
     * Ending Batch Mode
   * Functions Associated with Batch Mode
   * When to Use Batching

To me as a reader, this makes more sense, but if you disagree, I think
we should discuss further changes.  (For example, maybe we should move
the "Functions" section at the end?)  The original had "When to Use
Batching" at the start, but it seemed to me that the points it's making
are not as critical as understanding what it is.

I reworked the test program to better fit the TAP model; I found that if
one test mecha failed in whatever way, the connection would be in a
weird state and cause the next test to also fail.  I changed so that it
runs one test and exits; then the t/001_libpq_async.pl file (hmm, need
to rename to 001_batch.pl I guess) calls it once for each test.
I adapted the test code to our code style.  I also removed the "timings"
stuff; I think that's something better left to pgbench.

(I haven't looked at Daniel's pgbench stuff yet, but I will do that
next.)

While looking at how the tests worked, I gave a hard stare at the new
libpq code and cleaned it up also.  There's a lot of minor changes, but
nothing truly substantial.  I moved the code around a lot, to keep
things where grouped together they belong.

I'm not 100% clear on things like PGconn->batch_status and how
PGconn->asyncStatus works.  Currently everything seems to work
correctly, but I'm worried that because we add new status values to
asyncStatus, some existing code might not be doing everything correctly
(for example when changing to/from ASYNC_BUSY in some cases, are we 100%
we shouldn't be changing to ASYNC_QUEUED?)


While looking this over I noticed a thread from 2014[1] where Matt
Newell tried to implement this stuff and apparently the main review
comment he got before abandoning the patch was that the user would like
a way to access the query that corresponded to each result.  The current
patch set does not address that need; the approach to this problem is
that it's on the application's head to keep track of this.  Honestly I
don't understand why it would be otherwise ... I'm not sure that it
makes sense to expect that the application is stupid enough that it
doesn't keep track in which order it sent things, but bright enough to
keep pointers to the queries it sent (??).  So this seems okay to me.
But added Heikki and Claudio to CC because of that old thread.


[1] https://postgr.es/m/2059418.jZQvL3lH90@obsidian


Attachment

Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
    Alvaro Herrera wrote:

> I adapted the test code to our code style.  I also removed the "timings"
> stuff; I think that's something better left to pgbench.
>
> (I haven't looked at Daniel's pgbench stuff yet, but I will do that
> next.)

The patch I posted in [1] was pretty simple, but at the time, query
results were always discarded. Now that pgbench can instantiate
variables from query results, a script can do:
  select 1 as var \gset
  select :var;
This kind of sequence wouldn't work in batch mode since it
sends queries before getting results of previous queries.

So maybe \gset should be rejected when inside a batch section.

Or alternatively pgbench should collect results before a variable is
reinjected into a query, thereby stalling the pipeline.
To do this only when necessary, it would have to track read-write
dependencies among variables, which seems overly complicated though.

[1]
https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da26d7@manitou-mail.org

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



Re: PATCH: Batch/pipelining support for libpq

From
"David G. Johnston"
Date:
On Fri, Nov 13, 2020 at 5:38 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's a v25.

I made a few more changes to the docs per David's suggestions; I also
reordered the sections quite a bit.  It's now like this:
 * Batch Mode
   * Using Batch Mode
     * Issuing Queries
     * Processing Results
     * Error Handling
     * Interleaving Result Processing and Query Dispatch
     * Ending Batch Mode
   * Functions Associated with Batch Mode
   * When to Use Batching

Thanks!

I like the new flow and changes.  I've attached a patch that covers some missing commas and typos along with a few parts that made me pause.

The impact of memory came out of nowhere under the non-blocking mode commentary.  I took a stab at incorporating it more broadly.

The "are error conditions" might be a well-known phrasing to those using libpq but that sentence reads odd to me.  I did try to make the example listing flow a bit better and added a needed comma.

Tried to clean up a few phrasings after that.  The error handling part I'm not especially happy with but I think it's closer and more useful than just "emitted during error handling" - it gets emitted upon error, handling is a client concern.

Seems odd to say the new feature was introduced in v14.0, the .0 seems ok to imply.  I didn't actually fix it in the attached but "the PostgreSQL 14 version of libpq" is going to become outdated quickly, better just to drop it.

"The batch API was introduced in PostgreSQL 14, but clients can use batches on servers supporting v3 of the extended query protocol, potentially going as far back as version 7.4."

David J.



Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2020-Nov-14, Daniel Verite wrote:

> The patch I posted in [1] was pretty simple, but at the time, query
> results were always discarded. Now that pgbench can instantiate
> variables from query results, a script can do:
>   select 1 as var \gset
>   select :var;
> This kind of sequence wouldn't work in batch mode since it
> sends queries before getting results of previous queries.
> 
> So maybe \gset should be rejected when inside a batch section.

Hah.

Hacking pgbench extensively is beyond what I'm willing to do for this
feature at this time.  Making \gset rejected in a batch section sounds
simple enough and supports \beginbatch et al sufficiently to compare
performance, so I'm OK with a patch that does that.  That'd be a small
extension to your previous patch, if I understand correctly.

If you or others want to send patches to extend batch support with
read-write tracking for variables, feel free, but I hereby declare that
I'm not taking immediate responsibility for getting them committed.




Re: PATCH: Batch/pipelining support for libpq

From
"Andres Freund"
Date:
Hi

On Wed, Nov 18, 2020, at 09:51, Alvaro Herrera wrote:
> On 2020-Nov-14, Daniel Verite wrote:
> 
> > The patch I posted in [1] was pretty simple, but at the time, query
> > results were always discarded. Now that pgbench can instantiate
> > variables from query results, a script can do:
> >   select 1 as var \gset
> >   select :var;
> > This kind of sequence wouldn't work in batch mode since it
> > sends queries before getting results of previous queries.
> > 
> > So maybe \gset should be rejected when inside a batch section.
> 
> Hah.
> 
> Hacking pgbench extensively is beyond what I'm willing to do for this
> feature at this time.  Making \gset rejected in a batch section sounds
> simple enough and supports \beginbatch et al sufficiently to compare
> performance, so I'm OK with a patch that does that.  That'd be a small
> extension to your previous patch, if I understand correctly.
> 
> If you or others want to send patches to extend batch support with
> read-write tracking for variables, feel free, but I hereby declare that
> I'm not taking immediate responsibility for getting them committed.

I think minimal support is entirely sufficient initially.

Andres



Re: PATCH: Batch/pipelining support for libpq

From
"Daniel Verite"
Date:
 Hi,

Here's a new version with the pgbench support included.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2020-Nov-23, Daniel Verite wrote:

>  Hi,
> 
> Here's a new version with the pgbench support included.

Thanks, incorporated into my local copy.




Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
Thanks David Johnston and Daniel Vérité, I have incorporated your
changes into this patch, which is now v26.  Also, it's been rebased on
current sources.

I've been using the new PQtrace() stuff to verify the behavior of the
new feature.  It's not perfect, but at least it doesn't crash
immediately as it did when I tried a few weeks ago.  There are
imperfections that I think are due to bugs in the PQtrace
implementation, not in this patch.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"El conflicto es el camino real hacia la unión"

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
As you can see in an XXX comment in the libpq test program, the current
implementation has the behavior that PQgetResult() returns NULL after a
batch is finished and has reported PGRES_BATCH_END.  I don't know if
there's a hard reason to do that, but I'd like to supress it because it
seems weird and out of place.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: PATCH: Batch/pipelining support for libpq

From
Zhihong Yu
Date:
Hi,

+                       commandFailed(st, "SQL", "\\gset and \\aset are not allowed in a batch section");

It seems '\\gset or \\aset is not ' would correspond to the check more closely.

+       if (my_command->argc != 1)
+           syntax_error(source, lineno, my_command->first_line, my_command->argv[0],

It is possible that my_command->argc == 0 (where my_command->argv[0] shouldn't be accessed) ?

+               appendPQExpBufferStr(&conn->errorMessage,
+                                 libpq_gettext("cannot queue commands during COPY\n"));
+               return false;
+               break;

Is the break necessary ? Similar comment for pqBatchProcessQueue().

+int
+PQexitBatchMode(PGconn *conn)

Since either 0 or 1 is returned, maybe change the return type to bool.
Also, the function operates on PGconn - should the function be static (pqBatchProcessQueue is) ?

Cheers

On Thu, Jan 21, 2021 at 3:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Thanks David Johnston and Daniel Vérité, I have incorporated your
changes into this patch, which is now v26.  Also, it's been rebased on
current sources.

I've been using the new PQtrace() stuff to verify the behavior of the
new feature.  It's not perfect, but at least it doesn't crash
immediately as it did when I tried a few weeks ago.  There are
imperfections that I think are due to bugs in the PQtrace
implementation, not in this patch.

--
Álvaro Herrera                            39°49'30"S 73°17'W
"El conflicto es el camino real hacia la unión"

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2021-Jan-21, Alvaro Herrera wrote:

> As you can see in an XXX comment in the libpq test program, the current
> implementation has the behavior that PQgetResult() returns NULL after a
> batch is finished and has reported PGRES_BATCH_END.  I don't know if
> there's a hard reason to do that, but I'd like to supress it because it
> seems weird and out of place.

Hello Craig, a question for you since this API is your devising.  I've
been looking at changing the way this works to prevent those NULL
returns from PQgetResult.  That is, instead of having what seems like a
"NULL separator" of query results, you'd just get the PGRES_BATCH_END to
signify a batch end (not a NULL result after the BATCH_END); and the
normal PGRES_COMMAND_OK or PGRES_TUPLES_OK etc when the result of a
command has been sent.  It's not working yet so I'm not sending an
updated patch, but I wanted to know if you had a rationale for including
this NULL return "separator" or was it just a convenience because of how
the code grew together.

Such a decision has obvious consequences for the test program (which
right now expects that PQgetResult() returns NULL at each step); and
naturally for libpq's internal state machine that controls how it all
works.

Thanks,

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On Thu, 11 Feb 2021 at 07:51, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Jan-21, Alvaro Herrera wrote:

> As you can see in an XXX comment in the libpq test program, the current
> implementation has the behavior that PQgetResult() returns NULL after a
> batch is finished and has reported PGRES_BATCH_END.  I don't know if
> there's a hard reason to do that, but I'd like to supress it because it
> seems weird and out of place.

Hello Craig, a question for you since this API is your devising.  I've
been looking at changing the way this works to prevent those NULL
returns from PQgetResult.  That is, instead of having what seems like a
"NULL separator" of query results, you'd just get the PGRES_BATCH_END to
signify a batch end (not a NULL result after the BATCH_END); and the
normal PGRES_COMMAND_OK or PGRES_TUPLES_OK etc when the result of a
command has been sent.  It's not working yet so I'm not sending an
updated patch, but I wanted to know if you had a rationale for including
this NULL return "separator" or was it just a convenience because of how
the code grew together.

The existing API for libpq actually specifies[1] that you should call PQgetResult() until it returns NULL:

> After successfully calling PQsendQuery, call PQgetResult one or more times to obtain the results. PQsendQuery cannot be called again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is done.

Similarly, in single-row mode, the existing API specifies that you should call PQgetResult() until it returns NULL.

Also, IIRC the protocol already permits multiple result sets to be returned, and the caller cannot safely assume that a single PQsendQuery() will produce only a single result set. (I really should write a test extension that exercises that and how libpq reacts to it.)

That's why I went for the NULL return. I think. It's been a while now so it's a bit fuzzy.

I would definitely like an API that does not rely on testing for a NULL return. Especially since NULL return can be ambiguous in the context of row-at-a-time mode. New explicit enumerations for PGresult would make a lot more sense.

So +1 from me for the general idea. I need to look at the patch as it has evolved soon too.

Remember that the original patch I submitted for this was a 1-day weekend hack and proof of concept to show that libpq could be modified to support query pipelining (and thus batching too), so I could illustrate the performance benefits that could be attained by doing so. I'd been aware of the benefits and the protocol's ability to support it for some time thanks to working on PgJDBC, but couldn't get anyone interested without some C code to demonstrate it, so I wrote some. I am not going to argue that the API I added for it is ideal in any way, and happy to see improvements.

The only change I would very strongly object to would be anything that turned this into a *batch* mode without query-pipelining support. If you have to queue all the queries up in advance then send them as a batch and wait for all the results, you miss out on a lot of the potential round-trip-time optimisations and you add initial latency. So long as there is a way to "send A", "send B", "send C", "read results from A", "send D", and there's a way for the application to associate some kind of state (an application specific id or index, a pointer to an application query-queue struct, whatever)  so it can match queries to results ... then I'm happy.

Re: PATCH: Batch/pipelining support for libpq

From
Craig Ringer
Date:
On Tue, 16 Feb 2021 at 09:19, Craig Ringer <craig.ringer@enterprisedb.com> wrote:
So long as there is a way to "send A", "send B", "send C", "read results from A", "send D", and there's a way for the application to associate some kind of state (an application specific id or index, a pointer to an application query-queue struct, whatever)  so it can match queries to results ... then I'm happy.

FWIW I'm also thinking of revising the docs to mostly use the term "pipeline" instead of "batch". Use "pipelining and batching" in the chapter topic, and mention "batch" in the index, and add a para that explains how to run batches on top of pipelining, but otherwise use the term "pipeline" not "batch".

That's because pipelining isn't actually batching, and using it as a naïve batch interface will get you in trouble. If you PQsendQuery(...) a long list of queries without consuming results, you'll block unless you're in libpq's nonblocking-send mode. You'll then be deadlocked because the server can't send results until you consume some (tx buffer full) and can't consume queries until it can send some results, but you can't consume results because you're blocked on a send that'll never end.

An actual batch interface where you can bind and send a long list of queries might be worthwhile, but should be addressed separately, and it'd be confusing if this pipelining interface claimed the term "batch". I'm not convinced enough application developers actually code directly against libpq for it to be worth creating a pretty batch interface where you can submit (say) an array of struct PQbatchEntry { const char * query, int nparams, ... } to a PQsendBatch(...) and let libpq handle the socket I/O for you. But if someone wants to add one later it'll be easier if we don't use the term "batch" for the pipelining feature.

I'll submit a reworded patch in a bit.

Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
On 2021-Feb-16, Craig Ringer wrote:

> FWIW I'm also thinking of revising the docs to mostly use the term
> "pipeline" instead of "batch". Use "pipelining and batching" in the chapter
> topic, and mention "batch" in the index, and add a para that explains how
> to run batches on top of pipelining, but otherwise use the term "pipeline"
> not "batch".

Hmm, this is a good point.  It means I have a lot of API renaming to do.
I'll get on it now and come back with a proposal.

-- 
Álvaro Herrera       Valdivia, Chile



Re: PATCH: Batch/pipelining support for libpq

From
Alvaro Herrera
Date:
Here's a new version, where I've renamed everything to "pipeline".  I
think the docs could use some additional tweaks now in order to make a
coherent story on pipeline mode, how it can be used in a batched
fashion, etc.

Here's the renames I applied.  It's mostly mechanical, except
PQbatchSendQueue is now PQsendPipeline:

    PQBatchStatus -> PGpipelineStatus (enum)
    PQBATCH_MODE_OFF -> PQ_PIPELINE_OFF
    PQBATCH_MODE_ON -> PQ_PIPELINE_ON
    PQBATCH_MODE_ABORTED -> PQ_PIPELINE_ABORTED
    PQbatchStatus -> PQpipelineStatus (function)
    PQenterBatchMode -> PQenterPipelineMode
    PQexitBatchMode -> PQexitPipelineMode
    PQbatchSendQueue -> PQsendPipeline
    PGRES_BATCH_END -> PGRES_PIPELINE_END
    PGRES_BATCH_ABORTED -> PGRES_PIPELINE_ABORTED

Also, PQbatchStatus(conn) returns enum PGpipelineStatus (it previously
returned int).

I'm tempted to rename PGASYNC_QUEUED to PGASYNC_PIPELINE_IDLE (not sure
if PGASYNC_PIPELINE_READY fits better with the existing one).


In pgbench, I changed the metacommands to be \startpipeline and
\endpipeline.  There's a failing Assert() there which I commented out;
needs fixed.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Attachment

Re: PATCH: Batch/pipelining support for libpq

From
Zhihong Yu
Date:
Hi,
+       if (querymode == QUERY_SIMPLE)
+       {
+           commandFailed(st, "startpipeline", "cannot use pipeline mode with the simple query protocol");
+           st->state = CSTATE_ABORTED;
+           return CSTATE_ABORTED;

I wonder why the st->state is only assigned for this if block. The state is not set for other cases where CSTATE_ABORTED is returned.

Should PQ_PIPELINE_OFF be returned for the following case ?

+PQpipelineStatus(const PGconn *conn)
+{
+   if (!conn)
+       return false;

Cheers

On Tue, Feb 16, 2021 at 3:14 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's a new version, where I've renamed everything to "pipeline".  I
think the docs could use some additional tweaks now in order to make a
coherent story on pipeline mode, how it can be used in a batched
fashion, etc.

Here's the renames I applied.  It's mostly mechanical, except
PQbatchSendQueue is now PQsendPipeline:

        PQBatchStatus -> PGpipelineStatus (enum)
        PQBATCH_MODE_OFF -> PQ_PIPELINE_OFF
        PQBATCH_MODE_ON -> PQ_PIPELINE_ON
        PQBATCH_MODE_ABORTED -> PQ_PIPELINE_ABORTED
        PQbatchStatus -> PQpipelineStatus (function)
        PQenterBatchMode -> PQenterPipelineMode
        PQexitBatchMode -> PQexitPipelineMode
        PQbatchSendQueue -> PQsendPipeline
        PGRES_BATCH_END -> PGRES_PIPELINE_END
        PGRES_BATCH_ABORTED -> PGRES_PIPELINE_ABORTED

Also, PQbatchStatus(conn) returns enum PGpipelineStatus (it previously
returned int).

I'm tempted to rename PGASYNC_QUEUED to PGASYNC_PIPELINE_IDLE (not sure
if PGASYNC_PIPELINE_READY fits better with the existing one).


In pgbench, I changed the metacommands to be \startpipeline and
\endpipeline.  There's a failing Assert() there which I commented out;
needs fixed.

--
Álvaro Herrera                            39°49'30"S 73°17'W