Thread: libpq pipelineing

libpq pipelineing

From
Samuel Williams
Date:
Hello,

Using the asynchronous interface of libpq, is it possible to pipeline
multiple queries?

i.e.

PQsendQuery(query1)
PQsendQuery(query2)

followed by

query1_results = PQgetResult(...)
query2_results = PQgetResult(...)

I tried it but got "another command is already in progress" error.

So, maybe it's not supported, or maybe I'm doing something wrong.

Thanks
Samuel



Re: libpq pipelineing

From
"David G. Johnston"
Date:
On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Hello,

Using the asynchronous interface of libpq, is it possible to pipeline
multiple queries?

i.e.

PQsendQuery(query1)
PQsendQuery(query2)

followed by

query1_results = PQgetResult(...)
query2_results = PQgetResult(...)

I tried it but got "another command is already in progress" error.

The documentation seems to leave zero ambiguity:

 After successfully calling PQsendQuery, call PQgetResult one or more times to obtain the results. PQsendQuery cannot be called again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is done.

David J.

Re: libpq pipelineing

From
Samuel Williams
Date:
Thanks David,

You are correct.

I was giving an example of what I was hoping to achieve, not what I
expected to work with the current interface.

I found some discussion in the past relating to batch processing which
appears to support some kind of pipelining:

https://2ndquadrant.github.io/postgres/libpq-batch-mode.html

However it seems to be abandoned.

Kind regards,
Samuel

On Sat, 27 Jun 2020 at 16:15, David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>>
>> Hello,
>>
>> Using the asynchronous interface of libpq, is it possible to pipeline
>> multiple queries?
>>
>> i.e.
>>
>> PQsendQuery(query1)
>> PQsendQuery(query2)
>>
>> followed by
>>
>> query1_results = PQgetResult(...)
>> query2_results = PQgetResult(...)
>>
>> I tried it but got "another command is already in progress" error.
>
>
> The documentation seems to leave zero ambiguity:
>
>  After successfully calling PQsendQuery, call PQgetResult one or more times to obtain the results. PQsendQuery cannot
becalled again (on the same connection) until PQgetResult has returned a null pointer, indicating that the command is
done.
>
> David J.
>



Re: libpq pipelineing

From
"David G. Johnston"
Date:
On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Thanks David,

You are correct.

I was giving an example of what I was hoping to achieve, not what I
expected to work with the current interface.

What about, as it says, sending multiple statements in a single sendQuery and then polling for multiple results?

David J.

Re: libpq pipelineing

From
Samuel Williams
Date:
> What about, as it says, sending multiple statements in a single sendQuery and then polling for multiple results?

I tried this, and even in single row streaming mode, I found that
there are cases where the results would not be streamed until all the
queries were sent.

From the users point of view, they may generate a loop sending
multiple queries and don't care about the result, so a pipeline/batch
processing is ideal to avoid RTT per loop iteration, if database
access is slow, this can be a significant source of latency.

Kind regards,
Samuel



Re: libpq pipelineing

From
"David G. Johnston"
Date:
On Friday, June 26, 2020, Samuel Williams <space.ship.traveller@gmail.com> wrote:
> What about, as it says, sending multiple statements in a single sendQuery and then polling for multiple results?

I tried this, and even in single row streaming mode, I found that
there are cases where the results would not be streamed until all the
queries were sent.

From the users point of view, they may generate a loop sending
multiple queries and don't care about the result, so a pipeline/batch
processing is ideal to avoid RTT per loop iteration, if database
access is slow, this can be a significant source of latency

 I don’t have any insight into the bigger picture but I’d concur that no other option is documented so what you desire is not possible.

David J.

Re: libpq pipelineing

From
Samuel Williams
Date:
Here is a short example:

https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18

It makes 10 queries in one "PQsendQuery" and sets single row mode. But
all the results come back at once as shown by the timestamps.

Next I'm planning to investigate streaming large recordsets to see if
it works better/incrementally.



Re: libpq pipelineing

From
Tom Lane
Date:
Samuel Williams <space.ship.traveller@gmail.com> writes:
> Here is a short example:
> https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18
> It makes 10 queries in one "PQsendQuery" and sets single row mode. But
> all the results come back at once as shown by the timestamps.

That looks to be less about what libpq will do than what the Ruby
interface code will do.

The volume of return data may also be an issue.  I don't think the
backend will flush data out to the client except when it (a) reaches
an idle state or (b) fills the output buffer.  Ten occurrences of
a short query result aren't gonna be enough for (b) --- from memory,
that buffer is probably 8KB.

            regards, tom lane



Re: libpq pipelineing

From
Samuel Williams
Date:
I think libmariadb has a nicer interface for this.

Essentially what you do is send your query, and then read a result set
(one result set per query), and then you stream individual rows using:

mysql_fetch_row_start
mysql_fetch_row_cont

Those methods don't seem to have an equivalent in libpq - you can use
PQgetResult but it buffers all the rows. Using single row mode results
in many results for each query (seems like a big overhead). The
difference between this and MariaDB is that mysql_fetch_row still
operates within one logical set of results, but single row mode breaks
the single logical set of results into lots of individual results.

Maybe the statement about efficiency is incorrect, but it would be
nice if you could incrementally stream a single result set more
easily.


On Sun, 28 Jun 2020 at 02:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Samuel Williams <space.ship.traveller@gmail.com> writes:
> > Here is a short example:
> > https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18
> > It makes 10 queries in one "PQsendQuery" and sets single row mode. But
> > all the results come back at once as shown by the timestamps.
>
> That looks to be less about what libpq will do than what the Ruby
> interface code will do.
>
> The volume of return data may also be an issue.  I don't think the
> backend will flush data out to the client except when it (a) reaches
> an idle state or (b) fills the output buffer.  Ten occurrences of
> a short query result aren't gonna be enough for (b) --- from memory,
> that buffer is probably 8KB.
>
>                         regards, tom lane



Re: libpq pipelineing

From
Stephen Frost
Date:
Greetings,

* Samuel Williams (space.ship.traveller@gmail.com) wrote:
> Here is a short example:
>
> https://gist.github.com/ioquatix/2f08f78699418f65971035785c80cf18
>
> It makes 10 queries in one "PQsendQuery" and sets single row mode. But
> all the results come back at once as shown by the timestamps.

If you have 10 queries that you want to make in a given transaction and
you care about the latency then really the best option is to wrap that
all in a single pl/pgsql function on the server side and make one call.

> Next I'm planning to investigate streaming large recordsets to see if
> it works better/incrementally.

If you want to stream large data sets to/from PG, you should consider
using COPY.

Thanks,

Stephen

Attachment

Re: libpq pipelineing

From
Samuel Williams
Date:
Tom, I'm implementing a small abstraction layer for event-driven
result streaming on PostgreSQL and MariaDB for Ruby, and I'll endeavor
to report back with some numbers once I have enough of it working to
benchmark something meaningful.

Thanks for your patience and help.

Kind regards,
Samuel

On Tue, 30 Jun 2020 at 02:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Samuel Williams <space.ship.traveller@gmail.com> writes:
> > Those methods don't seem to have an equivalent in libpq - you can use
> > PQgetResult but it buffers all the rows. Using single row mode results
> > in many results for each query (seems like a big overhead).
>
> Have you got any actual evidence for that?  Sure, the overhead is
> more than zero, but does it mean anything in comparison to the other
> costs of data transmission?
>
> > Maybe the statement about efficiency is incorrect, but it would be
> > nice if you could incrementally stream a single result set more
> > easily.
>
> More easily than what?  If we did not construct a PGresult then we would
> need some other abstraction for access to the returned row, dealing with
> error cases, etc etc.  That would mean a lot of very duplicative API code
> in libpq, and a painful bunch of adjustments in client code.
>
>                         regards, tom lane



Re: libpq pipelineing

From
Samuel Williams
Date:
Hi,

Here are some initial numbers.

DB::Client
Warming up --------------------------------------
         db-postgres   281.000  i/100ms
          db-mariadb   399.000  i/100ms
              mysql2   533.000  i/100ms
                  pg   591.000  i/100ms
Calculating -------------------------------------
         db-postgres      2.725k (± 1.8%) i/s -     13.769k in   5.053750s
          db-mariadb      3.990k (± 2.4%) i/s -     19.950k in   5.002453s
              mysql2      5.153k (± 4.7%) i/s -     26.117k in   5.079570s
                  pg      5.772k (± 4.4%) i/s -     28.959k in   5.027423s

Comparison:
                  pg:     5771.7 i/s
              mysql2:     5152.8 i/s - 1.12x  (± 0.00) slower
          db-mariadb:     3990.3 i/s - 1.45x  (± 0.00) slower
         db-postgres:     2725.5 i/s - 2.12x  (± 0.00) slower

The db-* gems are event driven. However, that is of less interest right now.

This benchmark creates a table, inserts 1000 (or some fixed number) of
rows, and then selects them all back out.

What I noticed is that `PQgetvalue` and `PQgetisnull` is not
particularly efficient, at least via FFI. Requires rows * columns * 2
FFI calls.

libmariadb provides `mysql_fetch_row` which returns a `char **` per
row. Requires only rows FFI calls.

Does a similar method exist for libpq? e.g. `PGgetrow(index) ->
char**` (array of strings, one for each column, may be nil to indicate
null).

Kind regards,
Samuel

On Tue, 30 Jun 2020 at 12:50, Samuel Williams
<space.ship.traveller@gmail.com> wrote:
>
> Tom, I'm implementing a small abstraction layer for event-driven
> result streaming on PostgreSQL and MariaDB for Ruby, and I'll endeavor
> to report back with some numbers once I have enough of it working to
> benchmark something meaningful.
>
> Thanks for your patience and help.
>
> Kind regards,
> Samuel
>
> On Tue, 30 Jun 2020 at 02:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Samuel Williams <space.ship.traveller@gmail.com> writes:
> > > Those methods don't seem to have an equivalent in libpq - you can use
> > > PQgetResult but it buffers all the rows. Using single row mode results
> > > in many results for each query (seems like a big overhead).
> >
> > Have you got any actual evidence for that?  Sure, the overhead is
> > more than zero, but does it mean anything in comparison to the other
> > costs of data transmission?
> >
> > > Maybe the statement about efficiency is incorrect, but it would be
> > > nice if you could incrementally stream a single result set more
> > > easily.
> >
> > More easily than what?  If we did not construct a PGresult then we would
> > need some other abstraction for access to the returned row, dealing with
> > error cases, etc etc.  That would mean a lot of very duplicative API code
> > in libpq, and a painful bunch of adjustments in client code.
> >
> >                         regards, tom lane