Thread: Re: Add Pipelining support in psql

Re: Add Pipelining support in psql

From
Jelte Fennema-Nio
Date:
On Wed, 27 Nov 2024 at 10:50, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> With \bind, \parse, \bind_named and \close, it is possible to issue
> queries from psql using the extended protocol. However, it wasn't
> possible to send those queries using pipelining and the only way to
> test pipelined queries was through pgbench's tap tests.

Big +1. Not being able to use psql for even the most basic pipeline
tests has definitely been an annoyance of mine.

I played around quickly with this patch and it works quite well. A few
things that would be nice improvements I think. Feel free to change
the command names:
1. Add a \flush command that calls PQflush
2. Add a \flushrequest command that calls PQsendFlushRequest
3. Add a \getresult command so you can get a result from a query
without having to close the pipeline

To be clear, not having those additional commands isn't a blocker for
this patch imho, but I'd definitely miss them if they weren't there
when I would be using it.



Re: Add Pipelining support in psql

From
Jelte Fennema-Nio
Date:
On Thu, 28 Nov 2024 at 07:43, Michael Paquier <michael@paquier.xyz> wrote:
> Hmm.  The start, end and sync meta-commands are useful for testing.  I
> find the flush control a bit less interesting, TBH.
>
> What would you use these for?

I guess mostly for interactively playing around with pipelining from psql.

But I think \getresult would be useful for testing too. This would
allow us to test that we can read part of the pipeline, without
sending a sync and waiting for everything.

To be clear \flushrequest and \flush would be necessary to make
\getresult work reliably.



Re: Add Pipelining support in psql

From
Jelte Fennema-Nio
Date:
On Tue, 10 Dec 2024 at 11:44, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> num_queries (2nd element in the pipeline status prompt) is now used to
> track queued queries that were not flushed (with a flush request or
> sync) to the server. It used to count both unflushed queries and
> flushed queries.

I skimmed the code a bit, but haven't looked closely at it yet. I did
try the patch out though. My thoughts below:

I think that new prompt is super useful, so useful in fact that I'd
suggest linking to it from the \startpipeline docs. I do think that
the wording in the docs could be a bit more precise:
1. The columns are not necessarily queries, they are messages or
commands. i.e. \parse and \bind_named both count as 1, even though
they form one query together.
2. messages not followed by \sync and \flushrequest, could very well
already "be sent to the server" (if the client buffer got full, or in
case of manual \flush). The main thing that \sync and \flushrequest do
is make sure that the server actually sends its own result back, even
if its buffer is not full yet.

The main feedback I have after playing around with this version is
that I'd like to have a \getresult (without the s), to only get a
single result. So that you can get results one by one, possibly
interleaved with some other queries again.

One thing I'm wondering is how useful the num_syncs count is in the
pipeline prompt, since you never really wait for a sync.

Regarding the usefulness of \flush. I agree it's not as useful as I
thought, because indeed \getresults already flushes everything. But
it's not completely useless either. The main way I was able to use it
interactively in a somewhat interesting way was to send it after a
long running query, and then while that's processing type up the next
query after it. Something like the following:

localhost jelte@postgres:5432-26274=
#> \startpipeline
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,0,0|> select pg_sleep(5) \bind \g
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,1,0|*> \flush
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,1,0|*> select 1 \bind \g
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,2,0|*> \syncpipeline
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|1,0,2|*> \getresults
 pg_sleep
──────────

(1 row)

 ?column?
──────────
        1
(1 row)

Time: 0.348 ms



Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Tue, Feb 18, 2025 at 06:34:20PM +0100, Anthonin Bonnefoy wrote:
> On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier <michael@paquier.xyz> wrote:
>> The tests in psql.sql are becoming really long.  Perhaps it would be
>> better to split that into its own file, say psql_pipeline.sql?  The
>> input file is already 2k lines, you are adding 15% more lines to that.
>
> Agreed, I wasn't sure if this was enough to warrant a dedicated test
> file. This is now separated in psql_pipeline.sql.

You have forgotten the expected output.  Not a big issue as the input
was sent.

>> What is the reasoning here behind this restriction?  \gx is a wrapper
>> of \g with expanded mode on, but it is also possible to call \g with
>> expanded=on, bypassing this restriction.
>
> The issue is that \gx enables expanded mode for the duration of the
> query and immediately reset it in sendquery_cleanup. With pipelining,
> the command is piped and displaying is done by either \endpipeline or
> \getresults, so the flag change has no impact. Forbidding it was a way
> to make it clearer that it won't have the expected effect. If we
> wanted a similar feature, this would need to be done with something
> like \endpipelinex or \getresultsx.

Hmm, okay.  If one wants one mode or the other it is always possible
to force one with \pset expanded when getting the results.  Not sure
if there is any need for new specific commands for these two printing
the results.  Another option would be to authorize the command to run,
but perhaps your option is just better as per the enforced behavior in
the output.  So fine by me.  There is coverage so we'll know if there
are arguments in favor of authorizing the command, if need be.

> I've split the patch and created the 3 special variables:
> PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT.

Thanks.  Looks sensible now.

> For requested_results, I don't think there's value in exposing it
> since it is used as an exit condition and thus will always be 0
> outside of ExecQueryAndProcessResults.

I've been playing with this patch and this configuration:
\set PROMPT1 '=(pipeline=%P,sync=%:PIPELINE_SYNC_COUNT:,cmd=%:PIPELINE_COMMAND_COUNT:,res=%:PIPELINE_RESULT_COUNT:)%#'

That's long, but seeing the evolution of the pipeline status is pretty
cool depending on the meta-commands used.

While testing, I have been able to run into an assertion failure by
adding some tests in psql.sql to check for the case of inactive
branches for \if.  For example:
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
    \echo arg1 arg2 arg3 arg4 arg5
    \echo arg1
    \encoding arg1
+   \endpipeline
    \errverbose

And the report:
+psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed.

We should have tests for all new six meta-commands in psql.sql.
MainLoop() is wrong when in pipeline mode for inactive branches.
--
Michael

Attachment

Re: Add Pipelining support in psql

From
Anthonin Bonnefoy
Date:
On Thu, Feb 20, 2025 at 9:02 AM Michael Paquier <michael@paquier.xyz> wrote:
> You have forgotten the expected output.  Not a big issue as the input
> was sent.

I was writing the mail with the missing file when you sent this mail.
This is fixed.

> While testing, I have been able to run into an assertion failure by
> adding some tests in psql.sql to check for the case of inactive
> branches for \if.  For example:
> --- a/src/test/regress/sql/psql.sql
> +++ b/src/test/regress/sql/psql.sql
> @@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
>     \echo arg1 arg2 arg3 arg4 arg5
>     \echo arg1
>     \encoding arg1
> +   \endpipeline
>     \errverbose
>
> And the report:
> +psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed.
>
> We should have tests for all new six meta-commands in psql.sql.
> MainLoop() is wrong when in pipeline mode for inactive branches.

Ha yeah, I forgot about the inactive branches. I've added the new
commands and fixed the behaviour.

A small issue I've noticed while testing: When a pipeline has at least
one queue command, pqClearConnErrorState isn't called in
PQsendQueryStart and errors are appended. For example:

\startpipeline
select 1 \bind \g
select 1;
PQsendQuery not allowed in pipeline mode
select 1;
PQsendQuery not allowed in pipeline mode
PQsendQuery not allowed in pipeline mode

This looks more like an issue on libpq's side as there's no way to
reset or advance the errorReported from ExecQueryAndProcessResults
(plus PQerrorMessage seems to ignore errorReported). I've added an
additional test to track this behaviour for now as this would probably
be better discussed in a dedicated thread.

Attachment

Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Thu, Feb 20, 2025 at 10:29:33AM +0100, Anthonin Bonnefoy wrote:
> Ha yeah, I forgot about the inactive branches. I've added the new
> commands and fixed the behaviour.

And I did not notice that it was as simple as forcing the status in
the routines for the new meta-commands, as we do for the existing
ones.  Noted.

> This looks more like an issue on libpq's side as there's no way to
> reset or advance the errorReported from ExecQueryAndProcessResults
> (plus PQerrorMessage seems to ignore errorReported). I've added an
> additional test to track this behaviour for now as this would probably
> be better discussed in a dedicated thread.

I am not sure if we should change that, actually, as it does not feel
completely wrong to stack these errors.  That's a bit confusing,
sure.  Perhaps a new libpq API to retrieve stacked errors when we are
in pipeline mode would be more adapted?  The design would be
different.

Anyway, I've stared at the result processing code for a couple of
hours, and the branches we're taking for the pipeline modes seem to be
rather right the way you have implemented them.  The docs, comments
and tests needed quite a few tweaks and edits to be more consistent.
There were some grammar mistakes, some frenchisms.

I'm hoping that there won't be any issues, but let's be honest, I am
definitely sure there will be some more tuning required.  It comes
down to if we want this set of features, and I do to be able to expand
tests in core with the extended query protocol and pipelines, knowing
that there is an ask for out-of-core projects.  This one being
reachable with a COPY gave me a huge smile:
+message type 0x5a arrived from server while idle

So let's take one step here, I have applied the main patch.  I am
really excited by the possibilities all this stuff offers.

Attached are the remaining pieces, split here because they are
different bullet points:
- Tests for implicit transactions with various commands, with some
edits.
- Prompt support, with more edits.

I'm putting these on standby for a few days, to let the buildfarm
digest the main change.
--
Michael

Attachment