Thread: Re: Add Pipelining support in psql

Re: Add Pipelining support in psql

From
"Daniel Verite"
Date:
    Anthonin Bonnefoy wrote:

> > What is the reasoning here behind this restriction?  \gx is a wrapper
> > of \g with expanded mode on, but it is also possible to call \g with
> > expanded=on, bypassing this restriction.
>
> The issue is that \gx enables expanded mode for the duration of the
> query and immediately reset it in sendquery_cleanup. With pipelining,
> the command is piped and displaying is done by either \endpipeline or
> \getresults, so the flag change has no impact. Forbidding it was a way
> to make it clearer that it won't have the expected effect

But it's not just \gx

The following invocations don't respect the desired output destination
and formats (ignoring them), when a pipeline is active:

select ... \bind \g filename
select ... \bind \g |program
select ... \bind \g (format=unaligned tuples_only=on)

Just like for \gx the problem is that in a pipeline, sending the query
is not followed by getting the results, and the output properties
of a query are lost in between.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/



Re: Add Pipelining support in psql

From
Michael Paquier
Date:
(My previous message did not reach the lists, so re-sending with some
edits.)

On Fri, Feb 28, 2025 at 05:31:00PM +0100, Daniel Verite wrote:
> The following invocations don't respect the desired output destination
> and formats (ignoring them), when a pipeline is active:
>
> select ... \bind \g filename
> select ... \bind \g |program
> select ... \bind \g (format=unaligned tuples_only=on)
>
> Just like for \gx the problem is that in a pipeline, sending the query
> is not followed by getting the results, and the output properties
> of a query are lost in between.

Right.  I completely forgot that these options could be applied with a
simple \g.  With the results being decoupled from the execution, one
can argue that the options defined at the moment when the query is
sent and executed should be the moment commanding how the result are
shaped when retrieving a batch with \getresult during a pipeline.
However, this means that we would need to save the set of options from
printQueryOpt when running the query depending on the number of
results we expect, and reapply them in order of the results
expected. We have the APIs to do that, with savePsetInfo() and
restorePsetInfo().

Anyway, can we really say that the set of printQueryOpt saved at
execution is the correct set to use?  It is possible to have the
opposite argument and say that we should just apply the printQueryOpt
at the moment where \getresult is run.  A benefit of that it to keep
the loop retrieving results simpler in ExecQueryAndProcessResults(),
because we pass down to this call *one* printQueryOpt as "opt".

There's of course the simplicity argument that I do like a lot here,
but applying the printing options at the time of \getresult feels also
more natural to me.

FWIW, I agree that HEAD is unbalanced with its handling of \gx, so we
could do one of the following two things:
1) Ignore any formatting options given to \g, but also allow \gx to
run, documenting that during a pipeline the formatting options are
ignored, and that the set of options defined when doing a \getresult
is the only thing that matters.
2) Invent a new meta-command (suggested by Daniel Verite off-list, and
not published to the lists because I don't know how to do a
reply-all), like a \send, a \push, forbiding entirely \gx and \g when
in a pipeline.  If we were to integrate options into this new
meta-command, a split with \g would make an integration easier to
think about.  One name suggestion I can come up is \sendpipeline.

I'd consider option 2, based on Daniel's concerns.  One line I'm going
to draw here is that we should not go down to manipulations of
printQueryOpt while retrieving batch of results depending on the style
of output that was defined when sending a query in a pipeline.

Anthonin, as the primary author, any thoughts?
--
Michael

Attachment

Re: Add Pipelining support in psql

From
Anthonin Bonnefoy
Date:
On Tue, Mar 4, 2025 at 1:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> Anyway, can we really say that the set of printQueryOpt saved at
> execution is the correct set to use?  It is possible to have the
> opposite argument and say that we should just apply the printQueryOpt
> at the moment where \getresult is run.  A benefit of that it to keep
> the loop retrieving results simpler in ExecQueryAndProcessResults(),
> because we pass down to this call *one* printQueryOpt as "opt".
>
> There's of course the simplicity argument that I do like a lot here,
> but applying the printing options at the time of \getresult feels also
> more natural to me.

Saving the printQueryOpt when a command is pushed was an option I had
in mind if that was straightforward to implement. However, even with
savePsetInfo, you will need to save values like gfname and gset_prefix
since it impacts the output (it may make sense to move those in
printQueryOpt). This would also need to be saved for all commands,
like \close or \parse since we don't distinguish if a piped command
generates an output or not. So that definitely looks like it would add
a lot of complexity for limited benefit.

> FWIW, I agree that HEAD is unbalanced with its handling of \gx, so we
> could do one of the following two things:
> 1) Ignore any formatting options given to \g, but also allow \gx to
> run, documenting that during a pipeline the formatting options are
> ignored, and that the set of options defined when doing a \getresult
> is the only thing that matters.
> 2) Invent a new meta-command (suggested by Daniel Verite off-list, and
> not published to the lists because I don't know how to do a
> reply-all), like a \send, a \push, forbiding entirely \gx and \g when
> in a pipeline.  If we were to integrate options into this new
> meta-command, a split with \g would make an integration easier to
> think about.  One name suggestion I can come up is \sendpipeline.
>
> I'd consider option 2, based on Daniel's concerns.  One line I'm going
> to draw here is that we should not go down to manipulations of
> printQueryOpt while retrieving batch of results depending on the style
> of output that was defined when sending a query in a pipeline.

Another possible option would be to directly send the command without
requiring an additional meta-command, like "SELECT 1 \bind". However,
this would make it more painful to introduce new parameters, plus it
makes the \bind and \bind_named inconsistent as it is normally
required to send the result with a separate meta-command.

I like the \sendpipeline option. It makes it clearer that formatting
options are not applicable within a pipeline (at least, in the current
implementation) and I think it would make more sense to have those
formatting options in \getresults or \endpipeline.

I took a stab at creating the \sendpipeline meta-command. I've also
realised there's a small leak where fname is currently not freed on
queries like 'select ... \bind \gx filename' when within a pipeline,
which is fixed in patch 0001.

Attachment

Re: Add Pipelining support in psql

From
"Daniel Verite"
Date:
    Anthonin Bonnefoy wrote:

> Another possible option would be to directly send the command without
> requiring an additional meta-command, like "SELECT 1 \bind". However,
> this would make it more painful to introduce new parameters, plus it
> makes the \bind and \bind_named inconsistent as it is normally
> required to send the result with a separate meta-command.

AFAIU the reason why \bind is required (even when there are no $N
parameters in the query) is to trigger the use of the extended query
protocol. This pre-dates the pipeline feature.

But when in a pipeline, we can't send queries with the simple query
protocol anyway, so a possible usability improvement would be to make
\bind optional in that case.

Concretely it's not possible currently to issue:
  \startpipeline
  select 1;
it causes the error:  "PQsendQuery not allowed in pipeline mode"

whereas this sequence does works:
  \startpipeline
  \bind
  select 1;
  \flushrequest
  \getresults

But if the code triggered the use of the extended query protocol
if \bind is in effect *or* a pipeline is active, then the first sequence
would  just push "select 1" into the pipeline.

This would have the advantage that, to submit into a pipeline
a pre-existing file with SQL commands separated with ";"  you don't have
to pre-process it to inject metacommands. Adding a \startpipeline at
the beginning and an \endpipeline at the end would be sufficient in the
cases that the user does not need the results before the end.

The \sendpipeline is not mandatory when ";" can be used to terminate
the queries. But it makes it clearer that the script wants
specifically to push into a pipeline, and it might accept specific
options in the future, whereas obviously ";" cannot.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/



Re: Add Pipelining support in psql

From
Anthonin Bonnefoy
Date:
On Tue, Mar 4, 2025 at 1:32 PM Daniel Verite <daniel@manitou-mail.org> wrote:
> But if the code triggered the use of the extended query protocol
> if \bind is in effect *or* a pipeline is active, then the first sequence
> would  just push "select 1" into the pipeline.
>
> This would have the advantage that, to submit into a pipeline
> a pre-existing file with SQL commands separated with ";"  you don't have
> to pre-process it to inject metacommands. Adding a \startpipeline at
> the beginning and an \endpipeline at the end would be sufficient in the
> cases that the user does not need the results before the end.
>
> The \sendpipeline is not mandatory when ";" can be used to terminate
> the queries. But it makes it clearer that the script wants
> specifically to push into a pipeline, and it might accept specific
> options in the future, whereas obviously ";" cannot.

So if I understand correctly, you want to automatically convert a
simple query into an extended query when we're within a pipeline. That
would be doable with:

--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1668,7 +1668,16 @@ ExecQueryAndProcessResults(const char *query,
                        }
                        break;
                case PSQL_SEND_QUERY:
-                       success = PQsendQuery(pset.db, query);
+                       if (PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF) {
+                               success = PQsendQueryParams(pset.db, query,
+
pset.bind_nparams, NULL,
+                                                               (const
char *const *) pset.bind_params,
+                                                               NULL, NULL, 0);
+                               if (success)
+                                       pset.piped_commands++;
+                       }
+                       else
+                               success = PQsendQuery(pset.db, query);
                        break;
        }

I do see the idea to make it easier to convert existing scripts into
using pipelining. The main focus of the initial implementation was
more on protocol regression tests with psql, so that's not necessarily
something I had in mind. I have some reservation as it will push all
parameters in the query string which may not be the desired behaviour.
But on the other hand, if it is to convert existing psql scripts, then
everything was already pushed as simple queries. Plus, this is similar
to what pgbench is doing when using -Mextended or -Mprepared.



Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Tue, Mar 04, 2025 at 10:31:46AM +0100, Anthonin Bonnefoy wrote:
> Saving the printQueryOpt when a command is pushed was an option I had
> in mind if that was straightforward to implement. However, even with
> savePsetInfo, you will need to save values like gfname and gset_prefix
> since it impacts the output (it may make sense to move those in
> printQueryOpt). This would also need to be saved for all commands,
> like \close or \parse since we don't distinguish if a piped command
> generates an output or not. So that definitely looks like it would add
> a lot of complexity for limited benefit.

Yeah, same opinion here.  I don't want this level of complexity with
extra manipulations of printQueryOpt when fetching the results,
either.  I'm all for making these meta-commands to what we think is
more natural, but not at the cost of a more complex logic in the
result printing depending on what's been given by a meta-command when
a query is pushed to a pipeline.

> I took a stab at creating the \sendpipeline meta-command. I've also
> realised there's a small leak where fname is currently not freed on
> queries like 'select ... \bind \gx filename' when within a pipeline,
> which is fixed in patch 0001.

Indeed.  Fixed this one for now.
--
Michael

Attachment

Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Tue, Mar 04, 2025 at 06:37:09PM +0100, Anthonin Bonnefoy wrote:
> I do see the idea to make it easier to convert existing scripts into
> using pipelining. The main focus of the initial implementation was
> more on protocol regression tests with psql, so that's not necessarily
> something I had in mind. I have some reservation as it will push all
> parameters in the query string which may not be the desired behaviour.
> But on the other hand, if it is to convert existing psql scripts, then
> everything was already pushed as simple queries. Plus, this is similar
> to what pgbench is doing when using -Mextended or -Mprepared.

Hmm.  Simplicity is tempting here because we know the status of the
pipeline when sending the query.  If we do something like what you are
suggesting here, do we actually need the \sendpipeline at all?  We
should still prevent \g, \gx and others from running in a pipeline
because of the format argument raised by Daniel so as we restrict the
use of meta-commands that can manipulate the output format, right?
--
Michael

Attachment

Re: Add Pipelining support in psql

From
"Daniel Verite"
Date:
    Anthonin Bonnefoy wrote:

> So if I understand correctly, you want to automatically convert a
> simple query into an extended query when we're within a pipeline. That
> would be doable with:
>
> --- a/src/bin/psql/common.c
> +++ b/src/bin/psql/common.c
> @@ -1668,7 +1668,16 @@ ExecQueryAndProcessResults(const char *query,
>                        }
>                        break;
>                case PSQL_SEND_QUERY:
> -                       success = PQsendQuery(pset.db, query);
> +                       if (PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF) {
> +                               success = PQsendQueryParams(pset.db, query,
> +
> pset.bind_nparams, NULL,
> +                                                               (const
> char *const *) pset.bind_params,
> +                                                               NULL, NULL,
> 0);
> +                               if (success)
> +                                       pset.piped_commands++;
> +                       }
> +                       else
> +                               success = PQsendQuery(pset.db, query);
>                        break;
>        }

Yes, except that the bind parameters need to be cleared after this,
as done in clean_extended_state()

> I do see the idea to make it easier to convert existing scripts into
> using pipelining. The main focus of the initial implementation was
> more on protocol regression tests with psql, so that's not necessarily
> something I had in mind.

Understood. Yet pipelining can accelerate considerably certain scripts
when client-server latency is an issue. We should expect end users to
benefit from it too.

> I have some reservation as it will push all
> parameters in the query string which may not be the desired
> behaviour.

I don't follow. For me the change discussed here is about simplifying
the syntax when there is no out-of-query $N-style parameters, it does
not change anything for queries that actually use them, nor does
it forbid a \bind without parameters.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/



Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Wed, Mar 05, 2025 at 03:25:12PM +0100, Daniel Verite wrote:
>     Anthonin Bonnefoy wrote:
>> I do see the idea to make it easier to convert existing scripts into
>> using pipelining. The main focus of the initial implementation was
>> more on protocol regression tests with psql, so that's not necessarily
>> something I had in mind.
>
> Understood. Yet pipelining can accelerate considerably certain scripts
> when client-server latency is an issue. We should expect end users to
> benefit from it too.

That was not a test case we had in mind originally here, but if it is
possible to keep the implementation simple while supporting your
demand, well, let's do it.  If it's not that straight-forward, let's
use the new meta-command, forbidding \g and \gx based on your
arguments from upthread.
--
Michael

Attachment

Re: Add Pipelining support in psql

From
"a.kozhemyakin"
Date:
Hello,

After commit 2cce0fe on master

When executing query:
psql postgres <<EOF
CREATE TABLE psql_pipeline();
\startpipeline
COPY psql_pipeline FROM STDIN;
SELECT 'val1';
\syncpipeline
\getresults
EOF


ERROR:  unexpected message type 0x50 during COPY from stdin
CONTEXT:  COPY psql_pipeline, line 1
Pipeline aborted, command did not run
psql: common.c:1510: discardAbortedPipelineResults: Assertion `res == 
((void *)0) || result_status == PGRES_PIPELINE_ABORTED' failed.
Aborted (core dumped)


The psql crashes with the stack trace:
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, 
threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) 
at ./nptl/pthread_kill.c:89
#3  0x0000760edd24527e in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x0000760edd2288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000760edd22881b in __assert_fail_base (fmt=0x760edd3d01e8 
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
     assertion=assertion@entry=0x5ba46ab79850 "res == ((void *)0) || 
result_status == PGRES_PIPELINE_ABORTED", file=file@entry=0x5ba46ab6fcad 
"common.c",
     line=line@entry=1510, function=function@entry=0x5ba46ab9c780 
<__PRETTY_FUNCTION__.3> "discardAbortedPipelineResults") at 
./assert/assert.c:96
#6  0x0000760edd23b517 in __assert_fail 
(assertion=assertion@entry=0x5ba46ab79850 "res == ((void *)0) || 
result_status == PGRES_PIPELINE_ABORTED",
     file=file@entry=0x5ba46ab6fcad "common.c", line=line@entry=1510,
     function=function@entry=0x5ba46ab9c780 <__PRETTY_FUNCTION__.3> 
"discardAbortedPipelineResults") at ./assert/assert.c:105
#7  0x00005ba46ab2bd40 in discardAbortedPipelineResults () at common.c:1510
#8  ExecQueryAndProcessResults (query=query@entry=0x5ba4a2ec1e10 "SELECT 
'val1';", elapsed_msec=elapsed_msec@entry=0x7ffeb07262a8,
     svpt_gone_p=svpt_gone_p@entry=0x7ffeb07262a7, 
is_watch=is_watch@entry=false, min_rows=min_rows@entry=0, 
opt=opt@entry=0x0, printQueryFout=0x0)
     at common.c:1811
#9  0x00005ba46ab2983f in SendQuery (query=0x5ba4a2ec1e10 "SELECT 
'val1';") at common.c:1212
#10 0x00005ba46ab3f66a in MainLoop (source=source@entry=0x760edd4038e0 
<_IO_2_1_stdin_>) at mainloop.c:515
#11 0x00005ba46ab23f2a in process_file (filename=0x0, 
use_relative_path=use_relative_path@entry=false) at command.c:4870
#12 0x00005ba46ab1e9d9 in main (argc=<optimized out>, 
argv=0x7ffeb07269d8) at startup.c:420




06.03.2025 11:20, Michael Paquier пишет:
> On Wed, Mar 05, 2025 at 03:25:12PM +0100, Daniel Verite wrote:
>>     Anthonin Bonnefoy wrote:
>>> I do see the idea to make it easier to convert existing scripts into
>>> using pipelining. The main focus of the initial implementation was
>>> more on protocol regression tests with psql, so that's not necessarily
>>> something I had in mind.
>> Understood. Yet pipelining can accelerate considerably certain scripts
>> when client-server latency is an issue. We should expect end users to
>> benefit from it too.
> That was not a test case we had in mind originally here, but if it is
> possible to keep the implementation simple while supporting your
> demand, well, let's do it.  If it's not that straight-forward, let's
> use the new meta-command, forbidding \g and \gx based on your
> arguments from upthread.
> --
> Michael



Re: Add Pipelining support in psql

From
Michael Paquier
Date:
On Wed, Apr 16, 2025 at 09:31:59PM +0700, a.kozhemyakin wrote:
> After commit 2cce0fe on master
>
> ERROR:  unexpected message type 0x50 during COPY from stdin
> CONTEXT:  COPY psql_pipeline, line 1
> Pipeline aborted, command did not run
> psql: common.c:1510: discardAbortedPipelineResults: Assertion `res == ((void
> *)0) || result_status == PGRES_PIPELINE_ABORTED' failed.
> Aborted (core dumped)

Reproduced here, thanks for the report.  I'll look at that at the
beginning of next week, adding an open item for now.
--
Michael

Attachment