Thread: Re: Add Pipelining support in psql
On Wed, 27 Nov 2024 at 10:50, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote: > With \bind, \parse, \bind_named and \close, it is possible to issue > queries from psql using the extended protocol. However, it wasn't > possible to send those queries using pipelining and the only way to > test pipelined queries was through pgbench's tap tests. Big +1. Not being able to use psql for even the most basic pipeline tests has definitely been an annoyance of mine. I played around quickly with this patch and it works quite well. A few things that would be nice improvements I think. Feel free to change the command names: 1. Add a \flush command that calls PQflush 2. Add a \flushrequest command that calls PQsendFlushRequest 3. Add a \getresult command so you can get a result from a query without having to close the pipeline To be clear, not having those additional commands isn't a blocker for this patch imho, but I'd definitely miss them if they weren't there when I would be using it.
On Thu, 28 Nov 2024 at 07:43, Michael Paquier <michael@paquier.xyz> wrote: > Hmm. The start, end and sync meta-commands are useful for testing. I > find the flush control a bit less interesting, TBH. > > What would you use these for? I guess mostly for interactively playing around with pipelining from psql. But I think \getresult would be useful for testing too. This would allow us to test that we can read part of the pipeline, without sending a sync and waiting for everything. To be clear \flushrequest and \flush would be necessary to make \getresult work reliably.
On Tue, 10 Dec 2024 at 11:44, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote: > num_queries (2nd element in the pipeline status prompt) is now used to > track queued queries that were not flushed (with a flush request or > sync) to the server. It used to count both unflushed queries and > flushed queries. I skimmed the code a bit, but haven't looked closely at it yet. I did try the patch out though. My thoughts below: I think that new prompt is super useful, so useful in fact that I'd suggest linking to it from the \startpipeline docs. I do think that the wording in the docs could be a bit more precise: 1. The columns are not necessarily queries, they are messages or commands. i.e. \parse and \bind_named both count as 1, even though they form one query together. 2. messages not followed by \sync and \flushrequest, could very well already "be sent to the server" (if the client buffer got full, or in case of manual \flush). The main thing that \sync and \flushrequest do is make sure that the server actually sends its own result back, even if its buffer is not full yet. The main feedback I have after playing around with this version is that I'd like to have a \getresult (without the s), to only get a single result. So that you can get results one by one, possibly interleaved with some other queries again. One thing I'm wondering is how useful the num_syncs count is in the pipeline prompt, since you never really wait for a sync. Regarding the usefulness of \flush. I agree it's not as useful as I thought, because indeed \getresults already flushes everything. But it's not completely useless either. The main way I was able to use it interactively in a somewhat interesting way was to send it after a long running query, and then while that's processing type up the next query after it. Something like the following: localhost jelte@postgres:5432-26274= #> \startpipeline Time: 0.000 ms localhost jelte@postgres:5432-26274= #|0,0,0|> select pg_sleep(5) \bind \g Time: 0.000 ms localhost jelte@postgres:5432-26274= #|0,1,0|*> \flush Time: 0.000 ms localhost jelte@postgres:5432-26274= #|0,1,0|*> select 1 \bind \g Time: 0.000 ms localhost jelte@postgres:5432-26274= #|0,2,0|*> \syncpipeline Time: 0.000 ms localhost jelte@postgres:5432-26274= #|1,0,2|*> \getresults pg_sleep ────────── (1 row) ?column? ────────── 1 (1 row) Time: 0.348 ms
On Tue, Feb 18, 2025 at 06:34:20PM +0100, Anthonin Bonnefoy wrote: > On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier <michael@paquier.xyz> wrote: >> The tests in psql.sql are becoming really long. Perhaps it would be >> better to split that into its own file, say psql_pipeline.sql? The >> input file is already 2k lines, you are adding 15% more lines to that. > > Agreed, I wasn't sure if this was enough to warrant a dedicated test > file. This is now separated in psql_pipeline.sql. You have forgotten the expected output. Not a big issue as the input was sent. >> What is the reasoning here behind this restriction? \gx is a wrapper >> of \g with expanded mode on, but it is also possible to call \g with >> expanded=on, bypassing this restriction. > > The issue is that \gx enables expanded mode for the duration of the > query and immediately reset it in sendquery_cleanup. With pipelining, > the command is piped and displaying is done by either \endpipeline or > \getresults, so the flag change has no impact. Forbidding it was a way > to make it clearer that it won't have the expected effect. If we > wanted a similar feature, this would need to be done with something > like \endpipelinex or \getresultsx. Hmm, okay. If one wants one mode or the other it is always possible to force one with \pset expanded when getting the results. Not sure if there is any need for new specific commands for these two printing the results. Another option would be to authorize the command to run, but perhaps your option is just better as per the enforced behavior in the output. So fine by me. There is coverage so we'll know if there are arguments in favor of authorizing the command, if need be. > I've split the patch and created the 3 special variables: > PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT. Thanks. Looks sensible now. > For requested_results, I don't think there's value in exposing it > since it is used as an exit condition and thus will always be 0 > outside of ExecQueryAndProcessResults. I've been playing with this patch and this configuration: \set PROMPT1 '=(pipeline=%P,sync=%:PIPELINE_SYNC_COUNT:,cmd=%:PIPELINE_COMMAND_COUNT:,res=%:PIPELINE_RESULT_COUNT:)%#' That's long, but seeing the evolution of the pipeline status is pretty cool depending on the meta-commands used. While testing, I have been able to run into an assertion failure by adding some tests in psql.sql to check for the case of inactive branches for \if. For example: --- a/src/test/regress/sql/psql.sql +++ b/src/test/regress/sql/psql.sql @@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two; \echo arg1 arg2 arg3 arg4 arg5 \echo arg1 \encoding arg1 + \endpipeline \errverbose And the report: +psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed. We should have tests for all new six meta-commands in psql.sql. MainLoop() is wrong when in pipeline mode for inactive branches. -- Michael
Attachment
On Thu, Feb 20, 2025 at 9:02 AM Michael Paquier <michael@paquier.xyz> wrote: > You have forgotten the expected output. Not a big issue as the input > was sent. I was writing the mail with the missing file when you sent this mail. This is fixed. > While testing, I have been able to run into an assertion failure by > adding some tests in psql.sql to check for the case of inactive > branches for \if. For example: > --- a/src/test/regress/sql/psql.sql > +++ b/src/test/regress/sql/psql.sql > @@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two; > \echo arg1 arg2 arg3 arg4 arg5 > \echo arg1 > \encoding arg1 > + \endpipeline > \errverbose > > And the report: > +psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed. > > We should have tests for all new six meta-commands in psql.sql. > MainLoop() is wrong when in pipeline mode for inactive branches. Ha yeah, I forgot about the inactive branches. I've added the new commands and fixed the behaviour. A small issue I've noticed while testing: When a pipeline has at least one queue command, pqClearConnErrorState isn't called in PQsendQueryStart and errors are appended. For example: \startpipeline select 1 \bind \g select 1; PQsendQuery not allowed in pipeline mode select 1; PQsendQuery not allowed in pipeline mode PQsendQuery not allowed in pipeline mode This looks more like an issue on libpq's side as there's no way to reset or advance the errorReported from ExecQueryAndProcessResults (plus PQerrorMessage seems to ignore errorReported). I've added an additional test to track this behaviour for now as this would probably be better discussed in a dedicated thread.
Attachment
On Thu, Feb 20, 2025 at 10:29:33AM +0100, Anthonin Bonnefoy wrote: > Ha yeah, I forgot about the inactive branches. I've added the new > commands and fixed the behaviour. And I did not notice that it was as simple as forcing the status in the routines for the new meta-commands, as we do for the existing ones. Noted. > This looks more like an issue on libpq's side as there's no way to > reset or advance the errorReported from ExecQueryAndProcessResults > (plus PQerrorMessage seems to ignore errorReported). I've added an > additional test to track this behaviour for now as this would probably > be better discussed in a dedicated thread. I am not sure if we should change that, actually, as it does not feel completely wrong to stack these errors. That's a bit confusing, sure. Perhaps a new libpq API to retrieve stacked errors when we are in pipeline mode would be more adapted? The design would be different. Anyway, I've stared at the result processing code for a couple of hours, and the branches we're taking for the pipeline modes seem to be rather right the way you have implemented them. The docs, comments and tests needed quite a few tweaks and edits to be more consistent. There were some grammar mistakes, some frenchisms. I'm hoping that there won't be any issues, but let's be honest, I am definitely sure there will be some more tuning required. It comes down to if we want this set of features, and I do to be able to expand tests in core with the extended query protocol and pipelines, knowing that there is an ask for out-of-core projects. This one being reachable with a COPY gave me a huge smile: +message type 0x5a arrived from server while idle So let's take one step here, I have applied the main patch. I am really excited by the possibilities all this stuff offers. Attached are the remaining pieces, split here because they are different bullet points: - Tests for implicit transactions with various commands, with some edits. - Prompt support, with more edits. I'm putting these on standby for a few days, to let the buildfarm digest the main change. -- Michael
Attachment
On Fri, Feb 21, 2025 at 11:33:41AM +0900, Michael Paquier wrote: > Attached are the remaining pieces, split here because they are > different bullet points: > - Tests for implicit transactions with various commands, with some > edits. > - Prompt support, with more edits. > > I'm putting these on standby for a few days, to let the buildfarm > digest the main change. Initial digestion has gone well. The remaining pieces have been done as 3ce357584e79 and a4e986ef5a46. For the prompt part, I have added a couple of tests with \echo and the variables. The patch felt incomplete without these. Perhaps we could extend them more, at least we have a start point. -- Michael
Attachment
On Tue, Feb 25, 2025 at 2:11 AM Michael Paquier <michael@paquier.xyz> wrote: > For the prompt part, I have added a > couple of tests with \echo and the variables. The patch felt > incomplete without these. Perhaps we could extend them more, at least > we have a start point. Good catch. I guess that's also another benefit of having special variables, as it makes it easier to check the pipeline state.
On Tue, 25 Feb 2025 at 02:11, Michael Paquier <michael@paquier.xyz> wrote: > Initial digestion has gone well. One thing I've noticed is that \startpipeline throws warnings when copy pasting multiple lines. It seems to still execute everything as expected though. As an example you can copy paste this tiny script: \startpipeline select pg_sleep(5) \bind \g \endpipeline And then it will show these "extra argument ... ignored" warnings \startpipeline: extra argument "select" ignored \startpipeline: extra argument "pg_sleep(5)" ignored
On Thu, Mar 6, 2025 at 5:20 AM Michael Paquier <michael@paquier.xyz> wrote: > That was not a test case we had in mind originally here, but if it is > possible to keep the implementation simple while supporting your > demand, well, let's do it. If it's not that straight-forward, let's > use the new meta-command, forbidding \g and \gx based on your > arguments from upthread. I think the new meta-command is a separate issue from allowing ';' to push in a pipeline. Any time there's a change or an additional format option added to \g, it will need to be forbidden for pipelining. The \sendpipeline meta-command will help keep those exceptions low since the whole \g will be forbidden. Another possible option would be to allow both \g and \gx, but send a warning like "printing options within a pipeline will be ignored" if those options are used, similar to "SET LOCAL" warning when done outside of a transaction block. That would have the benefit of making existing scripts using \g and \gx compatible. For using ';' to push commands in a pipeline, I think it should be fairly straightforward. I can try to work on that next week (I'm currently chasing a weird memory context bug that I need to finish first). On Fri, Mar 7, 2025 at 1:05 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > One thing I've noticed is that \startpipeline throws warnings when > copy pasting multiple lines. It seems to still execute everything as > expected though. As an example you can copy paste this tiny script: > > \startpipeline > select pg_sleep(5) \bind \g > \endpipeline > > And then it will show these "extra argument ... ignored" warnings > > \startpipeline: extra argument "select" ignored > \startpipeline: extra argument "pg_sleep(5)" ignored It looks like an issue with libreadline. At least, I've been able to reproduce the warnings and 'readline(prompt);' returns everything as a single line, with the \n inside the string. This explains why what is after \startpipeline is processed as arguments. This can also be done with: select 1 \bind \g select 2 \bind \g And somehow, I couldn't reproduce the issue anymore once I've compiled and installed libreadline with debug symbols.
Jelte Fennema-Nio wrote: > As an example you can copy paste this tiny script: > > \startpipeline > select pg_sleep(5) \bind \g > \endpipeline > > And then it will show these "extra argument ... ignored" warnings > > \startpipeline: extra argument "select" ignored > \startpipeline: extra argument "pg_sleep(5)" ignored It happens with other metacommands as well, and appears to depend on a readline option that is "on" by default since readline-8.1 [1] enable-bracketed-paste When set to ‘On’, Readline configures the terminal to insert each paste into the editing buffer as a single string of characters, instead of treating each character as if it had been read from the keyboard. This is called putting the terminal into bracketed paste mode; it prevents Readline from executing any editing commands bound to key sequences appearing in the pasted text. The default is ‘On’. This behavior of the metacommand complaining about arguments on the next line also happens if using \e and typing this sequence of commands in the editor. In that case readline is not involved. There might be something to improve here, because a metacommand cannot take its argument from the next line, and yet that's what the error messages somewhat imply. But that issue is not related to the new pipeline metacommands. [1] https://tiswww.case.edu/php/chet/readline/readline.html#index-enable_002dbracketed_002dpaste Best regards, -- Daniel Vérité https://postgresql.verite.pro/
Here is a new patch set: 0001: This introduces the \sendpipeline meta-command and forbid \g in a pipeline. This is to fix the formatting options of \g that are not supported in a pipeline. 0002: Allows ';' to send a query using extended protocol when within a pipeline by using PQsendQueryParams with 0 parameters. It is not possible to send parameters with extended protocol this way and everything will be propagated through the query string, similar to a simple query.
Attachment
On Mon, Mar 17, 2025 at 10:50:50AM +0100, Anthonin Bonnefoy wrote: > 0001: This introduces the \sendpipeline meta-command and forbid \g in > a pipeline. This is to fix the formatting options of \g that are not > supported in a pipeline. > > 0002: Allows ';' to send a query using extended protocol when within a > pipeline by using PQsendQueryParams with 0 parameters. It is not > possible to send parameters with extended protocol this way and > everything will be propagated through the query string, similar to a > simple query. Thanks for sending a new patch set. I was planning to look at the situation tomorrow, and you have beaten me to it. The split makes sense, and I'm OK with 0001. 0002 is going to require a much closer lookup. -- Michael
Attachment
On Mon, Mar 17, 2025 at 10:50:50AM +0100, Anthonin Bonnefoy wrote: > 0001: This introduces the \sendpipeline meta-command and forbid \g in > a pipeline. This is to fix the formatting options of \g that are not > supported in a pipeline. - count -------- - 4 -(1 row) This removal done in the regression tests was not intentional. I have done some reordering of the code around the new meta-command so as things are ordered alphabetically, and applied the result. > 0002: Allows ';' to send a query using extended protocol when within a > pipeline by using PQsendQueryParams with 0 parameters. It is not > possible to send parameters with extended protocol this way and > everything will be propagated through the query string, similar to a > simple query. I like the simplicity of what you are doing here, relying on PSQL_SEND_QUERY being the default so as we use PQsendQueryParams() with no parameters rather than PQsendQuery() when the pipeline mode is not off. How about adding a check on PIPELINE_COMMAND_COUNT when sending a query through this path? Should we check for more scenarios with syncs and flushes as well when sending these queries? -- Michael
Attachment
On Tue, Mar 18, 2025 at 1:50 AM Michael Paquier <michael@paquier.xyz> wrote: > - count > -------- > - 4 > -(1 row) > > This removal done in the regression tests was not intentional. Yes, thanks for fixing that. > How about adding a check on PIPELINE_COMMAND_COUNT when sending a > query through this path? Should we check for more scenarios with > syncs and flushes as well when sending these queries? I've added additional tests when piping queries with ';': - I've reused the same scenario with \sendpipeline: single query, multiple queries, flushes, syncs, using COPY... - Using ';' will replace the unnamed prepared statement. It's a bit different from expected as a simple query will delete the unnamed prepared statement. - Sending an extended query prepared with \bind using a ';' on a newline, though this is not specific to pipelining. The scanned semicolon triggers the call to SendQuery, processing the buffered extended query. It's a bit unusual but that's the current behaviour.
Attachment
On Tue, 18 Mar 2025 at 09:55, Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote: > I've added additional tests when piping queries with ';': > - I've reused the same scenario with \sendpipeline: single query, > multiple queries, flushes, syncs, using COPY... > - Using ';' will replace the unnamed prepared statement. It's a bit > different from expected as a simple query will delete the unnamed > prepared statement. > - Sending an extended query prepared with \bind using a ';' on a > newline, though this is not specific to pipelining. The scanned > semicolon triggers the call to SendQuery, processing the buffered > extended query. It's a bit unusual but that's the current behaviour. One thing that comes to mind that I think would be quite useful and pretty easy to implement if we have this functionality within a pipeline: An \extended command. That puts psql in "extended protocol mode" (without enabling pipelining). In "extended protocol mode" all queries would automatically be sent using PQsendQueryParams. That would remove the need to use \bind anymore outside of a pipeline either.
Anthonin Bonnefoy wrote: > 0002: Allows ';' to send a query using extended protocol when within a > pipeline by using PQsendQueryParams It's a nice improvement! > with 0 parameters. It is not > possible to send parameters with extended protocol this way and > everything will be propagated through the query string, similar to a > simple query. It's actually possible to use parameters \startpipeline \bind 'foo' select $1; \endpipeline ?column? ---------- foo (1 row) I suspect there's a misunderstanding that \bind can only be placed after the query, because it's always written like that in the regression tests, and in the documentation. But that's just a notational preference. Best regards, -- Daniel Vérité https://postgresql.verite.pro/
On Tue, Mar 18, 2025 at 10:36:28AM +0100, Daniel Verite wrote: > It's actually possible to use parameters > > \startpipeline > \bind 'foo' > select $1; > \endpipeline > > ?column? > ---------- > foo > (1 row) > > I suspect there's a misunderstanding that \bind can only be placed > after the query, because it's always written like that in the regression > tests, and in the documentation. > But that's just a notational preference. Nice trick, unrelated to pipelines. I don't think that we have anything testing this specific pattern for \bind. At quick glance all our test use \bind at the end of a query string. Perhaps we should? -- Michael
Attachment
On Tue, Mar 18, 2025 at 10:27:38AM +0100, Jelte Fennema-Nio wrote: > One thing that comes to mind that I think would be quite useful and > pretty easy to implement if we have this functionality within a > pipeline: An \extended command. That puts psql in "extended protocol > mode" (without enabling pipelining). In "extended protocol mode" all > queries would automatically be sent using PQsendQueryParams. That > would remove the need to use \bind anymore outside of a pipeline > either. How does that help when passing parameter values? \bind is here to be able to pass down parameter values to queries that are prepared, so we cannot bypass it as the parameter values need to be passed to the \bind meta-command itself. Perhaps an \extended command that behaves outside a pipeline makes sense to force the use of queries without parameters to use the extended mode, but I cannot get much excited about the concept knowing all the meta-commands we have now (not talking about the pipeline part, which is different, as we can treat queries in batches). -- Michael
Attachment
On Tue, Mar 18, 2025 at 09:55:21AM +0100, Anthonin Bonnefoy wrote: > I've added additional tests when piping queries with ';': > - I've reused the same scenario with \sendpipeline: single query, > multiple queries, flushes, syncs, using COPY... > - Using ';' will replace the unnamed prepared statement. It's a bit > different from expected as a simple query will delete the unnamed > prepared statement. > - Sending an extended query prepared with \bind using a ';' on a > newline, though this is not specific to pipelining. The scanned > semicolon triggers the call to SendQuery, processing the buffered > extended query. It's a bit unusual but that's the current behaviour. The tests could be much more organized, particularly for the "sinple" and "multiple" and COPY cases, rather than being treated as two different groups at different locations of psql_pipeline.sql. I've spent some time reorganizing all that. A second thing that was a bit itchy is the use of ";" for what's a semicolon, and we use this term in the psql docs to refer to queries terminated by that. The whole paragraph could be simplified a bit more, mentioning that everything in a pipeline uses the extended protocol, while \bind & co are more like options. The description of PIPELINE_COMMAND_COUNT could be simpler, and the part about the pending results can be more general now so I've removed it. With all that set, I've applied the patch. If you have more suggestions, please feel free to mention them. -- Michael
Attachment
Michael Paquier wrote: > Perhaps an \extended command that behaves outside a pipeline makes > sense to force the use of queries without parameters to use the > extended mode, but I cannot get much excited about the concept knowing > all the meta-commands we have now (not talking about the pipeline > part, which is different, as we can treat queries in batches). When psql started supporting the extended query protocol, the question of enabling it more globally rather than query-by-query was discussed a bit [1]. The idea was to switch to it with a setting or a variable rather than a metacommand. Some pros and cons were mentioned in the thread, but on the whole it was not convincing enough to get implemented. [1] https://www.postgresql.org/message-id/e8dd1cd5-0e04-3598-0518-a605159fe314%40enterprisedb.com Best regards, -- Daniel Vérité https://postgresql.verite.pro/