Thread: Make psql ignore trailing semicolons in \sf, \ef, etc
We had a complaint (see [1], but it's not the first IIRC) about how psql doesn't behave very nicely if one ends \sf or allied commands with a semicolon: regression=# \sf sin(float8); ERROR: expected a right parenthesis This is a bit of a usability gotcha, since many other backslash commands are forgiving about trailing semicolons. I looked at the code and found that it's actually trying to ignore semicolons, by passing semicolon = true to psql_scan_slash_option. But that fails to work because it's also passing type = OT_WHOLE_LINE, and the whole-line code path ignores the semicolon flag. Probably that's just because nobody needed to use that combination back in the day. There's another user of OT_WHOLE_LINE, exec_command_help, which also wants this behavior and has written its own stripping code to get it. That seems pretty silly, so here's a quick finger exercise to move that logic into psql_scan_slash_option. Is this enough of a bug to deserve back-patching? I'm not sure. regards, tom lane [1] https://www.postgresql.org/message-id/CAEs%3D6D%3DnwX2wm0hjkaw6C_LnqR%2BNFtnnzbSzeZq-xcfi_ooKSw%40mail.gmail.com diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 9103bc3465..2f74bfdc96 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1640,18 +1640,7 @@ exec_command_help(PsqlScanState scan_state, bool active_branch) if (active_branch) { char *opt = psql_scan_slash_option(scan_state, - OT_WHOLE_LINE, NULL, false); - size_t len; - - /* strip any trailing spaces and semicolons */ - if (opt) - { - len = strlen(opt); - while (len > 0 && - (isspace((unsigned char) opt[len - 1]) - || opt[len - 1] == ';')) - opt[--len] = '\0'; - } + OT_WHOLE_LINE, NULL, true); helpSQL(opt, pset.popt.topt.pager); free(opt); @@ -3165,6 +3154,10 @@ ignore_slash_filepipe(PsqlScanState scan_state) * This *MUST* be used for inactive-branch processing of any slash command * that takes an OT_WHOLE_LINE option. Otherwise we might consume a different * amount of option text in active and inactive cases. + * + * Note: although callers might pass "semicolon" as either true or false, + * we need not duplicate that here, since it doesn't affect the amount of + * input text consumed. */ static void ignore_slash_whole_line(PsqlScanState scan_state) diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l index 514977e59d..10a35dffc4 100644 --- a/src/bin/psql/psqlscanslash.l +++ b/src/bin/psql/psqlscanslash.l @@ -18,6 +18,8 @@ */ #include "postgres_fe.h" +#include <ctype.h> + #include "common.h" #include "psqlscanslash.h" @@ -640,7 +642,22 @@ psql_scan_slash_option(PsqlScanState state, termPQExpBuffer(&mybuf); return NULL; case xslashwholeline: - /* always okay */ + /* + * In whole-line mode, we interpret semicolon = true as stripping + * trailing whitespace as well as semi-colons; this gives the + * nearest equivalent to what semicolon = true does in normal + * mode. Note there's no concept of quoting in this mode. + */ + if (semicolon) + { + while (mybuf.len > 0 && + (mybuf.data[mybuf.len - 1] == ';' || + (isascii((unsigned char) mybuf.data[mybuf.len - 1]) && + isspace((unsigned char) mybuf.data[mybuf.len - 1])))) + { + mybuf.data[--mybuf.len] = '\0'; + } + } break; default: /* can't get here */ diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out index 5d61e4c7bb..4f3fd46420 100644 --- a/src/test/regress/expected/psql.out +++ b/src/test/regress/expected/psql.out @@ -5323,7 +5323,7 @@ END LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT COST 1 1 RETURN ($2 + $1) -\sf ts_debug(text) +\sf ts_debug(text); CREATE OR REPLACE FUNCTION pg_catalog.ts_debug(document text, OUT alias text, OUT description text, OUT token text, OUTdictionaries regdictionary[], OUT dictionary regdictionary, OUT lexemes text[]) RETURNS SETOF record LANGUAGE sql diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql index f199d624d3..c997106b9f 100644 --- a/src/test/regress/sql/psql.sql +++ b/src/test/regress/sql/psql.sql @@ -1315,7 +1315,7 @@ drop role regress_psql_user; \sf information_schema._pg_index_position \sf+ information_schema._pg_index_position \sf+ interval_pl_time -\sf ts_debug(text) +\sf ts_debug(text); \sf+ ts_debug(text) -- AUTOCOMMIT
On Mon Jan 8, 2024 at 2:48 PM CST, Tom Lane wrote: > We had a complaint (see [1], but it's not the first IIRC) about how > psql doesn't behave very nicely if one ends \sf or allied commands > with a semicolon: > > regression=# \sf sin(float8); > ERROR: expected a right parenthesis > > This is a bit of a usability gotcha, since many other backslash > commands are forgiving about trailing semicolons. I looked at > the code and found that it's actually trying to ignore semicolons, > by passing semicolon = true to psql_scan_slash_option. But that > fails to work because it's also passing type = OT_WHOLE_LINE, > and the whole-line code path ignores the semicolon flag. Probably > that's just because nobody needed to use that combination back in > the day. There's another user of OT_WHOLE_LINE, exec_command_help, > which also wants this behavior and has written its own stripping > code to get it. That seems pretty silly, so here's a quick finger > exercise to move that logic into psql_scan_slash_option. > > Is this enough of a bug to deserve back-patching? I'm not sure. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/CAEs%3D6D%3DnwX2wm0hjkaw6C_LnqR%2BNFtnnzbSzeZq-xcfi_ooKSw%40mail.gmail.com > + /* > + * In whole-line mode, we interpret semicolon = true as stripping > + * trailing whitespace as well as semi-colons; this gives the > + * nearest equivalent to what semicolon = true does in normal > + * mode. Note there's no concept of quoting in this mode. > + */ > + if (semicolon) > + { > + while (mybuf.len > 0 && > + (mybuf.data[mybuf.len - 1] == ';' || > + (isascii((unsigned char) mybuf.data[mybuf.len - 1]) && > + isspace((unsigned char) mybuf.data[mybuf.len - 1])))) > + { > + mybuf.data[--mybuf.len] = '\0'; > + } > + } Seems like if there was going to be any sort of casting, it would be to an int, which is what the man page says for these two function, though isascii(3) explicitly mentions "unsigned char." Small English nit-pick: I would drop the hyphen between semi and colons. As for backpatching, seems useful in the sense that people can write the same script for all supported version of Postgres using the relaxed syntax. -- Tristan Partin Neon (https://neon.tech)
"Tristan Partin" <tristan@neon.tech> writes: > On Mon Jan 8, 2024 at 2:48 PM CST, Tom Lane wrote: >> + (isascii((unsigned char) mybuf.data[mybuf.len - 1]) && >> + isspace((unsigned char) mybuf.data[mybuf.len - 1])))) > Seems like if there was going to be any sort of casting, it would be to > an int, which is what the man page says for these two function, though > isascii(3) explicitly mentions "unsigned char." Casting to unsigned char is our standard pattern for using these functions. If "char" is signed (which is the only case in which this changes anything) then casting to int would imply sign-extension of the char's high-order bit, which is exactly what must not happen in order to produce a legal value to be passed to these functions. POSIX says: The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined. If we cast to unsigned char, then the subsequent implicit cast to int will do zero-extension which is what we need. > Small English nit-pick: I would drop the hyphen between semi and colons. Me too, except that it's spelled like that in nearby comments. Shall I change them all? regards, tom lane
On Mon, 2024-01-08 at 15:48 -0500, Tom Lane wrote: > Is this enough of a bug to deserve back-patching? I'm not sure. I like the patch, but I wouldn't back-patch it. I'd call the current behavior a slight inconsistency rather than an outright bug, and I think that we should be conservative with back-patching. Yours, Laurenz Albe
On Mon Jan 8, 2024 at 6:08 PM CST, Tom Lane wrote: > "Tristan Partin" <tristan@neon.tech> writes: > > On Mon Jan 8, 2024 at 2:48 PM CST, Tom Lane wrote: > >> + (isascii((unsigned char) mybuf.data[mybuf.len - 1]) && > >> + isspace((unsigned char) mybuf.data[mybuf.len - 1])))) > > > Seems like if there was going to be any sort of casting, it would be to > > an int, which is what the man page says for these two function, though > > isascii(3) explicitly mentions "unsigned char." > > Casting to unsigned char is our standard pattern for using these > functions. If "char" is signed (which is the only case in which > this changes anything) then casting to int would imply sign-extension > of the char's high-order bit, which is exactly what must not happen > in order to produce a legal value to be passed to these functions. > POSIX says: > > The c argument is an int, the value of which the application shall > ensure is a character representable as an unsigned char or equal > to the value of the macro EOF. If the argument has any other > value, the behavior is undefined. > > If we cast to unsigned char, then the subsequent implicit cast to int > will do zero-extension which is what we need. Thanks for the explanation. > > Small English nit-pick: I would drop the hyphen between semi and colons. > > Me too, except that it's spelled like that in nearby comments. > Shall I change them all? I'll leave it up to you. Patch looks good as-is. -- Tristan Partin Neon (https://neon.tech)
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Mon, 2024-01-08 at 15:48 -0500, Tom Lane wrote: >> Is this enough of a bug to deserve back-patching? I'm not sure. > I like the patch, but I wouldn't back-patch it. I'd call the current > behavior a slight inconsistency rather than an outright bug, and I think > that we should be conservative with back-patching. Nobody spoke in favor of back-patching, so committed to HEAD only. regards, tom lane