Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote - Mailing list pgsql-hackers

From Álvaro Herrera
Subject Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote
Date
Msg-id 202504061734.mtvroeo3gn33@alvherre.pgsql
Whole thread Raw
In response to Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote
List pgsql-hackers
On 2025-Apr-06, Tom Lane wrote:

> I'd be 100% behind forbidding all ASCII control characters in all
> identifiers.  I can't see any situation in which that's a good thing,
> and I can think of plenty where it's a mistake (eg your editor
> decided to change space to tab) or done with underhanded intent.

Right.

> If we can cite the SQL standard then it's an entirely defensible
> restriction.

We can.  It says (in 5.2 <token> and <separator>)

<regular identifier> ::= <identifier body>
<identifier body> ::= <identifier start> [ <identifier part>... ]
<identifier part> ::= <identifier start> | <identifier extend>
<identifier start> ::= !! See the Syntax Rules.
<identifier extend> ::= !! See the Syntax Rules.

Syntax Rules
1) An <identifier start> is any character in the Unicode General Category
   classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, or “Nl”.
      NOTE 112 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
      “Lo”, and “Nl” are assigned to Unicode characters that are, respectively,
      upper-case letters, lower-case letters, title-case letters, modifier
      letters, other letters, and letter numbers.
2) An <identifier extend> is U+00B7, “Middle Dot”, or any character in the
   Unicode General Category classes “Mn”, “Mc”, “Nd”, or “Pc”.
      NOTE 113 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, and
      “Pc”, are assigned to Unicode characters that are, respectively,
      non-spacing marks, spacing combining marks, decimal numbers, and connector
      punctuations.

The class for control characters is "C", so there are allowed nowhere.

https://www.unicode.org/charts/script/

> Having said that, I'm not quite sure where we ought to implement
> the restriction, and it's possible that there are multiple places
> that would need to check.

Yeah, a general ban on control characters for all identifiers is harder
to implement than a restricted ban, because it probably involves the
lexer, and I'm not sure the resulting "syntax error" type of rejections
are going to be nice enough to users.  A C-function based rejection
seems more convenient at this stage.

> I concur that the day before feature freeze is not a good time to be
> designing this.  Let's defer.

Augh.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: FmgrInfo allocation patterns (and PL handling as staged programming)
Next
From: Steve Chavez
Date:
Subject: [PATCH] clarify palloc comment on quote_literal_cstr