On 2025-Apr-06, Tom Lane wrote:
> I'd be 100% behind forbidding all ASCII control characters in all
> identifiers. I can't see any situation in which that's a good thing,
> and I can think of plenty where it's a mistake (eg your editor
> decided to change space to tab) or done with underhanded intent.
Right.
> If we can cite the SQL standard then it's an entirely defensible
> restriction.
We can. It says (in 5.2 <token> and <separator>)
<regular identifier> ::= <identifier body>
<identifier body> ::= <identifier start> [ <identifier part>... ]
<identifier part> ::= <identifier start> | <identifier extend>
<identifier start> ::= !! See the Syntax Rules.
<identifier extend> ::= !! See the Syntax Rules.
Syntax Rules
1) An <identifier start> is any character in the Unicode General Category
classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, or “Nl”.
NOTE 112 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”,
“Lo”, and “Nl” are assigned to Unicode characters that are, respectively,
upper-case letters, lower-case letters, title-case letters, modifier
letters, other letters, and letter numbers.
2) An <identifier extend> is U+00B7, “Middle Dot”, or any character in the
Unicode General Category classes “Mn”, “Mc”, “Nd”, or “Pc”.
NOTE 113 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, and
“Pc”, are assigned to Unicode characters that are, respectively,
non-spacing marks, spacing combining marks, decimal numbers, and connector
punctuations.
The class for control characters is "C", so there are allowed nowhere.
https://www.unicode.org/charts/script/
> Having said that, I'm not quite sure where we ought to implement
> the restriction, and it's possible that there are multiple places
> that would need to check.
Yeah, a general ban on control characters for all identifiers is harder
to implement than a restricted ban, because it probably involves the
lexer, and I'm not sure the resulting "syntax error" type of rejections
are going to be nice enough to users. A C-function based rejection
seems more convenient at this stage.
> I concur that the day before feature freeze is not a good time to be
> designing this. Let's defer.
Augh.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)