Thread: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE
The doc for "quote_ident()" says this: « https://www.postgresql.org/docs/14/functions-string.html Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only ifnecessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properlydoubled. » B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's actualargument) and an identifier (what it returns). Some of you flatly reject (borrowing a phrase from Tom) the distinctionbetween these two terms of art. Oh well… Try this: create table dog$(n int); -- OK create table $dog(n int); -- Bad create table "$dog"(n int); -- OK These outcomes are consistent with the rules that say when a proposed name needs to be double-quoted to form its identifierin a SQL statement (or PL/pgSQL source text). So it's correct for this to return FALSE: select '$dog' = quote_ident('$dog'); But it's incorrect w.r.t. "quotes are added only if necessary" for this to return FALSE: select 'dog$' = quote_ident('dog$'); "format()" shows the same error when you use the %I placeholder. I suppose that "format()" and "quote_ident()" share thesame underlying implementation. select format('What happens with %I?', 'dog'); -- double quotes are not added select format('What happens with %I?', 'dog$'); -- double quotes are added
The doc for "quote_ident()" says this:
«
https://www.postgresql.org/docs/14/functions-string.html
Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled.
»
But it's incorrect w.r.t. "quotes are added only if necessary" for this to return FALSE:
select 'dog$' = quote_ident('dog$');
"David G. Johnston" <david.g.johnston@gmail.com> writes: > So I can see an argument for the existing behavior. It doesn't seem worth > changing in any case. And I don't really see the documentation being > improved by covering this corner case in detail when the current behavior > is at least intuitive. quote_ident is a good bit more conservative than the core lexer about what is an "identifier character" --- it considers all non-ASCII characters as requiring quoting, too. For typical uses of quote_ident, I think this is good future-proofing: it makes it very much less likely that something quote_ident decides not to quote would be rejected by some future PG version (not to mention non-PG SQL databases). So I'm not really in a hurry to change the code. Maybe we should tweak the docs a bit. regards, tom lane
On 10/5/22 17:16, Bryn Llewellyn wrote: > The doc for "quote_ident()" says this: > > « > https://www.postgresql.org/docs/14/functions-string.html > Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added onlyif necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properlydoubled. > » > > B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's actualargument) and an identifier (what it returns). Some of you flatly reject (borrowing a phrase from Tom) the distinctionbetween these two terms of art. Oh well… What it returns is text, quoted if needed: create table "$dog"(n int); select pg_typeof(quote_ident('$dog')), quote_ident('$dog'); pg_typeof | quote_ident -----------+------------- text | "$dog" The way I see is if it where an actual identifier then this: select * from quote_ident('$dog'); quote_ident ------------- "$dog" would be equal to this: select * from "$dog"; n --- -- Adrian Klaver adrian.klaver@aklaver.com
> On Oct 5, 2022, at 17:16, Bryn Llewellyn <bryn@yugabyte.com> wrote: > B.t.w, the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's actualargument) and an identifier (what it returns). There is no first-class "identifier" type in PostgreSQL, so a function can't "return an identifier." It returns a stringwhich might, when placed into a larger string and processed as SQL, be lexically correct as an identifier. To be useful, quote_ident() can't fail to quote a string in such a way that it's not a valid identifier to PostgreSQL. Ifit quotes some strings that PostgreSQL would accept as identifiers without quotes, that's interesting, I guess, but I'mnot sure I see how it is a bug. Pragmatically, what this function is for it to assemble SQL statements as strings. Any review of its correctness needs tobe based on a situation where it can't be used for that.
david.g.johnston@gmail.com writes:So I can see an argument for the existing behavior. It doesn't seem worth changing in any case. And I don't really see the documentation being improved by covering this corner case in detail when the current behavior is at least intuitive.
quote_ident is a good bit more conservative than the core lexer about what is an "identifier character" --- it considers all non-ASCII characters as requiring quoting, too.
For typical uses of quote_ident, I think this is good future-proofing: it makes it very much less likely that something quote_ident decides not to quote would be rejected by some future PG version (not to mention non-PG SQL databases). So I'm not really in a hurry to change the code. Maybe we should tweak the docs a bit.
good future-proofing: it makes it very much less likely that something quote_ident decides not to quote would be rejected by some future PG version
adrian.klaver@aklaver.com wrote:The way I see is if it where an actual identifier then this:
select * from quote_ident('$dog');
quote_ident
-------------
"$dog"
would be equal to this:
select * from "$dog";
create database db;
grant all on database db to "my name";
\c db "my name"
create schema s;
create table s."silly name"(n int);
select relname
from pg_class c inner join pg_roles r on c.relowner = r.oid
where r.rolname = 'my name';
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
n | integer | | |
> xof@thebuild.com wrote: > > There is no first-class "identifier" type in PostgreSQL, so a function can't "return an identifier." It returns a stringwhich might, when placed into a larger string and processed as SQL, be lexically correct as an identifier. It takes huge discipline always to say "the text of an identifier" when the context of discourse is established. But, yes,I agree, when I wrote this: « ...the value of "quote_ident()" rests on the distinction between a name (what you provide with the function's actual argument)and an identifier (what it returns). » that the context of discourse was indeed established. I should have made no such assumption and written this instead: « ...the value of "quote_ident()" rests on the distinction between the text of a name (what you provide with the function'sactual argument) and the text of an identifier (what it returns). »
On 6 Oct 2022, at 16:04, Bryn Llewellyn wrote:
Does this imply a risk that a future PG version will go against the SQL standard and reject any non-latin name that is free of all punctuation characters, when used in the role of a SQL identifier, unless it's double quoted?
From my perspective this thread seems to miss the essential purposes behind quote_ident(). It is part of processing external/user input —
- Protecting from PostgreSQL which always maps everything to lower case before anything gets to the parser
- Protection against SQL injection when processing input from outside the trusted perimeter
Expecting an arbitrary string to be equal to itself after it has been through string processing code is risky unless that processing is part of the design, and quote_ident() was never designed to be part of any such arrangement.
Expanding —
- It is a complex question what happens to non-ASCII characters when they are mapped to lower case… sometimes this is a meaningful concept e.g., ∏ -> π, sometimes it is not, e.g., pick any Chinese/Korean/Japanese character. If the designer decides to use non-ASCII characters in the identifier they can… just double-quote those identifiers. If the designer wants to use camelCase ASCII they can, but the identifier will be camelcase inside the machine unless it was double quoted.
AFAIK we never really use quote_ident() except to process external input. As noted above this function is not designed to be part of an equality test when attempting system introspection, rather —
- The simple quote_ident() function can also be used to wrap untrusted input so it will not mess with the parser. It is used with quote_literal() when building dynamic SQL statements from user (i.e., untrusted) input.
From my perspective any use of these function outside their scope is just that… outside their scope, with no promise this usage will work or comply with any current or future standard, or imply anything useful about pretty much anything.
Maybe I’m oversimplifying but I believe the current functions work and do their specific jobs, and have nothing to do with anything else. So there is no surprise for me in the subject line. There is mild surprise the question was asked.
BTW this ignores whether or not PG mapping everything that’s not quoted to lower case is standards compliant. This whole topic would be simpler if the case was left alone but that’s a long road ago and I believe most of the bridges have been burnt :)
Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920
What we deal with in our ordinary professional work is SQL texts, program source texts, within these, SQL identifier texts,and then the conventional display of the results of SQL and program execution. To emphasize the point about resulstdisplay, try "\d s.*" in "\t off" mode. You'll see this: Table "s.silly name" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- n | integer | | | But this SQL text: drop table "s.silly name"; tells me that there's no such table. And, indeed, there isn't. Perhaps there's a table s."silly name". It is accidental if unfortunate that the <s.silly name> is quoted with ""'s in the \d output... Karsten
> Karsten.Hilbert@gmx.net wrote: > >> bryn@yugabyte.com wrote: >> >> What we deal with in our ordinary professional work is SQL texts, program source texts, within these, SQL identifier texts,and then the conventional display of the results of SQL and program execution. To emphasize the point about resulstdisplay, try "\d s.*" in "\t off" mode. >> >> [Allow me to re-write my bext few words, for sport.] >> >> But a SQL statement with the following text representation >> >> drop table "s.silly name” >> >> when presented to the psql CLI as a text command in its language by appending a semi-colon causes that program to respondwith some text that tells me that there's no such table. > > And, indeed, there isn't. Perhaps there's a table s."silly name”. It is accidental if unfortunate that the <s.silly name>is quoted with ""'s in the \d output... I believe that you and I see things the same way, Karsten. Yes, it does seem at times that some things in PG are accidental—andsometimes prove to be unfortunate. Naturally, my questions to this list concern what know that I don’t understand. (Who knows what I think that I understand—butdon’t. And who knows what I don’t even suspect is there waiting for me to try to understand.) To err is human—andit’s human, too, to misunderstand something when the proper understanding seems to be counter-intuitive. In my case,I developed my intuitions in the context of a different RDBMS with notions and terms of art that differ very significantlyfrom PG’s—even though the core SQL syntax and semantics are deceptively similar. Maybe I should send posts to this list like this: « I just wrote and tested a PG implementation to do something I hadn’t done before. I was very impressed at how straightforwardit was—and with how expressive of my intentions the text of my code seemed to be. Well done PG. » I do very often have that experience. But I’ve never seen a contribution along those lines in this forum—and I’ve formedthe impression that it would be out of place.
list.pg.gavan@pendari.org wrote:bryn@yugabyte.com wrote:
Does this imply a risk that a future PG version will go against the SQL standard and reject any non-latin name that is free of all punctuation characters, when used in the role of a SQL identifier, unless it's double quoted?
From my perspective this thread seems to miss the essential purposes behind quote_ident(). It is part of processing external/user input… Expecting an arbitrary string to be equal to itself after it has been through string processing code is risky unless that processing is part of the design, and quote_ident() was never designed to be part of any such arrangement.…If the designer decides to use non-ASCII characters in the identifier they can… just double-quote those identifiers.AFAIK we never really use quote_ident() except to process external input… There is no surprise for me in the subject line. There is mild surprise the question was asked.
Quotes are added only if necessary…
create table redaktør(n int); → table successfully created
(3) The PG doc on quote_ident says this in large friendly letters:Quotes are added only if necessary…Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created
david.g.johnston@gmail.com wrote:bryn@yugabyte.com wrote:
(3) The PG doc on quote_ident says this in large friendly letters:Quotes are added only if necessary…
Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:
select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created
Yep, and that is precisely what would make for a good bug report. Pointing out that "if necessary" does not indeed match up with the behavior. I suspect it is likely to get changed - everything else being discussed just detracts attention from it.
*BRIEFLY*
execute x('Dog');
execute x('农民');
returns text
language plpgsql
as $body$
declare
i0 text not null := quote_ident(name);
i1 text not null := regexp_replace(regexp_replace(i0, '^"', ''), '"$', '');
ident text not null := case(i1 = i0)
when true then '"'||i0||'"'
else i0
end case;
begin
return ident;
end;
$body$;
everything else being discussed just detracts attention from it.
begin
assert not exists (select 1 from pg_class where is_exotic(relname));
end;
$body$;
declare
stmt constant text not null := 'create table %s(n int);';
n text not null := '';
simple_names constant text[] not null := array['farmers', 'bønder', '农民', '农民、儿童', '农民,儿童'];
exotic_names constant text[] not null := array['T', 'farmers and children', '"x', 'y"', '农民 & 儿童'];
begin
foreach n in array simple_names loop
assert not is_exotic(n), 'Problem with '||n;
end loop;
foreach n in array exotic_names loop
assert is_exotic(n), 'Problem with '||n;
end loop;
end;
$body$;
returns boolean
language plpgsql
as $body$
declare
ident constant text not null := quote_ident_2(name);
stripped_ident constant text not null := regexp_replace(regexp_replace(ident, '^"', ''), '"$', '');
stmt constant text not null := 'deallocate %s';
begin
/*
This is the easy special case. The input is entirely letters inascii(7), or some
other character set that distinguishes between upper and lower case, and has at
least one upper case letter. This literal actual argument 'Dogs' is a sufficient
example. Here, the value of "stripped_ident" will be equal to the value of the
input. It's essential to test for this special case because the general test,
below, is (famously) happy to execute "deallocate Dog" and the like.
*/
if lower(name) <> stripped_ident then
return true;
end if;
/*
This is the harder general case. The input is anything at all that passes the
first test and that causes no error when used "as is" in the role of a SQL identifier.
Without access to the actual implementation of the SQL parser, the simplest practical
way to check this property is to use an empirical test. The "deallocate" statement is
fairly lightweight. The approach seems to bring no measureable performance penalty.
*/
begin
execute format(stmt, stripped_ident);
exception when invalid_sql_statement_name then null; end;
return false;
exception when syntax_error then
return true;
end;
$body$;
On 10/7/22 17:16, Bryn Llewellyn wrote: >> david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com> wrote: >> >>> bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote: >>> >>> (3) The PG doc on quote_ident says this in large friendly letters: >>> >>>> Quotes are added only if necessary… >>> >>> Notice "only". I now know that this is very much not the case. You >>> can compose an effectively unlimited number of different examples >>> along these lines: >>> >>> *select quote_ident('redaktør'); → "redaktør" >>> create table redaktør(n int); → table successfully created >>> * >> >> Yep, and that is precisely what would make for a good bug report. >> Pointing out that "if necessary" does not indeed match up with the >> behavior. I suspect it is likely to get changed - everything else >> being discussed just detracts attention from it. > > **BRIEFLY** > > What does "make for a good bug report" mean, David? Is it: Oh for goodness sake just file a bug here: https://www.postgresql.org/account/login/?next=/account/submitbug/ with the test case you showed in your previous post. As to below: 1) If you want a guaranteed outcome then you are in the wrong business. 2) Excessive verbiage(writing for writing's sake) takes away from any argument you are trying to make. Less is more. I have come to the point where I ignore most of what you write as it really does not go anywhere other then make noise. -- Adrian Klaver adrian.klaver@aklaver.com
> On Oct 7, 2022, at 17:16, Bryn Llewellyn <bryn@yugabyte.com> wrote: > What does "make for a good bug report" mean, David? Well, first of all, brevity. :-) > Is it: > > (1.1) You, David, or somebody else who has been officially recognized as a PG Contributor (https://www.postgresql.org/community/contributors/)will file the bug, granting it credibility with their imprimatur? > > or (1.2) I, Bryn, should file the bug. That's unnecessarily snarky. You are the one who feels that there is an issue, so you are the one who should report thebug. The community documents how to file a bug report here: https://www.postgresql.org/docs/current/bug-reporting.html People completely new to the community report bugs all the time; it's often their first contact with the community. I don't think there is widespread agreement that this is as big an issue as you clearly feel it is, so it's up to you topersuade the community that it's worth changing. You may not be successful, and the implementation of quote_ident staysthe same as it is now. If that makes PostgreSQL useless to you, well, contact the bursar for a full refund. > About "I suspect it is likely to get changed", do you mean: > > (2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to describeaccurately, precisely, and yet still tersely" for "unpredictable".) > > (2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and accurately?And for this option, the next question is "What's the spec of the changed implementation?" Make a proposal, and it can be debated. If you feel up to it, prepare a documentation patch, a code patch, or both. It'sjust SGML and C; they won't bite. But you really need to make a specific, concrete, "it should do this instead" proposal. And, with all due respect, there is a strong "you FOOLS" tone to your conversation on the list, as if the community are allmorons, stubborn, or both. I would really suggest you dial that back, because it's getting in the way of your often-reasonablepoints.
david.g.johnston@gmail.com wrote:bryn@yugabyte.com wrote:
(3) The PG doc on quote_ident says this in large friendly letters:Quotes are added only if necessary…
Notice "only". I now know that this is very much not the case. You can compose an effectively unlimited number of different examples along these lines:
select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created
Yep, and that is precisely what would make for a good bug report. Pointing out that "if necessary" does not indeed match up with the behavior. I suspect it is likely to get changed - everything else being discussed just detracts attention from it.
*BRIEFLY*What does "make for a good bug report" mean, David? Is it:(1.1) You, David, or somebody else who has been officially recognized as a PG Contributor (https://www.postgresql.org/community/contributors/) will file the bug, granting it credibility with their imprimatur?
or (1.2) I, Bryn, should file the bug.
About "I suspect it is likely to get changed", do you mean:(2.1) Change the doc to match quote_ident's current, unpredictable, behavior? (By all means, substitute "hard to describe accurately, precisely, and yet still tersely" for "unpredictable".)
(2.2) Change quote_ident's implementation—and then write new doc to describe the new behavior precisely and accurately? And for this option, the next question is "What's the spec of the changed implementation?"Notice that the issue is broader than just quote_ident, as this test shows:
I’m not convinced. The discussion has shown that some people are somewhat confused. For example, it was suggested that a name like this:
Compare this with the implementation that I thought, at first, that I could use when I simply believed the doc. (The subject line of this thread hits at the trivial SQL statement that would implement the "language SQL" function.) ANd if that's all there is to it, then when not ship is as a built-in?