Re: regclass and format('%I') - Mailing list pgsql-general

From David G. Johnston
Subject Re: regclass and format('%I')
Date
Msg-id CAKFQuwY0w-_gHcNsAEWVU83ti9joT55YaMhNRRmphVGT-pGeXA@mail.gmail.com
Whole thread Raw
In response to Re: regclass and format('%I')  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: regclass and format('%I')  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Sat, Mar 14, 2015 at 8:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jason Dusek <jason.dusek@gmail.com> writes:
> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass.

You're mistaken.  The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want.  But it's not
format()'s job to be smarter than the user.  If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.



TBH ​I'm not all that convinced by this argument​.

First, it is not being smarter than the user but allowing the user to generalize their problem so that they do not need to take the nature of the input data into account and can write a semantically meaningful pattern string instead.  The risk of them incorrectly choosing between %s or %I and opening a security hole seems higher - if not as widespread - than any string logic we could apply.

Second, presupposing the the transformation of the input must be a single "thing", and that we are doing the %I conversion based upon our own internal (or SQL's at the matter may be) definition of what it means to "quote an identifier", we should be capable of noticing that the provided input is already a single "thing" which has been escaped according to said rules.

​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how it is possible ​for format to lay in a value at %I that is any more insecure than the current behavior.  If the input string already matches that pattern then it could be output as-is without any additional risk and with the positive benefit of making this case work as expected.  The broken case then exists when someone actually intends to name their identifier <"something"> which then correctly becomes <"""something"""> on output.

Since there is a behavior change involved there needs to be a convincing use-case for the new behavior in order to justify the effort to change it.

David J.

pgsql-general by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: regclass and format('%I')
Next
From: Seref Arikan
Date:
Subject: Re: is there a relationship between indexes and temporary file creation?