Thread: Allow COPY to use parameters
Folks, Per discussion on IRC and some test code, COPY can't take parameters in a PREPARE, which feature would make it even more useful. To make this work, we'd need to: - do parse analysis immediately - parameterize all the options This doesn't seem like a gigantic lift. What say? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > Per discussion on IRC and some test code, COPY can't take parameters > in a PREPARE, which feature would make it even more useful. Uh, what? regression=# prepare foo as copy c from stdin; ERROR: syntax error at or near "copy" LINE 1: prepare foo as copy c from stdin; ^ Passing parameters into a utility statement of any stripe is a pretty considerable project, IIRC; the infrastructure isn't there. regards, tom lane
On Tue, May 24, 2016 at 02:16:40PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > > Per discussion on IRC and some test code, COPY can't take parameters > > in a PREPARE, which feature would make it even more useful. > > Uh, what? > > regression=# prepare foo as copy c from stdin; > ERROR: syntax error at or near "copy" > LINE 1: prepare foo as copy c from stdin; > ^ > > Passing parameters into a utility statement of any stripe is a > pretty considerable project, IIRC; the infrastructure isn't there. Maybe it should be, at least for some of the utility statements. Please find attached a patch which, according to Andrew Gierth, its author, just barely qualifies as a PoC. Yes, it's had to break a couple of messages in the regression tests. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
On Tue, May 24, 2016 at 3:42 PM, David Fetter <david@fetter.org> wrote: > On Tue, May 24, 2016 at 02:16:40PM -0400, Tom Lane wrote: >> David Fetter <david@fetter.org> writes: >> > Per discussion on IRC and some test code, COPY can't take parameters >> > in a PREPARE, which feature would make it even more useful. >> >> Uh, what? >> >> regression=# prepare foo as copy c from stdin; >> ERROR: syntax error at or near "copy" >> LINE 1: prepare foo as copy c from stdin; >> ^ >> >> Passing parameters into a utility statement of any stripe is a >> pretty considerable project, IIRC; the infrastructure isn't there. > > Maybe it should be, at least for some of the utility statements. > > Please find attached a patch which, according to Andrew Gierth, its > author, just barely qualifies as a PoC. Yes, it's had to break a > couple of messages in the regression tests. Hm, what's the use case preparing COPY? Note, the biggest pain point I have with COPY is not being able to parameterize the filename argument. That's pretty easily worked around with a plpgsql wrapper however. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > Hm, what's the use case preparing COPY? Note, the biggest pain point > I have with COPY is not being able to parameterize the filename > argument. Yeah. But putting a parameter symbol there (or anywhere in a utility statement that's not part of an analyzable subexpression) introduces a bunch of new issues. We don't really have a notion of a specific data type associated with most arguments of utility statements, which complicates parse analysis (specifically, resolution of the data type to be attributed to the parameter symbol). And we don't have provision for applying expression evaluation to such arguments, which is what you'd expect to need to do to obtain the value of a parameter symbol. In many cases you absolutely don't want expression evaluation to happen in utility statements, because it would complicate semantics significantly. Bottom line is that there's a pretty large can of worms hiding under this, and I am not eager to open it. regards, tom lane
>>>>> "Merlin" == Merlin Moncure <mmoncure@gmail.com> writes: Merlin> Hm, what's the use case preparing COPY? Preparing it isn't necessarily the point (and SQL-level PREPARE is not addressed in that patch). The point is to allow parameters, so that a client can do COPY (select blah from whatever where thing = $1) TO ... without having to mess around with interpolations. Merlin> Note, the biggest pain point I have with COPY is not being ableMerlin> to parameterize the filename argument. Certainly that can be handled too. -- Andrew (irc:RhodiumToad)
>>>>> "Merlin" == Merlin Moncure <mmoncure@gmail.com> writes: Merlin> Note, the biggest pain point I have with COPY is not being able Merlin> to parameterize the filename argument. Second proof of concept attached. This goes so far as to allow statements like: do $$ declare t text := 'bar'; f text := '/tmp/copytest.dat'; begin copy (select t, now()) to (f) csv header; end; $$; Also "copy foo to $1" or "copy (select * from foo where x=$1) to $2" and so on should work from PQexecParams or in a plpgsql EXECUTE. (I haven't tried to parameterize anything other than the filename and query. Also, it does not accept arbitrary expressions - only $n, '...' or a columnref. $n and '...' can have parens or not, but the columnref must have them due to conflicts with unreserved keywords PROGRAM, STDIN, STDOUT. This could be hacked around in other ways, I guess, if the parens are too ugly.) -- Andrew (irc:RhodiumToad)
Attachment
On 27 May 2016 at 15:17, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Merlin" == Merlin Moncure <mmoncure@gmail.com> writes:
Merlin> Note, the biggest pain point I have with COPY is not being able
Merlin> to parameterize the filename argument.
Second proof of concept attached. This goes so far as to allow
statements like:
do $$
declare t text := 'bar'; f text := '/tmp/copytest.dat';
begin copy (select t, now()) to (f) csv header; end;
$$;
Also "copy foo to $1" or "copy (select * from foo where x=$1) to $2" and
so on should work from PQexecParams or in a plpgsql EXECUTE.
(I haven't tried to parameterize anything other than the filename and
query. Also, it does not accept arbitrary expressions - only $n, '...'
or a columnref. $n and '...' can have parens or not, but the columnref
must have them due to conflicts with unreserved keywords PROGRAM, STDIN,
STDOUT. This could be hacked around in other ways, I guess, if the
parens are too ugly.)
In addition to it being generally nice to be able to send parameters to COPY (SELECT ...), I'd personally like very basic and limited parameter support for all utility statements so clients can use the v3 protocol and parameter binding without having to figure out the type of statement.
Currently users have to know "Oh, this is a utility statement and can't be parameterised, so instead of using my client driver's parameter binding I have to do string interpolation".
SQL injection detection static analysis and trace tools might complain, and it means we have to tell users to do exactly what they should otherwise never do, but there's not really a way around it right now.
To make things more complicated, some client drivers use the simple-query protocol and do client-side in-driver parameter interpolation. Like psycopg2. Such drivers cannot easily enable use of server-side binding and extended query protocol because right now they have to look at the SQL and figure out which statements can be bound server-side and which require client-side interpolation.
For drivers like PgJDBC that do support parameter binding the application programmer has to know they can't use it for some kinds of statement, and have to do string interpolation on the SQL string. Carefully, if they're doing anything with client supplied data.
IMO this is a bit of a wart in Pg, and it'd be nice to get rid of it... but I'm aware it might not be worth the server-side complexity of handling parameter binding in utility statements.
But.
People will want to be able to parameterise identifiers too. There's just no way to do that. For plannable or utility statements. You can't write
SELECT ... FROM $1;
or
COPY FROM $1 TO 'myfilename'
... and users have to know that and deal with it. By string interpolation. So even if parameter binding for literals in utility statements becomes possible it doesn't mean users never have to interpolate stuff.
I don't think it's much worse to say "you can't use parameter binding in this statement type" than it is to say "you can't use paremeter binding for identifiers". So I'm not really that excited to solve this unless there's a way to solve it _just_ for SQL expressions within utility statements like COPY (SELECT ...).
COPY FROM $1 TO 'myfilename'
Random thought - how about at least making the following work:
For the following pretend that "STRING" has the same behavior as the "format(...)" function.
EXECUTE STRING('COPY %I TO %L', 'testtable', 'testfile.txt');
<(conceptually similar to: EXECUTE format('COPY %I TO %L', 'testtable', 'testfile.txt')>
This doesn't solve the knowledge problem but at least provides an idiomatic way to execute dynamic SQL without pl/pgsql and without forcing the client library to take responsibility for proper data massaging in order to eliminate sql injection.
As an extension making:
PREPARE name STRING('COPY %I TO %L', ?, ?);
EXECUTE name STRING USING ('testtable', 'testfile.txt');
David J.
For the following pretend that "STRING" has the same behavior as the "format(...)" function.EXECUTE STRING('COPY %I TO %L', 'testtable', 'testfile.txt');
+1
We should make string sanitization easy so that people use it by default.
In the mean time, if you're just using psql, the new \gexec command will cover that
select format('COPY %I TO %L', 'testtable', 'testfile.txt')\gexec
but it won't help with any \-commands. And it won't work for schema-qualified table names, and if you're using COPY tab FROM PROGRAM, you're going to have cases where %L finds an escape-y character in the command string (like using head -n 1 and sed to unpivot a header row) which results in an E'...' string that COPY can't handle.
For \copy, I end up doing something like
select format('\\copy %I from program %L',:'table_name','pigz -cd ' || :'file_name') as copy_command\gset:copy_command
Which won't win any beauty contests, and suffers from all the limitations I listed earlier, but works for me.
I'm indifferent to whether these commands need to be PREPARE-able so long as sanitization becomes a solved problem.
On Fri, May 27, 2016 at 11:36 AM, Corey Huinker <corey.huinker@gmail.com> wrote: >> >> For the following pretend that "STRING" has the same behavior as the >> "format(...)" function. >> >> EXECUTE STRING('COPY %I TO %L', 'testtable', 'testfile.txt'); > > +1 -1 Why use syntax to do this? If we need a function with special behaviors, let's make a function with special behaviors, hopefully not called 'string()'...for example, format_safe(). Furthermore, we already have functions to deal with safe string injection, quote_ident/literal, and I'm not sure it's a good divert users away from using those functions. Either way, I don't think we should mix improvements to EXECUTE with this patch, which I think addresses the problems I have with COPY nicely (at least as it pertains to my pain points)...so +1 to Andrew's proposal. I would rarely have to use dynamic SQL for COPY with that in place. Other utility commands (off the top of my head) tend not to be a problem for me (NOTIFY would be, but pg_notify handles that case). merlin