Thread: psql -f inconsistency with "copy from stdin"
Today I ran into an inconsistency between two versions of postgresql in how psql handles copies from stdin. At this point I am not sure how the rewrite of psql does things, but thought I'd mention the problem in case someone with it installed can check. The issue is how the command psql -f test.sql db < test.dat is treated, given the following files: -- test.sql drop table test; create table test (name text); copy test from stdin; select * from test; and test.dat a b Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2 producing what I would expect and v6.5.2 producing anomalous output. Note that performing the copy as psql -c "copy test from stdin" db < test.dat works fine in either case. v6.4.2 output: The contents of test.dat are read into the table as one might expect having redirected that file to stdin and copying from stdin. v6.5.2 output: The contents of test.dat are not read into the table at all. Instead, the remainder of the test.sql file (i.e., select * ...) are read into the table. How does the current version behave when performing these copies? If it still behaves like 6.5.2, I suspect there is some bug in handling the copy command. Cheers, Brook
7.0 behaves like 6.5.* in this regard because the code is pretty much the same. Thanks for pointing this out. On 2000-01-12, Brook Milligan mentioned: > Today I ran into an inconsistency between two versions of postgresql > in how psql handles copies from stdin. At this point I am not sure > how the rewrite of psql does things, but thought I'd mention the > problem in case someone with it installed can check. > > The issue is how the command > > psql -f test.sql db < test.dat > > is treated, given the following files: > > -- test.sql > drop table test; > create table test (name text); > copy test from stdin; > select * from test; > > and > > test.dat > a > b > > Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2 > producing what I would expect and v6.5.2 producing anomalous output. > Note that performing the copy as > > psql -c "copy test from stdin" db < test.dat > > works fine in either case. > > v6.4.2 output: The contents of test.dat are read into the table as > one might expect having redirected that file to stdin and copying from > stdin. > > v6.5.2 output: The contents of test.dat are not read into the table at > all. Instead, the remainder of the test.sql file (i.e., select * ...) > are read into the table. > > How does the current version behave when performing these copies? If > it still behaves like 6.5.2, I suspect there is some bug in handling > the copy command. > > Cheers, > Brook > > > ************ > > -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > 7.0 behaves like 6.5.* in this regard because the code is pretty much the > same. Thanks for pointing this out. Of course, the question is which way is right... I can see the potential usefulness of doingpsql -f driving.script <data.file but on the other hand, it bothers me a good deal that a script containingCOPY table FROM STDIN;... data here ...\. (as generated by such unheard-of, seldom-used utilities as pg_dump) would work when sourced by psql <pgdump.script and *fail* when sourced by psql -f pgdump.script. But that's what will happen if we change it back. I suspect the change in behavior from 6.4 to 6.5 may have been a deliberate change to avoid this failure mode. It'd be worth checking the archives to see if you can find any discussion about it. It seems to me that we ought to provide both behaviors, but make sure that the one that supports data-in-the-script is the one invoked by COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy command can be set up to support the other alternative. regards, tom lane
Of course, the question is which way is right... I can see the potential usefulness of doing psql -f driving.script <data.file but on the other hand, it bothers me agood deal that a script containing COPY table FROM STDIN; ... data here ... \. (as generated by such unheard-of,seldom-used utilities as pg_dump) would work when sourced by psql <pgdump.script and *fail* when sourced bypsql -f pgdump.script. But that's what will happen if we change it back. I suspect the change in behavior from 6.4 to 6.5 may have been a deliberate change to avoid this failure mode. It'd beworth checking the archives to see if you can find any discussion about it. It seems to me that we ought to provide both behaviors, but make sure that the one that supports data-in-the-script isthe one invoked by COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy command can be set up to supportthe other alternative. But isn't there a greater difference between copy and \copy than this? Doesn't one act on the frontend and one on the backend? There needs to be a mechanism for copying data in through the front end without special permissions. Also, it seems unfortunate from a semantics point of view to have COPY FROM STDIN not actually refer to the stdin file of the process. Perhaps that is necessary to preserve compatability with old pg_dump (new versions could be changed in this regard of course), but it is not what I would naturally expect STDIN to mean in the context of 30 years of Unix development. Further, this use of STDIN clearly conflicts with the meaning of STDOUT in the analogous copy out command which doesn't insert the output into the script file but rather directs it to the stdout file. In order to maintain some compatability with these broader uses of the terms STDIN/STDOUT (while still supporting previous pg_dump scripts, at least for awhile), I think it is worth exploring some options. A few ideas are: - Introduce a new syntax for the 6.5.2 here-doc semantics. Possibilities might include COPY FROM HERE (copy ends at EOF or\.) or COPY UNTIL <tag> (copy ends at matching <tag>, like shell here-docs). pg_dump would have to be changed to correspond. - Introduce a new flag to psql to differentiate the interpretation of COPY FROM STDIN. This seems confusing to users, butmight be worthwhile (but become deprecated after a few releases) if the syntax is changed and old pg_dump scripts needsupporting. New scripts and new pg_dump needn't worry about this if they use the new syntax. Cheers, Brook
Brook Milligan <brook@biology.nmsu.edu> writes: > It seems to me that we ought to provide both behaviors, but make sure > that the one that supports data-in-the-script is the one invoked by > COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy > command can be set up to support the other alternative. > But isn't there a greater difference between copy and \copy than this? > Doesn't one act on the frontend and one on the backend? Not when it's COPY FROM STDIN or TO STDOUT --- from the backend's point of view, that means transfer data from or to the frontend. What psql does with it is psql's concern. (Actually, \copy is implemented by sending a COPY FROM STDIN/TO STDOUT command to the backend; the backend can't tell the difference between the two cases, and has no way to know where the data is really coming from or going to on the client side.) > - Introduce a new syntax for the 6.5.2 here-doc semantics. > Possibilities might include COPY FROM HERE (copy ends at EOF or \.) Changing the SQL command is the wrong thing to think about, because the parameter would only be known at the backend which is not where it needs to be known to change psql's behavior. Furthermore, from the backend's point of view it *is* sending to or from the only "user interface" it's got. So I don't think there's anything wrong with the definition of the SQL COPY command. You should be thinking about adding options to psql's \copy, instead, if you want more flexibility in controlling where psql gets or puts data. > pg_dump would have to be changed to correspond. IMHO any proposal that requires changing pg_dump is a non-starter, because it will fail when people try to load 6.5 or earlier dumps into 7.0. But fortunately, pg_dump doesn't use \copy ... regards, tom lane
On Wed, 12 Jan 2000, Tom Lane wrote: > > pg_dump would have to be changed to correspond. > > IMHO any proposal that requires changing pg_dump is a non-starter, > because it will fail when people try to load 6.5 or earlier dumps > into 7.0. But fortunately, pg_dump doesn't use \copy ... I'm confused here...why would "any proposal that requires changing pg_dump is a non-starter"? How does changing pg_dump in v7.0 affect pg_dump in v6.5? As long as I can reload my v6.5 data into a v7.0 database using the pg_dump from v6.5, confused as to why v7.0s pg_dump matters... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
After further contemplation I am not completely sure which way is the correct behaviour. Consider me doing this: --test.sql COPY foo FROM stdin; data data data data SELECT * FROM foo; and running psql -f test.sql < (anything) on it. Then I would expect it to behave the other way. The -f option is just another way of saying "get the input from there". If you use both -f and stdin you're in essence saying "get the input from there and there", and that feature does not exist in psql and would be hard to extend to the general case. On 2000-01-12, Brook Milligan mentioned: > Today I ran into an inconsistency between two versions of postgresql > in how psql handles copies from stdin. At this point I am not sure > how the rewrite of psql does things, but thought I'd mention the > problem in case someone with it installed can check. > > The issue is how the command > > psql -f test.sql db < test.dat > > is treated, given the following files: > > -- test.sql > drop table test; > create table test (name text); > copy test from stdin; > select * from test; > > and > > test.dat > a > b > > Specifically v6.4.2 and v6.5.2 differ in the outcome, with v6.4.2 > producing what I would expect and v6.5.2 producing anomalous output. > Note that performing the copy as > > psql -c "copy test from stdin" db < test.dat > > works fine in either case. > > v6.4.2 output: The contents of test.dat are read into the table as > one might expect having redirected that file to stdin and copying from > stdin. > > v6.5.2 output: The contents of test.dat are not read into the table at > all. Instead, the remainder of the test.sql file (i.e., select * ...) > are read into the table. > > How does the current version behave when performing these copies? If > it still behaves like 6.5.2, I suspect there is some bug in handling > the copy command. > > Cheers, > Brook > > > ************ > > -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
The -f option is just another way of saying "get the input from there". If you use both -f and stdin you're in essencesaying "get the input from there and there", and that feature does not exist in psql and would be hard to extendto the general case. But there are specifically two kinds of input involved here [*]: - input of SQL commands and such to psql - input of data to a COPY command To me these are conceptually very distinct (in much the same way you have distinguished already between various output streams; in fact, I'm not sure how you have matched those with the output stream from COPY, but it might be relevant to think about that in light of this discussion). Thus, to me it makes sense to say "take input from there and there," as long as it is clear that one "there" refers to one input stream and the other to the other one. For example, -f naturally refers to the first one above, while the STDIN naturally refers to the second. Saying that -f should override all other sources of input is inconsistent in its own way; after all, that doesn't override a COPY FROM "filename" command, does it? In that case, you maintain a distinction between two different input streams. It seems that dropping that distinction for the special case of "filename" == STDIN is introducing unnecessary confusion into the semantics of commands. In short, I'm not really convinced that it is unreasonable to expect a command like COPY (or \copy) to be able to associate itself with an input (or output) stream that is different from that implied by -f, given that the nature of the various I/O streams is so different and clearly defined. Cheers, Brook [*] I'm not sure what you mean by the "general case," but I can't think of any other commands, at least SQL commands, that are naturally associated with more than one input stream, namely the source of the command itself which may include embedded data. Unless I'm missing something here, I suspect the "general case" is just fine and doesn't interact with the problem I raised. What is problematical is the special case of a command (perhaps there are others?) that inherently involves more than one input stream: the source of the command itself and the source of data upon which the command operates.
On 2000-01-12, Tom Lane mentioned: > It seems to me that we ought to provide both behaviors, but make sure > that the one that supports data-in-the-script is the one invoked by > COPY FROM STDIN (since that's what pg_dump uses). Perhaps psql's \copy > command can be set up to support the other alternative. \copy from stdin is not used yet. That would work. There might be issues I'm overlooking now, but anything else that came up in this thread will most likely not work in the general case. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
The Hermit Hacker <scrappy@hub.org> writes: > On Wed, 12 Jan 2000, Tom Lane wrote: >> IMHO any proposal that requires changing pg_dump is a non-starter, >> because it will fail when people try to load 6.5 or earlier dumps >> into 7.0. But fortunately, pg_dump doesn't use \copy ... > I'm confused here...why would "any proposal that requires changing pg_dump > is a non-starter"? How does changing pg_dump in v7.0 affect pg_dump in > v6.5? Because people will be using 6.5 pg_dump to make dump scripts that they will then try to load into 7.0 with 7.0's psql. If we change the way that COPY FROM STDIN is interpreted, we risk trouble with those scripts. I like Peter's suggestion of defining "\copy from stdin" to mean "read from psql's stdin". That would leave the SQL command COPY FROM STDIN for the other case where the data is in-line in the script. regards, tom lane
Okay, this is the new law: copy x from stdin; "stdin" is whereever the actual copy from stdin command line came from. This is that way because "stdin" in that case does not really refer to stdin in the classical sense but tells the backend to get the data from the same stream the command came from (namely the network connection), and that's what we're doing. copy x from stdout; The output goes to whereever select * from x would go to, in particular \o affects this. This is purely because I said so, but I think it's reasonable. \copy x from stdin The input comes from psql's stdin. (Which is correcter in this case since it's a _frontend_ copy.) \copy x to stdout psql's stdout I hope everyone's happy now. ;) On 2000-01-12, Brook Milligan mentioned: > But there are specifically two kinds of input involved here [*]: > > - input of SQL commands and such to psql > - input of data to a COPY command > > To me these are conceptually very distinct (in much the same way you > have distinguished already between various output streams; in fact, > I'm not sure how you have matched those with the output stream from > COPY, but it might be relevant to think about that in light of this > discussion). Thus, to me it makes sense to say "take input from there > and there," as long as it is clear that one "there" refers to one > input stream and the other to the other one. For example, -f > naturally refers to the first one above, while the STDIN naturally > refers to the second. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden