Thread: [PATCH] COPY vs \copy HINT
Hi all
--
I see this sort of question quite a bit:
where the user wonders why
COPY gemeenten
FROM 'D:\CBS_woningcijfers_2014.csv'
DELIMITER ';' CSV
fails with
ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No such file or directory'
and as usual, it's because the path is on their local host not the Pg server.
I think we should emit a HINT here, something like:
ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No such file or directory'
HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper
as a usability improvement. Trivial patch attached.
I'm not sure how to avoid the need to translate the string multiple times, or if the tooling will automatically flatten them. I haven't bothered with the stat() failure or isdir cases, since they seem less likely to be the cause of users being confused between client and server.
Sample output:
postgres=# COPY x FROM '/tmp/somepath';
ERROR: could not open file "/tmp/somepath" for reading: No such file or directory
HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper
postgres=# COPY x TO '/root/nopermissions';
ERROR: could not open file "/root/nopermissions" for writing: Permission denied
HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper
postgres=# COPY x TO 'relpath';
ERROR: relative path not allowed for COPY to file
HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper
Attachment
Re: Craig Ringer 2016-08-12 <CAMsr+YEqtD97qPEzQDqrCt5QiqPbWP_X4hmvy2pQzWC0GWiyPA@mail.gmail.com> > I think we should emit a HINT here, something like: > > ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No > such file or directory' > HINT: Paths for COPY are on the PostgreSQL server, not the client. You may > want psql's \copy or a driver COPY ... FROM STDIN wrapper +1 on the idea. > postgres=# COPY x TO '/root/nopermissions'; > ERROR: could not open file "/root/nopermissions" for writing: Permission > denied > HINT: Paths for COPY are on the PostgreSQL server, not the client. You may > want psql's \copy or a driver COPY ... FROM STDIN wrapper TO STDOUT. Also, I vaguely get what you wanted to say with "a driver ... wrapper", but it's pretty nonsensical if one doesn't know about the protocol details. I don't have a better suggestion now, but I think it needs rephrasing. Christoph
On 12 August 2016 at 16:34, Christoph Berg <myon@debian.org> wrote:
> postgres=# COPY x TO '/root/nopermissions';
> ERROR: could not open file "/root/nopermissions" for writing: Permission
> denied
> HINT: Paths for COPY are on the PostgreSQL server, not the client. You may
> want psql's \copy or a driver COPY ... FROM STDIN wrapper
TO STDOUT.
Ahem, whoops. It's the simple things sometimes...
Attached amended.
Also, I vaguely get what you wanted to say with "a driver ...
wrapper", but it's pretty nonsensical if one doesn't know about the
protocol details. I don't have a better suggestion now, but I think it
needs rephrasing.
I don't like it either, but didn't come up with anything better. The problem is that every driver calls it something different.
Attachment
Craig Ringer <craig@2ndquadrant.com> writes: > On 12 August 2016 at 16:34, Christoph Berg <myon@debian.org> wrote: >> Also, I vaguely get what you wanted to say with "a driver ... >> wrapper", but it's pretty nonsensical if one doesn't know about the >> protocol details. I don't have a better suggestion now, but I think it >> needs rephrasing. > I don't like it either, but didn't come up with anything better. The > problem is that every driver calls it something different. A few thoughts on this patch: 1. I don't really think the HINT is appropriate for the not-absolute-path case. 2. I don't think it's appropriate for all possible cases of AllocateFile failure either, eg surely not for EACCES or similar situations where we did find a file. Maybe print it only for ENOENT? (See for example report_newlocale_failure() for technique.) 3. As for the wording, maybe you could do it like this: HINT: COPY copies to[from] a file on the PostgreSQL server, not on the client. You may want a client-side facility such as psql's \copy. That avoids trying to invent a name for other implementations. regards, tom lane
On 2 September 2016 at 04:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> On 12 August 2016 at 16:34, Christoph Berg <myon@debian.org> wrote: >>> Also, I vaguely get what you wanted to say with "a driver ... >>> wrapper", but it's pretty nonsensical if one doesn't know about the >>> protocol details. I don't have a better suggestion now, but I think it >>> needs rephrasing. > >> I don't like it either, but didn't come up with anything better. The >> problem is that every driver calls it something different. > > A few thoughts on this patch: Thanks for the review. I'll leave the first change pending your comments, the others are made, though I'm not completely sure we should restrict it to ENOENT. > 1. I don't really think the HINT is appropriate for the not-absolute-path > case. Why? If the user runs # COPY sometable FROM 'localfile.csv' WITH (FORMAT CSV); ERROR: could not open file "localfile.csv" for reading: No such file or directory they're going to be just as confused by this error as by: # COPY batch_demo FROM '/tmp/localfile.csv' WITH (FORMAT CSV); ERROR: could not open file "/tmp/localfile.csv" for reading: No such file or directory so I don't really see the rationale for this change. > 2. I don't think it's appropriate for all possible cases of AllocateFile > failure either, eg surely not for EACCES or similar situations where we > did find a file. Maybe print it only for ENOENT? (See for example > report_newlocale_failure() for technique.) I thought about that but figured it didn't really matter too much, when thinking about examples like # COPY batch_demo FROM '/root/secret.csv' WITH (FORMAT CSV); ERROR: could not open file "/root/secret.csv" for reading: Permission denied or whatever, where the user doesn't understand why they can't read the file given that their local client has permission to do so. I don't feel strongly about this and think that the error on ENOENT is by far the most important, so I'll adjust it per your recommendation. > 3. As for the wording, maybe you could do it like this: > > HINT: COPY copies to[from] a file on the PostgreSQL server, not on the > client. You may want a client-side facility such as psql's \copy. > > That avoids trying to invent a name for other implementations. I like that wording a lot more, thanks. Adopted. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Craig Ringer 2016-09-02 <CAMsr+YFr6Sk=bbU2yCORN7z9foHF0cqx29vk5B49DswQ6EkVxg@mail.gmail.com> > I thought about that but figured it didn't really matter too much, > when thinking about examples like > > # COPY batch_demo FROM '/root/secret.csv' WITH (FORMAT CSV); > ERROR: could not open file "/root/secret.csv" for reading: Permission denied > > or whatever, where the user doesn't understand why they can't read the > file given that their local client has permission to do so. > > I don't feel strongly about this and think that the error on ENOENT is > by far the most important, so I'll adjust it per your recommendation. Couldn't you just add EACCESS to the check as well? > > 3. As for the wording, maybe you could do it like this: > > > > HINT: COPY copies to[from] a file on the PostgreSQL server, not on the > > client. You may want a client-side facility such as psql's \copy. > > > > That avoids trying to invent a name for other implementations. > > I like that wording a lot more, thanks. Adopted. Same here, thanks! Christoph
On 2 September 2016 at 17:05, Christoph Berg <myon@debian.org> wrote: > Re: Craig Ringer 2016-09-02 <CAMsr+YFr6Sk=bbU2yCORN7z9foHF0cqx29vk5B49DswQ6EkVxg@mail.gmail.com> >> I thought about that but figured it didn't really matter too much, >> when thinking about examples like >> >> # COPY batch_demo FROM '/root/secret.csv' WITH (FORMAT CSV); >> ERROR: could not open file "/root/secret.csv" for reading: Permission denied >> >> or whatever, where the user doesn't understand why they can't read the >> file given that their local client has permission to do so. >> >> I don't feel strongly about this and think that the error on ENOENT is >> by far the most important, so I'll adjust it per your recommendation. > > Couldn't you just add EACCESS to the check as well? Yeah, probably. Those are really the only two failures I think it's worth reporting for. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > On 2 September 2016 at 04:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. I don't really think the HINT is appropriate for the not-absolute-path >> case. > Why? If the user runs > # COPY sometable FROM 'localfile.csv' WITH (FORMAT CSV); > ERROR: could not open file "localfile.csv" for reading: No such file > or directory > they're going to be just as confused by this error as by: > # COPY batch_demo FROM '/tmp/localfile.csv' WITH (FORMAT CSV); > ERROR: could not open file "/tmp/localfile.csv" for reading: No such > file or directory > so I don't really see the rationale for this change. I think you misinterpreted what I was complaining about; it's not the FROM case but the TO case, specifically this addition: if (!is_absolute_path(filename)) ereport(ERROR, (errcode(ERRCODE_INVALID_NAME), - errmsg("relative path not allowed for COPY to file"))); + errmsg("relative path not allowed for COPY to file"), + errhint("COPY copies to a file on the PostgreSQL server, not on the client. " + "You may want a client-side facility such as psql's \\copy."))); The reason I'm not happy about that is that the hint is completely unrelated to the error condition, and I think that such a hint is likely to be more confusing than helpful. What's likely to happen if we leave it alone is that someone who's confused about client vs. server will do =# COPY sometable TO 'localfile.csv' WITH (FORMAT CSV); ERROR: relative path not allowed for COPY to file [ user mutters something about hopeless pedants and writes an absolute path: ] =# COPY sometable TO '/home/joe/localfile.csv' WITH (FORMAT CSV); ERROR: could not open file "/home/joe/localfile.csv" for writing: No such file or directory At this point, it's appropriate to give the HINT, but I think doing it for the first case is just confusing. There's an opportunity cost to hints like this, which is that if they are in fact wildly unrelated to the real problem, they are more likely to send the user down a blind alley than lead his thoughts to the right answer. What you want to do is essentially cutting one step out of the process by jumping to the conclusion that if the user gave a relative path, it's because he's confused about client vs. server, and I don't see how that follows. >> 2. I don't think it's appropriate for all possible cases of AllocateFile >> failure either, eg surely not for EACCES or similar situations where we >> did find a file. Maybe print it only for ENOENT? > I thought about that but figured it didn't really matter too much, > when thinking about examples like > # COPY batch_demo FROM '/root/secret.csv' WITH (FORMAT CSV); > ERROR: could not open file "/root/secret.csv" for reading: Permission denied > or whatever, where the user doesn't understand why they can't read the > file given that their local client has permission to do so. Meh. Hinting in this case could be helpful only if the user in fact had identical directory structures on client and server and a data file in the right place on both. That doesn't seem terribly likely, and again I argue that the downsides of giving an irrelevant hint outweigh the occasional benefit when it happens to be right. I mean, taking this position just one step further, we should annotate every COPY data error with this same hint, arguing that maybe the user has the right data on one machine and slightly wrong data on the other machine. Once in a blue moon that would indeed be the answer, but most of the time it would be wrong, and users would soon start getting annoyed with the hint --- maybe to the point where they automatically ignore that hint even when it's right. In short, I like hints as long as they're carefully targeted, but if they're not they are likely to be seen as a net negative by users. I think giving this hint for the ENOENT case is probably fine, but I'm much less convinced about other cases. One additional case that I could believe is useful is whatever error code Windows gives when you specify a drive letter that's not in use. (Maybe that's ENOENT but I bet not.) regards, tom lane
I wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> I thought about that but figured it didn't really matter too much, >> when thinking about examples like >> # COPY batch_demo FROM '/root/secret.csv' WITH (FORMAT CSV); >> ERROR: could not open file "/root/secret.csv" for reading: Permission denied >> or whatever, where the user doesn't understand why they can't read the >> file given that their local client has permission to do so. > Meh. Hinting in this case could be helpful only if the user in fact had > identical directory structures on client and server and a data file in the > right place on both. So my consciousness was raised just now by an example of exactly this scenario over in pgsql-novice. What I forgot was that the client may in fact be on the same machine as the server, in which case EACCES is pretty much exactly what you'd expect. So we probably do want to hint for that case, but the hint wording I previously suggested no longer seems like le mot juste ... it needs to cover the idea that the client and server are different processes on the same machine. I don't suppose there's any easy way for COPY to distinguish local from remote connections --- if it could, that might influence both the errnos to hint for and the phrasing of the hint. regards, tom lane
On 4 September 2016 at 23:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So my consciousness was raised just now by an example of exactly this > scenario over in pgsql-novice. What I forgot was that the client may > in fact be on the same machine as the server, in which case EACCES > is pretty much exactly what you'd expect. Yep. Also in cases with common paths, like /root or whatever. > So we probably do want to > hint for that case, but the hint wording I previously suggested no > longer seems like le mot juste ... it needs to cover the idea that > the client and server are different processes on the same machine. Yeah, may be. *We* know that in this case "client" and "server" still applies since client and server can be on the same host, but the people who needs this hint may not understand that. Though there's only so much we can do or should try to do in a HINT. I think the most important bit is pointing them at \copy, so it's still useful. To cover the same-host case we could try something like: COPY runs on the PostgreSQL server, using the PostgreSQL server's directories and permissions, it doesn't run on the client. ... but I think that's actually less helpful for the users who'll need this. We could say "COPY runs as the PostgreSQL server" but that's the kind of hint that mostly helps people who already understand it and don't actually the hint. (BTW, whoever came up with "EACCES" needs to go spend time with the creat() system call somewhere dark and smelly). > I don't suppose there's any easy way for COPY to distinguish local > from remote connections Not that I see, since "local" can be unix socket or tcp to localhost. Not cleanly anyway. I don't think it matters. Many hosts have enough paths in common that in practice the hint on EACCES will be useful anyway. It'd be nice to work in something about running with the permissions of the PostgreSQL server, but I don't see a way to do that without making it all more complex. I don't think it's worth tons of time anyway. This will be better than what we have, lets do it. I'm fairly happy with the wording you came up with: "COPY copies to a file on the PostgreSQL server, not on the client. " "You may want a client-side facility such as psql's \\copy." and am inclined to suggest going ahead with the existing wording. I agree that removing the part for "relative path not allowed for COPY to file" is reasonable, so I've attached an update that does so and warns on EACCES too. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 5 September 2016 at 09:05, Craig Ringer <craig@2ndquadrant.com> wrote: >I've attached an update that does so and > warns on EACCES too. ... this time, with required parens. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: Craig Ringer 2016-09-05 <CAMsr+YHP=hTUKpuHK4LOAmWE_JEe-A281QL0uni_gw-V7jQv2w@mail.gmail.com> > To cover the same-host case we could try something like: > > COPY runs on the PostgreSQL server, using the PostgreSQL server's > directories and permissions, it doesn't run on the client. > > ... but I think that's actually less helpful for the users who'll need this. The part about the server permissions might be useful to hint at. > > I don't suppose there's any easy way for COPY to distinguish local > > from remote connections > > Not that I see, since "local" can be unix socket or tcp to localhost. > Not cleanly anyway. > > I don't think it matters. Many hosts have enough paths in common that > in practice the hint on EACCES will be useful anyway. It'd be nice to > work in something about running with the permissions of the PostgreSQL > server, but I don't see a way to do that without making it all more > complex. > > I don't think it's worth tons of time anyway. This will be better than > what we have, lets do it. It's probably not helpful trying to distinguish local and remote connections, the set of possible problems is diverse and this is just one of them. > I'm fairly happy with the wording you came up with: > > "COPY copies to a file on the PostgreSQL server, not on the client. " > "You may want a client-side facility such as psql's \\copy." What about "COPY TO instructs the PostgreSQL server to write to a file on the server. " "You may want a client-side facilitysuch as psql's \\copy." "COPY FROM instructs the PostgreSQL server to read from a file on the server. " "You may want a client-side facilitysuch as psql's \\copy." This stresses more explicitly that the file is opened by the server process. (I'm not particularly happy that it says "server" in there twice, but couldn't think of anything else.) Christoph
On 5 September 2016 at 16:32, Christoph Berg <myon@debian.org> wrote: > The part about the server permissions might be useful to hint at. > What about > > "COPY TO instructs the PostgreSQL server to write to a file on the server. " > "You may want a client-side facility such as psql's \\copy." > > "COPY FROM instructs the PostgreSQL server to read from a file on the server. " > "You may want a client-side facility such as psql's \\copy." > > This stresses more explicitly that the file is opened by the server > process. (I'm not particularly happy that it says "server" in there > twice, but couldn't think of anything else.) Tom, any preference here? I'm probably inclined to go for your original wording and accept that it's just too hard to hint at the client/server process split in a single short message. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > Tom, any preference here? > I'm probably inclined to go for your original wording and accept that > it's just too hard to hint at the client/server process split in a > single short message. I think my original wording is pretty hopeless for the single-machine case: "COPY copies to a file on the PostgreSQL server, not on the client". If the server and client are the same machine, that's content-free. Christoph's idea isn't bad. We could tweak it to: COPY TO instructs the PostgreSQL server process to write a file. COPY FROM instructs the PostgreSQL server process to read a file. This seems to cover both the wrong-machine and wrong-process-identity cases, and as a bonus it might help if you've had a brain fade about which COPY direction is write vs. read. (I think we're all in agreement that the second sentence of the hint is fine, so leaving that out of it.) Comments? regards, tom lane
Re: Tom Lane 2016-09-06 <17637.1473192771@sss.pgh.pa.us> > Christoph's idea isn't bad. We could tweak it to: > > COPY TO instructs the PostgreSQL server process to write a file. > > COPY FROM instructs the PostgreSQL server process to read a file. > > This seems to cover both the wrong-machine and wrong-process-identity > cases, and as a bonus it might help if you've had a brain fade about > which COPY direction is write vs. read. > > (I think we're all in agreement that the second sentence of the hint > is fine, so leaving that out of it.) > > Comments? I like your new version, it's crisp and transports the right message. Christoph
On 7 September 2016 at 04:19, Christoph Berg <myon@debian.org> wrote: > I like your new version, it's crisp and transports the right message. OK, updated with Tom's tweaked version of Christoph's wording per discussion. Thanks all. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Craig Ringer <craig@2ndquadrant.com> writes: > On 7 September 2016 at 04:19, Christoph Berg <myon@debian.org> wrote: >> I like your new version, it's crisp and transports the right message. > OK, updated with Tom's tweaked version of Christoph's wording per > discussion. Thanks all. Pushed with minor stylistic adjustment (narrowing the scope of save_errno). regards, tom lane