Thread: Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
"Joel Jacobson" <joel@compiler.org> writes: > Fix by adjusting handleCopyIn() to use the binary code path also when the copy > source is a file (i.e., copystream != pset.cur_cmd_source), even in textual > copies. That seems like a hack, as it also changes the behavior w.r.t. prompts and EOF-mark detection, neither for the better. I'm not really convinced that we need to do anything at all about this. regards, tom lane
Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
From
"Joel Jacobson"
Date:
On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote: > "Joel Jacobson" <joel@compiler.org> writes: >> Fix by adjusting handleCopyIn() to use the binary code path also when the copy >> source is a file (i.e., copystream != pset.cur_cmd_source), even in textual >> copies. > > That seems like a hack, as it also changes the behavior w.r.t. > prompts and EOF-mark detection, neither for the better. Hmm, in what way does it change the behavior? I ran almost all the tests, including the TAP ones, and none of them failed with this fix. Is there some behavior that is currently not covered by the test suite? > I'm not really convinced that we need to do anything at all > about this. I think this should be fixed, because 0x00 is valid UTF-8 for the Unicode Character 'NULL' (U+0000), which makes me think there is a risk CSV files coming from other systems than PostgreSQL, could contain such text strings which would then silently cause data corruption. /Joel
"Joel Jacobson" <joel@compiler.org> writes: > On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote: >> That seems like a hack, as it also changes the behavior w.r.t. >> prompts and EOF-mark detection, neither for the better. > Hmm, in what way does it change the behavior? > I ran almost all the tests, including the TAP ones, and none of them failed with > this fix. Is there some behavior that is currently not covered by the test suite? Of course. (Our regression tests are very far from covering 100% of the behavioral space.) In the case at hand, this would break the expected line-by-line prompting for "\copy ... from '/dev/tty'". Playing with this just now, I notice that the prompt you get still claims that \. works to end input, although right now it does not: regression=# \copy int8_tbl from '/dev/tty' Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1 3 >> 5 6 >> \. >> ^D COPY 2 I'm inclined to think that the prompt still describes what should happen, at least in non-binary mode, although in binary mode we probably ought to just say (and do) "End with an EOF signal". So perhaps the if-test to choose the code path could be if ((isbinary || copystream != pset.cur_cmd_source) && !showprompt) which would allow dropping the vestigial prompt logic in the first path, and we would also need to change the test in the second path that decides if we should check for \. (Likely this should be refactored a bit to make it more understandable. An intermediate flag saying whether we intend to check for \. might help.) Anyway, my point is that we need to think through the desirable behavior for each possible combination of showprompt, isbinary, and copystream != pset.cur_cmd_source, because all 8 cases are reachable. regards, tom lane
Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
From
"Joel Jacobson"
Date:
On Sun, Nov 10, 2024, at 23:14, Tom Lane wrote: > "Joel Jacobson" <joel@compiler.org> writes: >> On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote: >>> That seems like a hack, as it also changes the behavior w.r.t. >>> prompts and EOF-mark detection, neither for the better. > >> Hmm, in what way does it change the behavior? >> I ran almost all the tests, including the TAP ones, and none of them failed with >> this fix. Is there some behavior that is currently not covered by the test suite? > > Of course. (Our regression tests are very far from covering 100% of > the behavioral space.) > > In the case at hand, this would break the > expected line-by-line prompting for "\copy ... from '/dev/tty'". Yes, of course, just wondered what kind of behavior in this area that wasn't tested, thanks for explaining it. > Playing with this just now, I notice that the prompt you get > still claims that \. works to end input, although right now > it does not: > > regression=# \copy int8_tbl from '/dev/tty' > Enter data to be copied followed by a newline. > End with a backslash and a period on a line by itself, or an EOF signal. >>> 1 3 >>> 5 6 >>> \. >>> ^D > COPY 2 > > I'm inclined to think that the prompt still describes what should > happen, at least in non-binary mode, although in binary mode we > probably ought to just say (and do) "End with an EOF signal". > > So perhaps the if-test to choose the code path could be > > if ((isbinary || copystream != pset.cur_cmd_source) && !showprompt) Thanks for guidance, that seems to fix, for 6/8 cases I've figured out how to test. > which would allow dropping the vestigial prompt logic in the first > path, I assume you mean that due to "&& !showprompt" the "if (showprompt)" becomes unreachable and can therefore be dropped? > and we would also need to change the test in the second path > that decides if we should check for \. (Likely this should be > refactored a bit to make it more understandable. An intermediate > flag saying whether we intend to check for \. might help.) Maybe check_dot_command? const bool check_dot_command = (copystream == pset.cur_cmd_source); I haven't tried yet to refactor the code, except than replacing the two "copystream == pset.cur_cmd_source" occurrences with the new check_dot_command flag. First want to understand if the two remaining cases are valid, and if they can be tested: > Anyway, my point is that we need to think through the desirable > behavior for each possible combination of showprompt, isbinary, and > copystream != pset.cur_cmd_source, because all 8 cases are reachable. I guess these are the 8 cases? +--------+-------------+----------+------------------+ | CASE | showprompt | isbinary | check_dot_command | +--------+-------------+----------+------------------+ | 1 | false | false | false | | 2 | false | false | true | | 3 | false | true | false | | 4 | false | true | true | | 5 | true | false | false | | 6 | true | false | true | | 7* | true | true | false | | 8* | true | true | true | +--------+-------------+----------+------------------+ * Cases 7 and 8 not tested yet With the changed if-test, case 1-6 works, and for case 1, then binary mode branch is taken instead of the text mode branch, whereas cases 2-6 take the same branch as before. joel@Joels-MBP psql_tester % git diff --no-index -U100 /tmp/psql.log.HEAD /tmp/psql.log diff --git a/tmp/psql.log.HEAD b/tmp/psql.log index 5e44e30..1f48ac9 100644 --- a/tmp/psql.log.HEAD +++ b/tmp/psql.log @@ -1,6 +1,6 @@ -COPY case: 1 TEXT MODE +COPY case: 1 BINARY MODE COPY case: 2 TEXT MODE COPY case: 3 BINARY MODE COPY case: 4 BINARY MODE COPY case: 5 TEXT MODE COPY case: 6 TEXT MODE Here is how I tested each case: # CASE 1: # showprompt: false # isbinary: false # check_dot_command: false psql -c "\copy int8_tbl from '/tmp/int8_tbl.data'" # CASE 2: # showprompt: false # isbinary: false # check_dot_command: true psql -f /tmp/copy_stdin_text.sql # CASE 3: # showprompt: false # isbinary: true # check_dot_command: false psql -c "\copy int8_tbl from '/tmp/int8_tbl.bin' (format binary)" # CASE 4: # showprompt: false # isbinary: true # check_dot_command: true printf '\\copy int8_tbl from stdin (format binary)\n' >/tmp/copy_stdin_binary.sql cat /tmp/int8_tbl.bin >>/tmp/copy_stdin_binary.sql psql -f copy_stdin_binary.sql # CASE 5: # showprompt: true # isbinary: false # check_dot_command: false psql \copy int8_tbl from '/dev/tty' 17 18 19 20 \. # Send EOF (Ctrl+D) # CASE 6: # showprompt: true # isbinary: false # check_dot_command: true psql \copy int8_tbl from stdin 21 22 23 24 \. # CASE 7: # showprompt: true # isbinary: true # check_dot_command: false CASE 7 would be like CASE 5, with the addition of (format binary) that is: \copy int8_tbl from '/dev/tty' (format binary) but I wonder if this is a valid case? Could we really copy/paste or by some other means give psql binary data here? I tried to wrap psql and send the binary content of my '/tmp/int8_tbl.bin' file, and tried sending the EOF control code, and I see I get PGCOPY-message, but the txn isn't committed, so not sure what's happening. Before I continue trying to figure this one out, I just wanted to make sure this is a valid case, and how it is supposed to be used if so? # CASE 8: # showprompt: true # isbinary: true # check_dot_command: true CASE 8 would be like CASE 6, with the addition of (format binary) that is: \copy int8_tbl from stdin (format binary) I've had the same problem with this one, as with CASE 7. Many thanks for guidance. /Joel