Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes
Date
Msg-id 327d8dac-2321-4147-909f-6f18474ff4e1@app.fastmail.com
Whole thread Raw
In response to Re: [BUG] psql: Make \copy from 'text' and 'csv' formats fail on NUL bytes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Nov 10, 2024, at 23:14, Tom Lane wrote:
> "Joel Jacobson" <joel@compiler.org> writes:
>> On Sun, Nov 10, 2024, at 22:37, Tom Lane wrote:
>>> That seems like a hack, as it also changes the behavior w.r.t.
>>> prompts and EOF-mark detection, neither for the better.
>
>> Hmm, in what way does it change the behavior?
>> I ran almost all the tests, including the TAP ones, and none of them failed with
>> this fix. Is there some behavior that is currently not covered by the test suite?
>
> Of course.  (Our regression tests are very far from covering 100% of
> the behavioral space.)
>
> In the case at hand, this would break the
> expected line-by-line prompting for "\copy ... from '/dev/tty'".

Yes, of course, just wondered what kind of behavior in this area
that wasn't tested, thanks for explaining it.

> Playing with this just now, I notice that the prompt you get
> still claims that \. works to end input, although right now
> it does not:
>
> regression=# \copy int8_tbl from '/dev/tty'
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
>>> 1    3
>>> 5    6
>>> \.
>>> ^D
> COPY 2
>
> I'm inclined to think that the prompt still describes what should
> happen, at least in non-binary mode, although in binary mode we
> probably ought to just say (and do) "End with an EOF signal".
>
> So perhaps the if-test to choose the code path could be
>
>     if ((isbinary || copystream != pset.cur_cmd_source) && !showprompt)

Thanks for guidance, that seems to fix, for 6/8 cases I've figured out how to test.

> which would allow dropping the vestigial prompt logic in the first
> path,

I assume you mean that due to "&& !showprompt" the "if (showprompt)"
becomes unreachable and can therefore be dropped?

> and we would also need to change the test in the second path
> that decides if we should check for \.  (Likely this should be
> refactored a bit to make it more understandable.  An intermediate
> flag saying whether we intend to check for \. might help.)

Maybe check_dot_command?

    const bool    check_dot_command = (copystream == pset.cur_cmd_source);

I haven't tried yet to refactor the code,
except than replacing the two "copystream == pset.cur_cmd_source"
occurrences with the new check_dot_command flag.

First want to understand if the two remaining cases are valid,
and if they can be tested:

> Anyway, my point is that we need to think through the desirable
> behavior for each possible combination of showprompt, isbinary, and
> copystream != pset.cur_cmd_source, because all 8 cases are reachable.

I guess these are the 8 cases?

+--------+-------------+----------+------------------+
| CASE   | showprompt  | isbinary | check_dot_command |
+--------+-------------+----------+------------------+
|   1    |    false    |   false  |      false      |
|   2    |    false    |   false  |      true       |
|   3    |    false    |   true   |      false      |
|   4    |    false    |   true   |      true       |
|   5    |    true     |   false  |      false      |
|   6    |    true     |   false  |      true       |
|   7*   |    true     |   true   |      false      |
|   8*   |    true     |   true   |      true       |
+--------+-------------+----------+------------------+
* Cases 7 and 8 not tested yet

With the changed if-test, case 1-6 works,
and for case 1, then binary mode branch is taken
instead of the text mode branch,
whereas cases 2-6 take the same branch as before.

joel@Joels-MBP psql_tester % git diff --no-index -U100 /tmp/psql.log.HEAD /tmp/psql.log
diff --git a/tmp/psql.log.HEAD b/tmp/psql.log
index 5e44e30..1f48ac9 100644
--- a/tmp/psql.log.HEAD
+++ b/tmp/psql.log
@@ -1,6 +1,6 @@
-COPY case: 1 TEXT MODE
+COPY case: 1 BINARY MODE
 COPY case: 2 TEXT MODE
 COPY case: 3 BINARY MODE
 COPY case: 4 BINARY MODE
 COPY case: 5 TEXT MODE
 COPY case: 6 TEXT MODE

Here is how I tested each case:

# CASE 1:
# showprompt:        false
# isbinary:          false
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.data'"

# CASE 2:
# showprompt:        false
# isbinary:          false
# check_dot_command: true
psql -f /tmp/copy_stdin_text.sql

# CASE 3:
# showprompt:        false
# isbinary:          true
# check_dot_command: false
psql -c "\copy int8_tbl from '/tmp/int8_tbl.bin' (format binary)"

# CASE 4:
# showprompt:        false
# isbinary:          true
# check_dot_command: true
printf '\\copy int8_tbl from stdin (format binary)\n' >/tmp/copy_stdin_binary.sql
cat /tmp/int8_tbl.bin >>/tmp/copy_stdin_binary.sql
psql -f copy_stdin_binary.sql

# CASE 5:
# showprompt:        true
# isbinary:          false
# check_dot_command: false

psql
\copy int8_tbl from '/dev/tty'
17    18
19    20
\.
# Send EOF (Ctrl+D)

# CASE 6:
# showprompt:        true
# isbinary:          false
# check_dot_command: true

psql
\copy int8_tbl from stdin
21    22
23    24
\.

# CASE 7:
# showprompt:        true
# isbinary:          true
# check_dot_command: false

CASE 7 would be like CASE 5, with the addition of (format binary)
that is:

    \copy int8_tbl from '/dev/tty' (format binary)

but I wonder if this is a valid case? Could we really copy/paste
or by some other means give psql binary data here?
I tried to wrap psql and send the binary content of
my '/tmp/int8_tbl.bin' file, and tried sending the EOF control code,
and I see I get PGCOPY-message, but the txn isn't committed,
so not sure what's happening. Before I continue trying
to figure this one out, I just wanted to make sure this is
a valid case, and how it is supposed to be used if so?

# CASE 8:
# showprompt:        true
# isbinary:          true
# check_dot_command: true

CASE 8 would be like CASE 6, with the addition of (format binary)
that is:

    \copy int8_tbl from stdin (format binary)

I've had the same problem with this one, as with CASE 7.

Many thanks for guidance.

/Joel



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Separate memory contexts for relcache and catcache
Next
From: torikoshia
Date:
Subject: Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row