Re: New "raw" COPY format - Mailing list pgsql-hackers

From jian he
Subject Re: New "raw" COPY format
Date
Msg-id CACJufxG7gE4XGCQJ9UG0ki2YnyWFUhJ_QoWm0TE42QZZVyz8Hg@mail.gmail.com
Whole thread Raw
In response to Re: New "raw" COPY format  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
On Sat, Oct 19, 2024 at 1:24 AM Joel Jacobson <joel@compiler.org> wrote:
>>
> Handling of e.g. JSON and other structured text files that could contain
> newlines, in a seamless way seems important, so therefore the default is
> no delimiter for the raw format, so that the entire input is read as one data
> value for COPY FROM, and all column data is concatenated without delimiter
> for COPY TO.
>
> When specifying a delimiter for the raw format, it separates *rows*, and can be
> a multi-byte string, such as E'\r\n' to handle Windows text files.
>
> This has been documented under the DELIMITER option, as well as under the
> Raw Format section.
>
We already make RAW and can only have one column.
if RAW has no default delimiter, then COPY FROM a text file will
become one datum value;
which makes it looks like importing a Large Object.
(https://www.postgresql.org/docs/17/lo-funcs.html)

i think, most of the time, you have more than one row/value to import
and export?


> The refactoring is now in a separate first single commit, which seems
> necessary, to separate the new functionality, from the refactoring.
I agree.


ProcessCopyOptions
/* Extract options from the statement node tree */
foreach(option, options)
{
}
/* --- DELIMITER option --- */
/* --- NULL option --- */
/* --- QUOTE option --- */
Currently the regress test passed, i think that means your refactor is fine.

in ProcessCopyOptions, maybe we can rearrange the code after the
foreach loop (foreach(option, options)
based on the parameters order in
https://www.postgresql.org/docs/devel/sql-copy.html Parameters section.
so we can review it by comparing the refactoring with the
sql-copy.html Parameters section's description.


>
> > We already did column length checking at BeginCopyTo.
> > no need to  "if (list_length(cstate->attnumlist) != 1)" error check in
> > CopyOneRowTo?
>
> Hmm, not sure really, since DoCopy() calls both BeginCopyTo()
> and DoCopyTo() which in turn calls CopyOneRowTo(),
> but CopyOneRowTo() is also being called from copy_dest_receive().
>

BeginCopyTo do the preparation work.
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);

After CopyGetAttnums, the number of attributes for COPY TO cannot be changed.
right after CopyGetAttnums call then check the length of cstate->attnumlist
seems fine for me.
I think in CopyOneRowTo, we can actually
Assert(list_length(cstate->attnumlist) == 1).
for raw format.


src10=# drop table if exists x;
create table x(a int);
COPY x from stdin (FORMAT raw);
DROP TABLE
CREATE TABLE
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 11
>> 12
>> \.
ERROR:  invalid input syntax for type integer: "11
12
"
CONTEXT:  COPY x, line 1, column a: "11
12
"

The above case means COPY FROM STDIN (FORMAT RAW) can only import one
single value (when successful).
user need to specify like:

COPY x from stdin (FORMAT raw, delimiter E'\n');

seems raw format default no delimiter is not user friendly.



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: New "raw" COPY format
Next
From: Dilip Kumar
Date:
Subject: Re: Make default subscription streaming option as Parallel