Re: New "raw" COPY format - Mailing list pgsql-hackers
From | jian he |
---|---|
Subject | Re: New "raw" COPY format |
Date | |
Msg-id | CACJufxG7gE4XGCQJ9UG0ki2YnyWFUhJ_QoWm0TE42QZZVyz8Hg@mail.gmail.com Whole thread Raw |
In response to | Re: New "raw" COPY format (Tatsuo Ishii <ishii@postgresql.org>) |
List | pgsql-hackers |
On Sat, Oct 19, 2024 at 1:24 AM Joel Jacobson <joel@compiler.org> wrote: >> > Handling of e.g. JSON and other structured text files that could contain > newlines, in a seamless way seems important, so therefore the default is > no delimiter for the raw format, so that the entire input is read as one data > value for COPY FROM, and all column data is concatenated without delimiter > for COPY TO. > > When specifying a delimiter for the raw format, it separates *rows*, and can be > a multi-byte string, such as E'\r\n' to handle Windows text files. > > This has been documented under the DELIMITER option, as well as under the > Raw Format section. > We already make RAW and can only have one column. if RAW has no default delimiter, then COPY FROM a text file will become one datum value; which makes it looks like importing a Large Object. (https://www.postgresql.org/docs/17/lo-funcs.html) i think, most of the time, you have more than one row/value to import and export? > The refactoring is now in a separate first single commit, which seems > necessary, to separate the new functionality, from the refactoring. I agree. ProcessCopyOptions /* Extract options from the statement node tree */ foreach(option, options) { } /* --- DELIMITER option --- */ /* --- NULL option --- */ /* --- QUOTE option --- */ Currently the regress test passed, i think that means your refactor is fine. in ProcessCopyOptions, maybe we can rearrange the code after the foreach loop (foreach(option, options) based on the parameters order in https://www.postgresql.org/docs/devel/sql-copy.html Parameters section. so we can review it by comparing the refactoring with the sql-copy.html Parameters section's description. > > > We already did column length checking at BeginCopyTo. > > no need to "if (list_length(cstate->attnumlist) != 1)" error check in > > CopyOneRowTo? > > Hmm, not sure really, since DoCopy() calls both BeginCopyTo() > and DoCopyTo() which in turn calls CopyOneRowTo(), > but CopyOneRowTo() is also being called from copy_dest_receive(). > BeginCopyTo do the preparation work. cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist); After CopyGetAttnums, the number of attributes for COPY TO cannot be changed. right after CopyGetAttnums call then check the length of cstate->attnumlist seems fine for me. I think in CopyOneRowTo, we can actually Assert(list_length(cstate->attnumlist) == 1). for raw format. src10=# drop table if exists x; create table x(a int); COPY x from stdin (FORMAT raw); DROP TABLE CREATE TABLE Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 11 >> 12 >> \. ERROR: invalid input syntax for type integer: "11 12 " CONTEXT: COPY x, line 1, column a: "11 12 " The above case means COPY FROM STDIN (FORMAT RAW) can only import one single value (when successful). user need to specify like: COPY x from stdin (FORMAT raw, delimiter E'\n'); seems raw format default no delimiter is not user friendly.
pgsql-hackers by date: