Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Emitting JSON to file using COPY TO |
Date | |
Msg-id | CAD21AoCb02zhZM3vXb8HSw8fwOsL+iRdEFb--Kmunv8PjPAWjw@mail.gmail.com Whole thread Raw |
In response to | Re: Emitting JSON to file using COPY TO (Joe Conway <mail@joeconway.com>) |
Responses |
Re: Emitting JSON to file using COPY TO
|
List | pgsql-hackers |
On Thu, Dec 7, 2023 at 10:10 AM Joe Conway <mail@joeconway.com> wrote: > > On 12/6/23 18:09, Joe Conway wrote: > > On 12/6/23 14:47, Joe Conway wrote: > >> On 12/6/23 13:59, Daniel Verite wrote: > >>> Andrew Dunstan wrote: > >>> > >>>> IMNSHO, we should produce either a single JSON > >>>> document (the ARRAY case) or a series of JSON documents, one per row > >>>> (the LINES case). > >>> > >>> "COPY Operations" in the doc says: > >>> > >>> " The backend sends a CopyOutResponse message to the frontend, followed > >>> by zero or more CopyData messages (always one per row), followed by > >>> CopyDone". > >>> > >>> In the ARRAY case, the first messages with the copyjsontest > >>> regression test look like this (tshark output): > >>> > >>> PostgreSQL > >>> Type: CopyOut response > >>> Length: 13 > >>> Format: Text (0) > >>> Columns: 3 > >>> Format: Text (0) > >>> PostgreSQL > >>> Type: Copy data > >>> Length: 6 > >>> Copy data: 5b0a > >>> PostgreSQL > >>> Type: Copy data > >>> Length: 76 > >>> Copy data: > >>> 207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031… > >>> > >>> The first Copy data message with contents "5b0a" does not qualify > >>> as a row of data with 3 columns as advertised in the CopyOut > >>> message. Isn't that a problem? > >> > >> > >> Is it a real problem, or just a bit of documentation change that I missed? > >> > >> Anything receiving this and looking for a json array should know how to > >> assemble the data correctly despite the extra CopyData messages. > > > > Hmm, maybe the real problem here is that Columns do not equal "3" for > > the json mode case -- that should really say "1" I think, because the > > row is not represented as 3 columns but rather 1 json object. > > > > Does that sound correct? > > > > Assuming yes, there is still maybe an issue that there are two more > > "rows" that actual output rows (the "[" and the "]"), but maybe those > > are less likely to cause some hazard? > > > The attached should fix the CopyOut response to say one column. I.e. it > ought to look something like: > > PostgreSQL > Type: CopyOut response > Length: 13 > Format: Text (0) > Columns: 1 > Format: Text (0) > PostgreSQL > Type: Copy data > Length: 6 > Copy data: 5b0a > PostgreSQL > Type: Copy data > Length: 76 > Copy data: [...] > If I'm not missing, copyto_json.007.diff is the latest patch but it needs to be rebased to the current HEAD. Here are random comments: --- if (opts_out->json_mode) + { + if (is_from) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot use JSON mode in COPY FROM"))); + } + else if (opts_out->force_array) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("COPY FORCE_ARRAY requires JSON mode"))); I think that flatting these two condition make the code more readable: if (opts_out->json_mode && is_from) ereport(ERROR, ...); if (!opts_out->json_mode && opts_out->force_array) ereport(ERROR, ...); Also these checks can be moved close to other checks at the end of ProcessCopyOptions(). --- @@ -3395,6 +3395,10 @@ copy_opt_item: { $$ = makeDefElem("format", (Node *) makeString("csv"), @1); } + | JSON + { + $$ = makeDefElem("format", (Node *) makeString("json"), @1); + } | HEADER_P { $$ = makeDefElem("header", (Node *) makeBoolean(true), @1); @@ -3427,6 +3431,10 @@ copy_opt_item: { $$ = makeDefElem("encoding", (Node *) makeString($2), @1); } + | FORCE ARRAY + { + $$ = makeDefElem("force_array", (Node *) makeBoolean(true), @1); + } ; I believe we don't need to support new options in old-style syntax. --- @@ -3469,6 +3477,10 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2, @1); } + | FORMAT_LA copy_generic_opt_arg + { + $$ = makeDefElem("format", $2, @1); + } ; I think it's not necessary. "format" option is already handled in copy_generic_opt_elem. --- +/* need delimiter to start next json array element */ +static bool json_row_delim_needed = false; I think it's cleaner to include json_row_delim_needed into CopyToStateData. --- Splitting the patch into two patches: add json format and add force_array option would make reviews easy. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: