Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers

From Dave Cramer
Subject Re: Emitting JSON to file using COPY TO
Date
Msg-id CADK3HH+fALOQMba8Li3N9EpJ9PhW5e+Px=YWsMKuSfZaqOyHMg@mail.gmail.com
Whole thread Raw
In response to Re: Emitting JSON to file using COPY TO  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Emitting JSON to file using COPY TO
List pgsql-hackers



On Thu, 7 Dec 2023 at 08:47, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, December 7, 2023, Daniel Verite <daniel@manitou-mail.org> wrote:
        Joe Conway wrote:

> The attached should fix the CopyOut response to say one column. I.e. it
> ought to look something like:

Spending more time with the doc I came to the opinion that in this bit
of the protocol, in CopyOutResponse (B)
...
Int16
The number of columns in the data to be copied (denoted N below).
...

this number must be the number of columns in the source.
That is for COPY table(a,b,c)   the number is 3, independently
on whether the result is formatted in text, cvs, json or binary.

I think that changing it for json can reasonably be interpreted
as a protocol break and we should not do it.

The fact that this value does not help parsing the CopyData
messages that come next is not a new issue. A reader that
doesn't know the field separator and whether it's text or csv
cannot parse these messages into fields anyway.
But just knowing how much columns there are in the original
data might be useful by itself and we don't want to break that.

This argument for leaving 3 as the column count makes sense to me.  I agree this content is not meant to facilitate interpreting the contents at a protocol level.

I'd disagree. From my POV if the data comes back as a JSON Array this is one object and this should be reflected in the column count. 
 


The other question for me is, in the CopyData message, this
bit:
" Messages sent from the backend will always correspond to single data rows"

ISTM that considering that the "[" starting the json array is a
"data row" is a stretch.
That might be interpreted as a protocol break, depending
on how strict the interpretation is.

Well technically it is a single row if you send an array.

Regardless, I expect Euler's comment above that JSON lines format is going to be the preferred format as the client doesn't have to wait for the entire object before starting to parse.

Dave

pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Next
From: Ashutosh Bapat
Date:
Subject: Re: Memory consumed by paths during partitionwise join planning