Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers

From Joe Conway
Subject Re: Emitting JSON to file using COPY TO
Date
Msg-id 64c24dab-cf4b-4c33-8667-540136588e9e@joeconway.com
Whole thread Raw
In response to Re: Emitting JSON to file using COPY TO  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: Emitting JSON to file using COPY TO
List pgsql-hackers
On 12/6/23 18:28, David G. Johnston wrote:
> On Wed, Dec 6, 2023 at 4:09 PM Joe Conway <mail@joeconway.com 
> <mailto:mail@joeconway.com>> wrote:
> 
>     On 12/6/23 14:47, Joe Conway wrote:
>      > On 12/6/23 13:59, Daniel Verite wrote:
>      >>      Andrew Dunstan wrote:
>      >>
>      >>> IMNSHO, we should produce either a single JSON
>      >>> document (the ARRAY case) or a series of JSON documents, one
>     per row
>      >>> (the LINES case).
>      >>
>      >> "COPY Operations" in the doc says:
>      >>
>      >> " The backend sends a CopyOutResponse message to the frontend,
>     followed
>      >>     by zero or more CopyData messages (always one per row),
>     followed by
>      >>     CopyDone".
>      >>
>      >> In the ARRAY case, the first messages with the copyjsontest
>      >> regression test look like this (tshark output):
>      >>
>      >> PostgreSQL
>      >>      Type: CopyOut response
>      >>      Length: 13
>      >>      Format: Text (0)
>      >>      Columns: 3
>      >>      Format: Text (0)
> 
>      > Anything receiving this and looking for a json array should know
>     how to
>      > assemble the data correctly despite the extra CopyData messages.
> 
>     Hmm, maybe the real problem here is that Columns do not equal "3" for
>     the json mode case -- that should really say "1" I think, because the
>     row is not represented as 3 columns but rather 1 json object.
> 
>     Does that sound correct?
> 
>     Assuming yes, there is still maybe an issue that there are two more
>     "rows" that actual output rows (the "[" and the "]"), but maybe those
>     are less likely to cause some hazard?
> 
> 
> What is the limitation, if any, of introducing new type codes for 
> these.  n = 2..N for the different variants?  Or even -1 for "raw 
> text"?  And document that columns and structural rows need to be 
> determined out-of-band.  Continuing to use 1 (text) for this non-csv 
> data seems like a hack even if we can technically make it function.  The 
> semantics, especially for the array case, are completely discarded or wrong.

I am not following you here. SendCopyBegin looks like this currently:

8<--------------------------------
SendCopyBegin(CopyToState cstate)
{
    StringInfoData buf;
    int            natts = list_length(cstate->attnumlist);
    int16        format = (cstate->opts.binary ? 1 : 0);
    int            i;

    pq_beginmessage(&buf, PqMsg_CopyOutResponse);
    pq_sendbyte(&buf, format);    /* overall format */
    pq_sendint16(&buf, natts);
    for (i = 0; i < natts; i++)
        pq_sendint16(&buf, format); /* per-column formats */
    pq_endmessage(&buf);
    cstate->copy_dest = COPY_FRONTEND;
}
8<--------------------------------

The "1" is saying are we binary mode or not. JSON mode will never be 
sending in binary in the current implementation at least. And it always 
aggregates all the columns as one json object. So the correct answer is 
(I think):
8<--------------------------------
*************** SendCopyBegin(CopyToState cstate)
*** 146,154 ****

       pq_beginmessage(&buf, PqMsg_CopyOutResponse);
       pq_sendbyte(&buf, format);    /* overall format */
!     pq_sendint16(&buf, natts);
!     for (i = 0; i < natts; i++)
!         pq_sendint16(&buf, format); /* per-column formats */
       pq_endmessage(&buf);
       cstate->copy_dest = COPY_FRONTEND;
   }
--- 150,169 ----

       pq_beginmessage(&buf, PqMsg_CopyOutResponse);
       pq_sendbyte(&buf, format);    /* overall format */
!     if (!cstate->opts.json_mode)
!     {
!         pq_sendint16(&buf, natts);
!         for (i = 0; i < natts; i++)
!             pq_sendint16(&buf, format); /* per-column formats */
!     }
!     else
!     {
!         /*
!          * JSON mode is always one non-binary column
!          */
!         pq_sendint16(&buf, 1);
!         pq_sendint16(&buf, 0);
!     }
       pq_endmessage(&buf);
       cstate->copy_dest = COPY_FRONTEND;
   }
8<--------------------------------

That still leaves the need to fix the documentation:

" The backend sends a CopyOutResponse message to the frontend, followed
    by zero or more CopyData messages (always one per row), followed by
    CopyDone"

probably "always one per row" would be changed to note that json array 
format outputs two extra rows for the start/end bracket.

In fact, as written the patch does this:
8<--------------------------------
COPY (SELECT x.i, 'val' || x.i as v FROM
       generate_series(1, 3) x(i) WHERE false)
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
]
8<--------------------------------

Not sure if that is a problem or not.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Emitting JSON to file using COPY TO
Next
From: Joe Conway
Date:
Subject: Re: Emitting JSON to file using COPY TO