Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers

From Davin Shearer
Subject Re: Emitting JSON to file using COPY TO
Date
Msg-id CALvfUkByf6eou6J36-2Hw7Kdro80JD_Ch1AeUDAjsX8RpcA==A@mail.gmail.com
Whole thread Raw
In response to Re: Emitting JSON to file using COPY TO  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Emitting JSON to file using COPY TO
List pgsql-hackers
I'm really glad to see this taken up as a possible new feature and will definitely use it if it gets released.  I'm impressed with how clean, understandable, and approachable the postgres codebase is in general and how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a concern with adding the '[' at the beginning and ']' at the end of the json output.  Those are already added by `json_agg` (https://www.postgresql.org/docs/current/functions-aggregate.html) as you can see in my initial email.  Adding them in the COPY TO may be redundant (e.g., [[{"key":"value"...}....]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and NiFi has superior handling of JSON (via JOLT parsers) versus CSV where parsing is generally done with regex.  I want to be able to emit JSON using a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to be flat and rectangular.  I would vote -1 to leave it out due to the necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> I did a quick PoC patch (attached) -- if there interest and no hard
> objections I would like to get it up to speed for the January commitfest.

Cool.  I would expect there to be interest, given all the other JSON
support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format that must be
quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

> 1. Is supporting JSON array format sufficient, or does it need to support
> some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one.  My instinct would be
start with something simple, though.  I don't think we offer any special
options for log_destination...

> 2. This only supports COPY TO and we would undoubtedly want to support COPY
> FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

> !     if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Refactoring backend fork+exec code
Next
From: Lev Kokotov
Date:
Subject: Re: Bug in pgbench prepared statements