Thread: psql JSON output format

psql JSON output format

From
Christoph Berg
Date:
I noticed psql was lacking JSON formatting of query results which I
need for a follow-up patch. It also seems useful generally, so here's
a patch:

postgres=# \pset format json
Output format is json.
postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c);
[
{ "a": "one", "b": "2", "c": "three" },
{ "a": "four", "b": "5", "c": "six" }
]
postgres=# \x
Expanded display is on.
postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c);
[{
  "a": "one",
  "b": "2",
  "c": "three"
},{
  "a": "four",
  "b": "5",
  "c": "six"
}]
postgres=#

Both normal and expanded output format are optimized for readability
while still saving screen space.

Both formats output the same JSON structure, an array of objects.
Other variants like array-of-arrays or line-separated objects
("jsonline") might be possible, but I didn't want to overengineer it.

On the command line, the format is selected by `psql --json` and `psql -J`.
(I'm not attached to the short option, but -J was free and it's in
line with `psql -H` to select HTML.)

Christoph

Attachment

Re: psql JSON output format

From
Christoph Berg
Date:
Re: To PostgreSQL Hackers
> On the command line, the format is selected by `psql --json` and `psql -J`.

Among other uses, it enables easy post-processing of psql output using `jq`:

$ psql -lJ | jq
[
  {
    "Name": "myon",
    "Owner": "myon",
    "Encoding": "UTF8",
    "Locale Provider": "libc",
    "Collate": "de_DE.utf8",
    "Ctype": "de_DE.utf8",
    "ICU Locale": null,
    "ICU Rules": null,
    "Access privileges": null
  },
...
]

Christoph



Re: psql JSON output format

From
Jelte Fennema-Nio
Date:
On Mon, 18 Dec 2023 at 15:56, Christoph Berg <myon@debian.org> wrote:
> I noticed psql was lacking JSON formatting of query results which I
> need for a follow-up patch.

This seems useful to me too, but my usecases would also be solved (and
possibly better solved) by adding JSON support to COPY as proposed
here:
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com

I'm wondering if your follow-up patch would be better served by that too or not.



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Jelte Fennema-Nio
> This seems useful to me too, but my usecases would also be solved (and
> possibly better solved) by adding JSON support to COPY as proposed
> here:
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com

Thanks for the pointer, I had not scrolled back enough to see that
thread.

I'm happy to see that this patch is also settling on "array of
objects".

> I'm wondering if your follow-up patch would be better served by that too or not.

I'd need it to work on query results. Which could of course be wrapped
into "copy (select whatever) to stdout (format json)", but doing it in
psql without mangling the query is cleaner. And (see the other mail),
the psql format selection works nicely with existing queries like
`psql -l`.

And "copy format json" wouldn't support \x expanded mode.

We'd want both patches even if they do the same thing on two different
levels, I'd say.

Christoph



Re: psql JSON output format

From
Jelte Fennema-Nio
Date:
On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote:
> We'd want both patches even if they do the same thing on two different
> levels, I'd say.

Makes sense. One thing I was still wondering is if it wouldn't be
easier to wrap all queries in "copy (select whatever) to stdout
(format json)" automatically when the -J flag is passed psql. Because
it would be nice not to have to implement this very similar logic in
two places.

But I guess that approach does not work for commands that don't work
inside COPY, i.e. DML and DDL. I'm assuming your current patch works
fine with DML/DDL. If that's indeed the case then I agree it makes
sense to have this patch. And another big benefit is that it wouldn't
require a new Postgres server function for the json functionality of
psql.



Re: psql JSON output format

From
Dean Rasheed
Date:
On Mon, 18 Dec 2023 at 16:34, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>
> On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote:
> > We'd want both patches even if they do the same thing on two different
> > levels, I'd say.
>
> Makes sense.
>

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

Consider, for example:

CREATE TABLE foo(col json);
INSERT INTO foo VALUES ('"str_value"');

copy foo to stdout with (format json) produces this:

{"col":"str_value"}

which is as expected. However, psql -Jc "select * from foo" produces

[
{ "col": "\"str_value\"" }
]

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result. And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Regards,
Dean



Re: psql JSON output format

From
Laurenz Albe
Date:
On Mon, 2024-01-08 at 18:43 +0000, Dean Rasheed wrote:
> I can see the appeal in this feature. However, as it stands, this
> isn't compatible with copy format json, and I think it would need to
> duplicate quite a lot of the JSON output code in client-side code to
> make it compatible.
>
> Consider, for example:
>
> CREATE TABLE foo(col json);
> INSERT INTO foo VALUES ('"str_value"');
>
> copy foo to stdout with (format json) produces this:
>
> {"col":"str_value"}
>
> which is as expected. However, psql -Jc "select * from foo" produces
>
> [
> { "col": "\"str_value\"" }
> ]
>
> The problem is, various datatypes such as boolean, number types, json,
> and jsonb must not be quoted and escaped, since that would change them
> to strings or double-encode them in the result.

I agree that such data types should not be double quoted.

> And then there are
> domain types built on top of those types, and arrays, etc. See, for
> example, the logic in json_categorize_type(). I think that trying to
> duplicate that client-side is doomed to failure.

Perhaps.  But maybe "printTableContent" could be extended to contain
a boolean array "quote_for_json" that is set in "printTableAddHeader"
based on the underlying data type, similar to how "aligns" is set now.
Detecting array types might be a challenge.

Domains might not be a problem, since "PQftype()" seems to return the
base data type for domain values.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Dean Rasheed
> I can see the appeal in this feature. However, as it stands, this
> isn't compatible with copy format json, and I think it would need to
> duplicate quite a lot of the JSON output code in client-side code to
> make it compatible.

I can see we probably wouldn't want two different output formats named
json, but the general idea of "allow psql to format results as json of
strings" makes a lot of sense, so we should try to make it work. Does
it even have to be compatible?

If the code required is the same, it could be moved to libpgcommon.

> The problem is, various datatypes such as boolean, number types, json,
> and jsonb must not be quoted and escaped, since that would change them
> to strings or double-encode them in the result. And then there are
> domain types built on top of those types, and arrays, etc. See, for
> example, the logic in json_categorize_type(). I think that trying to
> duplicate that client-side is doomed to failure.

Can we try to make it work first, before we declare the perfect the
enemy of the good?

I'll note that the current code uses PG's string representation of
strings which is meant to be round-trip safe when fed back into the
server. So quoted numeric values aren't a problem at all. (And that
part is fixable.)

The real problem here is that COPY json violates that by pulling json
values up one syntax level. "Normal" cases will be fixable by just
looking for json(b) and printing that unquoted. And composite types
with jsonb members... are these really only half-quoted?!


Re: Laurenz Albe
> > The problem is, various datatypes such as boolean, number types, json,
> > and jsonb must not be quoted and escaped, since that would change them
> > to strings or double-encode them in the result.
> 
> I agree that such data types should not be double quoted.

I left that out so far because it didn't make a practical difference,
but that's fixable.

> > And then there are
> > domain types built on top of those types, and arrays, etc. See, for
> > example, the logic in json_categorize_type(). I think that trying to
> > duplicate that client-side is doomed to failure.
> 
> Perhaps.  But maybe "printTableContent" could be extended to contain
> a boolean array "quote_for_json" that is set in "printTableAddHeader"
> based on the underlying data type, similar to how "aligns" is set now.
> Detecting array types might be a challenge.
> 
> Domains might not be a problem, since "PQftype()" seems to return the
> base data type for domain values.

Thanks, I'll give that a try.

Christoph



Re: psql JSON output format

From
Dean Rasheed
Date:
On Tue, 9 Jan 2024 at 09:43, Christoph Berg <myon@debian.org> wrote:
>
> I can see we probably wouldn't want two different output formats named
> json, but the general idea of "allow psql to format results as json of
> strings" makes a lot of sense, so we should try to make it work. Does
> it even have to be compatible?
>

I would say that they should be compatible, in the sense that an
external tool parsing the outputs should regard them as equal, but
maybe there are different expectations for the two features.

> I'll note that the current code uses PG's string representation of
> strings which is meant to be round-trip safe when fed back into the
> server. So quoted numeric values aren't a problem at all. (And that
> part is fixable.)
>

I'm not sure that being round-trip safe is a necessary goal here, but
again, it's about the expectations for the feature. I was imagining
that the goal was to produce something that an external tool would
parse, rather than something Postgres would read back in. So not
quoting numeric values seems desirable to produce output that better
reflects the semantic content of the data (though it doesn't affect it
being round-trip safe).

> The real problem here is that COPY json violates that by pulling json
> values up one syntax level. "Normal" cases will be fixable by just
> looking for json(b) and printing that unquoted. And composite types
> with jsonb members... are these really only half-quoted?!
>

What to do with composites is an interesting point in question. COPY
format json will turn a composite into a JSON object whose keys are
the field names. That's useful if you want to use an external tool to
parse the result and get at the individual fields, but it's not
round-trip safe. OTOH, this patch outputs the Postgres string
representation of the object, which might be round-trip safe, but is
not very convenient for any other tool to read.

Regards,
Dean



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Dean Rasheed
> > I'll note that the current code uses PG's string representation of
> > strings which is meant to be round-trip safe when fed back into the
> > server. So quoted numeric values aren't a problem at all. (And that
> > part is fixable.)
> 
> I'm not sure that being round-trip safe is a necessary goal here, but
> again, it's about the expectations for the feature. I was imagining
> that the goal was to produce something that an external tool would
> parse, rather than something Postgres would read back in. So not
> quoting numeric values seems desirable to produce output that better
> reflects the semantic content of the data (though it doesn't affect it
> being round-trip safe).

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

> OTOH, this patch outputs the Postgres string representation of the
> object, which might be round-trip safe, but is not very convenient
> for any other tool to read.

For my use case, I need something that can be fed back into PG.
Reassembling all the json parts back into proper values would be a
pretty hard problem.

Perhaps there should be two output formats, one that's roundtrip-safe,
and one that represents json structures and composite values nicely.
Adding format-specific options could also be used to switch the output
between "array of json objects" and "one json object per line".

Christoph

Attachment

Re: psql JSON output format

From
Dean Rasheed
Date:
[cc'ing Joe]

On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:
>
> Getting it print numeric/boolean without quotes was actually easy, as
> well as json(b). Implemented as the attached v2 patch.
>
> But: not quoting json means that NULL and 'null'::json will both be
> rendered as 'null'. That strikes me as a pretty undesirable conflict.
> Does the COPY patch also do that?
>

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

> > OTOH, this patch outputs the Postgres string representation of the
> > object, which might be round-trip safe, but is not very convenient
> > for any other tool to read.
>
> For my use case, I need something that can be fed back into PG.
> Reassembling all the json parts back into proper values would be a
> pretty hard problem.
>

What is your use case? It seems like what you want is quite different
from the COPY patch.

Regards,
Dean



Re: psql JSON output format

From
Laurenz Albe
Date:
On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:
> On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:
> >
> > Getting it print numeric/boolean without quotes was actually easy, as
> > well as json(b). Implemented as the attached v2 patch.
> >
> > But: not quoting json means that NULL and 'null'::json will both be
> > rendered as 'null'. That strikes me as a pretty undesirable conflict.
> > Does the COPY patch also do that?
>
> Yes. Perhaps what needs to happen is for a NULL column to be omitted
> entirely from the output. I think the COPY TO json patch would have to
> do that if COPY FROM json were to be added later, to make it
> round-trip safe.

I think the behavior is fine as it is.  I'd expect both NULL and JSON "null"
to be rendered as "null".  I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important.  If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Andrew Dunstan
Date:


On 2024-01-16 Tu 11:07, Laurenz Albe wrote:
On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:
On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:
Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?
Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.
I think the behavior is fine as it is.  I'd expect both NULL and JSON "null"
to be rendered as "null".  I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important.  If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.


This is absolutely not true. The docs say about CSV format:

A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes ("").

CSV format with default settings is and has been from the beginning designed to be round trippable.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: psql JSON output format

From
Robert Haas
Date:
On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> "Round-trip safety" is not so important.  If you want to move data from
> PostgreSQL to PostgreSQL, you use the plain or the binary format.
> The CSV format by default renders NULL and empty strings identical, and
> I don't think anybody objects to that.

As Andrew says, the part about the CSV format is not correct, but I
also don't think I agree with the larger point, either. I believe that
round-trip safety is a really desirable property. Is it absolutely
necessary in every case? Maybe not. But, it shouldn't be lacking
without a good reason, either, at least IMHO. If you postulate that
people are moving data from A to B, it is reasonable to think that
eventually someone is going to want to move some data from B back to
A. If that turns out to be hard, they'll be sad. We shouldn't make
people sad without a good reason.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: psql JSON output format

From
Laurenz Albe
Date:
On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote:
> On 2024-01-16 Tu 11:07, Laurenz Albe wrote:
> > On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:
> > > On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:
> > > > Getting it print numeric/boolean without quotes was actually easy, as
> > > > well as json(b). Implemented as the attached v2 patch.
> > > >
> > > > But: not quoting json means that NULL and 'null'::json will both be
> > > > rendered as 'null'. That strikes me as a pretty undesirable conflict.
> > > > Does the COPY patch also do that?
> > >
> > > Yes. Perhaps what needs to happen is for a NULL column to be omitted
> > > entirely from the output. I think the COPY TO json patch would have to
> > > do that if COPY FROM json were to be added later, to make it
> > > round-trip safe.
> >
> > I think the behavior is fine as it is.  I'd expect both NULL and JSON "null"
> > to be rendered as "null".  I think the main use case for a feature like this
> > is people who need the result in JSON for further processing somewhere else.
> >
> > "Round-trip safety" is not so important.  If you want to move data from
> > PostgreSQL to PostgreSQL, you use the plain or the binary format.
> > The CSV format by default renders NULL and empty strings identical, and
> > I don't think anybody objects to that.
>
> This is absolutely not true.
>
> CSV format with default settings is and has been from the beginning designed
> to be round trippable.

Sorry for being unclear.  I wasn't talking about COPY, but about the psql
output format:

CREATE TABLE xy (a integer, b text);

INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, '');

\pset format csv
Output format is csv.

TABLE xy;
a,b
1,one
2,
3,

Yours,
Laurenz Albe



Re: psql JSON output format

From
Laurenz Albe
Date:
On Tue, 2024-01-16 at 14:12 -0500, Robert Haas wrote:
> On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > "Round-trip safety" is not so important.  If you want to move data from
> > PostgreSQL to PostgreSQL, you use the plain or the binary format.
> > The CSV format by default renders NULL and empty strings identical, and
> > I don't think anybody objects to that.
>
> As Andrew says, the part about the CSV format is not correct, but I
> also don't think I agree with the larger point, either. I believe that
> round-trip safety is a really desirable property. Is it absolutely
> necessary in every case? Maybe not. But, it shouldn't be lacking
> without a good reason, either, at least IMHO. If you postulate that
> people are moving data from A to B, it is reasonable to think that
> eventually someone is going to want to move some data from B back to
> A. If that turns out to be hard, they'll be sad. We shouldn't make
> people sad without a good reason.

As mentioned in my other mail, I was talking about the psql output
format "csv" rather than about COPY.

I agree that it is desirable to lose as little information as possible.
But if we want to format query output as JSON, we have a couple of
requirements that cannot all be satisfied:

1. lose no information ("round-trip safe")

2. don't double quote numbers, booleans and other JSON values

3. don't skip any table column in the output

Christoph's original patch didn't satisfy #2, and his current version
doesn't satisfy #1.  Do you think that skipping NULL columns would be
the best solution?  We don't do that in the to_json() function, which
also renders SQL NULL as JSON null.

I think the argument for round-trip safety of psql output is tenuous.
There is no way for psql to ingest JSON as input format, and the patch
to add JSON as COPY format only supports COPY TO.  And unless you can
name the exact way that the data written by psql will be loaded into
PostgreSQL again, all that remains is an (understandable) unease about
losing the distiction between SQL NULL and JSON null.

We have jsonb_populate_record() to convert JSON back to a table row,
but that function will convert both missing columns and a JSON null
to SQL NULL:

CREATE TABLE xy (id integer, j jsonb);

\pset null '∅'

SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1,"j":null}');

 id │ j
════╪═══
  1 │ ∅
(1 row)

SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1}');

 id │ j
════╪═══
  1 │ ∅
(1 row)

Indeed, there doesn't seem to be a way to generate JSON null with that
function.

So I wouldn't worry about round-trip safety too much, and my preference
is how the current patch does it.  I am not dead set against a solution
that omits NULL columns in the output, though.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Andrew Dunstan
Date:
On 2024-01-17 We 03:52, Laurenz Albe wrote:
> On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote:
>> On 2024-01-16 Tu 11:07, Laurenz Albe wrote:
>>> On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:
>>>> On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:
>>>>> Getting it print numeric/boolean without quotes was actually easy, as
>>>>> well as json(b). Implemented as the attached v2 patch.
>>>>>
>>>>> But: not quoting json means that NULL and 'null'::json will both be
>>>>> rendered as 'null'. That strikes me as a pretty undesirable conflict.
>>>>> Does the COPY patch also do that?
>>>> Yes. Perhaps what needs to happen is for a NULL column to be omitted
>>>> entirely from the output. I think the COPY TO json patch would have to
>>>> do that if COPY FROM json were to be added later, to make it
>>>> round-trip safe.
>>> I think the behavior is fine as it is.  I'd expect both NULL and JSON "null"
>>> to be rendered as "null".  I think the main use case for a feature like this
>>> is people who need the result in JSON for further processing somewhere else.
>>>
>>> "Round-trip safety" is not so important.  If you want to move data from
>>> PostgreSQL to PostgreSQL, you use the plain or the binary format.
>>> The CSV format by default renders NULL and empty strings identical, and
>>> I don't think anybody objects to that.
>> This is absolutely not true.
>>
>> CSV format with default settings is and has been from the beginning designed
>> to be round trippable.
> Sorry for being unclear.  I wasn't talking about COPY, but about the psql
> output format:
>
> CREATE TABLE xy (a integer, b text);
>
> INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, '');
>
> \pset format csv
> Output format is csv.
>
> TABLE xy;
> a,b
> 1,one
> 2,
> 3,
>

I think the reason nobody's complained about it is quite possibly that 
very few people have used it. That's certainly the case with me - if I'd 
noticed it I would have complained.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: psql JSON output format

From
Robert Haas
Date:
On Wed, Jan 17, 2024 at 4:30 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> As mentioned in my other mail, I was talking about the psql output
> format "csv" rather than about COPY.

Oh. Well, I think it's sad that the psql format csv has that property.
Why doesn't it adopt COPY's handling?

> I agree that it is desirable to lose as little information as possible.
> But if we want to format query output as JSON, we have a couple of
> requirements that cannot all be satisfied:
>
> 1. lose no information ("round-trip safe")
>
> 2. don't double quote numbers, booleans and other JSON values
>
> 3. don't skip any table column in the output
>
> Christoph's original patch didn't satisfy #2, and his current version
> doesn't satisfy #1.  Do you think that skipping NULL columns would be
> the best solution?  We don't do that in the to_json() function, which
> also renders SQL NULL as JSON null.

Let me start by clarifying that I'm OK with sacrificing
round-trippability here as long as we do it thoughtfully.
"Round-trippability is important but X is more important and we cannot
have both for Y reasons" seems like a potentially fine argument to me;
I'm only objecting to an argument of the form "round-trippability
doesn't even matter." My previous comment was a bit of a drive-by
remark on that specifically rather than a strong opinion about what
exactly we ought to do here.

I guess the specifically issue here is around a json(b) column that is
null at the SQL level vs one that contains a JSON null. How do we
distinguish those cases? I think entirely omitting null columns could
be a way forward, but I don't know if that would cause other problems
for users.

I'm not quite sure that addresses all the issues, though. For
instance, consider that 1.00::numeric and 1.0::numeric are equal but
distinguishable. If those get rendered into the JSON unquoted as 1.00
and 1.0, respectively, is that going to round-trip properly? What
about float8 values where extra_float_digits=3 is needed to properly
round trip? If we take PostgreSQL's array data types and turn them
into JSON arrays, what happens with non-default bounds? I know how
we're going to turn '{1,2}'::int[] into a JSON array, or at least I
assume I do, but what in the world are we going to do about
'[-3:-2]={1,2}'?

As much as I think round-trippability is good, getting it to 100% here
is probably a good bit of work. And maybe that work isn't worth doing
or involves too much collateral damage. But I do think it has positive
value. If we produce output that could be ingested back into PG later
with the right tool, that leaves the door open for someone to build
the tool later even if we don't have it today. If we produce output
that loses information, no tool built later can make up for the loss.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: psql JSON output format

From
Laurenz Albe
Date:
On Wed, 2024-01-17 at 14:52 -0500, Robert Haas wrote:
> Let me start by clarifying that I'm OK with sacrificing
> round-trippability here as long as we do it thoughtfully.

Got you.


> I'm not quite sure that addresses all the issues, though. For
> instance, consider that 1.00::numeric and 1.0::numeric are equal but
> distinguishable. If those get rendered into the JSON unquoted as 1.00
> and 1.0, respectively, is that going to round-trip properly? What
> about float8 values where extra_float_digits=3 is needed to properly
> round trip? If we take PostgreSQL's array data types and turn them
> into JSON arrays, what happens with non-default bounds? I know how
> we're going to turn '{1,2}'::int[] into a JSON array, or at least I
> assume I do, but what in the world are we going to do about
> '[-3:-2]={1,2}'?
>
> As much as I think round-trippability is good, getting it to 100% here
> is probably a good bit of work.

I would go as far as saying that the attempt to preserve all that is
futile, if you are bound to JSON as format.

> But I do think it has positive
> value. If we produce output that could be ingested back into PG later
> with the right tool, that leaves the door open for someone to build
> the tool later even if we don't have it today. If we produce output
> that loses information, no tool built later can make up for the loss.

I am all for losing as little information as possible, but I think
that this discussion is going off on a tangent.  After all, we are not
talking about a data export tool here, we are talking about psql.
I don't see anybody complain that float8 values lose precision in
the default aligned format, or that empty strings and NULL values
look the same in aligned format.  Why do the goalposts move for the
JSON output format?  I don't think psql output should be considered
a form of backup.

I'd say that we should strive to preserve whatever information we
easily can, and we shouldn't worry about the rest.

Can we get consensus that SQL NULL columns should be omitted from the
output, and the rest left as it currently is?

Yours,
Laurenz Albe



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Laurenz Albe
> > But I do think it has positive
> > value. If we produce output that could be ingested back into PG later
> > with the right tool, that leaves the door open for someone to build
> > the tool later even if we don't have it today. If we produce output
> > that loses information, no tool built later can make up for the loss.

> I am all for losing as little information as possible, but I think
> that this discussion is going off on a tangent.  After all, we are not
> talking about a data export tool here, we are talking about psql.

I've just posted the other patch where I need the JSON format:
https://www.postgresql.org/message-id/flat/Za6EfXeewwLRS_fs%40msg.df7cb.de

There, I need to be able to read back the query output into psql,
while at the same time being human-readable so the user can sanely
edit the data in an editor. The default "aligned" format is only
human-readable, while CSV is mostly only machine-readable. JSON is the
best option between the two, I think.

What I did now in v3 of this patch is to print boolean and numeric
values (ints, floats, numeric) without quotes, while adding the quotes
back to json. This solves the NULL vs 'null'::json problem.

> I don't see anybody complain that float8 values lose precision in
> the default aligned format, or that empty strings and NULL values
> look the same in aligned format.  Why do the goalposts move for the
> JSON output format?  I don't think psql output should be considered
> a form of backup.

Fwiw, not quoting numbers in JSON won't have any of these problems if
the JSON reader just passes the strings read through. (Which PG's JSON
parser does.)

> Can we get consensus that SQL NULL columns should be omitted from the
> output, and the rest left as it currently is?

I think that would be an interesting option for a JSON export format.
The psql JSON format is more for human inspection, where omitting the
columns might create confusion. (We could make it a pset parameter of
the format, but I think the default should be to show NULL columns.)

Christoph

Attachment

Re: psql JSON output format

From
Laurenz Albe
Date:
On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> What I did now in v3 of this patch is to print boolean and numeric
> values (ints, floats, numeric) without quotes, while adding the quotes
> back to json. This solves the NULL vs 'null'::json problem.

The patch is working as advertised.

I am kind of unhappy about this change.  It seems awkward and undesirable
so have JSON values decorated with weird quoting in JSON output.
I understand the motivation, but I bet it's not what will make users
happy.

If you need to disambiguate between SQL NULL and JSON null, my
preferred solution would be to omit SQL NULL columns from the output
altogether.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Stefan Keller
Date:
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.

I fully support Laurenz's proposal and argumentation. The main use
case for such a JSON output feature is further processing somewhere
else.

--Stefan

Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
>
> On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> > What I did now in v3 of this patch is to print boolean and numeric
> > values (ints, floats, numeric) without quotes, while adding the quotes
> > back to json. This solves the NULL vs 'null'::json problem.
>
> The patch is working as advertised.
>
> I am kind of unhappy about this change.  It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.
>
> Yours,
> Laurenz Albe
>
>



Re: psql JSON output format

From
"David G. Johnston"
Date:
On Tue, Jan 23, 2024 at 7:35 AM Stefan Keller <sfkeller@gmail.com> wrote:
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.

I fully support Laurenz's proposal and argumentation. The main use
case for such a JSON output feature is further processing somewhere
else.

--Stefan

Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
>
> On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> > What I did now in v3 of this patch is to print boolean and numeric
> > values (ints, floats, numeric) without quotes, while adding the quotes
> > back to json. This solves the NULL vs 'null'::json problem.
>
> The patch is working as advertised.
>
> I am kind of unhappy about this change.  It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.
>

I agree on distinguishing SQL via omission but I do think, almost regardless, that the output should include a metadata section that lists all of the actual columns in the result, the column position, and since we have the info available, the data type name and possibly OID.  Then any column name present in the metadata but that isn't a key name for a given object is known to have an SQL NULL as the value of that column in that row.

David J.

Re: psql JSON output format

From
Christoph Berg
Date:
Re: Laurenz Albe
> I am kind of unhappy about this change.  It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.

Well, why stop at JSON, and not represent any array type as a JSON
array? Compound types could be transformed into JSON objects. Custom
data types could add hooks for their own custom JSON representation.

I'd just stop the flood right before it starts...

The real reason I'd not want to go that route is because I need the
format to be round-trip safe for the "\gedit" patch, see the other
thread. We would have to transform JSON sub-parts back into PG's text
format. But there were already complaints that other patch is complex.
At the moment it's just strcmp-ing the text value before and after,
which is straighforward.

> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.

That works, but only by convention only.


Re: David G. Johnston
> I agree on distinguishing SQL via omission but I do think, almost
> regardless, that the output should include a metadata section that lists
> all of the actual columns in the result, the column position, and since we
> have the info available, the data type name and possibly OID.  Then any
> column name present in the metadata but that isn't a key name for a given
> object is known to have an SQL NULL as the value of that column in that row.

There are no comments in JSON.

Adding the info in-band would break the simple use case of using the
data as-is for further processing.

Christoph



Re: psql JSON output format

From
Laurenz Albe
Date:
On Tue, 2024-01-23 at 08:01 -0700, David G. Johnston wrote:
> I do think that the output should include a metadata section that lists all of
> the actual columns in the result, the column position, and since we have the
> info available, the data type name and possibly OID.  Then any column name
> present in the metadata but that isn't a key name for a given object is known
> to have an SQL NULL as the value of that column in that row.

Sounds attractive, but I'm a bit worried that that additional information
might make life harder for some consumers.

My crystal ball is as cloudy as anybody's when it comes to guessing the
most likely use cases for the feature, but I'd rather keep it simple and add
features like that later, if there is a demand.

If you import the data into an existing structure, you don't need the metadata.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Laurenz Albe
> If you import the data into an existing structure, you don't need the metadata.

Also, since you were running the query yourself, you should know what
columns you were expecting.

Christoph



Re: psql JSON output format

From
Laurenz Albe
Date:
On Tue, 2024-01-23 at 16:36 +0100, Christoph Berg wrote:
> Re: Laurenz Albe
> > I am kind of unhappy about this change.  It seems awkward and undesirable
> > so have JSON values decorated with weird quoting in JSON output.
> > I understand the motivation, but I bet it's not what will make users
> > happy.
>
> Well, why stop at JSON, and not represent any array type as a JSON
> array? Compound types could be transformed into JSON objects. Custom
> data types could add hooks for their own custom JSON representation.
>
> I'd just stop the flood right before it starts...

I'd stop the flood right after json/jsonb.

Arrays as database columns are probably too rare to be a real issue.

> The real reason I'd not want to go that route is because I need the
> format to be round-trip safe for the "\gedit" patch, see the other
> thread. We would have to transform JSON sub-parts back into PG's text
> format. But there were already complaints that other patch is complex.
> At the moment it's just strcmp-ing the text value before and after,
> which is straighforward.

I don't particularly care about \gedit, any I think \gedit shouldn't
be the raison d'être for this patch.

I cannot imagine that anybody who wants to move data from PostgreSQL
to PostgreSQL will use \gedit, load the output from psql into the
editor and save.  After all, there is COPY and pg_dump.

I'm pretty certain that people are more likely to use psql's JSON
output format to move data somewhere else.

Yours,
Laurenz Albe



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Laurenz Albe
> > I'd just stop the flood right before it starts...
> 
> I'd stop the flood right after json/jsonb.

Nod. I do see a point here, given it's "json in json", and not
"something else in json". Will try to make it work with \gedit.

> Arrays as database columns are probably too rare to be a real issue.

Ack.

> I'm pretty certain that people are more likely to use psql's JSON
> output format to move data somewhere else.

Well, there's also the other patch to add JSON support to COPY, that
would be even more suitable.

Christoph



Re: psql JSON output format

From
Robert Haas
Date:
On Tue, Jan 23, 2024 at 11:35 AM Christoph Berg <myon@debian.org> wrote:
> Ack.

The last version of this patch was posted on January 22nd and got a
bunch of replies, so I'm marking
https://commitfest.postgresql.org/48/4707/ as Returned with Feedback
for now. Please feel free to update the status of the patch when the
situation changes.

IMHO, the big problem here is that different people want different
corner-case behaviors and it's not clear what to do about that. I
don't think there's a single vote for "don't do this at all". So if
there is a desire to take this work forward, the goal probably ought
to be to try to either (a) figure out one behavior that everyone can
live with or (b) figure out a short list of options that can be used
to customize the behavior to a degree that lets everyone get something
reasonably close to what they want. For instance, "what to do if you
find a SQL null" and "whether to include json values as strings or
json objects" seem like they could potentially be customizable. That's
probably not a silver bullet because (1) that's more work and (2)
there might be more behaviors than we want to code, or maintain the
code for, and (3) if it gets too complicated that can itself become a
source of objections. But it's an idea.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: psql JSON output format

From
Christoph Berg
Date:
Re: Robert Haas
> IMHO, the big problem here is that different people want different
> corner-case behaviors and it's not clear what to do about that. I
> don't think there's a single vote for "don't do this at all". So if
> there is a desire to take this work forward, the goal probably ought
> to be to try to either (a) figure out one behavior that everyone can
> live with or (b) figure out a short list of options that can be used
> to customize the behavior to a degree that lets everyone get something
> reasonably close to what they want. For instance, "what to do if you
> find a SQL null" and "whether to include json values as strings or
> json objects" seem like they could potentially be customizable. That's
> probably not a silver bullet because (1) that's more work and (2)
> there might be more behaviors than we want to code, or maintain the
> code for, and (3) if it gets too complicated that can itself become a
> source of objections. But it's an idea.

Thanks for summarizing the thread.

Things mentioned in the thread:

1) rendering of SQL NULLs - include or omit the column

2) rendering of JSON values - both "quoted string" and "inline as
   JSON" make sense

3) not quoting numeric values and booleans

4) no special treatment of other datatypes like arrays or compound
   values, just quote them

5) row format: JSON object or array (array would be close to CSV
   format)

6) overall format: array of rows, or simply print each row separately
   ("JSON Lines" format, https://jsonlines.org/)

I think 1, 2 and perhaps 6 make sense to have configurable. Two or
three \pset options (or one option with a list of flags) don't sound
too bad complexity-wise.

Or maybe just default to "omit NULL columns" and "inline JSON" (and
render null as NULL).

Thoughts?

Christoph



Re: psql JSON output format

From
Robert Haas
Date:
On Fri, May 17, 2024 at 9:42 AM Christoph Berg <myon@debian.org> wrote:
> Thanks for summarizing the thread.
>
> Things mentioned in the thread:
>
> 1) rendering of SQL NULLs - include or omit the column
>
> 2) rendering of JSON values - both "quoted string" and "inline as
>    JSON" make sense
>
> 3) not quoting numeric values and booleans
>
> 4) no special treatment of other datatypes like arrays or compound
>    values, just quote them
>
> 5) row format: JSON object or array (array would be close to CSV
>    format)
>
> 6) overall format: array of rows, or simply print each row separately
>    ("JSON Lines" format, https://jsonlines.org/)
>
> I think 1, 2 and perhaps 6 make sense to have configurable. Two or
> three \pset options (or one option with a list of flags) don't sound
> too bad complexity-wise.
>
> Or maybe just default to "omit NULL columns" and "inline JSON" (and
> render null as NULL).

If we're going to just have one option, I agree with making that the
default, and I'd default to an array of row objects. If we're going to
have something configurable, I'd at least consider making (4)
configurable.

It's tempting to just have one option, like \pset jsonformat
nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array
simply because adding a ton of new options just for this isn't very
appealing. But looking at how long that is, it's probably not a great
idea. So I guess separate options is probably better?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: psql JSON output format

From
Pavel Stehule
Date:


pá 17. 5. 2024 v 16:04 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
On Fri, May 17, 2024 at 9:42 AM Christoph Berg <myon@debian.org> wrote:
> Thanks for summarizing the thread.
>
> Things mentioned in the thread:
>
> 1) rendering of SQL NULLs - include or omit the column
>
> 2) rendering of JSON values - both "quoted string" and "inline as
>    JSON" make sense
>
> 3) not quoting numeric values and booleans
>
> 4) no special treatment of other datatypes like arrays or compound
>    values, just quote them
>
> 5) row format: JSON object or array (array would be close to CSV
>    format)
>
> 6) overall format: array of rows, or simply print each row separately
>    ("JSON Lines" format, https://jsonlines.org/)
>
> I think 1, 2 and perhaps 6 make sense to have configurable. Two or
> three \pset options (or one option with a list of flags) don't sound
> too bad complexity-wise.
>
> Or maybe just default to "omit NULL columns" and "inline JSON" (and
> render null as NULL).

If we're going to just have one option, I agree with making that the
default, and I'd default to an array of row objects. If we're going to
have something configurable, I'd at least consider making (4)
configurable.

It's tempting to just have one option, like \pset jsonformat
nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array
simply because adding a ton of new options just for this isn't very
appealing. But looking at how long that is, it's probably not a great
idea. So I guess separate options is probably better?

+1 for separate options

lot of these proposed options can be used for XML too

Regards

Pavel


--
Robert Haas
EDB: http://www.enterprisedb.com