Thread: psql JSON output format
I noticed psql was lacking JSON formatting of query results which I need for a follow-up patch. It also seems useful generally, so here's a patch: postgres=# \pset format json Output format is json. postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c); [ { "a": "one", "b": "2", "c": "three" }, { "a": "four", "b": "5", "c": "six" } ] postgres=# \x Expanded display is on. postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c); [{ "a": "one", "b": "2", "c": "three" },{ "a": "four", "b": "5", "c": "six" }] postgres=# Both normal and expanded output format are optimized for readability while still saving screen space. Both formats output the same JSON structure, an array of objects. Other variants like array-of-arrays or line-separated objects ("jsonline") might be possible, but I didn't want to overengineer it. On the command line, the format is selected by `psql --json` and `psql -J`. (I'm not attached to the short option, but -J was free and it's in line with `psql -H` to select HTML.) Christoph
Attachment
Re: To PostgreSQL Hackers > On the command line, the format is selected by `psql --json` and `psql -J`. Among other uses, it enables easy post-processing of psql output using `jq`: $ psql -lJ | jq [ { "Name": "myon", "Owner": "myon", "Encoding": "UTF8", "Locale Provider": "libc", "Collate": "de_DE.utf8", "Ctype": "de_DE.utf8", "ICU Locale": null, "ICU Rules": null, "Access privileges": null }, ... ] Christoph
On Mon, 18 Dec 2023 at 15:56, Christoph Berg <myon@debian.org> wrote: > I noticed psql was lacking JSON formatting of query results which I > need for a follow-up patch. This seems useful to me too, but my usecases would also be solved (and possibly better solved) by adding JSON support to COPY as proposed here: https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com I'm wondering if your follow-up patch would be better served by that too or not.
Re: Jelte Fennema-Nio > This seems useful to me too, but my usecases would also be solved (and > possibly better solved) by adding JSON support to COPY as proposed > here: https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com Thanks for the pointer, I had not scrolled back enough to see that thread. I'm happy to see that this patch is also settling on "array of objects". > I'm wondering if your follow-up patch would be better served by that too or not. I'd need it to work on query results. Which could of course be wrapped into "copy (select whatever) to stdout (format json)", but doing it in psql without mangling the query is cleaner. And (see the other mail), the psql format selection works nicely with existing queries like `psql -l`. And "copy format json" wouldn't support \x expanded mode. We'd want both patches even if they do the same thing on two different levels, I'd say. Christoph
On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote: > We'd want both patches even if they do the same thing on two different > levels, I'd say. Makes sense. One thing I was still wondering is if it wouldn't be easier to wrap all queries in "copy (select whatever) to stdout (format json)" automatically when the -J flag is passed psql. Because it would be nice not to have to implement this very similar logic in two places. But I guess that approach does not work for commands that don't work inside COPY, i.e. DML and DDL. I'm assuming your current patch works fine with DML/DDL. If that's indeed the case then I agree it makes sense to have this patch. And another big benefit is that it wouldn't require a new Postgres server function for the json functionality of psql.
On Mon, 18 Dec 2023 at 16:34, Jelte Fennema-Nio <postgres@jeltef.nl> wrote: > > On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote: > > We'd want both patches even if they do the same thing on two different > > levels, I'd say. > > Makes sense. > I can see the appeal in this feature. However, as it stands, this isn't compatible with copy format json, and I think it would need to duplicate quite a lot of the JSON output code in client-side code to make it compatible. Consider, for example: CREATE TABLE foo(col json); INSERT INTO foo VALUES ('"str_value"'); copy foo to stdout with (format json) produces this: {"col":"str_value"} which is as expected. However, psql -Jc "select * from foo" produces [ { "col": "\"str_value\"" } ] The problem is, various datatypes such as boolean, number types, json, and jsonb must not be quoted and escaped, since that would change them to strings or double-encode them in the result. And then there are domain types built on top of those types, and arrays, etc. See, for example, the logic in json_categorize_type(). I think that trying to duplicate that client-side is doomed to failure. Regards, Dean
On Mon, 2024-01-08 at 18:43 +0000, Dean Rasheed wrote: > I can see the appeal in this feature. However, as it stands, this > isn't compatible with copy format json, and I think it would need to > duplicate quite a lot of the JSON output code in client-side code to > make it compatible. > > Consider, for example: > > CREATE TABLE foo(col json); > INSERT INTO foo VALUES ('"str_value"'); > > copy foo to stdout with (format json) produces this: > > {"col":"str_value"} > > which is as expected. However, psql -Jc "select * from foo" produces > > [ > { "col": "\"str_value\"" } > ] > > The problem is, various datatypes such as boolean, number types, json, > and jsonb must not be quoted and escaped, since that would change them > to strings or double-encode them in the result. I agree that such data types should not be double quoted. > And then there are > domain types built on top of those types, and arrays, etc. See, for > example, the logic in json_categorize_type(). I think that trying to > duplicate that client-side is doomed to failure. Perhaps. But maybe "printTableContent" could be extended to contain a boolean array "quote_for_json" that is set in "printTableAddHeader" based on the underlying data type, similar to how "aligns" is set now. Detecting array types might be a challenge. Domains might not be a problem, since "PQftype()" seems to return the base data type for domain values. Yours, Laurenz Albe
Re: Dean Rasheed > I can see the appeal in this feature. However, as it stands, this > isn't compatible with copy format json, and I think it would need to > duplicate quite a lot of the JSON output code in client-side code to > make it compatible. I can see we probably wouldn't want two different output formats named json, but the general idea of "allow psql to format results as json of strings" makes a lot of sense, so we should try to make it work. Does it even have to be compatible? If the code required is the same, it could be moved to libpgcommon. > The problem is, various datatypes such as boolean, number types, json, > and jsonb must not be quoted and escaped, since that would change them > to strings or double-encode them in the result. And then there are > domain types built on top of those types, and arrays, etc. See, for > example, the logic in json_categorize_type(). I think that trying to > duplicate that client-side is doomed to failure. Can we try to make it work first, before we declare the perfect the enemy of the good? I'll note that the current code uses PG's string representation of strings which is meant to be round-trip safe when fed back into the server. So quoted numeric values aren't a problem at all. (And that part is fixable.) The real problem here is that COPY json violates that by pulling json values up one syntax level. "Normal" cases will be fixable by just looking for json(b) and printing that unquoted. And composite types with jsonb members... are these really only half-quoted?! Re: Laurenz Albe > > The problem is, various datatypes such as boolean, number types, json, > > and jsonb must not be quoted and escaped, since that would change them > > to strings or double-encode them in the result. > > I agree that such data types should not be double quoted. I left that out so far because it didn't make a practical difference, but that's fixable. > > And then there are > > domain types built on top of those types, and arrays, etc. See, for > > example, the logic in json_categorize_type(). I think that trying to > > duplicate that client-side is doomed to failure. > > Perhaps. But maybe "printTableContent" could be extended to contain > a boolean array "quote_for_json" that is set in "printTableAddHeader" > based on the underlying data type, similar to how "aligns" is set now. > Detecting array types might be a challenge. > > Domains might not be a problem, since "PQftype()" seems to return the > base data type for domain values. Thanks, I'll give that a try. Christoph
On Tue, 9 Jan 2024 at 09:43, Christoph Berg <myon@debian.org> wrote: > > I can see we probably wouldn't want two different output formats named > json, but the general idea of "allow psql to format results as json of > strings" makes a lot of sense, so we should try to make it work. Does > it even have to be compatible? > I would say that they should be compatible, in the sense that an external tool parsing the outputs should regard them as equal, but maybe there are different expectations for the two features. > I'll note that the current code uses PG's string representation of > strings which is meant to be round-trip safe when fed back into the > server. So quoted numeric values aren't a problem at all. (And that > part is fixable.) > I'm not sure that being round-trip safe is a necessary goal here, but again, it's about the expectations for the feature. I was imagining that the goal was to produce something that an external tool would parse, rather than something Postgres would read back in. So not quoting numeric values seems desirable to produce output that better reflects the semantic content of the data (though it doesn't affect it being round-trip safe). > The real problem here is that COPY json violates that by pulling json > values up one syntax level. "Normal" cases will be fixable by just > looking for json(b) and printing that unquoted. And composite types > with jsonb members... are these really only half-quoted?! > What to do with composites is an interesting point in question. COPY format json will turn a composite into a JSON object whose keys are the field names. That's useful if you want to use an external tool to parse the result and get at the individual fields, but it's not round-trip safe. OTOH, this patch outputs the Postgres string representation of the object, which might be round-trip safe, but is not very convenient for any other tool to read. Regards, Dean
Re: Dean Rasheed > > I'll note that the current code uses PG's string representation of > > strings which is meant to be round-trip safe when fed back into the > > server. So quoted numeric values aren't a problem at all. (And that > > part is fixable.) > > I'm not sure that being round-trip safe is a necessary goal here, but > again, it's about the expectations for the feature. I was imagining > that the goal was to produce something that an external tool would > parse, rather than something Postgres would read back in. So not > quoting numeric values seems desirable to produce output that better > reflects the semantic content of the data (though it doesn't affect it > being round-trip safe). Getting it print numeric/boolean without quotes was actually easy, as well as json(b). Implemented as the attached v2 patch. But: not quoting json means that NULL and 'null'::json will both be rendered as 'null'. That strikes me as a pretty undesirable conflict. Does the COPY patch also do that? > OTOH, this patch outputs the Postgres string representation of the > object, which might be round-trip safe, but is not very convenient > for any other tool to read. For my use case, I need something that can be fed back into PG. Reassembling all the json parts back into proper values would be a pretty hard problem. Perhaps there should be two output formats, one that's roundtrip-safe, and one that represents json structures and composite values nicely. Adding format-specific options could also be used to switch the output between "array of json objects" and "one json object per line". Christoph
Attachment
[cc'ing Joe] On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote: > > Getting it print numeric/boolean without quotes was actually easy, as > well as json(b). Implemented as the attached v2 patch. > > But: not quoting json means that NULL and 'null'::json will both be > rendered as 'null'. That strikes me as a pretty undesirable conflict. > Does the COPY patch also do that? > Yes. Perhaps what needs to happen is for a NULL column to be omitted entirely from the output. I think the COPY TO json patch would have to do that if COPY FROM json were to be added later, to make it round-trip safe. > > OTOH, this patch outputs the Postgres string representation of the > > object, which might be round-trip safe, but is not very convenient > > for any other tool to read. > > For my use case, I need something that can be fed back into PG. > Reassembling all the json parts back into proper values would be a > pretty hard problem. > What is your use case? It seems like what you want is quite different from the COPY patch. Regards, Dean
On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote: > On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote: > > > > Getting it print numeric/boolean without quotes was actually easy, as > > well as json(b). Implemented as the attached v2 patch. > > > > But: not quoting json means that NULL and 'null'::json will both be > > rendered as 'null'. That strikes me as a pretty undesirable conflict. > > Does the COPY patch also do that? > > Yes. Perhaps what needs to happen is for a NULL column to be omitted > entirely from the output. I think the COPY TO json patch would have to > do that if COPY FROM json were to be added later, to make it > round-trip safe. I think the behavior is fine as it is. I'd expect both NULL and JSON "null" to be rendered as "null". I think the main use case for a feature like this is people who need the result in JSON for further processing somewhere else. "Round-trip safety" is not so important. If you want to move data from PostgreSQL to PostgreSQL, you use the plain or the binary format. The CSV format by default renders NULL and empty strings identical, and I don't think anybody objects to that. Yours, Laurenz Albe
On 2024-01-16 Tu 11:07, Laurenz Albe wrote:
On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:Getting it print numeric/boolean without quotes was actually easy, as well as json(b). Implemented as the attached v2 patch. But: not quoting json means that NULL and 'null'::json will both be rendered as 'null'. That strikes me as a pretty undesirable conflict. Does the COPY patch also do that?Yes. Perhaps what needs to happen is for a NULL column to be omitted entirely from the output. I think the COPY TO json patch would have to do that if COPY FROM json were to be added later, to make it round-trip safe.I think the behavior is fine as it is. I'd expect both NULL and JSON "null" to be rendered as "null". I think the main use case for a feature like this is people who need the result in JSON for further processing somewhere else. "Round-trip safety" is not so important. If you want to move data from PostgreSQL to PostgreSQL, you use the plain or the binary format. The CSV format by default renders NULL and empty strings identical, and I don't think anybody objects to that.
This is absolutely not true. The docs say about CSV format:
A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes ("").
CSV format with default settings is and has been from the beginning designed to be round trippable.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > "Round-trip safety" is not so important. If you want to move data from > PostgreSQL to PostgreSQL, you use the plain or the binary format. > The CSV format by default renders NULL and empty strings identical, and > I don't think anybody objects to that. As Andrew says, the part about the CSV format is not correct, but I also don't think I agree with the larger point, either. I believe that round-trip safety is a really desirable property. Is it absolutely necessary in every case? Maybe not. But, it shouldn't be lacking without a good reason, either, at least IMHO. If you postulate that people are moving data from A to B, it is reasonable to think that eventually someone is going to want to move some data from B back to A. If that turns out to be hard, they'll be sad. We shouldn't make people sad without a good reason. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote: > On 2024-01-16 Tu 11:07, Laurenz Albe wrote: > > On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote: > > > On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote: > > > > Getting it print numeric/boolean without quotes was actually easy, as > > > > well as json(b). Implemented as the attached v2 patch. > > > > > > > > But: not quoting json means that NULL and 'null'::json will both be > > > > rendered as 'null'. That strikes me as a pretty undesirable conflict. > > > > Does the COPY patch also do that? > > > > > > Yes. Perhaps what needs to happen is for a NULL column to be omitted > > > entirely from the output. I think the COPY TO json patch would have to > > > do that if COPY FROM json were to be added later, to make it > > > round-trip safe. > > > > I think the behavior is fine as it is. I'd expect both NULL and JSON "null" > > to be rendered as "null". I think the main use case for a feature like this > > is people who need the result in JSON for further processing somewhere else. > > > > "Round-trip safety" is not so important. If you want to move data from > > PostgreSQL to PostgreSQL, you use the plain or the binary format. > > The CSV format by default renders NULL and empty strings identical, and > > I don't think anybody objects to that. > > This is absolutely not true. > > CSV format with default settings is and has been from the beginning designed > to be round trippable. Sorry for being unclear. I wasn't talking about COPY, but about the psql output format: CREATE TABLE xy (a integer, b text); INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, ''); \pset format csv Output format is csv. TABLE xy; a,b 1,one 2, 3, Yours, Laurenz Albe
On Tue, 2024-01-16 at 14:12 -0500, Robert Haas wrote: > On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > "Round-trip safety" is not so important. If you want to move data from > > PostgreSQL to PostgreSQL, you use the plain or the binary format. > > The CSV format by default renders NULL and empty strings identical, and > > I don't think anybody objects to that. > > As Andrew says, the part about the CSV format is not correct, but I > also don't think I agree with the larger point, either. I believe that > round-trip safety is a really desirable property. Is it absolutely > necessary in every case? Maybe not. But, it shouldn't be lacking > without a good reason, either, at least IMHO. If you postulate that > people are moving data from A to B, it is reasonable to think that > eventually someone is going to want to move some data from B back to > A. If that turns out to be hard, they'll be sad. We shouldn't make > people sad without a good reason. As mentioned in my other mail, I was talking about the psql output format "csv" rather than about COPY. I agree that it is desirable to lose as little information as possible. But if we want to format query output as JSON, we have a couple of requirements that cannot all be satisfied: 1. lose no information ("round-trip safe") 2. don't double quote numbers, booleans and other JSON values 3. don't skip any table column in the output Christoph's original patch didn't satisfy #2, and his current version doesn't satisfy #1. Do you think that skipping NULL columns would be the best solution? We don't do that in the to_json() function, which also renders SQL NULL as JSON null. I think the argument for round-trip safety of psql output is tenuous. There is no way for psql to ingest JSON as input format, and the patch to add JSON as COPY format only supports COPY TO. And unless you can name the exact way that the data written by psql will be loaded into PostgreSQL again, all that remains is an (understandable) unease about losing the distiction between SQL NULL and JSON null. We have jsonb_populate_record() to convert JSON back to a table row, but that function will convert both missing columns and a JSON null to SQL NULL: CREATE TABLE xy (id integer, j jsonb); \pset null '∅' SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1,"j":null}'); id │ j ════╪═══ 1 │ ∅ (1 row) SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1}'); id │ j ════╪═══ 1 │ ∅ (1 row) Indeed, there doesn't seem to be a way to generate JSON null with that function. So I wouldn't worry about round-trip safety too much, and my preference is how the current patch does it. I am not dead set against a solution that omits NULL columns in the output, though. Yours, Laurenz Albe
On 2024-01-17 We 03:52, Laurenz Albe wrote: > On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote: >> On 2024-01-16 Tu 11:07, Laurenz Albe wrote: >>> On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote: >>>> On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote: >>>>> Getting it print numeric/boolean without quotes was actually easy, as >>>>> well as json(b). Implemented as the attached v2 patch. >>>>> >>>>> But: not quoting json means that NULL and 'null'::json will both be >>>>> rendered as 'null'. That strikes me as a pretty undesirable conflict. >>>>> Does the COPY patch also do that? >>>> Yes. Perhaps what needs to happen is for a NULL column to be omitted >>>> entirely from the output. I think the COPY TO json patch would have to >>>> do that if COPY FROM json were to be added later, to make it >>>> round-trip safe. >>> I think the behavior is fine as it is. I'd expect both NULL and JSON "null" >>> to be rendered as "null". I think the main use case for a feature like this >>> is people who need the result in JSON for further processing somewhere else. >>> >>> "Round-trip safety" is not so important. If you want to move data from >>> PostgreSQL to PostgreSQL, you use the plain or the binary format. >>> The CSV format by default renders NULL and empty strings identical, and >>> I don't think anybody objects to that. >> This is absolutely not true. >> >> CSV format with default settings is and has been from the beginning designed >> to be round trippable. > Sorry for being unclear. I wasn't talking about COPY, but about the psql > output format: > > CREATE TABLE xy (a integer, b text); > > INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, ''); > > \pset format csv > Output format is csv. > > TABLE xy; > a,b > 1,one > 2, > 3, > I think the reason nobody's complained about it is quite possibly that very few people have used it. That's certainly the case with me - if I'd noticed it I would have complained. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Wed, Jan 17, 2024 at 4:30 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > As mentioned in my other mail, I was talking about the psql output > format "csv" rather than about COPY. Oh. Well, I think it's sad that the psql format csv has that property. Why doesn't it adopt COPY's handling? > I agree that it is desirable to lose as little information as possible. > But if we want to format query output as JSON, we have a couple of > requirements that cannot all be satisfied: > > 1. lose no information ("round-trip safe") > > 2. don't double quote numbers, booleans and other JSON values > > 3. don't skip any table column in the output > > Christoph's original patch didn't satisfy #2, and his current version > doesn't satisfy #1. Do you think that skipping NULL columns would be > the best solution? We don't do that in the to_json() function, which > also renders SQL NULL as JSON null. Let me start by clarifying that I'm OK with sacrificing round-trippability here as long as we do it thoughtfully. "Round-trippability is important but X is more important and we cannot have both for Y reasons" seems like a potentially fine argument to me; I'm only objecting to an argument of the form "round-trippability doesn't even matter." My previous comment was a bit of a drive-by remark on that specifically rather than a strong opinion about what exactly we ought to do here. I guess the specifically issue here is around a json(b) column that is null at the SQL level vs one that contains a JSON null. How do we distinguish those cases? I think entirely omitting null columns could be a way forward, but I don't know if that would cause other problems for users. I'm not quite sure that addresses all the issues, though. For instance, consider that 1.00::numeric and 1.0::numeric are equal but distinguishable. If those get rendered into the JSON unquoted as 1.00 and 1.0, respectively, is that going to round-trip properly? What about float8 values where extra_float_digits=3 is needed to properly round trip? If we take PostgreSQL's array data types and turn them into JSON arrays, what happens with non-default bounds? I know how we're going to turn '{1,2}'::int[] into a JSON array, or at least I assume I do, but what in the world are we going to do about '[-3:-2]={1,2}'? As much as I think round-trippability is good, getting it to 100% here is probably a good bit of work. And maybe that work isn't worth doing or involves too much collateral damage. But I do think it has positive value. If we produce output that could be ingested back into PG later with the right tool, that leaves the door open for someone to build the tool later even if we don't have it today. If we produce output that loses information, no tool built later can make up for the loss. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, 2024-01-17 at 14:52 -0500, Robert Haas wrote: > Let me start by clarifying that I'm OK with sacrificing > round-trippability here as long as we do it thoughtfully. Got you. > I'm not quite sure that addresses all the issues, though. For > instance, consider that 1.00::numeric and 1.0::numeric are equal but > distinguishable. If those get rendered into the JSON unquoted as 1.00 > and 1.0, respectively, is that going to round-trip properly? What > about float8 values where extra_float_digits=3 is needed to properly > round trip? If we take PostgreSQL's array data types and turn them > into JSON arrays, what happens with non-default bounds? I know how > we're going to turn '{1,2}'::int[] into a JSON array, or at least I > assume I do, but what in the world are we going to do about > '[-3:-2]={1,2}'? > > As much as I think round-trippability is good, getting it to 100% here > is probably a good bit of work. I would go as far as saying that the attempt to preserve all that is futile, if you are bound to JSON as format. > But I do think it has positive > value. If we produce output that could be ingested back into PG later > with the right tool, that leaves the door open for someone to build > the tool later even if we don't have it today. If we produce output > that loses information, no tool built later can make up for the loss. I am all for losing as little information as possible, but I think that this discussion is going off on a tangent. After all, we are not talking about a data export tool here, we are talking about psql. I don't see anybody complain that float8 values lose precision in the default aligned format, or that empty strings and NULL values look the same in aligned format. Why do the goalposts move for the JSON output format? I don't think psql output should be considered a form of backup. I'd say that we should strive to preserve whatever information we easily can, and we shouldn't worry about the rest. Can we get consensus that SQL NULL columns should be omitted from the output, and the rest left as it currently is? Yours, Laurenz Albe
Re: Laurenz Albe > > But I do think it has positive > > value. If we produce output that could be ingested back into PG later > > with the right tool, that leaves the door open for someone to build > > the tool later even if we don't have it today. If we produce output > > that loses information, no tool built later can make up for the loss. > I am all for losing as little information as possible, but I think > that this discussion is going off on a tangent. After all, we are not > talking about a data export tool here, we are talking about psql. I've just posted the other patch where I need the JSON format: https://www.postgresql.org/message-id/flat/Za6EfXeewwLRS_fs%40msg.df7cb.de There, I need to be able to read back the query output into psql, while at the same time being human-readable so the user can sanely edit the data in an editor. The default "aligned" format is only human-readable, while CSV is mostly only machine-readable. JSON is the best option between the two, I think. What I did now in v3 of this patch is to print boolean and numeric values (ints, floats, numeric) without quotes, while adding the quotes back to json. This solves the NULL vs 'null'::json problem. > I don't see anybody complain that float8 values lose precision in > the default aligned format, or that empty strings and NULL values > look the same in aligned format. Why do the goalposts move for the > JSON output format? I don't think psql output should be considered > a form of backup. Fwiw, not quoting numbers in JSON won't have any of these problems if the JSON reader just passes the strings read through. (Which PG's JSON parser does.) > Can we get consensus that SQL NULL columns should be omitted from the > output, and the rest left as it currently is? I think that would be an interesting option for a JSON export format. The psql JSON format is more for human inspection, where omitting the columns might create confusion. (We could make it a pset parameter of the format, but I think the default should be to show NULL columns.) Christoph
Attachment
On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote: > What I did now in v3 of this patch is to print boolean and numeric > values (ints, floats, numeric) without quotes, while adding the quotes > back to json. This solves the NULL vs 'null'::json problem. The patch is working as advertised. I am kind of unhappy about this change. It seems awkward and undesirable so have JSON values decorated with weird quoting in JSON output. I understand the motivation, but I bet it's not what will make users happy. If you need to disambiguate between SQL NULL and JSON null, my preferred solution would be to omit SQL NULL columns from the output altogether. Yours, Laurenz Albe
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe <laurenz.albe@cybertec.at>: > I understand the motivation, but I bet it's not what will make users > happy. > > If you need to disambiguate between SQL NULL and JSON null, my > preferred solution would be to omit SQL NULL columns from the output > altogether. I fully support Laurenz's proposal and argumentation. The main use case for such a JSON output feature is further processing somewhere else. --Stefan Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe <laurenz.albe@cybertec.at>: > > On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote: > > What I did now in v3 of this patch is to print boolean and numeric > > values (ints, floats, numeric) without quotes, while adding the quotes > > back to json. This solves the NULL vs 'null'::json problem. > > The patch is working as advertised. > > I am kind of unhappy about this change. It seems awkward and undesirable > so have JSON values decorated with weird quoting in JSON output. > I understand the motivation, but I bet it's not what will make users > happy. > > If you need to disambiguate between SQL NULL and JSON null, my > preferred solution would be to omit SQL NULL columns from the output > altogether. > > Yours, > Laurenz Albe > >
On Tue, Jan 23, 2024 at 7:35 AM Stefan Keller <sfkeller@gmail.com> wrote:
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.
I fully support Laurenz's proposal and argumentation. The main use
case for such a JSON output feature is further processing somewhere
else.
--Stefan
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
<laurenz.albe@cybertec.at>:
>
> On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> > What I did now in v3 of this patch is to print boolean and numeric
> > values (ints, floats, numeric) without quotes, while adding the quotes
> > back to json. This solves the NULL vs 'null'::json problem.
>
> The patch is working as advertised.
>
> I am kind of unhappy about this change. It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.
>
I agree on distinguishing SQL via omission but I do think, almost regardless, that the output should include a metadata section that lists all of the actual columns in the result, the column position, and since we have the info available, the data type name and possibly OID. Then any column name present in the metadata but that isn't a key name for a given object is known to have an SQL NULL as the value of that column in that row.
David J.
Re: Laurenz Albe > I am kind of unhappy about this change. It seems awkward and undesirable > so have JSON values decorated with weird quoting in JSON output. > I understand the motivation, but I bet it's not what will make users > happy. Well, why stop at JSON, and not represent any array type as a JSON array? Compound types could be transformed into JSON objects. Custom data types could add hooks for their own custom JSON representation. I'd just stop the flood right before it starts... The real reason I'd not want to go that route is because I need the format to be round-trip safe for the "\gedit" patch, see the other thread. We would have to transform JSON sub-parts back into PG's text format. But there were already complaints that other patch is complex. At the moment it's just strcmp-ing the text value before and after, which is straighforward. > If you need to disambiguate between SQL NULL and JSON null, my > preferred solution would be to omit SQL NULL columns from the output > altogether. That works, but only by convention only. Re: David G. Johnston > I agree on distinguishing SQL via omission but I do think, almost > regardless, that the output should include a metadata section that lists > all of the actual columns in the result, the column position, and since we > have the info available, the data type name and possibly OID. Then any > column name present in the metadata but that isn't a key name for a given > object is known to have an SQL NULL as the value of that column in that row. There are no comments in JSON. Adding the info in-band would break the simple use case of using the data as-is for further processing. Christoph
On Tue, 2024-01-23 at 08:01 -0700, David G. Johnston wrote: > I do think that the output should include a metadata section that lists all of > the actual columns in the result, the column position, and since we have the > info available, the data type name and possibly OID. Then any column name > present in the metadata but that isn't a key name for a given object is known > to have an SQL NULL as the value of that column in that row. Sounds attractive, but I'm a bit worried that that additional information might make life harder for some consumers. My crystal ball is as cloudy as anybody's when it comes to guessing the most likely use cases for the feature, but I'd rather keep it simple and add features like that later, if there is a demand. If you import the data into an existing structure, you don't need the metadata. Yours, Laurenz Albe
Re: Laurenz Albe > If you import the data into an existing structure, you don't need the metadata. Also, since you were running the query yourself, you should know what columns you were expecting. Christoph
On Tue, 2024-01-23 at 16:36 +0100, Christoph Berg wrote: > Re: Laurenz Albe > > I am kind of unhappy about this change. It seems awkward and undesirable > > so have JSON values decorated with weird quoting in JSON output. > > I understand the motivation, but I bet it's not what will make users > > happy. > > Well, why stop at JSON, and not represent any array type as a JSON > array? Compound types could be transformed into JSON objects. Custom > data types could add hooks for their own custom JSON representation. > > I'd just stop the flood right before it starts... I'd stop the flood right after json/jsonb. Arrays as database columns are probably too rare to be a real issue. > The real reason I'd not want to go that route is because I need the > format to be round-trip safe for the "\gedit" patch, see the other > thread. We would have to transform JSON sub-parts back into PG's text > format. But there were already complaints that other patch is complex. > At the moment it's just strcmp-ing the text value before and after, > which is straighforward. I don't particularly care about \gedit, any I think \gedit shouldn't be the raison d'être for this patch. I cannot imagine that anybody who wants to move data from PostgreSQL to PostgreSQL will use \gedit, load the output from psql into the editor and save. After all, there is COPY and pg_dump. I'm pretty certain that people are more likely to use psql's JSON output format to move data somewhere else. Yours, Laurenz Albe
Re: Laurenz Albe > > I'd just stop the flood right before it starts... > > I'd stop the flood right after json/jsonb. Nod. I do see a point here, given it's "json in json", and not "something else in json". Will try to make it work with \gedit. > Arrays as database columns are probably too rare to be a real issue. Ack. > I'm pretty certain that people are more likely to use psql's JSON > output format to move data somewhere else. Well, there's also the other patch to add JSON support to COPY, that would be even more suitable. Christoph
On Tue, Jan 23, 2024 at 11:35 AM Christoph Berg <myon@debian.org> wrote: > Ack. The last version of this patch was posted on January 22nd and got a bunch of replies, so I'm marking https://commitfest.postgresql.org/48/4707/ as Returned with Feedback for now. Please feel free to update the status of the patch when the situation changes. IMHO, the big problem here is that different people want different corner-case behaviors and it's not clear what to do about that. I don't think there's a single vote for "don't do this at all". So if there is a desire to take this work forward, the goal probably ought to be to try to either (a) figure out one behavior that everyone can live with or (b) figure out a short list of options that can be used to customize the behavior to a degree that lets everyone get something reasonably close to what they want. For instance, "what to do if you find a SQL null" and "whether to include json values as strings or json objects" seem like they could potentially be customizable. That's probably not a silver bullet because (1) that's more work and (2) there might be more behaviors than we want to code, or maintain the code for, and (3) if it gets too complicated that can itself become a source of objections. But it's an idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Robert Haas > IMHO, the big problem here is that different people want different > corner-case behaviors and it's not clear what to do about that. I > don't think there's a single vote for "don't do this at all". So if > there is a desire to take this work forward, the goal probably ought > to be to try to either (a) figure out one behavior that everyone can > live with or (b) figure out a short list of options that can be used > to customize the behavior to a degree that lets everyone get something > reasonably close to what they want. For instance, "what to do if you > find a SQL null" and "whether to include json values as strings or > json objects" seem like they could potentially be customizable. That's > probably not a silver bullet because (1) that's more work and (2) > there might be more behaviors than we want to code, or maintain the > code for, and (3) if it gets too complicated that can itself become a > source of objections. But it's an idea. Thanks for summarizing the thread. Things mentioned in the thread: 1) rendering of SQL NULLs - include or omit the column 2) rendering of JSON values - both "quoted string" and "inline as JSON" make sense 3) not quoting numeric values and booleans 4) no special treatment of other datatypes like arrays or compound values, just quote them 5) row format: JSON object or array (array would be close to CSV format) 6) overall format: array of rows, or simply print each row separately ("JSON Lines" format, https://jsonlines.org/) I think 1, 2 and perhaps 6 make sense to have configurable. Two or three \pset options (or one option with a list of flags) don't sound too bad complexity-wise. Or maybe just default to "omit NULL columns" and "inline JSON" (and render null as NULL). Thoughts? Christoph
On Fri, May 17, 2024 at 9:42 AM Christoph Berg <myon@debian.org> wrote: > Thanks for summarizing the thread. > > Things mentioned in the thread: > > 1) rendering of SQL NULLs - include or omit the column > > 2) rendering of JSON values - both "quoted string" and "inline as > JSON" make sense > > 3) not quoting numeric values and booleans > > 4) no special treatment of other datatypes like arrays or compound > values, just quote them > > 5) row format: JSON object or array (array would be close to CSV > format) > > 6) overall format: array of rows, or simply print each row separately > ("JSON Lines" format, https://jsonlines.org/) > > I think 1, 2 and perhaps 6 make sense to have configurable. Two or > three \pset options (or one option with a list of flags) don't sound > too bad complexity-wise. > > Or maybe just default to "omit NULL columns" and "inline JSON" (and > render null as NULL). If we're going to just have one option, I agree with making that the default, and I'd default to an array of row objects. If we're going to have something configurable, I'd at least consider making (4) configurable. It's tempting to just have one option, like \pset jsonformat nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array simply because adding a ton of new options just for this isn't very appealing. But looking at how long that is, it's probably not a great idea. So I guess separate options is probably better? -- Robert Haas EDB: http://www.enterprisedb.com
pá 17. 5. 2024 v 16:04 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
On Fri, May 17, 2024 at 9:42 AM Christoph Berg <myon@debian.org> wrote:
> Thanks for summarizing the thread.
>
> Things mentioned in the thread:
>
> 1) rendering of SQL NULLs - include or omit the column
>
> 2) rendering of JSON values - both "quoted string" and "inline as
> JSON" make sense
>
> 3) not quoting numeric values and booleans
>
> 4) no special treatment of other datatypes like arrays or compound
> values, just quote them
>
> 5) row format: JSON object or array (array would be close to CSV
> format)
>
> 6) overall format: array of rows, or simply print each row separately
> ("JSON Lines" format, https://jsonlines.org/)
>
> I think 1, 2 and perhaps 6 make sense to have configurable. Two or
> three \pset options (or one option with a list of flags) don't sound
> too bad complexity-wise.
>
> Or maybe just default to "omit NULL columns" and "inline JSON" (and
> render null as NULL).
If we're going to just have one option, I agree with making that the
default, and I'd default to an array of row objects. If we're going to
have something configurable, I'd at least consider making (4)
configurable.
It's tempting to just have one option, like \pset jsonformat
nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array
simply because adding a ton of new options just for this isn't very
appealing. But looking at how long that is, it's probably not a great
idea. So I guess separate options is probably better?
+1 for separate options
lot of these proposed options can be used for XML too
Regards
Pavel
--
Robert Haas
EDB: http://www.enterprisedb.com