Thread: Patch: Improve Boolean Predicate JSON Path Docs
Hackers, Following up from a suggestion from Tom Lane[1] to improve the documentation of boolean predicate JSON path expressions,please find enclosed a draft patch to do so. It does three things: 1. Converts all of the example path queries to use jsonb_path_query() and show the results, to make it clearer what the behaviorsare. 2. Replaces the list of deviations from the standards with a new subsection, with each deviation in its own sub-subsection.The regex section is unchanged, but I’ve greatly expanded the boolean expression JSON path section with examplescomparing standard filter expressions and nonstandard boolean predicates. I’ve also added an exhortation not useboolean expressions with @? or standard path expressions with @@. 3. While converting the modes section to use jsonb_path_query() and show the results, I also added an example of strict modereturning an error. Follow-ups I’d like to make: 1. Expand the modes section to show how the types of results can vary depending on the mode, thanks to the flattening. Examples: david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); jsonb_path_query ------------------ 3 4 5 (3 rows) david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); jsonb_path_query ------------------ [1, 2, 3, 4, 5] 2. Improve the descriptions and examples for @?/jsonb_path_exists() and @@/jsonb_path_match(). Best, David [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us
Attachment
On Oct 14, 2023, at 16:40, David E. Wheeler <david@justatheory.com> wrote: > Following up from a suggestion from Tom Lane[1] to improve the documentation of boolean predicate JSON path expressions,please find enclosed a draft patch to do so. And now I see I can’t spell “Deviations”. Will fix along with any other requested revisions. GitHub diff here if you’re intothat sort of thing: https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs Best, David
On 2023-10-14 22:40 +0200, David E. Wheeler write: > Following up from a suggestion from Tom Lane[1] to improve the > documentation of boolean predicate JSON path expressions, please find > enclosed a draft patch to do so. Thanks for putting this together. See my review at the end. > It does three things: > > 1. Converts all of the example path queries to use jsonb_path_query() > and show the results, to make it clearer what the behaviors are. Nice. This really does help to make some sense of it. I checked all queries and they do work out except for two queries where the path expression string is not properly quoted (but the intended output is still correct). > 2. Replaces the list of deviations from the standards with a new > subsection, with each deviation in its own sub-subsection. The regex > section is unchanged, but I’ve greatly expanded the boolean expression > JSON path section with examples comparing standard filter expressions > and nonstandard boolean predicates. I’ve also added an exhortation not > use boolean expressions with @? or standard path expressions with @@. LGTM. > 3. While converting the modes section to use jsonb_path_query() and > show the results, I also added an example of strict mode returning an > error. > > Follow-ups I’d like to make: > > 1. Expand the modes section to show how the types of results can vary > depending on the mode, thanks to the flattening. Examples: > > david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); > jsonb_path_query > ------------------ > 3 > 4 > 5 > (3 rows) > > david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); > jsonb_path_query > ------------------ > [1, 2, 3, 4, 5] > > 2. Improve the descriptions and examples for @?/jsonb_path_exists() > and @@/jsonb_path_match(). +1 > [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us My review: > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index affd1254bb..295f8ca5c9 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -17205,7 +17205,7 @@ array w/o UK? | t > For example, suppose you have some JSON data from a GPS tracker that you > would like to parse, such as: > <programlisting> > -{ > + \set json '{ Perhaps make it explicit that the reader must run this in psql in order to use \set and :'json' in the ensuing samples? Some of the existing examples already use psql output but they do not rely on any psql features. > "track": { > "segments": [ > { > @@ -17220,7 +17220,7 @@ array w/o UK? | t > } > ] > } > -} > +}' > </programlisting> > </para> > > @@ -17229,7 +17229,10 @@ array w/o UK? | t > <literal>.<replaceable>key</replaceable></literal> accessor > operator to descend through surrounding JSON objects: > <programlisting> > -$.track.segments > +select jsonb_path_query(:'json'::jsonb, '$.track.segments'); > + jsonb_path_query > +------------------------------------------------------------------------------------------------------------------------------------------------------------------- > + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] > </programlisting> This should use <screen>, <userinput>, and <computeroutput> if it shows a psql session, e.g.: <screen> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> <computeroutput> jsonb_path_query ------------------------------------------------------------------------------------------------------------------------------------------------------------------- [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] </computeroutput> </screen> Also the cast to jsonb is not necessary and only adds clutter IMO. > </para> > > @@ -17239,7 +17242,11 @@ $.track.segments > the following path will return the location coordinates for all > the available track segments: > <programlisting> > -$.track.segments[*].location > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > @@ -17248,7 +17255,10 @@ $.track.segments[*].location > specify the corresponding subscript in the <literal>[]</literal> > accessor operator. Recall that JSON array indexes are 0-relative: > <programlisting> > -$.track.segments[0].location > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > </programlisting> > </para> > > @@ -17259,7 +17269,10 @@ $.track.segments[0].location > Each method name must be preceded by a dot. For example, > you can get the size of an array: > <programlisting> > -$.track.segments.size() > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()'); > + jsonb_path_query > +------------------ > + 2 > </programlisting> > More examples of using <type>jsonpath</type> operators > and methods within path expressions appear below in > @@ -17302,7 +17315,10 @@ $.track.segments.size() > For example, suppose you would like to retrieve all heart rate values higher > than 130. You can achieve this using the following expression: > <programlisting> > -$.track.segments[*].HR ? (@ > 130) > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ > 130)'); > + jsonb_path_query > +------------------ > + 135 > </programlisting> > </para> > > @@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ > 130) > filter expression is applied to the previous step, and the path used > in the condition is different: > <programlisting> > -$.track.segments[*] ? (@.HR > 130)."start time" > + select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR > 130)."start time"'); > + jsonb_path_query > +----------------------- > + "2018-10-14 10:39:21" > </programlisting> > </para> > > @@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR > 130)."start time" > example, the following expression selects start times of all segments that > contain locations with relevant coordinates and high heart rate values: > <programlisting> > -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"'); > + jsonb_path_query > +----------------------- > + "2018-10-14 10:39:21" > </programlisting> > </para> > > @@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" > The following example first filters all segments by location, and then > returns high heart rate values for these segments, if available: > <programlisting> > -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130) > +select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)'); The opening quote is missing from the jsonpath literal. > + jsonb_path_query > +------------------ > + 135 > </programlisting> > </para> > > <para> > You can also nest filter expressions within each other: > <programlisting> > -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size() > +select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'); Missing opening quote here as well. > + jsonb_path_query > +------------------ > + 2 > </programlisting> > This expression returns the size of the track if it contains any > segments with high heart rate values, or an empty sequence otherwise. > </para> > > - <para> > - <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path > - language has the following deviations from the SQL/JSON standard: > - </para> > + <sect3 id="devations-from-the-standard"> > + <title>Devaiations from the SQL Standard</title> Typo in "deviations" (section ID and title). > + <para> > + <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path > + language has the following deviations from the SQL/JSON standard: The sentence should and in a period when this para is no longer followed by an item list. > + </para> > > - <itemizedlist> > - <listitem> > + <sect4 id="boolean-predicate-path-expressions"> > + <title>Boolean Predicate Path Expressions</title> > <para> > - A path expression can be a Boolean predicate, although the SQL/JSON > - standard allows predicates only in filters. This is necessary for > - implementation of the <literal>@@</literal> operator. For example, > - the following <type>jsonpath</type> expression is valid in > - <productname>PostgreSQL</productname>: > + As an extension to the SQL standard, a <productname>PostgreSQL</productname> > + path expression can be a Boolean predicate, whereas the SQL standard allows > + predicates only in filters. Where SQL standard path expressions return the > + relevant contents of the queried JSON value, predicate path expressions > + return the three-value three-valued result of the predicate: Redundant "three-value" before "three-valued result". > + <literal>true</literal>, <literal>false</literal>, or > + <literal>unknown</literal>. Compare this filter <type>jsonpath</type> > + exression: > <programlisting> > -$.track.segments[*].HR < 70 > +select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR > 130)'); > + jsonb_path_query > +--------------------------------------------------------------------------------- > + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} > </programlisting> > - </para> > - </listitem> > + To a predicate expression, which returns <literal>true</literal> > +<programlisting> > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR > 130'); > + jsonb_path_query > +------------------ > + true > +</programlisting> > + </para> > > - <listitem> > - <para> > - There are minor differences in the interpretation of regular > - expression patterns used in <literal>like_regex</literal> filters, as > - described in <xref linkend="jsonpath-regular-expressions"/>. > - </para> > - </listitem> > - </itemizedlist> > + <para> > + Predicate-only path expressions are necessary for implementation of the > + <literal>@@</literal> operator (and the > + <function>jsonb_path_match</function> function), and should not be used > + with the <literal>@?</literal> operator (or > + <function>jsonb_path_exists</function> function). > + </para> > + > + <para> > + Conversely, non-predicate <type>jsonpath</type> expressions should not be > + used with the <literal>@@</literal> operator (or the > + <function>jsonb_path_match</function> function). > + </para> > + </sect4> Both paras should be wrapped in a single <note> so that they stand out from the rest of the text. Maybe even <warning>, but <note> is already used on this page for things that I'd consider warnings. > + <sect4 id="jsonpath-regular-expression-deviation"> > + <title>Regular Expression Interpretation</title> > + <para> > + There are minor differences in the interpretation of regular > + expression patterns used in <literal>like_regex</literal> filters, as > + described in <xref linkend="jsonpath-regular-expressions"/>. > + </para> > + </sect4> <sect3 id="devations-from-the-standard"> should be closed here, otherwise the docs won't build. This can be checked with `make -C doc/src/sgml check`. > > <sect3 id="strict-and-lax-modes"> > <title>Strict and Lax Modes</title> > @@ -17431,18 +17488,30 @@ $.track.segments[*].HR < 70 > abstract from the fact that it stores an array of segments > when using the lax mode: > <programlisting> > -lax $.track.segments.location > + select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location'); > + jsonb_path_query `git diff --check` shows a couple of lines with trailing whitespace (mostly psql output). > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > <para> > - In the strict mode, the specified path must exactly match the structure of > + In strict mode, the specified path must exactly match the structure of > the queried JSON document to return an SQL/JSON item, so using this > - path expression will cause an error. To get the same result as in > - the lax mode, you have to explicitly unwrap the > + path expression will cause an error: > +<programlisting> > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location'); > +ERROR: jsonpath member accessor can only be applied to an object > +</programlisting> > + To get the same result as in the lax mode, you have to explicitly unwrap the > <literal>segments</literal> array: > <programlisting> > -strict $.track.segments[*].location > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > @@ -17451,7 +17520,13 @@ strict $.track.segments[*].location > when using the lax mode. For instance, the following query selects every > <literal>HR</literal> value twice: > <programlisting> > -lax $.**.HR > +select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR'); > + jsonb_path_query > +------------------ > + 73 > + 135 > + 73 > + 135 > </programlisting> > This happens because the <literal>.**</literal> accessor selects both > the <literal>segments</literal> array and each of its elements, while > @@ -17460,7 +17535,11 @@ lax $.**.HR > the <literal>.**</literal> accessor only in the strict mode. The > following query selects each <literal>HR</literal> value just once: > <programlisting> > -strict $.**.HR > +select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR'); > + jsonb_path_query > +------------------ > + 73 > + 135 > </programlisting> > </para> > -- Erik
On Oct 14, 2023, at 19:51, Erik Wienhold <ewie@ewie.name> wrote: > Thanks for putting this together. See my review at the end. Appreciate the speedy review! > Nice. This really does help to make some sense of it. I checked all > queries and they do work out except for two queries where the path > expression string is not properly quoted (but the intended output is > still correct). 🤦🏻♂️ >> Follow-ups I’d like to make: >> >> 1. Expand the modes section to show how the types of results can vary >> depending on the mode, thanks to the flattening. Examples: >> >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); >> jsonb_path_query >> ------------------ >> 3 >> 4 >> 5 >> (3 rows) >> >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); >> jsonb_path_query >> ------------------ >> [1, 2, 3, 4, 5] >> >> 2. Improve the descriptions and examples for @?/jsonb_path_exists() >> and @@/jsonb_path_match(). > > +1 I planned to submit these changes in a separate patch, based on Tom Lane’s suggestion[1]. Would it be preferred to add themto this patch? > Perhaps make it explicit that the reader must run this in psql in order > to use \set and :'json' in the ensuing samples? Some of the existing > examples already use psql output but they do not rely on any psql > features. Good call, done. > This should use <screen>, <userinput>, and <computeroutput> if it shows > a psql session, e.g.: > > <screen> > <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> > <computeroutput> > jsonb_path_query > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] > </computeroutput> > </screen> I pokwds around, and it appears the computeroutput bit is used for function output. So I followed the precedent in queries.sgml[2]and omitted the computeroutput tags but added prompt, e.g., <screen> <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput> jsonb_path_query ------------------ 73 135 </screen> > Also the cast to jsonb is not necessary and only adds clutter IMO. Right, removed them all in function calls. >> + <para> >> + Predicate-only path expressions are necessary for implementation of the >> + <literal>@@</literal> operator (and the >> + <function>jsonb_path_match</function> function), and should not be used >> + with the <literal>@?</literal> operator (or >> + <function>jsonb_path_exists</function> function). >> + </para> >> + >> + <para> >> + Conversely, non-predicate <type>jsonpath</type> expressions should not be >> + used with the <literal>@@</literal> operator (or the >> + <function>jsonb_path_match</function> function). >> + </para> >> + </sect4> > > Both paras should be wrapped in a single <note> so that they stand out > from the rest of the text. Maybe even <warning>, but <note> is already > used on this page for things that I'd consider warnings. Agreed. Would be good if we could teach these functions and operators to reject path expressions they don’t support. >> + <sect4 id="jsonpath-regular-expression-deviation"> >> + <title>Regular Expression Interpretation</title> >> + <para> >> + There are minor differences in the interpretation of regular >> + expression patterns used in <literal>like_regex</literal> filters, as >> + described in <xref linkend="jsonpath-regular-expressions"/>. >> + </para> >> + </sect4> > > <sect3 id="devations-from-the-standard"> should be closed here, > otherwise the docs won't build. This can be checked with > `make -C doc/src/sgml check`. Thanks. That produces a bunch of warnings for postgres.sgml and legal.sgml (and a failure to load the docbook DTD), but func.sgmlis clean now. > `git diff --check` shows a couple of lines with trailing whitespace > (mostly psql output). I must’ve cleaned those after I sent the patch, good now. Updated patch attached, this time created by `git format-patch-v2`. Best, David [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us [2] https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN
Attachment
On 2023-10-16 01:04 +0200, David E. Wheeler write: > On Oct 14, 2023, at 19:51, Erik Wienhold <ewie@ewie.name> wrote: > > > Thanks for putting this together. See my review at the end. > > Appreciate the speedy review! You're welcome. > >> Follow-ups I’d like to make: > >> > >> 1. Expand the modes section to show how the types of results can vary > >> depending on the mode, thanks to the flattening. Examples: > >> > >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); > >> jsonb_path_query > >> ------------------ > >> 3 > >> 4 > >> 5 > >> (3 rows) > >> > >> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); > >> jsonb_path_query > >> ------------------ > >> [1, 2, 3, 4, 5] > >> > >> 2. Improve the descriptions and examples for @?/jsonb_path_exists() > >> and @@/jsonb_path_match(). > > > > +1 > > I planned to submit these changes in a separate patch, based on Tom > Lane’s suggestion[1]. Would it be preferred to add them to this patch? Your call but I'm not against including it in this patch because it already touches the modes section. > I pokwds around, and it appears the computeroutput bit is used for > function output. So I followed the precedent in queries.sgml[2] and > omitted the computeroutput tags but added prompt, e.g., > <screen> > <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput> > jsonb_path_query > ------------------ > 73 > 135 > </screen> Okay, Not sure what the preferred style is but I saw <userinput> and <computeroutput> used together in doc/src/sgml/ref/createuser.sgml. But it's not applied consistently in the rest of the docs. > >> + <para> > >> + Predicate-only path expressions are necessary for implementation of the > >> + <literal>@@</literal> operator (and the > >> + <function>jsonb_path_match</function> function), and should not be used > >> + with the <literal>@?</literal> operator (or > >> + <function>jsonb_path_exists</function> function). > >> + </para> > >> + > >> + <para> > >> + Conversely, non-predicate <type>jsonpath</type> expressions should not be > >> + used with the <literal>@@</literal> operator (or the > >> + <function>jsonb_path_match</function> function). > >> + </para> > >> + </sect4> > > > > Both paras should be wrapped in a single <note> so that they stand out > > from the rest of the text. Maybe even <warning>, but <note> is already > > used on this page for things that I'd consider warnings. > > Agreed. Would be good if we could teach these functions and operators > to reject path expressions they don’t support. Right, you mentioned that idea in [1] (separate types). Not sure what the best strategy here is but it's likely to break existing queries. Maybe deprecating unsupported path expressions in the next major release and changing that to an error in the major release after that. > > This can be checked with `make -C doc/src/sgml check`. > > Thanks. That produces a bunch of warnings for postgres.sgml and > legal.sgml (and a failure to load the docbook DTD), but func.sgml is > clean now. Hmm... I get no warnings on 1f89b73c4e. Did you install all tools as described in [2]? The DTD needs to be installed as well. [1] https://www.postgresql.org/message-id/BAF11F2D-5EDD-4DBB-87FA-4F35845029AE%40justatheory.com [2] https://www.postgresql.org/docs/current/docguide-toolsets.html -- Erik
On Oct 15, 2023, at 23:03, Erik Wienhold <ewie@ewie.name> wrote: > Your call but I'm not against including it in this patch because it > already touches the modes section. Okay, added, let’s just put all our cards on the table. :-) >> Agreed. Would be good if we could teach these functions and operators >> to reject path expressions they don’t support. > > Right, you mentioned that idea in [1] (separate types). Not sure what > the best strategy here is but it's likely to break existing queries. > Maybe deprecating unsupported path expressions in the next major release > and changing that to an error in the major release after that. Well if the functions have a JsonPathItem struct, they can check its type attribute and reject those with a root type that’sa predicate in @? and reject it if it’s not a predicate in @@. Example of checking type here: https://github.com/postgres/postgres/blob/54b208f90963cb8b48b9794a5392b2fae4b40a98/src/backend/utils/adt/jsonpath_exec.c#L622 >>> This can be checked with `make -C doc/src/sgml check`. >> >> Thanks. That produces a bunch of warnings for postgres.sgml and >> legal.sgml (and a failure to load the docbook DTD), but func.sgml is >> clean now. > > Hmm... I get no warnings on 1f89b73c4e. Did you install all tools as > described in [2]? The DTD needs to be installed as well. Thanks, got it down to one: postgres.sgml:112: element sect4: validity error : Element sect4 content does not follow the DTD, expecting (sect4info? ,(title , subtitle? , titleabbrev?) , (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist | bibliolist| itemizedlist | orderedlist | segmentedlist | simplelist | variablelist | caution | important | note | tip | warning| literallayout | programlisting | programlistingco | screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis| classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para| simpara | address | blockquote | graphic | graphicco | mediaobject | mediaobjectco | informalequation | informalexample| informalfigure | informaltable | equation | example | figure | table | msgset | procedure | sidebar | qandaset| task | anchor | bridgehead | remark | highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+, (refentry* | sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index | glossary | bibliography)*),got (para para ) &func; David
Attachment
On 2023-10-16 21:59 +0200, David E. Wheeler write: > On Oct 15, 2023, at 23:03, Erik Wienhold <ewie@ewie.name> wrote: > > > Your call but I'm not against including it in this patch because it > > already touches the modes section. > > Okay, added, let’s just put all our cards on the table. :-) I'll have a look but the attached v3 is not a patch but some applefile. > Thanks, got it down to one: > > postgres.sgml:112: element sect4: validity error : Element sect4 content does not follow the DTD, expecting (sect4info?, (title , subtitle? , titleabbrev?) , (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist| bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | variablelist | caution | important | note| tip | warning | literallayout | programlisting | programlistingco | screen | screenco | screenshot | synopsis | cmdsynopsis| funcsynopsis | classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara| para | simpara | address | blockquote | graphic | graphicco | mediaobject | mediaobjectco | informalequation| informalexample | informalfigure | informaltable | equation | example | figure | table | msgset | procedure| sidebar | qandaset | task | anchor | bridgehead | remark | highlights | abstract | authorblurb | epigraph | indexterm| beginpage)+ , (refentry* | sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index | glossary| bibliography)*), got (para para ) > &func; One of the added <sect4> is invalid by the looks of it. Maybe <title> is missing because it says "got (para para )" at the end. -- Erik
On Oct 16, 2023, at 18:07, Erik Wienhold <ewie@ewie.name> wrote: >> Okay, added, let’s just put all our cards on the table. :-) > > I'll have a look but the attached v3 is not a patch but some applefile. Weird, should be no different from previous attachments. I believe Apple Mail always uses application/octet-stream for attachmentsit doesn’t recognize, which includes .patch and .diff files, sadly. > One of the added <sect4> is invalid by the looks of it. Maybe <title> > is missing because it says "got (para para )" at the end. Oh, I thought it would report issues from the files they were found in. You’re right, I forgot a title. Fixed in v4. David
Attachment
On Tue, Oct 17, 2023 at 10:56 AM David E. Wheeler <david@justatheory.com> wrote: > > > Oh, I thought it would report issues from the files they were found in. You’re right, I forgot a title. Fixed in v4. > > David > + Returns the result of a JSON path + <link linkend="boolean-predicate-path-expressions">predicate + check</link> for the specified JSON value. If the result is not Boolean, + then <literal>NULL</literal> is returned. Do not use with non-predicate + JSON path expressions. "Do not use with non-predicate", double negative is not easy to comprehend. Maybe we can simplify it. 16933: value. Use only SQL-standard JSON path expressions, not not there are two "not". 15842: SQL-standard JSON path expressions, not not there are two "not".
On Oct 19, 2023, at 01:22, jian he <jian.universality@gmail.com> wrote: > "Do not use with non-predicate", double negative is not easy to > comprehend. Maybe we can simplify it. > > 16933: value. Use only SQL-standard JSON path expressions, not not > there are two "not". > > 15842: SQL-standard JSON path expressions, not not > there are two "not”. Thank you, jian. Updated patch attached and also on GitHub. https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs Best, David
Attachment
On 2023-10-19 15:39 +0200, David E. Wheeler wrote: > On Oct 19, 2023, at 01:22, jian he <jian.universality@gmail.com> wrote: > > Updated patch attached and also on GitHub. > > https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs Just wanted to take a look at v5. But it's an applefile again :P -- Erik
On Oct 19, 2023, at 10:49 PM, Erik Wienhold <ewie@mailbox.org> wrote: > Just wanted to take a look at v5. But it's an applefile again :P I don’t get it. It was the other times too! Are you able to save it with a .patch suffix? D
On 2023-10-20 05:20 +0200, David E. Wheeler wrote: > On Oct 19, 2023, at 10:49 PM, Erik Wienhold <ewie@mailbox.org> wrote: > > > Just wanted to take a look at v5. But it's an applefile again :P > > I don’t get it. It was the other times too! Are you able to save it > with a .patch suffix? Saving it is not the problem, but the actual file contents: $ xxd v5-0001-Improve-boolean-predicate-JSON-Path-docs.patch 00000000: 0005 1600 0002 0000 0000 0000 0000 0000 ................ 00000010: 0000 0000 0000 0000 0002 0000 0009 0000 ................ 00000020: 0032 0000 000a 0000 0003 0000 003c 0000 .2...........<.. 00000030: 0036 0000 0000 0000 0000 0000 7635 2d30 .6..........v5-0 00000040: 3030 312d 496d 7072 6f76 652d 626f 6f6c 001-Improve-bool 00000050: 6561 6e2d 7072 6564 6963 6174 652d 4a53 ean-predicate-JS 00000060: 4f4e 2d50 6174 682d 646f 6373 2e70 6174 ON-Path-docs.pat 00000070: 6368 ch I don't even know what that represents, probably not some fancy file compression. -- Erik
On Oct 19, 2023, at 23:49, Erik Wienhold <ewie@ewie.name> wrote:
I don't even know what that represents, probably not some fancy filecompression.
Oh, weird. Trying from a webmail client instead.
Best,
David
Attachment
On 2023-10-20 15:49 +0200, David Wheeler wrote: > On Oct 19, 2023, at 23:49, Erik Wienhold <ewie@ewie.name> wrote: > > > I don't even know what that represents, probably not some fancy file > > compression. That's an AppleSingle file according to [1][2]. It only contains the resource fork and file name but no data fork. > Oh, weird. Trying from a webmail client instead. Thanks. > + Does JSON path return any item for the specified JSON value? Use only > + SQL-standard JSON path expressions, not > + <link linkend="boolean-predicate-path-expressions">predicate check > + expressions.</link> Any reason for calling it "predicate check expressions" (e.g. the link text) and sometimes "predicate path expressions" (e.g. the linked section title)? I think it should be named consistently to avoid confusion and also to simplify searching. > + Returns the result of a JSON path > + <link linkend="boolean-predicate-path-expressions">predicate > + check</link> for the specified JSON value. If the result is not Boolean, > + then <literal>NULL</literal> is returned. Use only with > + <link linkend="boolean-predicate-path-expressions">predicate check > + expressions.</link> Linking the same section twice in the same paragraph seems excessive. > +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> > +select jsonb_path_query(:'json', '$.track.segments'); Please remove the second SELECT. > +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[0].location');</userinput> > + jsonb_path_query > +------------------- > + [47.763, 13.4034] Strict mode is unnecessary to get that result and I'd omit it because the different modes are not introduced yet at this point. > +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.size()');</userinput> > + jsonb_path_query > +------------------ > + 2 Strict mode is unnecessary here as well. > + using the lax mode. To avoid surprising results, we recommend using > + the <literal>.**</literal> accessor only in the strict mode. The Please change to "in strict mode" (without "the"). [1] https://www.rfc-editor.org/rfc/rfc1740.txt [2] https://web.archive.org/web/20180311140826/http://kaiser-edv.de/documents/AppleSingle_AppleDouble.pdf -- Erik
On Oct 22, 2023, at 20:36, Erik Wienhold <ewie@ewie.name> wrote: > That's an AppleSingle file according to [1][2]. It only contains the > resource fork and file name but no data fork. Ah, I had “Send large attachments with Mail Drop” enabled. To me 20K is not big but whatever. Let’s see if turning it offfixes the issue. > Any reason for calling it "predicate check expressions" (e.g. the link > text) and sometimes "predicate path expressions" (e.g. the linked > section title)? I think it should be named consistently to avoid > confusion and also to simplify searching. I think "predicate path expressions” is more descriptive, but "predicate check expressions” is what was in the docs before,so let’s stick with that. > Linking the same section twice in the same paragraph seems excessive. Fair. Will link the second one. >> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> >> +select jsonb_path_query(:'json', '$.track.segments'); > > Please remove the second SELECT. Done. >> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[0].location');</userinput> >> + jsonb_path_query >> +------------------- >> + [47.763, 13.4034] > > Strict mode is unnecessary to get that result and I'd omit it because > the different modes are not introduced yet at this point. Yep, pasto. > Strict mode is unnecessary here as well. Fixed. >> + using the lax mode. To avoid surprising results, we recommend using >> + the <literal>.**</literal> accessor only in the strict mode. The > > Please change to "in strict mode" (without "the"). Hrm, I prefer it without the article, too, but it is consistently used that way elsewhere, like here: https://github.com/postgres/postgres/blob/5b36e8f/doc/src/sgml/func.sgml#L17401 I’d be happy to change them all, but was keeping it consistent for now. Updated patch attached, thank you! David
Attachment
On 2023-10-24 00:58 +0200, David E. Wheeler wrote: > On Oct 22, 2023, at 20:36, Erik Wienhold <ewie@ewie.name> wrote: > > > That's an AppleSingle file according to [1][2]. It only contains the > > resource fork and file name but no data fork. > > Ah, I had “Send large attachments with Mail Drop” enabled. To me 20K > is not big but whatever. Let’s see if turning it off fixes the issue. I suspected it had something to do with iCloud. Glad you solved it! > > Please change to "in strict mode" (without "the"). > > Hrm, I prefer it without the article, too, but it is consistently used > that way elsewhere, like here: > > https://github.com/postgres/postgres/blob/5b36e8f/doc/src/sgml/func.sgml#L17401 > > I’d be happy to change them all, but was keeping it consistent for now. Right. I haven't really noticed that the article case is more common. I thought that you may have missed that one because I saw this change that removes the article: > - In the strict mode, the specified path must exactly match the structure of > + In strict mode, the specified path must exactly match the structure of > Updated patch attached, thank you! LGTM. Would you create a commitfest entry? I'll set the status to RfC. -- Erik
On Oct 23, 2023, at 20:20, Erik Wienhold <ewie@ewie.name> wrote: > I thought that you may have missed that one because I saw this change > that removes the article: > >> - In the strict mode, the specified path must exactly match the structure of >> + In strict mode, the specified path must exactly match the structure of Oh, didn’t realize. Fixed. > LGTM. Would you create a commitfest entry? I'll set the status to RfC. Done. https://commitfest.postgresql.org/45/4624/ Best, David
Attachment
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: tested, passed I took a look for this commit, it looks correct to me
"David E. Wheeler" <david@justatheory.com> writes: > [ v7-0001-Improve-boolean-predicate-JSON-Path-docs.patch ] I started to review this, and got bogged down at @@ -17203,9 +17214,12 @@ array w/o UK? | t <para> For example, suppose you have some JSON data from a GPS tracker that you - would like to parse, such as: + would like to parse, such as this JSON, set up as a + <link linkend="app-psql-meta-command-set"><application>psql</application> + <command>\set</command> variable</link> for use as <literal>:'json'</literal> + in the examples below: <programlisting> -{ + \set json '{ "track": { "segments": [ { I find the textual change rather unwieldy, but the bigger problem is that this example doesn't actually work. If you try to copy-and-paste this into psql, you get "unterminated quoted string", because psql metacommands can't span line boundaries. Perhaps we could leave the existing display alone, and then add To follow the examples below, paste this into psql: <programlisting> \set json '{ "track": { "segments": [ { "location": [ 47.763, 13.4034 ], "start time": "2018-10-14 10:05:14", "HR": 73}, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", "HR": 135 } ] }}' </programlisting> This will allow <literal>:'json'</literal> to be expanded into the above JSON value, plus suitable quoting. However, I'm not sure that's a great solution, because it's going to line-wrap on most displays, making copy-and-paste a bit iffy. I experimented with SELECT ' ... multiline json value ... ' AS json \gexec but that didn't seem to work either. Anybody have a better idea? regards, tom lane
I wrote: > I experimented with > SELECT ' > ... multiline json value ... > ' AS json > \gexec > but that didn't seem to work either. Anybody have a better idea? Oh, never mind, \gset is what I was reaching for. We can make it work with that. regards, tom lane
On 2024-01-19 22:15 +0100, Tom Lane wrote: > "David E. Wheeler" <david@justatheory.com> writes: > > [ v7-0001-Improve-boolean-predicate-JSON-Path-docs.patch ] > > + \set json '{ > "track": { > "segments": [ > { > > I find the textual change rather unwieldy, but the bigger problem is > that this example doesn't actually work. If you try to copy-and-paste > this into psql, you get "unterminated quoted string", because psql > metacommands can't span line boundaries. Interesting... copy-pasting the entire \set command works for me with psql 16.1 in gnome-terminal and tmux. Typing it out manually gives me the "unterminated quoted string" error. Maybe has to do with my stty settings. > I experimented with > > SELECT ' > ... multiline json value ... > ' AS json > \gexec > > but that didn't seem to work either. Anybody have a better idea? Fine with me (the \gset variant). -- Erik
On Jan 19, 2024, at 21:46, Erik Wienhold <ewie@ewie.name> wrote: > Interesting... copy-pasting the entire \set command works for me with > psql 16.1 in gnome-terminal and tmux. Typing it out manually gives me > the "unterminated quoted string" error. Maybe has to do with my stty > settings. Yes, same on macOS Terminal.app and 16.1 compiled with readline. I didn’t realize that \set didn’t support newlines, becauseit works fine when you paste something with newlines. Curious. >> I experimented with >> >> SELECT ' >> ... multiline json value ... >> ' AS json >> \gexec >> >> but that didn't seem to work either. Anybody have a better idea? > > Fine with me (the \gset variant). Much cleaner TBH. david=# select '{ "track": { "segments": [ { "location": [ 47.763, 13.4034 ], "start time": "2018-10-14 10:05:14", "HR": 73 }, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", "HR": 135 } ] } }'::jsonb as json; json -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location":[47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}} (1 row) david=# \gset david=# select :'json'::jsonb; jsonb -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"track": {"segments": [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location":[47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]}} (1 row) So great! While you’re in there, Tom, would it make sense to fold in something like [this patch][1] I posted last month to clarifywhich JSONPath comparison operators can take advantage of a index? --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; </programlisting> For these operators, a GIN index extracts clauses of the form <literal><replaceable>accessors_chain</replaceable> - = <replaceable>constant</replaceable></literal> out of + == <replaceable>constant</replaceable></literal> out of the <type>jsonpath</type> pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include <literal>.<replaceable>key</replaceable></literal>, @@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; The <literal>jsonb_ops</literal> operator class also supports <literal>.*</literal> and <literal>.**</literal> accessors, but the <literal>jsonb_path_ops</literal> operator class does not. + Only the <literal>==</literal> and <literal>!=</literal> <link + linkend="functions-sqljson-path-operators">SQL/JSON Path Operators</link> + can use the index. </para> <para> Best, David [1]: https://www.postgresql.org/message-id/0ECE6B9C-CDDE-4B65-BE5A-49D7372046AF@justatheory.com
"David E. Wheeler" <david@justatheory.com> writes: > While you’re in there, Tom, would it make sense to fold in something like [this patch][1] I posted last month to clarifywhich JSONPath comparison operators can take advantage of a index? > --- a/doc/src/sgml/json.sgml > +++ b/doc/src/sgml/json.sgml > @@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; > </programlisting> > For these operators, a GIN index extracts clauses of the form > <literal><replaceable>accessors_chain</replaceable> > - = <replaceable>constant</replaceable></literal> out of > + == <replaceable>constant</replaceable></literal> out of > the <type>jsonpath</type> pattern, and does the index search based on > the keys and values mentioned in these clauses. The accessors chain > may include <literal>.<replaceable>key</replaceable></literal>, Right, clearly a typo. > @@ -522,6 +522,9 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; > The <literal>jsonb_ops</literal> operator class also > supports <literal>.*</literal> and <literal>.**</literal> accessors, > but the <literal>jsonb_path_ops</literal> operator class does not. > + Only the <literal>==</literal> and <literal>!=</literal> <link > + linkend="functions-sqljson-path-operators">SQL/JSON Path Operators</link> > + can use the index. > </para> You sure about that? It would surprise me if we could effectively use a not-equal condition with an index. If it is only == that works, then the preceding statement seems sufficient. regards, tom lane
So, overall reaction to this patch: I like the approach of defining "predicate check expressions" as being a different thing from standard jsonpath expressions. However, I'm not so thrilled with just saying "don't use" one type or the other with different jsonpath functions. According to my tests, some of these functions seem to give sensible results anyway with the path type you say not to use, while some give less-sensible results, and others give errors. We ought to try to document that, and maybe even clean up the less sane behaviors. (That is, I don't feel that a docs-only patch is necessarily the thing to do here.) As an example, @? seems to behave sanely with a standard jsonpath: regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ < 5)' ; ?column? ---------- t (1 row) regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 5)' ; ?column? ---------- f (1 row) It will take a predicate, but seems to always return true: regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ; ?column? ---------- t (1 row) regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ; ?column? ---------- t (1 row) Surely we're not helping anybody by leaving that behavior in place. Making it do something useful, throwing an error, or returning NULL all seem superior to this. I observe that @@ returns NULL for the path type it doesn't like, so maybe that's what to do here. (Unsurprisingly, jsonb_path_exists acts similarly.) BTW, jsonb_path_query_array and jsonb_path_query_first seem to take both types of path, like jsonb_path_query, so ISTM they need docs changes too. regards, tom lane
On Jan 20, 2024, at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Surely we're not helping anybody by leaving that behavior in place. > Making it do something useful, throwing an error, or returning NULL > all seem superior to this. I observe that @@ returns NULL for the > path type it doesn't like, so maybe that's what to do here. I agree it would be far better for the behavior to be consistent, but frankly would like to see them raise an error. Ideallythe hit would suggest the proper alternative operator or function to use, and maybe link to the docs that describethe difference between SQL-standard JSONPath and "predicate check expressions”, and how they have separate operatorsand functions. I think of them as practically different data types (and wish they were, TBH). It makes sense that passing a JSON containmentexpression[1] would raise an error; so should passing the wrong flavor of JSONPath. > BTW, jsonb_path_query_array and jsonb_path_query_first seem to > take both types of path, like jsonb_path_query, so ISTM they need > docs changes too. Happy to update the patch, either to add those docs or, if we change the behavior to return a NULL or raise an error, thenwith that information, instead. Best, David [1]: https://www.postgresql.org/docs/current/datatype-json.html#JSON-CONTAINMENT
On Jan 20, 2024, at 11:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You sure about that? It would surprise me if we could effectively use > a not-equal condition with an index. If it is only == that works, > then the preceding statement seems sufficient. I’m not! I just assumed it in the same way creating an SQL = operator automatically respects NOT syntax (or so I recall).In fiddling a bit, I can’t get it to use an index: CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"'; create index on movies using gin (movie); analyze movies; david=# explain analyze select id from movies where movie @? '$ ?(@.genre[*] != "Teen")'; QUERY PLAN ----------------------------------------------------------------------------------------------------- Seq Scan on movies (cost=0.00..3741.41 rows=4 width=4) (actual time=19.222..19.223 rows=0 loops=1) Filter: (movie @? '$?(@."genre"[*] != "Teen")'::jsonpath) Rows Removed by Filter: 36273 Planning Time: 1.242 ms Execution Time: 19.247 ms (5 rows) But that might be because the planner knows that the query is going to fetch most records, anyway. If I set most recordsto a single value: david=# update movies set movie = jsonb_set(movie, '{year}', '2020'::jsonb) where id < 3600; UPDATE 3599 david=# analyze movies; ANALYZE david=# explain analyze select id from movies where movie @? '$ ?(@.year != 2020)'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on movies (cost=0.00..3884.41 rows=32609 width=4) (actual time=0.065..43.730 rows=32399 loops=1) Filter: (movie @? '$?(@."year" != 2020)'::jsonpath) Rows Removed by Filter: 3874 Planning Time: 1.759 ms Execution Time: 45.368 ms (5 rows) Looks like it still doesn’t use the index with !=. Pity. Best, David
On Jan 20, 2024, at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It will take a predicate, but seems to always return true: > > regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ; > ?column? > ---------- > t > (1 row) > > regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ; > ?column? > ---------- > t > (1 row) Just for the sake of clarity, this return value is “correct,” because @? and other functions and operators that expect SQLstandard statements evaluate the SET returned by the JSONPath statement, but predicate check expressions don’t returna set, but a always a single scalar value (true, false, or null). From the POV of the code expecting SQL standard JSONPathresults, that’s a set of one. @? sees that the set is not empty so returns true. Best, David
"David E. Wheeler" <david@justatheory.com> writes: > On Jan 20, 2024, at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Surely we're not helping anybody by leaving that behavior in place. >> Making it do something useful, throwing an error, or returning NULL >> all seem superior to this. I observe that @@ returns NULL for the >> path type it doesn't like, so maybe that's what to do here. > I agree it would be far better for the behavior to be consistent, but frankly would like to see them raise an error. Ideallythe hit would suggest the proper alternative operator or function to use, and maybe link to the docs that describethe difference between SQL-standard JSONPath and "predicate check expressions”, and how they have separate operatorsand functions. That ship's probably sailed. However, I spent some time poking into the odd behavior I showed for @?, and it seems to me that it's an oversight in appendBoolResult. That just automatically returns jperOk in the !found short-circuit path for any boolean result, which is not the behavior you'd get if the boolean value were actually returned (cf. jsonb_path_match_internal). I experimented with making it do what seems like the right thing, and found that there is only one regression test case that changes behavior: select jsonb '2' @? '$ == "2"'; ?column? ---------- - t + f (1 row) Now, JSON does not think that numeric 2 equals string "2", so ISTM the expected output here is flat wrong. It's certainly inconsistent with @@: regression=# select jsonb '2' @@ '$ == "2"'; ?column? ---------- (1 row) So I think we should consider a patch like the attached (probably with some more test cases added). I don't really understand this code however, so maybe I missed something. regards, tom lane diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c index ac16f5c85d..63c46cfab5 100644 --- a/src/backend/utils/adt/jsonpath_exec.c +++ b/src/backend/utils/adt/jsonpath_exec.c @@ -2065,7 +2065,14 @@ appendBoolResult(JsonPathExecContext *cxt, JsonPathItem *jsp, JsonbValue jbv; if (!jspGetNext(jsp, &next) && !found) - return jperOk; /* found singleton boolean value */ + { + /* + * We have a predicate check expression, i.e. a path ending in a bare + * boolean operator, and we don't need to return the exact value(s) + * found. Just report success or failure of the boolean. + */ + return (res == jpbTrue) ? jperOk : jperNotFound; + } if (res == jpbUnknown) { diff --git a/src/test/regress/expected/jsonb_jsonpath.out b/src/test/regress/expected/jsonb_jsonpath.out index 6659bc9091..59e1b71051 100644 --- a/src/test/regress/expected/jsonb_jsonpath.out +++ b/src/test/regress/expected/jsonb_jsonpath.out @@ -1157,7 +1157,7 @@ select jsonb_path_query('2', '$ == "2"'); select jsonb '2' @? '$ == "2"'; ?column? ---------- - t + f (1 row) select jsonb '2' @@ '$ > 1';
"David E. Wheeler" <david@justatheory.com> writes: > On Jan 20, 2024, at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It will take a predicate, but seems to always return true: >> >> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] < 5' ; >> ?column? >> ---------- >> t >> (1 row) >> >> regression=# select '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] > 5' ; >> ?column? >> ---------- >> t >> (1 row) > Just for the sake of clarity, this return value is “correct,” because @? and other functions and operators that expectSQL standard statements evaluate the SET returned by the JSONPath statement, but predicate check expressions don’treturn a set, but a always a single scalar value (true, false, or null). From the POV of the code expecting SQL standardJSONPath results, that’s a set of one. @? sees that the set is not empty so returns true. I don't entirely buy this argument --- if that is the interpretation, of what use are predicate check expressions? It seems to me that we have to consider them as being a shorthand notation for filter expressions, or else they simply do not make sense as jsonpath. regards, tom lane
On Jan 21, 2024, at 14:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't entirely buy this argument --- if that is the interpretation, > of what use are predicate check expressions? It seems to me that we > have to consider them as being a shorthand notation for filter > expressions, or else they simply do not make sense as jsonpath. I believe it becomes pretty apparent when using jsonb_path_query(). The filter expression returns a set (using the previous\gset example): david=# select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 10)'); jsonb_path_query ------------------ 73 135 (2 rows) The predicate check returns a boolean: david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 10'); jsonb_path_query ------------------ true (1 row) This is the only way the different behaviors make sense to me. @? expects a set, not a boolean, sees there is an item inthe set, so returns true: david=# select jsonb_path_query(:'json', '$.track.segments[*].HR > 1000'); jsonb_path_query ------------------ false (1 row) david=# select :'json'::jsonb @? '$.track.segments[*].HR > 1000'; ?column? ---------- t (1 row) Best, David
On Jan 21, 2024, at 14:52, David E. Wheeler <david@justatheory.com> wrote: > This is the only way the different behaviors make sense to me. @? expects a set, not a boolean, sees there is an item inthe set, so returns true: I make this interpretation based on this bit of the docs: <para> <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path language has the following deviations from the SQL/JSON standard. </para> <sect4 id="boolean-predicate-check-expressions"> <title>Boolean Predicate Check Expressions</title> <para> As an extension to the SQL standard, a <productname>PostgreSQL</productname> path expression can be a Boolean predicate, whereas the SQL standard allows predicates only in filters. Where SQL standard path expressions return the relevant contents of the queried JSON value, predicate check expressions return the three-valued result of the predicate: <literal>true</literal>, <literal>false</literal>, or <literal>unknown</literal>. Compare this filter <type>jsonpath</type> expression: <screen> <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR > 130)');</userinput> jsonb_path_query --------------------------------------------------------------------------------- {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} </screen> To a predicate expression, which returns <literal>true</literal> <screen> <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR > 130');</userinput> jsonb_path_query ------------------ true </screen> </para> Best, David
On Jan 21, 2024, at 14:58, David E. Wheeler <david@justatheory.com> wrote: > I make this interpretation based on this bit of the docs: Sorry, that’s from my branch. Here it is in master: <listitem> <para> A path expression can be a Boolean predicate, although the SQL/JSON standard allows predicates only in filters. This is necessary for implementation of the <literal>@@</literal> operator. For example, the following <type>jsonpath</type> expression is valid in <productname>PostgreSQL</productname>: <programlisting> $.track.segments[*].HR < 70 </programlisting> </para> </listitem> In any event, something to do with @@, perhaps to have some compatibility with `jsonb @> jsonb`? I don’t know why @@ wasimportant to have. David
"David E. Wheeler" <david@justatheory.com> writes: > In any event, something to do with @@, perhaps to have some compatibility with `jsonb @> jsonb`? I don’t know why @@ wasimportant to have. Yeah, that's certainly under-explained. But it seems like I'm not getting traction for the idea of changing the behavior, so let's go back to just documenting it. I spent some time going over your text and also cleaning up nearby shaky English, and ended with v8 attached. I'd be content to commit this if it looks good to you. regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5030a1045f..99616d2298 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15889,6 +15889,9 @@ table2-mapping </para> <para> Does JSON path return any item for the specified JSON value? + (This is useful only with SQL-standard JSON path expressions, not + <link linkend="functions-sqljson-check-expressions">predicate check + expressions</link>, since those always return a value.) </para> <para> <literal>'{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)'</literal> @@ -15903,9 +15906,12 @@ table2-mapping </para> <para> Returns the result of a JSON path predicate check for the - specified JSON value. Only the first item of the result is taken into - account. If the result is not Boolean, then <literal>NULL</literal> - is returned. + specified JSON value. + (This is useful only + with <link linkend="functions-sqljson-check-expressions">predicate + check expressions</link>, not SQL-standard JSON path expressions, + since it will return <literal>NULL</literal> if the path result is + not a single boolean value.) </para> <para> <literal>'{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2'</literal> @@ -17029,6 +17035,9 @@ ERROR: value too long for type character(2) <para> Checks whether the JSON path returns any item for the specified JSON value. + (This is useful only with SQL-standard JSON path expressions, not + <link linkend="functions-sqljson-check-expressions">predicate check + expressions</link>, since those always return a value.) If the <parameter>vars</parameter> argument is specified, it must be a JSON object, and its fields provide named values to be substituted into the <type>jsonpath</type> expression. @@ -17052,8 +17061,12 @@ ERROR: value too long for type character(2) </para> <para> Returns the result of a JSON path predicate check for the specified - JSON value. Only the first item of the result is taken into account. - If the result is not Boolean, then <literal>NULL</literal> is returned. + JSON value. + (This is useful only + with <link linkend="functions-sqljson-check-expressions">predicate + check expressions</link>, not SQL-standard JSON path expressions, + since it will either fail or return <literal>NULL</literal> if the + path result is not a single boolean value.) The optional <parameter>vars</parameter> and <parameter>silent</parameter> arguments act the same as for <function>jsonb_path_exists</function>. @@ -17075,6 +17088,12 @@ ERROR: value too long for type character(2) <para> Returns all JSON items returned by the JSON path for the specified JSON value. + For SQL-standard JSON path expressions it returns the JSON + values selected from <parameter>target</parameter>. + For <link linkend="functions-sqljson-check-expressions">predicate + check expressions</link> it returns the result of the predicate + check: <literal>true</literal>, <literal>false</literal>, + or <literal>null</literal>. The optional <parameter>vars</parameter> and <parameter>silent</parameter> arguments act the same as for <function>jsonb_path_exists</function>. @@ -17103,9 +17122,8 @@ ERROR: value too long for type character(2) <para> Returns all JSON items returned by the JSON path for the specified JSON value, as a JSON array. - The optional <parameter>vars</parameter> - and <parameter>silent</parameter> arguments act the same as - for <function>jsonb_path_exists</function>. + The parameters are the same as + for <function>jsonb_path_query</function>. </para> <para> <literal>jsonb_path_query_array('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2,"max":4}')</literal> @@ -17123,11 +17141,10 @@ ERROR: value too long for type character(2) </para> <para> Returns the first JSON item returned by the JSON path for the - specified JSON value. Returns <literal>NULL</literal> if there are no + specified JSON value, or <literal>NULL</literal> if there are no results. - The optional <parameter>vars</parameter> - and <parameter>silent</parameter> arguments act the same as - for <function>jsonb_path_exists</function>. + The parameters are the same as + for <function>jsonb_path_query</function>. </para> <para> <literal>jsonb_path_query_first('{"a":[1,2,3,4,5]}', '$.a[*] ? (@ >= $min && @ <= $max)', '{"min":2,"max":4}')</literal> @@ -17266,9 +17283,9 @@ ERROR: value too long for type character(2) </indexterm> <para> - SQL/JSON path expressions specify the items to be retrieved - from the JSON data, similar to XPath expressions used - for SQL access to XML. In <productname>PostgreSQL</productname>, + SQL/JSON path expressions specify item(s) to be retrieved + from a JSON value, similarly to XPath expressions used + for access to XML content. In <productname>PostgreSQL</productname>, path expressions are implemented as the <type>jsonpath</type> data type and can use any elements described in <xref linkend="datatype-jsonpath"/>. @@ -17279,6 +17296,8 @@ ERROR: value too long for type character(2) pass the provided path expression to the <firstterm>path engine</firstterm> for evaluation. If the expression matches the queried JSON data, the corresponding JSON item, or set of items, is returned. + If there is no match, the result will be <literal>NULL</literal>, + <literal>false</literal>, or an error, depending on the function. Path expressions are written in the SQL/JSON path language and can include arithmetic expressions and functions. </para> @@ -17296,18 +17315,20 @@ ERROR: value too long for type character(2) <para> To refer to the JSON value being queried (the <firstterm>context item</firstterm>), use the <literal>$</literal> variable - in the path expression. It can be followed by one or more + in the path expression. The first element of a path must always + be <literal>$</literal>. It can be followed by one or more <link linkend="type-jsonpath-accessors">accessor operators</link>, which go down the JSON structure level by level to retrieve sub-items - of the context item. Each operator that follows deals with the - result of the previous evaluation step. + of the context item. Each accessor operator acts on the + result(s) of the previous evaluation step, producing zero, one, or more + output items from each input item. </para> <para> For example, suppose you have some JSON data from a GPS tracker that you would like to parse, such as: <programlisting> -{ +SELECT '{ "track": { "segments": [ { @@ -17322,68 +17343,100 @@ ERROR: value too long for type character(2) } ] } -} +}' AS json \gset </programlisting> + (The above example can be copied-and-pasted + into <application>psql</application> to set things up for the following + examples. Then <application>psql</application> will + expand <literal>:'json'</literal> into a suitably-quoted string + constant containing the JSON value.) </para> <para> To retrieve the available track segments, you need to use the <literal>.<replaceable>key</replaceable></literal> accessor - operator to descend through surrounding JSON objects: -<programlisting> -$.track.segments -</programlisting> + operator to descend through surrounding JSON objects, for example: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> + jsonb_path_query +-----------------------------------------------------------&zwsp;-----------------------------------------------------------&zwsp;--------------------------------------------- + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] +</screen> </para> <para> To retrieve the contents of an array, you typically use the - <literal>[*]</literal> operator. For example, - the following path will return the location coordinates for all + <literal>[*]</literal> operator. + The following example will return the location coordinates for all the available track segments: -<programlisting> -$.track.segments[*].location -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].location');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +</screen> + Here we started with the whole JSON input value (<literal>$</literal>), + then the <literal>.track</literal> accessor selected the JSON object + associated with the <literal>"track"</literal> object key, then + the <literal>.segments</literal> accessor selected the JSON array + associated with the <literal>"segments"</literal> key within that + object, then the <literal>[*]</literal> accessor selected each element + of that array (producing a series of items), then + the <literal>.location</literal> accessor selected the JSON array + associated with the <literal>"location"</literal> key within each of + those objects. In this example, each of those objects had + a <literal>"location"</literal> key; but if any of them did not, + the <literal>.location</literal> accessor would have simply produced no + output for that input item. </para> <para> To return the coordinates of the first segment only, you can specify the corresponding subscript in the <literal>[]</literal> accessor operator. Recall that JSON array indexes are 0-relative: -<programlisting> -$.track.segments[0].location -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[0].location');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] +</screen> </para> <para> The result of each path evaluation step can be processed - by one or more <type>jsonpath</type> operators and methods + by one or more of the <type>jsonpath</type> operators and methods listed in <xref linkend="functions-sqljson-path-operators"/>. Each method name must be preceded by a dot. For example, you can get the size of an array: -<programlisting> -$.track.segments.size() -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments.size()');</userinput> + jsonb_path_query +------------------ + 2 +</screen> More examples of using <type>jsonpath</type> operators and methods within path expressions appear below in <xref linkend="functions-sqljson-path-operators"/>. </para> <para> - When defining a path, you can also use one or more + A path can also contain <firstterm>filter expressions</firstterm> that work similarly to the <literal>WHERE</literal> clause in SQL. A filter expression begins with a question mark and provides a condition in parentheses: -<programlisting> +<synopsis> ? (<replaceable>condition</replaceable>) -</programlisting> +</synopsis> </para> <para> Filter expressions must be written just after the path evaluation step to which they should apply. The result of that step is filtered to include only those items that satisfy the provided condition. SQL/JSON defines - three-valued logic, so the condition can be <literal>true</literal>, <literal>false</literal>, + three-valued logic, so the condition can + produce <literal>true</literal>, <literal>false</literal>, or <literal>unknown</literal>. The <literal>unknown</literal> value plays the same role as SQL <literal>NULL</literal> and can be tested for with the <literal>is unknown</literal> predicate. Further path @@ -17395,90 +17448,133 @@ $.track.segments.size() The functions and operators that can be used in filter expressions are listed in <xref linkend="functions-sqljson-filter-ex-table"/>. Within a filter expression, the <literal>@</literal> variable denotes the value - being filtered (i.e., one result of the preceding path step). You can + being considered (i.e., one result of the preceding path step). You can write accessor operators after <literal>@</literal> to retrieve component items. </para> <para> For example, suppose you would like to retrieve all heart rate values higher - than 130. You can achieve this using the following expression: -<programlisting> -$.track.segments[*].HR ? (@ > 130) -</programlisting> + than 130. You can achieve this as follows: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 130)');</userinput> + jsonb_path_query +------------------ + 135 +</screen> </para> <para> To get the start times of segments with such values, you have to - filter out irrelevant segments before returning the start times, so the + filter out irrelevant segments before selecting the start times, so the filter expression is applied to the previous step, and the path used in the condition is different: -<programlisting> -$.track.segments[*] ? (@.HR > 130)."start time" -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.HR > 130)."start time"');</userinput> + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" +</screen> </para> <para> - You can use several filter expressions in sequence, if required. For - example, the following expression selects start times of all segments that + You can use several filter expressions in sequence, if required. + The following example selects start times of all segments that contain locations with relevant coordinates and high heart rate values: -<programlisting> -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR> 130)."start time"');</userinput> + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" +</screen> </para> <para> Using filter expressions at different nesting levels is also allowed. The following example first filters all segments by location, and then returns high heart rate values for these segments, if available: -<programlisting> -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130) -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4).HR ?(@ > 130)');</userinput> + jsonb_path_query +------------------ + 135 +</screen> </para> <para> - You can also nest filter expressions within each other: -<programlisting> -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size() -</programlisting> - This expression returns the size of the track if it contains any - segments with high heart rate values, or an empty sequence otherwise. + You can also nest filter expressions within each other. + This example returns the size of the track if it contains any + segments with high heart rate values, or an empty sequence otherwise: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()');</userinput> + jsonb_path_query +------------------ + 2 +</screen> </para> - <para> - <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path - language has the following deviations from the SQL/JSON standard: - </para> + <sect3 id="functions-sqljson-deviations"> + <title>Deviations from the SQL Standard</title> + <para> + <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path + language has the following deviations from the SQL/JSON standard. + </para> - <itemizedlist> - <listitem> + <sect4 id="functions-sqljson-check-expressions"> + <title>Boolean Predicate Check Expressions</title> <para> - A path expression can be a Boolean predicate, although the SQL/JSON - standard allows predicates only in filters. This is necessary for - implementation of the <literal>@@</literal> operator. For example, - the following <type>jsonpath</type> expression is valid in - <productname>PostgreSQL</productname>: -<programlisting> -$.track.segments[*].HR < 70 -</programlisting> - </para> - </listitem> + As an extension to the SQL standard, + a <productname>PostgreSQL</productname> path expression can be a + Boolean predicate, whereas the SQL standard allows predicates only within + filters. While SQL-standard path expressions return the relevant + element(s) of the queried JSON value, predicate check expressions + return the single three-valued result of the + predicate: <literal>true</literal>, + <literal>false</literal>, or <literal>unknown</literal>. + For example, we could write this SQL-standard filter expression: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR > 130)');</userinput> + jsonb_path_query +-----------------------------------------------------------&zwsp;---------------------- + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} +</screen> + The similar predicate check expression simply + returns <literal>true</literal>, indicating that a match exists: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR > 130');</userinput> + jsonb_path_query +------------------ + true +</screen> + </para> - <listitem> - <para> - There are minor differences in the interpretation of regular - expression patterns used in <literal>like_regex</literal> filters, as - described in <xref linkend="jsonpath-regular-expressions"/>. - </para> - </listitem> - </itemizedlist> + <note> + <para> + Predicate check expressions are required in the + <literal>@@</literal> operator (and the + <function>jsonb_path_match</function> function), and should not be used + with the <literal>@?</literal> operator (or the + <function>jsonb_path_exists</function> function). + </para> + </note> + </sect4> - <sect3 id="strict-and-lax-modes"> + <sect4 id="functions-sqljson-regular-expression-deviation"> + <title>Regular Expression Interpretation</title> + <para> + There are minor differences in the interpretation of regular + expression patterns used in <literal>like_regex</literal> filters, as + described in <xref linkend="jsonpath-regular-expressions"/>. + </para> + </sect4> + </sect3> + + <sect3 id="functions-sqljson-strict-and-lax-modes"> <title>Strict and Lax Modes</title> <para> When you query JSON data, the path expression may not match the actual JSON data structure. An attempt to access a non-existent - member of an object or element of an array results in a + member of an object or element of an array is defined as a structural error. SQL/JSON path expressions have two modes of handling structural errors: </para> @@ -17488,8 +17584,8 @@ $.track.segments[*].HR < 70 <para> lax (default) — the path engine implicitly adapts the queried data to the specified path. - Any remaining structural errors are suppressed and converted - to empty SQL/JSON sequences. + Any structural errors that cannot be fixed as described below + are suppressed, producing no match. </para> </listitem> <listitem> @@ -17500,15 +17596,15 @@ $.track.segments[*].HR < 70 </itemizedlist> <para> - The lax mode facilitates matching of a JSON document structure and path - expression if the JSON data does not conform to the expected schema. + The lax mode facilitates matching of a JSON document and path + expression when the JSON data does not conform to the expected schema. If an operand does not match the requirements of a particular operation, - it can be automatically wrapped as an SQL/JSON array or unwrapped by + it can be automatically wrapped as an SQL/JSON array, or unwrapped by converting its elements into an SQL/JSON sequence before performing - this operation. Besides, comparison operators automatically unwrap their + the operation. Also, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box. An array of size 1 is considered equal to its sole element. - Automatic unwrapping is not performed only when: + Automatic unwrapping is not performed when: <itemizedlist> <listitem> <para> @@ -17532,40 +17628,95 @@ $.track.segments[*].HR < 70 For example, when querying the GPS data listed above, you can abstract from the fact that it stores an array of segments when using the lax mode: -<programlisting> -lax $.track.segments.location -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments.location');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +</screen> </para> <para> In the strict mode, the specified path must exactly match the structure of - the queried JSON document to return an SQL/JSON item, so using this - path expression will cause an error. To get the same result as in - the lax mode, you have to explicitly unwrap the + the queried JSON document, so using this path + expression will cause an error: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.location');</userinput> +ERROR: jsonpath member accessor can only be applied to an object +</screen> + To get the same result as in the lax mode, you have to explicitly unwrap the <literal>segments</literal> array: -<programlisting> -strict $.track.segments[*].location -</programlisting> +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[*].location');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +</screen> </para> <para> - The <literal>.**</literal> accessor can lead to surprising results - when using the lax mode. For instance, the following query selects every - <literal>HR</literal> value twice: -<programlisting> -lax $.**.HR -</programlisting> + The unwrapping behavior of lax mode can lead to surprising results. For + instance, the following query using the <literal>.**</literal> accessor + selects every <literal>HR</literal> value twice: + <screen> + <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.**.HR');</userinput> + jsonb_path_query + ------------------ + 73 + 135 + 73 + 135 + </screen> This happens because the <literal>.**</literal> accessor selects both the <literal>segments</literal> array and each of its elements, while the <literal>.HR</literal> accessor automatically unwraps arrays when using the lax mode. To avoid surprising results, we recommend using the <literal>.**</literal> accessor only in the strict mode. The following query selects each <literal>HR</literal> value just once: -<programlisting> -strict $.**.HR -</programlisting> + <screen> + <prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput> + jsonb_path_query + ------------------ + 73 + 135 + </screen> </para> + <para> + The unwrapping of arrays can also lead to unexpected results. Consider this + example, which selects all the <literal>location</literal> arrays: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments[*].location');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +(2 rows) +</screen> + As expected it returns the full arrays. But applying a filter expression + causes the arrays to be unwrapped to evaluate each item, returning only the + items that match the expression: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments[*].location ?(@[*] > 15)');</userinput> + jsonb_path_query +------------------ + 47.763 + 47.706 +(2 rows) +</screen> + This despite the fact that the full arrays are selected by the path + expression. Use strict mode to restore selecting the arrays: +<screen> +<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[*].location ?(@[*] > 15)');</userinput> + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +(2 rows) +</screen> + </para> </sect3> <sect3 id="functions-sqljson-path-operators"> @@ -18141,7 +18292,7 @@ strict $.**.HR Tests whether a path expression matches at least one SQL/JSON item. Returns <literal>unknown</literal> if the path expression would result in an error; the second example uses this to avoid a no-such-key error - in strict mode. + in the strict mode. </para> <para> <literal>jsonb_path_query('{"x": [1, 2], "y": [2, 4]}', 'strict $.* ? (exists (@ ? (@[*] > 2)))')</literal> diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index b6c2ddbf55..1dbb9606e9 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -513,7 +513,7 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; </programlisting> For these operators, a GIN index extracts clauses of the form <literal><replaceable>accessors_chain</replaceable> - = <replaceable>constant</replaceable></literal> out of + == <replaceable>constant</replaceable></literal> out of the <type>jsonpath</type> pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include <literal>.<replaceable>key</replaceable></literal>,
On Jan 24, 2024, at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David E. Wheeler" <david@justatheory.com> writes: > >> In any event, something to do with @@, perhaps to have some compatibility with `jsonb @> jsonb`? I don’t know why @@ wasimportant to have. > > Yeah, that's certainly under-explained. But it seems like I'm not > getting traction for the idea of changing the behavior, so let's > go back to just documenting it. Curious about those discussions. On the one hand I find the distinction between the two behaviors to be odd, and to produceunexpected results when they’re not used in the proper context. It’s reminds me of the Perl idea of context, where functions behave differently in scalar and list context, and if you expectlist behavior on scalar context you’re gonna get a surprise. This is a bit of a challenge for those new to the language,as they’re not necessarily aware of the context. > I spent some time going over your > text and also cleaning up nearby shaky English, and ended with v8 > attached. I'd be content to commit this if it looks good to you. This looks very nice, thank you. A couple of comments. > + <para> > + Predicate check expressions are required in the > + <literal>@@</literal> operator (and the > + <function>jsonb_path_match</function> function), and should not be used > + with the <literal>@?</literal> operator (or the > + <function>jsonb_path_exists</function> function). > + </para> > + </note> > + </sect4> I had this bit here: <para> Conversely, non-predicate <type>jsonpath</type> expressions should not be used with the <literal>@@</literal> operator (or the <function>jsonb_path_match</function> function). </para> I think it’s important to let people know what the difference is in the behavior of the two forms, in every spot it’s likelyto come up. SQL-standard JSON Path expressions should never be used in contexts (functions, operators) only designedto work with predicate check expressions, and the docs should say so IMO. > <para> > - The lax mode facilitates matching of a JSON document structure and path > - expression if the JSON data does not conform to the expected schema. > + The lax mode facilitates matching of a JSON document and path > + expression when the JSON data does not conform to the expected schema. What do you think of also dropping the article from all the references to “the strict mode” or “the lax mode”, to make them“strict mode” and “lax mode”, respectively? Thanks for the review! Best, David
"David E. Wheeler" <david@justatheory.com> writes: > On Jan 24, 2024, at 16:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> + <para> >> + Predicate check expressions are required in the >> + <literal>@@</literal> operator (and the >> + <function>jsonb_path_match</function> function), and should not be used >> + with the <literal>@?</literal> operator (or the >> + <function>jsonb_path_exists</function> function). >> + </para> >> + </note> >> + </sect4> > I had this bit here: > <para> > Conversely, non-predicate <type>jsonpath</type> expressions should not be > used with the <literal>@@</literal> operator (or the > <function>jsonb_path_match</function> function). > </para> I changed the preceding para to say "... check expressions are required in ...", which I thought was sufficient to cover that. Also, the tabular description of the operator tells you not to do it. > What do you think of also dropping the article from all the references to “the strict mode” or “the lax mode”, to makethem “strict mode” and “lax mode”, respectively? Certainly most of 'em don't need it. I'll make it so. regards, tom lane
On Jan 25, 2024, at 11:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I changed the preceding para to say "... check expressions are > required in ...", which I thought was sufficient to cover that. > Also, the tabular description of the operator tells you not to do it. Yeah, that’s good. I was perhaps leaning into being over-explicit after it took me a while to even figure out that therewas a difference, let alone where matters. >> What do you think of also dropping the article from all the references to “the strict mode” or “the lax mode”, to makethem “strict mode” and “lax mode”, respectively? > > Certainly most of 'em don't need it. I'll make it so. Nice, thanks! David