Re: Patch: Improve Boolean Predicate JSON Path Docs - Mailing list pgsql-hackers
From | Erik Wienhold |
---|---|
Subject | Re: Patch: Improve Boolean Predicate JSON Path Docs |
Date | |
Msg-id | qpk36h5raavf66qpkthnynb3hhg3v7kj5xiybj3pe6gvraxqms@6fmeqdglg62q Whole thread Raw |
In response to | Patch: Improve Boolean Predicate JSON Path Docs ("David E. Wheeler" <david@justatheory.com>) |
Responses |
Re: Patch: Improve Boolean Predicate JSON Path Docs
|
List | pgsql-hackers |
On 2023-10-14 22:40 +0200, David E. Wheeler write: > Following up from a suggestion from Tom Lane[1] to improve the > documentation of boolean predicate JSON path expressions, please find > enclosed a draft patch to do so. Thanks for putting this together. See my review at the end. > It does three things: > > 1. Converts all of the example path queries to use jsonb_path_query() > and show the results, to make it clearer what the behaviors are. Nice. This really does help to make some sense of it. I checked all queries and they do work out except for two queries where the path expression string is not properly quoted (but the intended output is still correct). > 2. Replaces the list of deviations from the standards with a new > subsection, with each deviation in its own sub-subsection. The regex > section is unchanged, but I’ve greatly expanded the boolean expression > JSON path section with examples comparing standard filter expressions > and nonstandard boolean predicates. I’ve also added an exhortation not > use boolean expressions with @? or standard path expressions with @@. LGTM. > 3. While converting the modes section to use jsonb_path_query() and > show the results, I also added an example of strict mode returning an > error. > > Follow-ups I’d like to make: > > 1. Expand the modes section to show how the types of results can vary > depending on the mode, thanks to the flattening. Examples: > > david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); > jsonb_path_query > ------------------ > 3 > 4 > 5 > (3 rows) > > david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); > jsonb_path_query > ------------------ > [1, 2, 3, 4, 5] > > 2. Improve the descriptions and examples for @?/jsonb_path_exists() > and @@/jsonb_path_match(). +1 > [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us My review: > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index affd1254bb..295f8ca5c9 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -17205,7 +17205,7 @@ array w/o UK? | t > For example, suppose you have some JSON data from a GPS tracker that you > would like to parse, such as: > <programlisting> > -{ > + \set json '{ Perhaps make it explicit that the reader must run this in psql in order to use \set and :'json' in the ensuing samples? Some of the existing examples already use psql output but they do not rely on any psql features. > "track": { > "segments": [ > { > @@ -17220,7 +17220,7 @@ array w/o UK? | t > } > ] > } > -} > +}' > </programlisting> > </para> > > @@ -17229,7 +17229,10 @@ array w/o UK? | t > <literal>.<replaceable>key</replaceable></literal> accessor > operator to descend through surrounding JSON objects: > <programlisting> > -$.track.segments > +select jsonb_path_query(:'json'::jsonb, '$.track.segments'); > + jsonb_path_query > +------------------------------------------------------------------------------------------------------------------------------------------------------------------- > + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] > </programlisting> This should use <screen>, <userinput>, and <computeroutput> if it shows a psql session, e.g.: <screen> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput> <computeroutput> jsonb_path_query ------------------------------------------------------------------------------------------------------------------------------------------------------------------- [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635],"start time": "2018-10-14 10:39:21"}] </computeroutput> </screen> Also the cast to jsonb is not necessary and only adds clutter IMO. > </para> > > @@ -17239,7 +17242,11 @@ $.track.segments > the following path will return the location coordinates for all > the available track segments: > <programlisting> > -$.track.segments[*].location > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > @@ -17248,7 +17255,10 @@ $.track.segments[*].location > specify the corresponding subscript in the <literal>[]</literal> > accessor operator. Recall that JSON array indexes are 0-relative: > <programlisting> > -$.track.segments[0].location > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > </programlisting> > </para> > > @@ -17259,7 +17269,10 @@ $.track.segments[0].location > Each method name must be preceded by a dot. For example, > you can get the size of an array: > <programlisting> > -$.track.segments.size() > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()'); > + jsonb_path_query > +------------------ > + 2 > </programlisting> > More examples of using <type>jsonpath</type> operators > and methods within path expressions appear below in > @@ -17302,7 +17315,10 @@ $.track.segments.size() > For example, suppose you would like to retrieve all heart rate values higher > than 130. You can achieve this using the following expression: > <programlisting> > -$.track.segments[*].HR ? (@ > 130) > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ > 130)'); > + jsonb_path_query > +------------------ > + 135 > </programlisting> > </para> > > @@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ > 130) > filter expression is applied to the previous step, and the path used > in the condition is different: > <programlisting> > -$.track.segments[*] ? (@.HR > 130)."start time" > + select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR > 130)."start time"'); > + jsonb_path_query > +----------------------- > + "2018-10-14 10:39:21" > </programlisting> > </para> > > @@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR > 130)."start time" > example, the following expression selects start times of all segments that > contain locations with relevant coordinates and high heart rate values: > <programlisting> > -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"'); > + jsonb_path_query > +----------------------- > + "2018-10-14 10:39:21" > </programlisting> > </para> > > @@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" > The following example first filters all segments by location, and then > returns high heart rate values for these segments, if available: > <programlisting> > -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130) > +select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)'); The opening quote is missing from the jsonpath literal. > + jsonb_path_query > +------------------ > + 135 > </programlisting> > </para> > > <para> > You can also nest filter expressions within each other: > <programlisting> > -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size() > +select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'); Missing opening quote here as well. > + jsonb_path_query > +------------------ > + 2 > </programlisting> > This expression returns the size of the track if it contains any > segments with high heart rate values, or an empty sequence otherwise. > </para> > > - <para> > - <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path > - language has the following deviations from the SQL/JSON standard: > - </para> > + <sect3 id="devations-from-the-standard"> > + <title>Devaiations from the SQL Standard</title> Typo in "deviations" (section ID and title). > + <para> > + <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path > + language has the following deviations from the SQL/JSON standard: The sentence should and in a period when this para is no longer followed by an item list. > + </para> > > - <itemizedlist> > - <listitem> > + <sect4 id="boolean-predicate-path-expressions"> > + <title>Boolean Predicate Path Expressions</title> > <para> > - A path expression can be a Boolean predicate, although the SQL/JSON > - standard allows predicates only in filters. This is necessary for > - implementation of the <literal>@@</literal> operator. For example, > - the following <type>jsonpath</type> expression is valid in > - <productname>PostgreSQL</productname>: > + As an extension to the SQL standard, a <productname>PostgreSQL</productname> > + path expression can be a Boolean predicate, whereas the SQL standard allows > + predicates only in filters. Where SQL standard path expressions return the > + relevant contents of the queried JSON value, predicate path expressions > + return the three-value three-valued result of the predicate: Redundant "three-value" before "three-valued result". > + <literal>true</literal>, <literal>false</literal>, or > + <literal>unknown</literal>. Compare this filter <type>jsonpath</type> > + exression: > <programlisting> > -$.track.segments[*].HR < 70 > +select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR > 130)'); > + jsonb_path_query > +--------------------------------------------------------------------------------- > + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} > </programlisting> > - </para> > - </listitem> > + To a predicate expression, which returns <literal>true</literal> > +<programlisting> > +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR > 130'); > + jsonb_path_query > +------------------ > + true > +</programlisting> > + </para> > > - <listitem> > - <para> > - There are minor differences in the interpretation of regular > - expression patterns used in <literal>like_regex</literal> filters, as > - described in <xref linkend="jsonpath-regular-expressions"/>. > - </para> > - </listitem> > - </itemizedlist> > + <para> > + Predicate-only path expressions are necessary for implementation of the > + <literal>@@</literal> operator (and the > + <function>jsonb_path_match</function> function), and should not be used > + with the <literal>@?</literal> operator (or > + <function>jsonb_path_exists</function> function). > + </para> > + > + <para> > + Conversely, non-predicate <type>jsonpath</type> expressions should not be > + used with the <literal>@@</literal> operator (or the > + <function>jsonb_path_match</function> function). > + </para> > + </sect4> Both paras should be wrapped in a single <note> so that they stand out from the rest of the text. Maybe even <warning>, but <note> is already used on this page for things that I'd consider warnings. > + <sect4 id="jsonpath-regular-expression-deviation"> > + <title>Regular Expression Interpretation</title> > + <para> > + There are minor differences in the interpretation of regular > + expression patterns used in <literal>like_regex</literal> filters, as > + described in <xref linkend="jsonpath-regular-expressions"/>. > + </para> > + </sect4> <sect3 id="devations-from-the-standard"> should be closed here, otherwise the docs won't build. This can be checked with `make -C doc/src/sgml check`. > > <sect3 id="strict-and-lax-modes"> > <title>Strict and Lax Modes</title> > @@ -17431,18 +17488,30 @@ $.track.segments[*].HR < 70 > abstract from the fact that it stores an array of segments > when using the lax mode: > <programlisting> > -lax $.track.segments.location > + select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location'); > + jsonb_path_query `git diff --check` shows a couple of lines with trailing whitespace (mostly psql output). > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > <para> > - In the strict mode, the specified path must exactly match the structure of > + In strict mode, the specified path must exactly match the structure of > the queried JSON document to return an SQL/JSON item, so using this > - path expression will cause an error. To get the same result as in > - the lax mode, you have to explicitly unwrap the > + path expression will cause an error: > +<programlisting> > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location'); > +ERROR: jsonpath member accessor can only be applied to an object > +</programlisting> > + To get the same result as in the lax mode, you have to explicitly unwrap the > <literal>segments</literal> array: > <programlisting> > -strict $.track.segments[*].location > +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location'); > + jsonb_path_query > +------------------- > + [47.763, 13.4034] > + [47.706, 13.2635] > </programlisting> > </para> > > @@ -17451,7 +17520,13 @@ strict $.track.segments[*].location > when using the lax mode. For instance, the following query selects every > <literal>HR</literal> value twice: > <programlisting> > -lax $.**.HR > +select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR'); > + jsonb_path_query > +------------------ > + 73 > + 135 > + 73 > + 135 > </programlisting> > This happens because the <literal>.**</literal> accessor selects both > the <literal>segments</literal> array and each of its elements, while > @@ -17460,7 +17535,11 @@ lax $.**.HR > the <literal>.**</literal> accessor only in the strict mode. The > following query selects each <literal>HR</literal> value just once: > <programlisting> > -strict $.**.HR > +select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR'); > + jsonb_path_query > +------------------ > + 73 > + 135 > </programlisting> > </para> > -- Erik
pgsql-hackers by date: