Re: Patch: Improve Boolean Predicate JSON Path Docs - Mailing list pgsql-hackers

From Erik Wienhold
Subject Re: Patch: Improve Boolean Predicate JSON Path Docs
Date
Msg-id qpk36h5raavf66qpkthnynb3hhg3v7kj5xiybj3pe6gvraxqms@6fmeqdglg62q
Whole thread Raw
In response to Patch: Improve Boolean Predicate JSON Path Docs  ("David E. Wheeler" <david@justatheory.com>)
Responses Re: Patch: Improve Boolean Predicate JSON Path Docs
List pgsql-hackers
On 2023-10-14 22:40 +0200, David E. Wheeler write:
> Following up from a suggestion from Tom Lane[1] to improve the
> documentation of boolean predicate JSON path expressions, please find
> enclosed a draft patch to do so.

Thanks for putting this together.  See my review at the end.

> It does three things:
> 
> 1. Converts all of the example path queries to use jsonb_path_query()
> and show the results, to make it clearer what the behaviors are.

Nice.  This really does help to make some sense of it.  I checked all
queries and they do work out except for two queries where the path
expression string is not properly quoted (but the intended output is
still correct).

> 2. Replaces the list of deviations from the standards with a new
> subsection, with each deviation in its own sub-subsection. The regex
> section is unchanged, but I’ve greatly expanded the boolean expression
> JSON path section with examples comparing standard filter expressions
> and nonstandard boolean predicates. I’ve also added an exhortation not
> use boolean expressions with @? or standard path expressions with @@.

LGTM.

> 3. While converting the modes section to use jsonb_path_query() and
> show the results, I also added an example of strict mode returning an
> error.
> 
> Follow-ups I’d like to make:
> 
> 1. Expand the modes section to show how the types of results can vary
> depending on the mode, thanks to the flattening. Examples:
> 
> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
> jsonb_path_query 
> ------------------
> 3
> 4
> 5
> (3 rows)
> 
> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}',  'strict $.a ?(@[*] > 2)');
> jsonb_path_query  
> ------------------
> [1, 2, 3, 4, 5]
> 
> 2. Improve the descriptions and examples for @?/jsonb_path_exists()
> and @@/jsonb_path_match().

+1

> [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us

My review:

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index affd1254bb..295f8ca5c9 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -17205,7 +17205,7 @@ array w/o UK? | t
>     For example, suppose you have some JSON data from a GPS tracker that you
>     would like to parse, such as:
>  <programlisting>
> -{
> + \set json '{

Perhaps make it explicit that the reader must run this in psql in order
to use \set and :'json' in the ensuing samples?  Some of the existing
examples already use psql output but they do not rely on any psql
features.

>    "track": {
>      "segments": [
>        {
> @@ -17220,7 +17220,7 @@ array w/o UK? | t
>        }
>      ]
>    }
> -}
> +}'
>  </programlisting>
>    </para>
>  
> @@ -17229,7 +17229,10 @@ array w/o UK? | t
>     <literal>.<replaceable>key</replaceable></literal> accessor
>     operator to descend through surrounding JSON objects:
>  <programlisting>
> -$.track.segments
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments');
> +                                                                         jsonb_path_query
>
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
> + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706,
13.2635],"start time": "2018-10-14 10:39:21"}]
 
>  </programlisting>

This should use <screen>, <userinput>, and <computeroutput> if it shows
a psql session, e.g.:

    <screen>
    <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
    <computeroutput>
                                                                             jsonb_path_query

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
     [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706,
13.2635],"start time": "2018-10-14 10:39:21"}]
 
    </computeroutput>
    </screen>

Also the cast to jsonb is not necessary and only adds clutter IMO.

>    </para>
>  
> @@ -17239,7 +17242,11 @@ $.track.segments
>     the following path will return the location coordinates for all
>     the available track segments:
>  <programlisting>
> -$.track.segments[*].location
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location');
> + jsonb_path_query
> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
>  </programlisting>
>    </para>
>  
> @@ -17248,7 +17255,10 @@ $.track.segments[*].location
>     specify the corresponding subscript in the <literal>[]</literal>
>     accessor operator. Recall that JSON array indexes are 0-relative:
>  <programlisting>
> -$.track.segments[0].location
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location');
> + jsonb_path_query
> +-------------------
> + [47.763, 13.4034]
>  </programlisting>
>    </para>
>  
> @@ -17259,7 +17269,10 @@ $.track.segments[0].location
>     Each method name must be preceded by a dot. For example,
>     you can get the size of an array:
>  <programlisting>
> -$.track.segments.size()
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()');
> + jsonb_path_query
> +------------------
> + 2
>  </programlisting>
>     More examples of using <type>jsonpath</type> operators
>     and methods within path expressions appear below in
> @@ -17302,7 +17315,10 @@ $.track.segments.size()
>     For example, suppose you would like to retrieve all heart rate values higher
>     than 130. You can achieve this using the following expression:
>  <programlisting>
> -$.track.segments[*].HR ? (@ > 130)
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ > 130)');
> + jsonb_path_query
> +------------------
> + 135
>  </programlisting>
>    </para>
>  
> @@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ > 130)
>     filter expression is applied to the previous step, and the path used
>     in the condition is different:
>  <programlisting>
> -$.track.segments[*] ? (@.HR > 130)."start time"
> + select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR > 130)."start time"');
> +   jsonb_path_query
> +-----------------------
> + "2018-10-14 10:39:21"
>  </programlisting>
>    </para>
>  
> @@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR > 130)."start time"
>     example, the following expression selects start times of all segments that
>     contain locations with relevant coordinates and high heart rate values:
>  <programlisting>
> -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start
time"');
> +   jsonb_path_query
> +-----------------------
> + "2018-10-14 10:39:21"
>  </programlisting>
>    </para>
>  
> @@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"
>     The following example first filters all segments by location, and then
>     returns high heart rate values for these segments, if available:
>  <programlisting>
> -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)
> +select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)');

The opening quote is missing from the jsonpath literal.

> + jsonb_path_query
> +------------------
> + 135
>  </programlisting>
>    </para>
>  
>    <para>
>     You can also nest filter expressions within each other:
>  <programlisting>
> -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()
> +select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()');

Missing opening quote here as well.

> + jsonb_path_query
> +------------------
> + 2
>  </programlisting>
>     This expression returns the size of the track if it contains any
>     segments with high heart rate values, or an empty sequence otherwise.
>    </para>
>  
> -  <para>
> -   <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
> -   language has the following deviations from the SQL/JSON standard:
> -  </para>
> +  <sect3 id="devations-from-the-standard">
> +  <title>Devaiations from the SQL Standard</title>

Typo in "deviations" (section ID and title).

> +   <para>
> +    <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
> +    language has the following deviations from the SQL/JSON standard:

The sentence should and in a period when this para is no longer followed
by an item list.

> +   </para>
>  
> -  <itemizedlist>
> -   <listitem>
> +   <sect4 id="boolean-predicate-path-expressions">
> +   <title>Boolean Predicate Path Expressions</title>
>      <para>
> -     A path expression can be a Boolean predicate, although the SQL/JSON
> -     standard allows predicates only in filters.  This is necessary for
> -     implementation of the <literal>@@</literal> operator. For example,
> -     the following <type>jsonpath</type> expression is valid in
> -     <productname>PostgreSQL</productname>:
> +     As an extension to the SQL standard, a <productname>PostgreSQL</productname>
> +     path expression can be a Boolean predicate, whereas the SQL standard allows
> +     predicates only in filters. Where SQL standard path expressions return the
> +     relevant contents of the queried JSON value, predicate path expressions
> +     return the three-value three-valued result of the predicate:

Redundant "three-value" before "three-valued result".

> +     <literal>true</literal>, <literal>false</literal>, or
> +     <literal>unknown</literal>. Compare this filter <type>jsonpath</type>
> +     exression:
>  <programlisting>
> -$.track.segments[*].HR < 70
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR > 130)');
> +                                jsonb_path_query
> +---------------------------------------------------------------------------------
> + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
>  </programlisting>
> -    </para>
> -   </listitem>
> +     To a predicate expression, which returns <literal>true</literal>
> +<programlisting>
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR > 130');
> + jsonb_path_query
> +------------------
> + true
> +</programlisting>
> +     </para>
>  
> -   <listitem>
> -    <para>
> -     There are minor differences in the interpretation of regular
> -     expression patterns used in <literal>like_regex</literal> filters, as
> -     described in <xref linkend="jsonpath-regular-expressions"/>.
> -    </para>
> -   </listitem>
> -  </itemizedlist>
> +     <para>
> +      Predicate-only path expressions are necessary for implementation of the
> +      <literal>@@</literal> operator (and the
> +      <function>jsonb_path_match</function> function), and should not be used
> +      with the <literal>@?</literal> operator (or
> +      <function>jsonb_path_exists</function> function).
> +     </para>
> +
> +     <para>
> +      Conversely, non-predicate <type>jsonpath</type> expressions should not be
> +      used with the <literal>@@</literal> operator (or the
> +      <function>jsonb_path_match</function> function).
> +     </para>
> +    </sect4>

Both paras should be wrapped in a single <note> so that they stand out
from the rest of the text.  Maybe even <warning>, but <note> is already
used on this page for things that I'd consider warnings.

> +    <sect4 id="jsonpath-regular-expression-deviation">
> +    <title>Regular Expression Interpretation</title>
> +     <para>
> +      There are minor differences in the interpretation of regular
> +      expression patterns used in <literal>like_regex</literal> filters, as
> +      described in <xref linkend="jsonpath-regular-expressions"/>.
> +     </para>
> +    </sect4>

<sect3 id="devations-from-the-standard"> should be closed here,
otherwise the docs won't build.  This can be checked with
`make -C doc/src/sgml check`.

>  
>     <sect3 id="strict-and-lax-modes">
>     <title>Strict and Lax Modes</title>
> @@ -17431,18 +17488,30 @@ $.track.segments[*].HR < 70
>      abstract from the fact that it stores an array of segments
>      when using the lax mode:
>  <programlisting>
> -lax $.track.segments.location
> + select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location');
> + jsonb_path_query  

`git diff --check` shows a couple of lines with trailing whitespace
(mostly psql output).

> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
>  </programlisting>
>     </para>
>  
>     <para>
> -    In the strict mode, the specified path must exactly match the structure of
> +    In strict mode, the specified path must exactly match the structure of
>      the queried JSON document to return an SQL/JSON item, so using this
> -    path expression will cause an error. To get the same result as in
> -    the lax mode, you have to explicitly unwrap the
> +    path expression will cause an error:
> +<programlisting>
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location');
> +ERROR:  jsonpath member accessor can only be applied to an object
> +</programlisting>    
> +    To get the same result as in the lax mode, you have to explicitly unwrap the
>      <literal>segments</literal> array:
>  <programlisting>
> -strict $.track.segments[*].location
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location');
> + jsonb_path_query  
> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
>  </programlisting>
>     </para>
>  
> @@ -17451,7 +17520,13 @@ strict $.track.segments[*].location
>      when using the lax mode. For instance, the following query selects every
>      <literal>HR</literal> value twice:
>  <programlisting>
> -lax $.**.HR
> +select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR');
> + jsonb_path_query 
> +------------------
> + 73
> + 135
> + 73
> + 135
>  </programlisting>
>      This happens because the <literal>.**</literal> accessor selects both
>      the <literal>segments</literal> array and each of its elements, while
> @@ -17460,7 +17535,11 @@ lax $.**.HR
>      the <literal>.**</literal> accessor only in the strict mode. The
>      following query selects each <literal>HR</literal> value just once:
>  <programlisting>
> -strict $.**.HR
> +select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR');
> + jsonb_path_query 
> +------------------
> + 73
> + 135
>  </programlisting>
>     </para>
>  

-- 
Erik



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: Alexander Korotkov
Date:
Subject: Re: Asymmetric partition-wise JOIN