Thread: jsonpath

jsonpath

From

Andrew Dunstan

Date:

07 January 2018, 22:31:57

Next up in the proposed SQL/JSON feature set I will review the three
jsonpath patches. These are attached, extracted from the patch set
Nikita posted on Jan 2nd.


Note that they depend on the TZH/TZM patch (see separate email thread).
I'd like to be able to commit that patch pretty soon.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: jsonpath

From

Pavel Stehule

Date:

07 January 2018, 23:00:26

2018-01-07 20:31 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:

Next up in the proposed SQL/JSON feature set I will review the three
jsonpath patches. These are attached, extracted from the patch set
Nikita posted on Jan 2nd.

Note that they depend on the TZH/TZM patch (see separate email thread).
I'd like to be able to commit that patch pretty soon.

I did few tests - and it is working very well.

regards

Pavel

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

07 January 2018, 23:14:41


On 01/07/2018 03:00 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>:
>
>
>     Next up in the proposed SQL/JSON feature set I will review the three
>     jsonpath patches. These are attached, extracted from the patch set
>     Nikita posted on Jan 2nd.
>
>
>     Note that they depend on the TZH/TZM patch (see separate email
>     thread).
>     I'd like to be able to commit that patch pretty soon.
>
>
> I did few tests - and it is working very well.
>
>


You mean on the revised TZH/TZM patch that I posted?

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Pavel Stehule

Date:

07 January 2018, 23:16:05

2018-01-07 21:14 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:

On 01/07/2018 03:00 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>:
>
>
> Next up in the proposed SQL/JSON feature set I will review the three
> jsonpath patches. These are attached, extracted from the patch set
> Nikita posted on Jan 2nd.
>
>
> Note that they depend on the TZH/TZM patch (see separate email
> thread).
> I'd like to be able to commit that patch pretty soon.
>
>
> I did few tests - and it is working very well.
>
>

You mean on the revised TZH/TZM patch that I posted?

no - I tested jsonpath implementation

Pavel

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

07 January 2018, 23:20:13


On 01/07/2018 03:16 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 21:14 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>:
>
>
>
>     On 01/07/2018 03:00 PM, Pavel Stehule wrote:
>     >
>     >
>     > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
>     > <andrew.dunstan@2ndquadrant.com
>     <mailto:andrew.dunstan@2ndquadrant.com>
>     <mailto:andrew.dunstan@2ndquadrant.com
>     <mailto:andrew.dunstan@2ndquadrant.com>>>:
>     >
>     >
>     >     Next up in the proposed SQL/JSON feature set I will review
>     the three
>     >     jsonpath patches. These are attached, extracted from the
>     patch set
>     >     Nikita posted on Jan 2nd.
>     >
>     >
>     >     Note that they depend on the TZH/TZM patch (see separate email
>     >     thread).
>     >     I'd like to be able to commit that patch pretty soon.
>     >
>     >
>     > I did few tests - and it is working very well.
>     >
>     >
>
>
>     You mean on the revised TZH/TZM patch that I posted?
>
>
> no -  I tested jsonpath implementation
>
>


OK. You can help by also reviewing and testing the small patch at
<https://www.postgresql.org/message-id/f16a6408-45fe-b299-02b0-41e50a1c3023%402ndQuadrant.com>

thanks

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Pavel Stehule

Date:

07 January 2018, 23:23:45

2018-01-07 21:20 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:

On 01/07/2018 03:16 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 21:14 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>:
>
>
>
> On 01/07/2018 03:00 PM, Pavel Stehule wrote:
> >
> >
> > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> > <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>
> <mailto:andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>>>:
> >
> >
> > Next up in the proposed SQL/JSON feature set I will review
> the three
> > jsonpath patches. These are attached, extracted from the
> patch set
> > Nikita posted on Jan 2nd.
> >
> >
> > Note that they depend on the TZH/TZM patch (see separate email
> > thread).
> > I'd like to be able to commit that patch pretty soon.
> >
> >
> > I did few tests - and it is working very well.
> >
> >
>
>
> You mean on the revised TZH/TZM patch that I posted?
>
>
> no - I tested jsonpath implementation
>
>

OK. You can help by also reviewing and testing the small patch at
<https://www.postgresql.org/message-id/f16a6408-45fe-b299-02b0-41e50a1c3023%402ndQuadrant.com>

I'll look there

Regards

Pavel

thanks

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Nikita Glukhov

Date:

11 January 2018, 01:42:51

Attached new 8th version of jsonpath related patches. Complete 
documentation is still missing.

The first 4 small patches are necessary datetime handling in jsonpath:
1. simple refactoring, extracted function that will be used later in 
jsonpath
2. throw an error when the input or format string contains trailing elements
3. avoid unnecessary cstring to text conversions
4. add function for automatic datetime type recognition by the presence 
of formatting components

Should they be posted in a separate thread?

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attached 9th version of jsonpath patches rebased onto the latest master.

Jsonpath grammar for parenthesized expressions and predicates was fixed.

Documentation drafts for jsonpath written by Oleg Bartunov:
https://github.com/obartunov/sqljsondoc

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Oleg Bartunov

Date:

14 February 2018, 09:00:24

On Wed, Feb 14, 2018 at 2:04 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> Attached 9th version of jsonpath patches rebased onto the latest master.
>
> Jsonpath grammar for parenthesized expressions and predicates was fixed.
>
> Documentation drafts for jsonpath written by Oleg Bartunov:
> https://github.com/obartunov/sqljsondoc

Direct link is https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
Please consider it as WIP, I will update it in my spare time.

>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

26 February 2018, 18:34:04

Attached 10th version of the jsonpath patches.

1. Fixed error handling in arithmetic operators.

    Now run-time errors in arithmetic operators are catched (added
    PG_TRY/PG_CATCH around operator's functions calls) and converted into
    Unknown values in predicates as it is required by the standard:

    =# SELECT jsonb '[1,0,2]' @* '$[*] ? (1 / @ > 0)';
     ?column?
    ----------
     1
     2
    (2 rows)

2. Fixed grammar for parenthesized expressions.

3. Refactored GIN support for jsonpath operators.

4. Added one more operator json[b] @# jsonpath returning singleton json[b] with
    automatic conditional wrapping of sequences with more than one element into
    arrays:

    =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 2)';
     ?column?
    -----------
     [3, 4, 5]
    (1 row)

    =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 4)';
     ?column?
    ----------
     5
    (1 row)

    =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 5)';
     ?column?
    ----------
     (null)
    (1 row)

    Existing set-returning operator json[b] @* jsonpath is also very userful but
    can't be used in functional indices like new operator @#.

    Note that conditional wrapping of @# differs from the wrapping in
    JSON_QUERY(... WITH [ARRAY] WRAPPER), where only singleton objects and
    arrays are not wrapped.  Unconditional wrapping can be emulated with our
    array construction feature (see below).

5. Changed time zone behavior in .datetime() item method.

    In the previous version of the patch timestamptz SQL/JSON items were
    serialized into JSON string items using session time zone.  This behavior
    did not allow jsonpath operators to be marked as immutable, and therefore
    they could not be used in functional indices.  Also, when the time zone was
    not specified in the input string, but TZM or TZH format fields were present
    in the format string, session time zone was used as a default for
    timestamptz items.

    To make jsonpath operators immutable we decided to save input time zone for
    timestamptz items and disallow automatic time zone assignment.  Also
    additional parameter was added to .datetime() for default time zone
    specification:

    =# SET timezone = '+03';
    SET

    =# SELECT jsonb '"10-03-2017 12:34:56"' @*
                    '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")';
    ERROR:  Invalid argument for SQL/JSON datetime function

    =# SELECT jsonb '"10-03-2017 12:34:56"' @*
                    '$.datetime("DD-MM-YYYY HH24:MI:SS TZH", "+05")';
              ?column?
    -----------------------------
     "2017-03-10T12:34:56+05:00"
    (1 row)

    =# SELECT jsonb '"10-03-2017 12:34:56 +05"' @*
                    '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")';
              ?column?
    -----------------------------
     "2017-03-10T12:34:56+05:00"
    (1 row)

    Please note that our .datetime() behavior is not standard now: by the
    standard, input and format strings must match exactly, i.e. they both should
    not contain trailing unmatched elements, so automatic time zone assignment
    is impossible.  But it too restrictive for PostgreSQL users, so we decided
    to preserve usual PostgreSQL behavior here:

    =# SELECT jsonb '"10-03-2017"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS")';
           ?column?
    -----------------------
     "2017-03-10T00:00:00"
    (1 row)


    Also PostgreSQL is able to automatically recognize format of the input
    string for the specified datetime type, but we can only bring this behavior
    into jsonpath by introducing separate item methods .date(), .time(),
    .timetz(), .timestamp() and .timestamptz().   Also we can use here our
    unfinished feature that gives us ability to work with PostresSQL types in
    jsonpath using cast operator :: (see sqljson_ext branch in our github repo):

    =# SELECT jsonb '"10/03/2017 12:34"' @* '$::timestamptz';
              ?column?
    -----------------------------
     "2017-03-10T12:34:00+03:00"
    (1 row)



A brief description of the extra jsonpath syntax features contained in the
patch #7:

   * Sequence construction by joining path expressions with comma:

     =# SELECT jsonb '[1, 2, 3]' @* '$[*], 4, 5';
      ?column?
     ----------
      1
      2
      3
      4
      5
     (5 rows)

   * Array construction by placing sequence into brackets (equivalent to
     JSON_QUERY(... WITH UNCONDITIONAL WRAPPER)):

     =# SELECT jsonb '[1, 2, 3]' @* '[$[*], 4, 5]';
         ?column?
     -----------------
      [1, 2, 3, 4, 5]
     (1 row)

   * Object construction by placing sequences of key-value pairs into braces:

     =# SELECT jsonb '{"a" : [1, 2, 3]}' @* '{a: [$.a[*], 4, 5], "b c": "dddd"}';
                    ?column?
     ---------------------------------------
      {"a": [1, 2, 3, 4, 5], "b c": "dddd"}
     (1 row)

   * Object subscripting with string-valued expressions:

     =# SELECT jsonb '{"a" : "aaa", "b": "a", "c": "ccc"}' @* '$[$.b, "c"]';
      ?column?
     ----------
      "aaa"
      "ccc"
     (2 rows)

   * Support of UNIX epoch time in .datetime() item method:

     =# SELECT jsonb '1519649957.37' @* '$.datetime()';
                 ?column?
     --------------------------------
      "2018-02-26T12:59:17.37+00:00"
     (1 row)

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:

On 28.02.2018 06:55, Robert Haas wrote:

On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov
<n.gluhov@postgrespro.ru> wrote:
Attached 10th version of the jsonpath patches.

1. Fixed error handling in arithmetic operators.

Now run-time errors in arithmetic operators are catched (added
PG_TRY/PG_CATCH around operator's functions calls) and converted into
Unknown values in predicates as it is required by the standard:
I think we really need to rename PG_TRY and PG_CATCH or rethink this
whole interface so that people stop thinking they can use it to
prevent errors from being thrown.

I understand that it is unsafe to call arbitrary function inside PG_TRY without
rethrowing of caught errors in PG_CATCH, but in jsonpath only the following
numeric and datetime functions with known behavior are called inside PG_TRY
and only errors of category ERRCODE_DATA_EXCEPTION are caught:

numeric_add()
numeric_mul()
numeric_div()
numeric_mod()
numeric_float8()
float8in()
float8_numeric()
to_datetime()

That seems like a quite limited list of functions. What about reworking them

providing a way of calling them without risk of exception? For example, we can

have numeric_add_internal() function which fill given data structure with

error information instead of throwing the error. numeric_add() would be a

wrapper over numeric_add_internal(), which throws an error if corresponding

data structure is filled. In jsonpath we can call numeric_add_internal() and

interpret errors in another way. That seems to be better than use of PG_TRY

and PG_CATCH.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Oleg Bartunov

Date:

06 March 2018, 01:56:56

Thanks, Tomas !

Will publish a new version really soon !

Regards,
Oleg

On Tue, Mar 6, 2018 at 12:29 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hi,
>
> The patch no longer applies - it got broken by fd1a421fe66 which changed
> columns in pg_proc. A rebase is needed.
>
> Fixing it is pretty simle, so I've done that locally and tried to run
> 'make check' under valgrind. And I got a bunch of reports about
> uninitialised values. Full report attached, but in general there seem to
> be two types of failures:
>
>   Conditional jump or move depends on uninitialised value(s)
>      at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so)
>      by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so)
>      by 0xA926D3: pvsnprintf (psprintf.c:121)
>      by 0x723E03: appendStringInfoVA (stringinfo.c:130)
>      by 0x723D58: appendStringInfo (stringinfo.c:87)
>      by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413)
>      by 0x776F99: outNode (outfuncs.c:3978)
>      by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779)
>      by 0x777CB9: outNode (outfuncs.c:4398)
>      by 0x76D507: _outJsonExpr (outfuncs.c:1752)
>      by 0x777CA1: outNode (outfuncs.c:4395)
>      by 0x767000: _outList (outfuncs.c:187)
>      by 0x776874: outNode (outfuncs.c:3753)
>      by 0x76A4D2: _outTableFunc (outfuncs.c:1068)
>      by 0x776D89: outNode (outfuncs.c:3912)
>      by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209)
>      by 0x777959: outNode (outfuncs.c:4290)
>      by 0x767000: _outList (outfuncs.c:187)
>      by 0x776874: outNode (outfuncs.c:3753)
>      by 0x773713: _outQuery (outfuncs.c:3049)
>    Uninitialised value was created by a stack allocation
>      at 0x5B0C19: base_yyparse (gram.c:26287)
>
> This happens when _outCoerceViaIO tries to output 'location' field
> (that's line 1413), so I guess it's not set/copied somewhere.
>
> The second failure looks like this:
>
>   Conditional jump or move depends on uninitialised value(s)
>      at 0x49E58B: ginFillScanEntry (ginscan.c:72)
>      by 0x49EB56: ginFillScanKey (ginscan.c:221)
>      by 0x49EF72: ginNewScanKey (ginscan.c:369)
>      by 0x4A3875: gingetbitmap (ginget.c:1807)
>      by 0x4F620B: index_getbitmap (indexam.c:727)
>      by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
>      by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
>      by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
>      by 0x6DC26D: ExecScanFetch (execScan.c:95)
>      by 0x6DC308: ExecScan (execScan.c:162)
>      by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
>      by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>      by 0x6E5961: ExecProcNode (executor.h:239)
>      by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
>      by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
>      by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
>      by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>      by 0x6D1361: ExecProcNode (executor.h:239)
>      by 0x6D3BB7: ExecutePlan (execMain.c:1721)
>      by 0x6D1917: standard_ExecutorRun (execMain.c:361)
>    Uninitialised value was created by a heap allocation
>      at 0xA64FDC: palloc (mcxt.c:858)
>      by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630)
>      by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746)
>      by 0xA340C0: FunctionCall7Coll (fmgr.c:1201)
>      by 0x49EE7F: ginNewScanKey (ginscan.c:313)
>      by 0x4A3875: gingetbitmap (ginget.c:1807)
>      by 0x4F620B: index_getbitmap (indexam.c:727)
>      by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104)
>      by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506)
>      by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118)
>      by 0x6DC26D: ExecScanFetch (execScan.c:95)
>      by 0x6DC308: ExecScan (execScan.c:162)
>      by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730)
>      by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>      by 0x6E5961: ExecProcNode (executor.h:239)
>      by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406)
>      by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736)
>      by 0x6E7C84: ExecAgg (nodeAgg.c:1551)
>      by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446)
>      by 0x6D1361: ExecProcNode (executor.h:239)
>
> So the extra_data allocated in gin_extract_jsonpath_query() get to
> ginFillScanEntry() uninitialised.
>
>
> Both seem like a valid issues, I think.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

20 March 2018, 08:06:39

On Fri, Mar 2, 2018 at 8:27 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru>
> wrote:
>>
>> On 28.02.2018 06:55, Robert Haas wrote:
>>
>>> On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov
>>> <n.gluhov@postgrespro.ru> wrote:
>>>>
>>>> Attached 10th version of the jsonpath patches.
>>>>
>>>> 1. Fixed error handling in arithmetic operators.
>>>>
>>>>     Now run-time errors in arithmetic operators are catched (added
>>>>     PG_TRY/PG_CATCH around operator's functions calls) and converted
>>>> into
>>>>     Unknown values in predicates as it is required by the standard:
>>>
>>> I think we really need to rename PG_TRY and PG_CATCH or rethink this
>>> whole interface so that people stop thinking they can use it to
>>> prevent errors from being thrown.
>>
>>
>> I understand that it is unsafe to call arbitrary function inside PG_TRY
>> without
>> rethrowing of caught errors in PG_CATCH, but in jsonpath only the
>> following
>> numeric and datetime functions with known behavior are called inside
>> PG_TRY
>> and only errors of category ERRCODE_DATA_EXCEPTION are caught:
>>
>> numeric_add()
>> numeric_mul()
>> numeric_div()
>> numeric_mod()
>> numeric_float8()
>> float8in()
>> float8_numeric()
>> to_datetime()
>
>
> That seems like a quite limited list of functions.  What about reworking
> them
> providing a way of calling them without risk of exception?  For example, we
> can
> have numeric_add_internal() function which fill given data structure with
> error information instead of throwing the error.  numeric_add() would be a
> wrapper over numeric_add_internal(), which throws an error if corresponding
> data structure is filled.  In jsonpath we can call numeric_add_internal()
> and
> interpret errors in another way.  That seems to be better than use of PG_TRY
> and PG_CATCH.
>


I haven't seen a response to this email. Do we need one before
proceeding any further with jsonpath?

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

20 March 2018, 08:09:00

On Tue, Mar 20, 2018 at 3:36 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On Fri, Mar 2, 2018 at 8:27 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru>
>> wrote:
>>>
>>> On 28.02.2018 06:55, Robert Haas wrote:
>>>
>>>> On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov
>>>> <n.gluhov@postgrespro.ru> wrote:
>>>>>
>>>>> Attached 10th version of the jsonpath patches.
>>>>>
>>>>> 1. Fixed error handling in arithmetic operators.
>>>>>
>>>>>     Now run-time errors in arithmetic operators are catched (added
>>>>>     PG_TRY/PG_CATCH around operator's functions calls) and converted
>>>>> into
>>>>>     Unknown values in predicates as it is required by the standard:
>>>>
>>>> I think we really need to rename PG_TRY and PG_CATCH or rethink this
>>>> whole interface so that people stop thinking they can use it to
>>>> prevent errors from being thrown.
>>>
>>>
>>> I understand that it is unsafe to call arbitrary function inside PG_TRY
>>> without
>>> rethrowing of caught errors in PG_CATCH, but in jsonpath only the
>>> following
>>> numeric and datetime functions with known behavior are called inside
>>> PG_TRY
>>> and only errors of category ERRCODE_DATA_EXCEPTION are caught:
>>>
>>> numeric_add()
>>> numeric_mul()
>>> numeric_div()
>>> numeric_mod()
>>> numeric_float8()
>>> float8in()
>>> float8_numeric()
>>> to_datetime()
>>
>>
>> That seems like a quite limited list of functions.  What about reworking
>> them
>> providing a way of calling them without risk of exception?  For example, we
>> can
>> have numeric_add_internal() function which fill given data structure with
>> error information instead of throwing the error.  numeric_add() would be a
>> wrapper over numeric_add_internal(), which throws an error if corresponding
>> data structure is filled.  In jsonpath we can call numeric_add_internal()
>> and
>> interpret errors in another way.  That seems to be better than use of PG_TRY
>> and PG_CATCH.
>>
>
>
> I haven't seen a response to this email. Do we need one before
> proceeding any further with jsonpath?
>


Apologies. I see the reply now.

cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tom Lane

Date:

20 March 2018, 08:36:18

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> That seems like a quite limited list of functions.  What about
>> reworking them providing a way of calling them without risk of
>> exception?

> I haven't seen a response to this email. Do we need one before
> proceeding any further with jsonpath?

I've not been following this thread in detail, but IMO any code anywhere
that thinks that no error can be thrown out of non-straight-line code is
broken beyond redemption.  What, for example, happens if we get ENOMEM
within one of the elog.c functions?

I did look through 0007-jsonpath-arithmetic-error-handling-v12.patch,
and I can't believe that's seriously proposed for commit.  It's making
some pretty fragile changes in error handling, and so far as I can
find there is not even one line of commentary as to what the new
design rules are supposed to be.  Even if it's completely bug-free
today (which I would bet against), how could we keep it so?

            regards, tom lane

Re: jsonpath

From

David Steele

Date:

10 April 2018, 16:32:02

On 3/27/18 11:13 AM, Nikita Glukhov wrote:
>
> Attached 14th version of the patches:

It appears this entry should be marked Needs Review, so I have done that
and moved it to the next CF.

Regards,
-- 
-David
david@pgmasters.net

Re: jsonpath

From

Thomas Munro

Date:

28 June 2018, 05:39:46

On Thu, Jun 28, 2018 at 11:38 AM, Nikita Glukhov
<n.gluhov@postgrespro.ru> wrote:
> Attached 15th version of the patches.

Hi Nikita,

I wonder why the Windows build scripts are not finding and processing
your new .y and .l files:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.3672

Perhaps Mkvcbuild.pm needs to be told about them?  For example, it has this:

    $postgres->AddFiles('src/backend/parser', 'scan.l', 'gram.y');

PS If you need a way to test on Windows but don't have a local Windows
build system, this may be useful:

https://wiki.postgresql.org/wiki/Continuous_Integration

-- 
Thomas Munro
http://www.enterprisedb.com

Re: jsonpath

From

Stas Kelvich

Date:

04 September 2018, 17:15:39

> On 23 Aug 2018, at 00:29, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>
> Attached 17th version of the patches rebased onto the current master.
>
> Nothing significant has changed.

Hey.

I’ve looked through presented implementation of jsonpath and have some remarks:

1. Current patch set adds functions named jsonpath_* that implementing subset of functionality
of corresponding json_* functions which doesn’t require changes to postgres SQL grammar.
If we are going to commit json_* functions it will be quite strange for end user: two sets of functions
with mostly identical behaviour, except that one of them allows customise error handling and pass variable via
special language constructions. I suggest only leave operators in this patch, but remove functions.

2. Handling of Unknown value differs from one that described in tech report: filter '(@.k == 1)’
doesn’t evaluates to Unknown. Consider following example:

select '42'::json @* '$ ? ( (@.k == 1) is unknown )';

As per my understanding of standard this should show 42. Seems that it was evaluated to False instead,
because boolean operators (! || &&) on filter expression with structural error behave like it is False.

3. .keyvalue() returns object with field named ‘key’ instead of ’name’ as per tech report. ‘key’ field seems to
be more consistent with function name, but i’m not sure it is worths of mismatch with standard. Also ‘id’ field
is omitted, making it harder to use something like GROUP BY afterwards.

4. Looks like jsonpath executor lacks some CHECK_FOR_INTERRUPTS() during hot paths. Backend with
following query is unresponsive to signals:

select count(*) from (
    select '[0,1,2,3,4,5,6,7,8,9]'::json @* longpath::jsonpath from (
        select '$[' || string_agg(subscripts, ',') ||']' as longpath from (
            select 'last,1' as subscripts from generate_series(1,1000000)
        ) subscripts_q
    ) jpath_q
) count_q;

5. Files generated by lex/bison should be listed in .gitignore files in corresponding directories.

6. My compiler complains about unused functions: JsonValueListConcat, JsonValueListClear.

7. Presented patch files structure is somewhat complicated with patches to patches. I've melded them
down to following patches:

0001: three first patches with preliminary datetime infrastructure
0002: Jsonpath engine and operators that is your previous 4+6+7
0003: Jsonpath extensions is your previous 8+9
0004: GIN support is your 5th path

Also this patches were formed with 'git format-patch', so one can apply all of them with 'git apply'
restoring each one as commit.

8. Also sending jsonpath_check.sql with queries which I used to check compliance with the standard.
That can be added to regression test, if you want.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: jsonpath

From

Nikita Glukhov

Date:

08 September 2018, 02:21:27

Attached 18th version of the patches rebased onto the current master.


On 04.09.2018 17:15, Stas Kelvich wrote:

On 23 Aug 2018, at 00:29, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:

Attached 17th version of the patches rebased onto the current master.

Nothing significant has changed.

Hey.

I’ve looked through presented implementation of jsonpath and have some remarks:

Thank you for your review.

1. Current patch set adds functions named jsonpath_* that implementing subset of functionality
of corresponding json_* functions which doesn’t require changes to postgres SQL grammar.
If we are going to commit json_* functions it will be quite strange for end user: two sets of functions
with mostly identical behaviour, except that one of them allows customise error handling and pass variable via
special language constructions. I suggest only leave operators in this patch, but remove functions.

We can't simply remove these functions because they are the implementation 
of operators.

2. Handling of Unknown value differs from one that described in tech report: filter '(@.k == 1)’
doesn’t evaluates to Unknown. Consider following example:

select '42'::json @* '$ ? ( (@.k == 1) is unknown )';

As per my understanding of standard this should show 42. Seems that it was evaluated to False instead,
because boolean operators (! || &&) on filter expression with structural error behave like it is False.

Yes, this example should return 42, but only in the strict mode.  In the lax 
mode, which is default, structural errors are ignored, so '@.k' evaluates to
an empty sequence, and '@.k == 1' then returns False.

3. .keyvalue() returns object with field named ‘key’ instead of ’name’ 
as per tech report. ‘key’ field seems to be more consistent with function 
name, but i’m not sure it is worths of mismatch with standard. Also ‘id’
field is omitted, making it harder to use something like GROUP BY afterwards.

SQL/JSON standard uses "key" as a field name (9.39, page 716):vi) For all h, 1 <= h <= q, let OBJh be an SQL/JSON object with three members:  1) The first member has key "key" and bound value Ki .  2) The second member has key "value”"and bound value BVi.  3) The third member has key "id" and bound value IDj.

But "id" field was really missing in our implementation.  I have added it
using byte offset of the object in jsonb as its identifier.

4. Looks like jsonpath executor lacks some CHECK_FOR_INTERRUPTS() during hot paths. Backend with
following query is unresponsive to signals:

select count(*) from (   select '[0,1,2,3,4,5,6,7,8,9]'::json @* longpath::jsonpath from (       select '$[' || string_agg(subscripts, ',') ||']' as longpath from (           select 'last,1' as subscripts from generate_series(1,1000000)       ) subscripts_q   ) jpath_q
) count_q;

Fixed: added CHECK_FOR_INTERRUPTS() to jsonpath parser and executor.

5. Files generated by lex/bison should be listed in .gitignore files in corresponding directories.

Fixed.

6. My compiler complains about unused functions: JsonValueListConcat, JsonValueListClear.

Fixed: these functions used only in the next SQL/JSON patches were removed.

7. Presented patch files structure is somewhat complicated with patches to patches. I've melded them
down to following patches:

0001: three first patches with preliminary datetime infrastructure
0002: Jsonpath engine and operators that is your previous 4+6+7
0003: Jsonpath extensions is your previous 8+9
0004: GIN support is your 5th path

Also this patches were formed with 'git format-patch', so one can apply all of them with 'git apply'
restoring each one as commit.

New patch set is formed with 'git format-patch', but I can still supply patches
in the previous split form.

8. Also sending jsonpath_check.sql with queries which I used to check compliance with the standard.
That can be added to regression test, if you want.

I think these additional tests can be added, but I have not yet done it in this
version of the patches.


-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Oleg Bartunov

Date:

31 October 2018, 07:29:08

On Mon, Oct 29, 2018 at 2:20 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> Hi,
>
> On 10/02/2018 04:33 AM, Michael Paquier wrote:
> > On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote:
> >> Attached 18th version of the patches rebased onto the current master.
> >
> > Nikita, this version fails to apply, as 0004 has conflicts with some
> > regression tests.  Could you rebase?  I am moving the patch to CF
> > 2018-11, waiting for your input.
> > --
> > Michael
> >
>
> As Michael mentioned, the patch does not apply anymore, so it would be
> good to provide a rebased version for CF 2018-11. I've managed to do
> that, as the issues are due to minor bitrot, so that I can do some quick
> review of the current version.
>
> I haven't done much testing, pretty much just compiling, running the
> usual regression tests and reading through the patches. I plan to do
> more testing once the rebased patch is submitted.
>
> Now, a couple of comments based on eye-balling the diffs.
>
>
> 1) There are no docs, which makes it pretty much non-committable for
> now. I know there is [1] and it was a good intro for the review, but we
> need something as a part of our sgml docs.

SQL/JSON documentation discussed in separate topic
https://www.postgresql.org/message-id/flat/732208d3-56c3-25a4-8f08-3be1d54ad51b%40postgrespro.ru


>
>
> 2) 0001 says this:
>
>     *typmod = -1; /* TODO implement FF1, ..., FF9 */
>
> but I have no idea what FF1 or FF9 are. I guess it needs to be
> implemented, or explained better.
>
>
> 3) The makefile rule for scan.o does this:
>
> +# Latest flex causes warnings in this file.
> +ifeq ($(GCC),yes)
> +scan.o: CFLAGS += -Wno-error
> +endif
>
> That seems a bit ugly, and we should probably try to make it work with
> the latest flex, instead of hiding the warnings. I don't think we have
> any such ad-hoc rules in other Makefiles. If we really need it, can't we
> make it part of configure, and/or restrict it depending on flex version?
>
>
> 4) There probably should be .gitignore rule for jsonpath_gram.h, just
> like for other generated header files.
>
>
> 5) jbvType says jbvDatetime is "virtual type" but does not explain what
> it is. IMHO that deserves a comment or something.
>
>
> 6) I see the JsonPath definition says this about header:
>
>     /* just version, other bits are reservedfor future use */
>
> but the very next thing it does is defining two pieces stored in the
> header - version AND "lax" mode flag. Which makes the comment invalid
> (also, note the missing space after "reserved").
>
> FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
> codebase works with "strict" flags passed around, and it's easy to
> forget to negate the flag somewhere (at least that's my experience).
>
>
> 7) I see src/include/utils/jsonpath_json.h adds support for plain json
> by undefining various jsonb macros and redirecting them to the json
> variants. I find that rather suspicious - imagine you're investigating
> something in code using those jsonb macros, and wondering why it ends up
> calling the json stuff. I'd be pretty confused ...
>
> Also, some of the redefinitions are not really needed for example
> JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere
> (and neither are the json variants).
>
>
> 8) I see 0002 defines IsAJsonbScalar like this:
>
>   #define IsAJsonbScalar(jsonbval)  (((jsonbval)->type >= jbvNull && \
>                                      (jsonbval)->type <= jbvBool) || \
>                                      (jsonbval)->type == jbvDatetime)
>
> while 0004 does this
>
>   #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool)
>
> I know those are for different data types (jsonb vs. jsonpath), but I
> suppose jspIsScalar should include the datetime too.
>
> FWIW JsonPathItemType would deserve better documentation what the
> various items mean (a comment for each line would be enough). I've been
> wondering if "jpiDouble" means scalar double value until I realized
> there's a ".double()" function for paths.
>
>
> 9) It's generally a good idea to make the individual pieces committable
> separately, but that means e.g. the regression tests have to pass after
> each patch. At the moment that does not seem to be the case for 0002,
> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
> not sure if that's related.
>
> regards
>
>
> [1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

06 November 2018, 14:31:46

On 29.10.2018 2:20, Tomas Vondra wrote:

On 10/02/2018 04:33 AM, Michael Paquier wrote:

On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote:

Attached 18th version of the patches rebased onto the current master.

Nikita, this version fails to apply, as 0004 has conflicts with some
regression tests.  Could you rebase?  I am moving the patch to CF
2018-11, waiting for your input.
--
Michael

As Michael mentioned, the patch does not apply anymore, so it would be
good to provide a rebased version for CF 2018-11. I've managed to do
that, as the issues are due to minor bitrot, so that I can do some quick
review of the current version.

Sorry that I failed to provide rebased version earlier. 

Attached 19th version of the patches rebased onto the current master.

I haven't done much testing, pretty much just compiling, running the
usual regression tests and reading through the patches. I plan to do
more testing once the rebased patch is submitted.

Now, a couple of comments based on eye-balling the diffs.

Thank you for your review.

1) There are no docs, which makes it pretty much non-committable for
now. I know there is [1] and it was a good intro for the review, but we
need something as a part of our sgml docs.

I think that jsonpath part of documentation can be extracted from [2] and
added to the patch set.

2) 0001 says this:
   *typmod = -1; /* TODO implement FF1, ..., FF9 */

but I have no idea what FF1 or FF9 are. I guess it needs to be
implemented, or explained better.

FF1-FF9 are standard datetime template fields used for specifying of fractional
seconds.  FF3/FF6 are analogues of our MS/US.  I decided simply to implement 
this feature (see patch 0001, I also can supply it in the separate patch).

But FF7-FF9 are questionable since the maximal supported precision is only 6.
They are optional by the standard:
 95) Specifications for Feature F555, “Enhanced seconds precision”:   d) Subclause 9.44, “Datetime templates”:     i) Without Feature F555, “Enhanced seconds precision”,        a <datetime template fraction> shall not be FF7, FF8 or FF9.

So I decided to allow FF7-FF9 only on the output in to_char().

3) The makefile rule for scan.o does this:

+# Latest flex causes warnings in this file.
+ifeq ($(GCC),yes)
+scan.o: CFLAGS += -Wno-error
+endif

That seems a bit ugly, and we should probably try to make it work with
the latest flex, instead of hiding the warnings. I don't think we have
any such ad-hoc rules in other Makefiles. If we really need it, can't we
make it part of configure, and/or restrict it depending on flex version?

These lines seem to belong to the earliest versions of our jsonpath
implementation. There is no scan.o file now at all, there is only 
jsonpath_scan.o, so I simply removed these lines.

4) There probably should be .gitignore rule for jsonpath_gram.h, just
like for other generated header files.

I see 3 rules in /src/backend/utils/adt/.gitignore:

/jsonpath_gram.h
/jsonpath_gram.c
/jsonpath_scan.c

5) jbvType says jbvDatetime is "virtual type" but does not explain what
it is. IMHO that deserves a comment or something.

"Virtual type" means here that it is used only for in-memory processing and 
converted into JSON string when outputted to jsonb.  Corresponding comment 
was added.

6) I see the JsonPath definition says this about header:
   /* just version, other bits are reservedfor future use */

but the very next thing it does is defining two pieces stored in the
header - version AND "lax" mode flag. Which makes the comment invalid
(also, note the missing space after "reserved").

Fixed.

FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
codebase works with "strict" flags passed around, and it's easy to
forget to negate the flag somewhere (at least that's my experience).

Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is 
saved  in "laxMode" field.  New "strict" flag passed to datetime functions 
is unrelated to jsonpath.

7) I see src/include/utils/jsonpath_json.h adds support for plain json
by undefining various jsonb macros and redirecting them to the json
variants. I find that rather suspicious - imagine you're investigating
something in code using those jsonb macros, and wondering why it ends up
calling the json stuff. I'd be pretty confused ...

I agree, this is rather simple but doubtful solution.  That's why json support
was in a separate patch until the 18th version of the patches.

But if we do not want to compile jsonpath.c twice with different definitions,
then we need some kind of run-time wrapping over json strings and jsonb
containers, which seems a bit harder to implement.

To simplify debugging I can also suggest to explicitly preprocess jsonpath.c
into jsonpath_json.c using redefinitions from jsonpath_json.h before its
compilation.

Also, some of the redefinitions are not really needed for example
JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere
(and neither are the json variants).

These definitions will be used in the next patches, so I removed them 
for now from this patch.

8) I see 0002 defines IsAJsonbScalar like this:
 #define IsAJsonbScalar(jsonbval)  (((jsonbval)->type >= jbvNull && \                                    (jsonbval)->type <= jbvBool) || \                                    (jsonbval)->type == jbvDatetime)

while 0004 does this
 #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool)

I know those are for different data types (jsonb vs. jsonpath), but I
suppose jspIsScalar should include the datetime too.

jpiDatetime does not mean the same thing as jbvDatetime.
jpiDatetime is a representation of ".datetime()" item method.
jbvDatetime is a representation of datetime SQL/JSON items.

Also datetime SQL/JSON items can be represented in JSON path only by strings
with ".datetime()" method applied, they do not have their own literal:
'"2018-01-10".datetime()'

FWIW JsonPathItemType would deserve better documentation what the
various items mean (a comment for each line would be enough). I've been
wondering if "jpiDouble" means scalar double value until I realized
there's a ".double()" function for paths.

I added per-item comments.

9) It's generally a good idea to make the individual pieces committable
separately, but that means e.g. the regression tests have to pass after
each patch. At the moment that does not seem to be the case for 0002,
see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
not sure if that's related.

This should definitely be a bug in json support, but I can't reproduce 
it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY.  Could you provide 
a stack trace at least?


[1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
[2] https://github.com/postgrespro/sqljson/tree/sqljson_doc

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Tomas Vondra

Date:

06 November 2018, 15:48:36

On 11/6/18 3:31 PM, Nikita Glukhov wrote:
> On 29.10.2018 2:20, Tomas Vondra wrote:>
 >
 > ...>
> Thank you for your review.
> 
>> 1) There are no docs, which makes it pretty much non-committable for
>> now. I know there is [1] and it was a good intro for the review, but we
>> need something as a part of our sgml docs.
> 
> I think that jsonpath part of documentation can be extracted from [2] and
> added to the patch set.
> 

Yes, please let's do that. The patch could not get into RFC without 
docs, so let's deal with it now.

>> 2) 0001 says this:
>>
>>      *typmod = -1; /* TODO implement FF1, ..., FF9 */
>>
>> but I have no idea what FF1 or FF9 are. I guess it needs to be
>> implemented, or explained better.
> 
> FF1-FF9 are standard datetime template fields used for specifying of fractional
> seconds.  FF3/FF6 are analogues of our MS/US.  I decided simply to implement
> this feature (see patch 0001, I also can supply it in the separate patch).
> 

Understood. Thanks for the explanation.

> But FF7-FF9 are questionable since the maximal supported precision is only 6.
> They are optional by the standard:
> 
>    95) Specifications for Feature F555, “Enhanced seconds precision”:
>      d) Subclause 9.44, “Datetime templates”:
>        i) Without Feature F555, “Enhanced seconds precision”,
>           a <datetime template fraction> shall not be FF7, FF8 or FF9.
> 
> So I decided to allow FF7-FF9 only on the output in to_char().
> 

Hmmmm, isn't that against the standard then? I mean, if our precision is 
only 6, that probably means we don't have the F555 feature, which means 
FF7-9 should not be available.

It also seems like a bit surprising behavior that we only allow those 
fields in to_char() and not in other places.

>> 4) There probably should be .gitignore rule for jsonpath_gram.h, just
>> like for other generated header files.
> 
> I see 3 rules in /src/backend/utils/adt/.gitignore:
> 
> /jsonpath_gram.h
> /jsonpath_gram.c
> /jsonpath_scan.c
> 

But there's a symlink in src/include/utils/jsonpath_gram.h and that's 
not in .gitignore.

>> FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
>> codebase works with "strict" flags passed around, and it's easy to
>> forget to negate the flag somewhere (at least that's my experience).
> 
> Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is
> saved  in "laxMode" field.  New "strict" flag passed to datetime functions
> is unrelated to jsonpath.
> 

OK.

>> 7) I see src/include/utils/jsonpath_json.h adds support for plain json
>> by undefining various jsonb macros and redirecting them to the json
>> variants. I find that rather suspicious - imagine you're investigating
>> something in code using those jsonb macros, and wondering why it ends up
>> calling the json stuff. I'd be pretty confused ...
> 
> I agree, this is rather simple but doubtful solution.  That's why json support
> was in a separate patch until the 18th version of the patches.
> 
> But if we do not want to compile jsonpath.c twice with different definitions,
> then we need some kind of run-time wrapping over json strings and jsonb
> containers, which seems a bit harder to implement.
> 
> To simplify debugging I can also suggest to explicitly preprocess jsonpath.c
> into jsonpath_json.c using redefinitions from jsonpath_json.h before its
> compilation.
> 

Not sure what's the right solution. But I agree the current approach 
probably is not it.

>> 
>> 9) It's generally a good idea to make the individual pieces committable
>> separately, but that means e.g. the regression tests have to pass after
>> each patch. At the moment that does not seem to be the case for 0002,
>> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
>> not sure if that's related.
> 
> This should definitely be a bug in json support, but I can't reproduce
> it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY.  Could you provide
> a stack trace at least?
> 
I'll try.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tomas Vondra

Date:

08 November 2018, 01:52:09

On 11/6/18 4:48 PM, Tomas Vondra wrote:
> On 11/6/18 3:31 PM, Nikita Glukhov wrote:
>> On 29.10.2018 2:20, Tomas Vondra wrote:>
>>
>> ...
>>>
>>> 9) It's generally a good idea to make the individual pieces committable
>>> separately, but that means e.g. the regression tests have to pass after
>>> each patch. At the moment that does not seem to be the case for 0002,
>>> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
>>> not sure if that's related.
>>
>> This should definitely be a bug in json support, but I can't reproduce
>> it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY.  Could you provide
>> a stack trace at least?
>>
> I'll try.
> 

Not sure why you can't reproduce the failures, it's perfectly
reproducible for me. For the record, I'm doing this:

./configure --prefix=/home/user/pg-jsonpath --enable-debug
--enable-cassert CFLAGS="-O0 -DRANDOMIZE_ALLOCATED_MEMORY" && make -s
clean && make -s -j4 && make check

After sticking Assert(false) to JsonEncodeJsonbValue (to the default
case), I get a failure like this:

  select json '{}' @* 'lax $[0]';
! WARNING:  unknown jsonb value type: 20938064
! server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.
! connection to server was lost

The backtrace is attached. My guess is JsonValueListGetList in
jsonb_jsonpath_query only does shallow copy instead of copying the
pieces into funcctx->multi_call_memory_ctx, so it gets corrupted on
subsequent calls.

I also attach valgrind report, but I suppose the reported issues are a
consequence of the same bug.

regard

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attached 20th version the jsonpath patches.


On 08.11.2018 4:52, Tomas Vondra wrote:

On 11/6/18 4:48 PM, Tomas Vondra wrote:

On 11/6/18 3:31 PM, Nikita Glukhov wrote:

On 29.10.2018 2:20, Tomas Vondra wrote:>

9) It's generally a good idea to make the individual pieces committable
separately, but that means e.g. the regression tests have to pass after
each patch. At the moment that does not seem to be the case for 0002,
see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
not sure if that's related.

This should definitely be a bug in json support, but I can't reproduce
it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY.  Could you provide
a stack trace at least?

I'll try.

Not sure why you can't reproduce the failures, it's perfectly
reproducible for me. For the record, I'm doing this:

./configure --prefix=/home/user/pg-jsonpath --enable-debug
--enable-cassert CFLAGS="-O0 -DRANDOMIZE_ALLOCATED_MEMORY" && make -s
clean && make -s -j4 && make check

After sticking Assert(false) to JsonEncodeJsonbValue (to the default
case), I get a failure like this:
 select json '{}' @* 'lax $[0]';
! WARNING:  unknown jsonb value type: 20938064
! server closed the connection unexpectedly
! 	This probably means the server terminated abnormally
! 	before or while processing the request.
! connection to server was lost

The backtrace is attached. My guess is JsonValueListGetList in
jsonb_jsonpath_query only does shallow copy instead of copying the
pieces into funcctx->multi_call_memory_ctx, so it gets corrupted on
subsequent calls.

I also attach valgrind report, but I suppose the reported issues are a
consequence of the same bug.

Than you for your help, I finally have managed to reproduce this bug.

The problem was really related to copying values but it was not located in the
jsonb_jsonpath_query() where the whole value list should already be allocated in 
funcctx->multi_call_memory_ctx (executeJsonPath() is called in that context and
also input json/jsonb is copied into it).  Stack-allocated root JsonbValue was
not copied to the heap due to wrong copy condition in recursiveExecuteNoUnwrap()
(jpiIndexArray case):

@@ -1754,1 +1754,1 @@ recursiveExecuteNoUnwrap(JsonPathExecContext *cxt, ...
-   res = recursiveExecuteNext(cxt, jsp, &elem, v, found, !binary);
+   res = recursiveExecuteNext(cxt, jsp, &elem, v, found, singleton || !binary);

I have refactored a bit this place extracting explicit 'copy' flag.

Also I found a similar problem in makePassingVars() where Numerics were not
copied into 'multi_call_memory_ctx' from the input json object. I decided to use
datumCopy() inside makePassingVars() instead of copying the whole input json in
jsonb_jsonpath_query() using PG_GETARG_JSONB_P_COPY().  But I think that
processing of the input json with variables should ideally be bone lazily,
just don't know if this refactoring is worth it because 3rd argument of
json[b]_jsonpath_xxx() functions is used only for testing of variables passing,
SQL/JSON functions use the different mechanism.

On 06.11.2018 18:48, Tomas Vondra wrote:

But FF7-FF9 are questionable since the maximal supported precision is only 6.
They are optional by the standard:

   95) Specifications for Feature F555, “Enhanced seconds precision”:
     d) Subclause 9.44, “Datetime templates”:
       i) Without Feature F555, “Enhanced seconds precision”,
          a <datetime template fraction> shall not be FF7, FF8 or FF9.

So I decided to allow FF7-FF9 only on the output in to_char().

Hmmmm, isn't that against the standard then? I mean, if our precision is 
only 6, that probably means we don't have the F555 feature, which means 
FF7-9 should not be available.

It also seems like a bit surprising behavior that we only allow those 
fields in to_char() and not in other places.

Ok, now error is thrown if FF7-FF9 are found in the format string.

On 06.11.2018 18:48, Tomas Vondra wrote:

4) There probably should be .gitignore rule for jsonpath_gram.h, just
like for other generated header files.

I see 3 rules in /src/backend/utils/adt/.gitignore:

/jsonpath_gram.h
/jsonpath_gram.c
/jsonpath_scan.c

But there's a symlink in src/include/utils/jsonpath_gram.h and that's 
not in .gitignore.

Added jsonpath_gram.h to /src/include/utils/.gitignore.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attached 21-st version of the patches rebased onto the current master.

Added new patch 0004 with documentation written by Liudmila Mantrova, and
optional patch 0003 that was separated from 0002.

On 24.11.2018 23:03, Tomas Vondra wrote:

Hi,

I've done another round of reviews on v20, assuming the patch is almost
ready to commit, but unfortunately I ran into a bunch of issues that
need to be resolved. None of this is a huge issue, but it's also above
the threshold of what could be tweaked by a committer IMHO.

Thank your for your review.

(Which brings the question who plans to commit this. The patch does not
have a committer in the CF app, but I see both Teodor and Alexander are
listed as it's authors, so I'd expect it to be one of those. Or I might
do that, of course.)

0001
----

1) to_timestamp() does this:
 do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false,                 &tm, &fsec, &fprec, NULL);

Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the
function did before (well, it called text_to_cstring, but that does
VARDATA_ANY). The same thing applies to to_date(), BTW.

I also find it a bit inconvenient that we expand the fmt like this in
all do_to_timestamp() calls, although it's just to_datetime() that needs
to do it this way. I realize we can't change to_datetime() because it's
external API, but maybe we should make it construct a varlena and pass
it to do_to_timestamp().

Of course, there must be VARDATA_ANY() instead of of VARDATA().  But I decided
to construct varlena and pass it to do_to_timestamp() and to to_datetime().

2) We define both DCH_FF# and DCH_ff#, but we never ever use the
lower-case version. Heck, it's not mentioned even in DCH_keywords, which
does this:
 ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */ ...

Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and
"day".

Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI.

"Day", "day" are not mapped to DCH_DAY because they determine letter case in the
output, but "ff1" and "FF#" output contains only digits.

Also, the comment for "ff1" is wrong, it should be "f" I guess.

Fixed.

And of course, there are regression tests for FF# variants, but not the
lower-case ones.

Tests were added

3) Of course, these new formatting patterns should be added to SGML
docs, for example to "Template Patterns for Date/Time Formatting" in
func.sgml (and maybe to other places, I haven't looked for them very
thoroughly).

Fixed.

4) The last block in DCH_from_char() does this
   while (*s == ' ')       s++;

but I suppose it should use isspace() just like the code immediately
before it.

Fixed.

5) I might be missing something, but why is there the "D" at the end of
the return flags from DCH_from_char?
   /* Return flags for DCH_from_char() */   #define DCH_DATED    0x01   #define DCH_TIMED    0x02   #define DCH_ZONED    0x04

These terms "dated", "timed", and "zoned" are taken from the standard:

9.39.10) If <JSON datetime template> JDT is specified, ... : a) If JDT contains <datetime template year>, ... then JDT is dated. b) If JDT contains <datetime template 12-hour>, ... then JDT is timed. c) If JDT contains <datetime template time zone hour>, ... then JDT is zoned.

0002
----

1) There are some unnecessary changes to to_datetime signature (date_txt
renamed to vs. datetxt), which is mostly minor but unnecessary churn.

2) There are some extra changes to to_datetime (extra parameters, etc.).
I wonder why those are not included in 0001, as part of the supporting
datetime infrastructure.

Extra changes to to_datetime() were moved into patch 0001.

3) I'm not sure whether the _safe functions are a good or bad idea, but
at the very least the comments should be updated to explain what it does
(as the API has changed, obviously).

 
This functions were introduced for elimination of PG_TRY/PG_CATCH in jsonpath
code.  But this error handling approach has a lot of issues (see below), so I
decided separate again them into optional patch 0004.

4) the json.c changes are under-documented, e.g. there are no comments
for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and
whether it needs to be specified all the time, half of the functions at
the end don't have comments (some of them are really simple, but then
other simple functions do have comments).

I don't know what the right balance here is (I'm certainly not asking
for bogus comments just to have comments) and I agree that the existing
code is not exactly abundant in comments. But perhaps having at least
some would be nice.

Some new comments were added to json.c.

The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There
are pretty much no comments whatsoever (at least at the function level,
explaining what the functions do). It would be good to have a file-level
comment too, explaining how jsonpath works, probably.

I hope to add detailed comments for jsonpath in the next version of the patches.

5) I see uniqueifyJsonbObject now does this:
if (!object->val.object.uniquified)	return;

That seems somewhat strange, considering the function says it'll
uniqueify objects, but then exits when noticing the passed object is not
uniquified?

'uniquified' field was rename to 'uniquify'.

6) Why do we make make_result inline? (numeric.c) If this needs to be
marked with "inline" then perhaps all the _internal functions should be
marked like that too? I have my doubts about the need for this.


7) The other places add _safe to functions that don't throw errors
directly and instead update edata. Why are set_var_from_str, div_var,
mod_var excluded from this convention?

I added "_safe" postfix only to functions having the both variants: safe (error
info is placed into *edata) and unsafe (errors are always thrown).

One-line make_result() is called from many unsafe places, so I decided to leave
it as it was.  It can also be defined as a simple macro.

New "_internal" functions are called from jsonpath_exec.c, so they can't be
"static inline", but "extern inline" seems to be non-portable.

8) I wonder if the changes in numeric can have negative impact on
performance. Has anyone done any performance tests of this part?

I've tried to measure this impact by evaluation of expression with dozens of '+'
or even by adding a loop into numeric_add(), but failed to get any measurable
difference.

0003
----

1) jsonb_gin.c should probably add comments briefly explaining what
JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and
JsonPathExtractionContext are for.

A lot of comments were added. Also major refactoring of jsonb_gin.c was done.

2) I find it a bit suspicious that there are no asserts in json_gin.c
(well, there are 3 in the existing code, but nothing in the new code,
and the patch essentially doubles the amount of code here).

I have managed to find only 4 similar places suitable for asserts.

No comments for 0004 at this point.

regards

11 January 2019, 22:21:13

Attached 23rd version of the patches.


On 05.01.2019 3:11, Alexander Korotkov wrote:
> On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the
>> lower-case version. Heck, it's not mentioned even in DCH_keywords, which
>> does this:
>>
>>    ...
>>    {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
>>    ...
>>    {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
>>    ...
>>
>> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and
>> "day".
>>
>> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI.
>>
>> "Day", "day" are not mapped to DCH_DAY because they determine letter case in the
>> output, but "ff1" and "FF#" output contains only digits.
> Right, DCH_poz is also offset in DCH_keywords array.  So, if array has
> an entry for "ff1" then enum should have a DCH_ff1 member in the same
> position.
>
> I got some other questions regarding this patchset.
>
> 1) Why do we parse FF7-FF9 if we're not supporting them anyway?
> Without defining FF7-FF9 we can also don't throw errors for them
> everywhere.  That would save us some code lines.
FF7-FF9 were removed.
> 2) + DCH_to_char_fsec("%01d", in->fsec / INT64CONST(100000));
> Why do we use INT64CONST() here and in the similar places assuming
> that fsec is only uint32?
Fixed.

> 3) wrapItem() is unused in
> 0002-Jsonpath-engine-and-operators-v21.patch, but used in
> 0006-Jsonpath-syntax-extensions-v21.patch.  Please, move it to
> 0006-Jsonpath-syntax-extensions-v21.patch?
wraptItem() was moved into patch #6.

> 4) I also got these couple of warning during compilation.
>
> jsonpath_exec.c:1485:1: warning: unused function
> 'recursiveExecuteNested' [-Wunused-function]
> recursiveExecuteNested(JsonPathExecContext *cxt, JsonPathItem *jsp,
> ^
> 1 warning generated.
> jsonpath_scan.l:444:6: warning: implicit declaration of function
> 'jsonpath_yyparse' is invalid in C99 [-Wimplicit-function-declaration]
>          if (jsonpath_yyparse((void*)&parseresult) != 0)
>              ^
> 1 warning generated.
>
> Perhaps recursiveExecuteNested() is unsed in this patchset.  It's
> probably used by some subsequent SQL/JSON-related patchset.  So,
> please, move it there.

recursiveExecuteNested() was removed from this patch set.

Prototype for jsonpath_yyparse() should be in the Bison-generated file
src/include/adt/jsonpath_gram.h.

> 5) I think each usage of PG_TRY()/PG_CATCH() deserves comment
> describing why it's safe to use without subtransaction.  In fact we
> should just state that no function called inside performs data
> modification.
Comments were added.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On Sun, Jan 20, 2019 at 2:45 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> 3) How do we calculate the "id" property returned by keyvalue()
> function?  It's not documented.  Even presence of "id" columns isn't
> documented.  Standard stands that it's implementation-depended
> indetifier of object holding key-value pair.  The way of its
> calculation is also not clear from the code.  Why do we need constant
> of 10000000000?
>
>                 id = jb->type != jbvBinary ? 0 :
>                     (int64)((char *) jb->val.binary.data -
>                             (char *) cxt->baseObject.jbc);
>                 id += (int64) cxt->baseObject.id * INT64CONST(10000000000);

I've revising patchset bringing it to commitable shape.  The
intermediate result is attached.

0001-Preliminary-datetime-infrastructure-v24.patch

* Commit message & minor cleanup

0002-Jsonpath-engine-and-docs-v24.patch

 * Support of json is removed.  Current implementation is tricky and
cumbersome.  We need to design a suitable abstraction layer for that.
It should be done by separate patch applying not only to jsonpath, but
also to other jsonb functions lacking of json analogues.
* jsonpath_exists(jsonb, jsonpath[, jsonb]), jsonpath_predicate(jsonb,
jsonpath[, jsonb]), jsonpath_query(jsonb, jsonpath[, jsonb]),
jsonpath_wrapped(jsonb, jsonpath[, jsonb]) are documented.
* @* and @# operators are removed, @~ operator is renamed to @@.
* Commit message & minor cleanup.

I'll continue revising this patchset.  Nikita, could you please write
tests for 3-argument versions of functions?  Also, please answer the
question regarding "id" property.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

20 January 2019, 22:40:28

On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I'll continue revising this patchset.  Nikita, could you please write
> tests for 3-argument versions of functions?  Also, please answer the
> question regarding "id" property.

I've some more notes regarding function set provided in jsonpath patch:
1) Function names don't follow the convention we have.  All our
functions dealing with jsonb have "jsonb_" prefix.  Exclusions have
"jsonb" in other part of name, for example, "to_jsonb(anyelement)".
We could name functions at SQL level in the same way they are named in
C.  So, they would be jsonb_jsonpath_exists() etc.  But it's probably
too long.  What about jsonb_path_exists() etc?
2) jsonpath_query_wrapped() name seems counter intuitive for me.  What
about jsonpath_query_array()?
3) jsonpath_query_wrapped() behavior looks tricky to me.  It returns
NULL for no results.  When result item is one, it is returned "as is".
When there are two or more result items, they are wrapped into array.
This representation of result seems extremely inconvenient for me.  It
becomes hard to solve even trivial tasks with that: for instance,
iterate all items found.  Without extra assumptions on what is going
to be returned it's also impossible for caller to distinguish single
array item found from multiple items found wrapped into array.  And
that seems very bad.  I know this behavior is inspired by SQL/JSON
standard.  But since these functions are anyway our extension, we can
implement them as we like.  So, I propose to make this function always
return array of items regardless on count of those items (even for 1
or 0 items).
4) If we change behavior of jsonpath_query_wrapped() as above, we can
also implement jsonpath_query_one() function, which would return first
result item or NULL for no items.
Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

21 January 2019, 13:56:46

On 20.01.2019 2:45, Alexander Korotkov wrote:

3) How do we calculate the "id" property returned by keyvalue()
function?  It's not documented.  Even presence of "id" columns isn't
documented.  Standard stands that it's implementation-depended
indetifier of object holding key-value pair.  The way of its
calculation is also not clear from the code.  Why do we need constant
of 10000000000?
               id = jb->type != jbvBinary ? 0 :                   (int64)((char *) jb->val.binary.data -                           (char *) cxt->baseObject.jbc);               id += (int64) cxt->baseObject.id * INT64CONST(10000000000);

I decided to construct object id from the two parts: base object id and its
binary offset in its base object's jsonb:

object_id = 10000000000 * base_object_id + object_offset_in_base_object


10000000000 (10^10) -- is a first round decimal number greater than 2^32
(maximal offset in jsonb).  Decimal multiplier is used here to improve the
readability of identifiers.


Base object is usually a root object of the path: context item '$' or path
variable '$var', literals can't produce objects for now.  But if the path
contains generated objects (.keyvalue() itself, for example), then they become
base object for the subsequent .keyvalue().  See example:

'$.a.b.keyvalue().value.keyvalue()' :- base for the first .keyvalue() is '$'- base for the second .keyvalue() is '$.a.b.keyvalue()'


Id of '$' is 0.
Id of '$var' is its ordinal (positive) number in the list of variables.
Ids for generated objects are assigned using global counter
'JsonPathExecContext.generatedObjectId' starting from 'number_of_vars + 1'.


Corresponding comments will be added in the upcoming version of the patches.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Oleg Bartunov

Date:

21 January 2019, 15:04:48

On Mon, Jan 21, 2019 at 1:40 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > I'll continue revising this patchset.  Nikita, could you please write
> > tests for 3-argument versions of functions?  Also, please answer the
> > question regarding "id" property.
>
> I've some more notes regarding function set provided in jsonpath patch:
> 1) Function names don't follow the convention we have.  All our
> functions dealing with jsonb have "jsonb_" prefix.  Exclusions have
> "jsonb" in other part of name, for example, "to_jsonb(anyelement)".
> We could name functions at SQL level in the same way they are named in
> C.  So, they would be jsonb_jsonpath_exists() etc.  But it's probably
> too long.  What about jsonb_path_exists() etc?

jsonpath_exists is similar to xpath_exists.
Actually, we could use jsonb_exists(jsonb, jsonpath, json), but then
we should specify the type of the second argument.

> 2) jsonpath_query_wrapped() name seems counter intuitive for me.  What
> about jsonpath_query_array()?

The question is should we try to provide a functional interface for
all options of
JSON_QUERY clause ?  The same is for  other SQL/JSON clauses.
Currently,  patch contains very limited subset of JSON_QUERY
functionality,  mostly for jsonpath testing.

> 3) jsonpath_query_wrapped() behavior looks tricky to me.  It returns
> NULL for no results.  When result item is one, it is returned "as is".
> When there are two or more result items, they are wrapped into array.
> This representation of result seems extremely inconvenient for me.  It
> becomes hard to solve even trivial tasks with that: for instance,
> iterate all items found.  Without extra assumptions on what is going
> to be returned it's also impossible for caller to distinguish single
> array item found from multiple items found wrapped into array.  And
> that seems very bad.  I know this behavior is inspired by SQL/JSON
> standard.  But since these functions are anyway our extension, we can
> implement them as we like.  So, I propose to make this function always
> return array of items regardless on count of those items (even for 1
> or 0 items).

Fair enough, but if we agree, that we provide an exact functionality of
SQL clauses, then better to follow the standard to avoid problems.


> 4) If we change behavior of jsonpath_query_wrapped() as above, we can
> also implement jsonpath_query_one() function, which would return first
> result item or NULL for no items.
> Any thoughts?

I think, that we should postpone this functional interface, which could be
added later.

>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

21 January 2019, 17:39:26

On Mon, Jan 21, 2019 at 6:05 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:
> On Mon, Jan 21, 2019 at 1:40 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> >
> > On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > I'll continue revising this patchset.  Nikita, could you please write
> > > tests for 3-argument versions of functions?  Also, please answer the
> > > question regarding "id" property.
> >
> > I've some more notes regarding function set provided in jsonpath patch:
> > 1) Function names don't follow the convention we have.  All our
> > functions dealing with jsonb have "jsonb_" prefix.  Exclusions have
> > "jsonb" in other part of name, for example, "to_jsonb(anyelement)".
> > We could name functions at SQL level in the same way they are named in
> > C.  So, they would be jsonb_jsonpath_exists() etc.  But it's probably
> > too long.  What about jsonb_path_exists() etc?
>
> jsonpath_exists is similar to xpath_exists.

That's true.  The question is whether it's more important to follow
json[b] naming convention or xml/xpath naming convention?  I guess
json[b] naming convention is more important in our case.

> Actually, we could use jsonb_exists(jsonb, jsonpath, json), but then
> we should specify the type of the second argument.

Yes, but AFAICS the key point of json[b]_ prefixes is to evade
function overloading.  So, I'm -1 for use jsonb_ prefix and have
function overload because of that.

> > 2) jsonpath_query_wrapped() name seems counter intuitive for me.  What
> > about jsonpath_query_array()?
>
> The question is should we try to provide a functional interface for
> all options of
> JSON_QUERY clause ?  The same is for  other SQL/JSON clauses.
> Currently,  patch contains very limited subset of JSON_QUERY
> functionality,  mostly for jsonpath testing.

Actually, my point is following.  We have jsonpath patch close to the
committable shape.  And we have patch for SQL/JSON clauses including
JSON_QUERY, which is huge, complex and didn't receive any serious
review yet.  So, we'll did our best on that patch during this release
cycle, but I can't guarantee it will get in PostgreSQL 12.  Thus, my
idea is to make jsonpath patch self contained by providing brief and
convenient procedural interface.  This procedural interface is anyway
not a part of standard.  It *might* be inspired by standard clauses,
but might be not.  I think we should try to make this procedural
interface as good and convenient by itself.  It's our extension, and
it wouldn't make us more or less standard conforming.

> > 3) jsonpath_query_wrapped() behavior looks tricky to me.  It returns
> > NULL for no results.  When result item is one, it is returned "as is".
> > When there are two or more result items, they are wrapped into array.
> > This representation of result seems extremely inconvenient for me.  It
> > becomes hard to solve even trivial tasks with that: for instance,
> > iterate all items found.  Without extra assumptions on what is going
> > to be returned it's also impossible for caller to distinguish single
> > array item found from multiple items found wrapped into array.  And
> > that seems very bad.  I know this behavior is inspired by SQL/JSON
> > standard.  But since these functions are anyway our extension, we can
> > implement them as we like.  So, I propose to make this function always
> > return array of items regardless on count of those items (even for 1
> > or 0 items).
>
> Fair enough, but if we agree, that we provide an exact functionality of
> SQL clauses, then better to follow the standard to avoid problems.

No, I see this as our extension.  And I don't see problems in being
different from standard clauses, because it's different anyway.  For
me, in this case it's better to evade problems of users.  And current
behavior of this function seems like just single big pain :)

> > 4) If we change behavior of jsonpath_query_wrapped() as above, we can
> > also implement jsonpath_query_one() function, which would return first
> > result item or NULL for no items.
> > Any thoughts?
>
> I think, that we should postpone this functional interface, which could be
> added later.

The reason we typically postpone things is that they are hard to bring
to committable shape.  jsonpath_query_one() doesn't cost us any real
development.  So, I don't see point in postponing that if consider
that as good part of procedural jsonpath interface.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

22 January 2019, 05:21:20

The next revision is attached.

Nikita made code and documentation improvements, renamed functions
from "jsonpath_" prefix to "jsonb_path_" prefix.  He also renamed
jsonpath_predicate() to jsonb_path_match() (that looks better for me
too).

I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and
changed that behavior to always wrap result into array.  Also, I've
introduced new function jsonb_query_first(), which returns first item
from result.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Oleg Bartunov

Date:

22 January 2019, 09:12:41

On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> The next revision is attached.
>
> Nikita made code and documentation improvements, renamed functions
> from "jsonpath_" prefix to "jsonb_path_" prefix.  He also renamed
> jsonpath_predicate() to jsonb_path_match() (that looks better for me
> too).
>
> I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and
> changed that behavior to always wrap result into array.

agree with new names

so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option

 Also, I've
> introduced new function jsonb_query_first(), which returns first item
> from result.
>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

23 January 2019, 00:41:21

Attached 26th version of the patches:
* Documentation:   - Fixed some mistakes   - Removed mention of syntax extensions not present in this patch   - Documented '.datetime(format, default_tz)'* Removed accidental syntax extension allowing to write '@.key' without '@'

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

23 January 2019, 05:01:41

On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> Attached 26th version of the patches:
>
>  * Documentation:
>     - Fixed some mistakes
>     - Removed mention of syntax extensions not present in this patch
>     - Documented '.datetime(format, default_tz)'
>  * Removed accidental syntax extension allowing to write '@.key' without '@'

Thank you!

I've made few more changes to patchset:
 * Change back interface of JsonbExtractScalar().  I decide it would
be better to keep it.
 * Run pgindent
 * Improve documentation of .keyvalue() function.

Finally, I'm going to commit this if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

23 January 2019, 05:02:26

On Tue, Jan 22, 2019 at 12:13 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:
>
> On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> >
> > The next revision is attached.
> >
> > Nikita made code and documentation improvements, renamed functions
> > from "jsonpath_" prefix to "jsonb_path_" prefix.  He also renamed
> > jsonpath_predicate() to jsonb_path_match() (that looks better for me
> > too).
> >
> > I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and
> > changed that behavior to always wrap result into array.
>
> agree with new names
>
> so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option

Good, thank you for feedback!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

23 January 2019, 12:25:09

Attached 28th version of the patches.

On 23.01.2019 8:01, Alexander Korotkov wrote:

On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:

Attached 26th version of the patches:
* Documentation:   - Fixed some mistakes   - Removed mention of syntax extensions not present in this patch   - Documented '.datetime(format, default_tz)'* Removed accidental syntax extension allowing to write '@.key' without '@'

Thank you!

I've made few more changes to patchset:* Change back interface of JsonbExtractScalar().  I decide it would
be better to keep it.* Run pgindent

I have fixed indentation by adding new typedefs to 
src/tools/pgindent/typedefs.list.

Also I have renamed LikeRegexContext => JsonLikeRegexContext.

 * Improve documentation of .keyvalue() function.

Finally, I'm going to commit this if no objections.

Thank you!

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

23 January 2019, 22:16:22

On Wed, Jan 23, 2019 at 3:24 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> Attached 28th version of the patches.
>
> On 23.01.2019 8:01, Alexander Korotkov wrote:
>
> On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>
> Attached 26th version of the patches:
>
>  * Documentation:
>     - Fixed some mistakes
>     - Removed mention of syntax extensions not present in this patch
>     - Documented '.datetime(format, default_tz)'
>  * Removed accidental syntax extension allowing to write '@.key' without '@'
>
> Thank you!
>
> I've made few more changes to patchset:
>  * Change back interface of JsonbExtractScalar().  I decide it would
> be better to keep it.
>  * Run pgindent
>
> I have fixed indentation by adding new typedefs to
> src/tools/pgindent/typedefs.list.

Good catch, thank you!

> Also I have renamed LikeRegexContext => JsonLikeRegexContext.
>  * Improve documentation of .keyvalue() function.

Ok!

I've also fix bug with possible reference to already freed value of
jsonpath in jsonb_path_execute(), pointed by you.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

26 January 2019, 01:27:11

On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Finally, I'm going to commit this if no objections.

BTW, I decided to postpone commit for few days.  Nikita and me are
still working on better error messages.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

27 January 2019, 10:50:15

On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Finally, I'm going to commit this if no objections.
>
> BTW, I decided to postpone commit for few days.  Nikita and me are
> still working on better error messages.

Updated patchset is attached.  This patchset includes:

* Improved error handling by Nikita, revised by me,
* Code beautification.

So, I'm going to commit this again.  This time seriously :)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

28 January 2019, 00:22:27

On Sun, Jan 27, 2019 at 1:50 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> >
> > On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > Finally, I'm going to commit this if no objections.
> >
> > BTW, I decided to postpone commit for few days.  Nikita and me are
> > still working on better error messages.
>
> Updated patchset is attached.  This patchset includes:
>
> * Improved error handling by Nikita, revised by me,
> * Code beautification.
>
> So, I'm going to commit this again.  This time seriously :)

I'm really sorry for confusing people, but I've one more revision.
This is my first time attempting to commit such a large patch.

Major changes are following:
 * We find it ridiculous to save ErrorData for possible latter throw.
Now, we either throw an error immediately or return jperError.  That
also allows to get rid of unwanted changes in elog.c/elog.h.
 * I decided to change behavior of jsonb_path_match() to throw as less
errors as possible.  The reason is that it's used to implement
potentially (patch is pending) indexable operator.  Index scan is not
always capable to throw as many errors and sequential scan.  So, it's
better to not introduce extra possible index scan and sequential scan
results divergence.

So, this is version I'm going to commit unless Nikita has corrections
or anybody else objects.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Tomas Vondra

Date:

28 January 2019, 13:50:14


On 1/28/19 1:22 AM, Alexander Korotkov wrote:
> On Sun, Jan 27, 2019 at 1:50 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov
>> <a.korotkov@postgrespro.ru> wrote:
>>>
>>> On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov
>>> <a.korotkov@postgrespro.ru> wrote:
>>>> Finally, I'm going to commit this if no objections.
>>>
>>> BTW, I decided to postpone commit for few days.  Nikita and me are
>>> still working on better error messages.
>>
>> Updated patchset is attached.  This patchset includes:
>>
>> * Improved error handling by Nikita, revised by me,
>> * Code beautification.
>>
>> So, I'm going to commit this again.  This time seriously :)
> 
> I'm really sorry for confusing people, but I've one more revision.
> This is my first time attempting to commit such a large patch.
> 

I'm not sure what you're apologizing for, it simply shows how careful
you are when polishing such a large patch.

> Major changes are following:
>  * We find it ridiculous to save ErrorData for possible latter throw.
> Now, we either throw an error immediately or return jperError.  That
> also allows to get rid of unwanted changes in elog.c/elog.h.

OK.

>  * I decided to change behavior of jsonb_path_match() to throw as less
> errors as possible.  The reason is that it's used to implement
> potentially (patch is pending) indexable operator.  Index scan is not
> always capable to throw as many errors and sequential scan.  So, it's
> better to not introduce extra possible index scan and sequential scan
> results divergence.
> 

Hmmm, can you elaborate a bit more? Which errors were thrown before and
are not thrown with the current patch version?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Nikita Glukhov

Date:

28 January 2019, 23:18:23

On 28.01.2019 16:50, Tomas Vondra wrote:

On 1/28/19 1:22 AM, Alexander Korotkov wrote:

 * I decided to change behavior of jsonb_path_match() to throw as less
errors as possible.  The reason is that it's used to implement
potentially (patch is pending) indexable operator.  Index scan is not
always capable to throw as many errors and sequential scan.  So, it's
better to not introduce extra possible index scan and sequential scan
results divergence.

Hmmm, can you elaborate a bit more? Which errors were thrown before and
are not thrown with the current patch version?

In the previous version of the patch jsonb_path_match() threw error when
jsonpath did not return a singleton value, but in the last version in such cases
NULL is returned.  This problem arises because we cannot guarantee at compile
time that jsonpath expression used in jsonb_path_match() is a predicate.
Predicates by standard can return only True, False, and Unknown (errors occurred
during execution of their operands are transformed into Unknown values), so 
predicates cannot throw errors, and there are no problems with errors. 

GIN does not attempt to search non-predicate expressions, so there may be no 
problem even we throw "not a singleton" error.


Here I want to remind that ability to use predicates in the root of jsonpath
expression is an our extension to standard that was created specially for the
operator @@.  By standard predicates are allowed only in filters.  Without this
extension we are still able to rewrite @@ using @?:
jsonb @@ 'predicate'        is equivalent to
jsonb @? '$ ? (predicate)'
but such @? expression is a bit slower to execute and a bit verbose to write.

If we introduced special type 'jsonpath_predicate', then we could solve the
problem by checking the type of jsonpath expression at compile-time.


Another problem with error handling is that jsonb_path_query*() functions
always throw SQL/JSON errors and there is no easy and effective way to emulate
NULL ON ERROR behavior, which is used by default in SQL/JSON functions.  So I
think it's worth trying to add some kind of flag 'throwErrors' to
jsonb_path_query*() functions.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

29 January 2019, 01:00:19

On Tue, Jan 29, 2019 at 2:17 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> In the previous version of the patch jsonb_path_match() threw error when
> jsonpath did not return a singleton value, but in the last version in such cases
> NULL is returned.  This problem arises because we cannot guarantee at compile
> time that jsonpath expression used in jsonb_path_match() is a predicate.
> Predicates by standard can return only True, False, and Unknown (errors occurred
> during execution of their operands are transformed into Unknown values), so
> predicates cannot throw errors, and there are no problems with errors.

Attached patchset provides description of errors suppressed.  It also
clarifies how jsonb_path_match() works.

> GIN does not attempt to search non-predicate expressions, so there may be no
> problem even we throw "not a singleton" error.

Yes, I don't insist on that.  If majority of us wants to bring "not a
singleton" error back, I don't object to it.

> Here I want to remind that ability to use predicates in the root of jsonpath
> expression is an our extension to standard that was created specially for the
> operator @@.  By standard predicates are allowed only in filters.  Without this
> extension we are still able to rewrite @@ using @?:
> jsonb @@ 'predicate'        is equivalent to
> jsonb @? '$ ? (predicate)'
> but such @? expression is a bit slower to execute and a bit verbose to write.
>
> If we introduced special type 'jsonpath_predicate', then we could solve the
> problem by checking the type of jsonpath expression at compile-time.

For me it seems that separate datatype for this kind of problem is overkill.

> Another problem with error handling is that jsonb_path_query*() functions
> always throw SQL/JSON errors and there is no easy and effective way to emulate
> NULL ON ERROR behavior, which is used by default in SQL/JSON functions.  So I
> think it's worth trying to add some kind of flag 'throwErrors' to
> jsonb_path_query*() functions.

Good idea, but let's commit basic jsonpath implementation first.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On Tue, Jan 29, 2019 at 04:51:46AM +0300, Alexander Korotkov wrote:
> Oh, sorry.  I've missed we have ERRCODE_TO_CATEGORY() here.

Note: patch set moved to next CF, still waiting on author.
--
Michael

Attachment

signature.asc

Re: jsonpath

From

Alexander Korotkov

Date:

24 February 2019, 09:03:19

Hi!

Attached is revised version of jsonpath.  Assuming that jsonpath have
problem places, I decided to propose partial implementation.
Following functionality was cut from jsonpath:
1) Support of datetime datatypes.  Besides error suppression, this
functionality have troubles with timezones.  According to standard we
should support timezones in jsonpath expressions.  But that would
prevent jsonpath functions from being immutable, that in turn limits
their usage in expression indexes.
2) Suppression of numeric errors.  I will post it as a separate patch.
Pushing this even this partial implementation of jsonpath to PG 12 is
still very useful.  Also it will simplify a lot pushing other parts of
SQL/JSON to future releases.

Besides this changes, there is some refactoring and code beautification.

On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > +/*
> > + * Make datetime type from 'date_txt' which is formated at argument 'fmt'.
> > + * Actual datatype (returned in 'typid', 'typmod') is determined by
> > + * presence of date/time/zone components in the format string.
> > + */
> > +Datum
> > +to_datetime(text *date_txt, text *fmt, char *tzname, bool strict, Oid *typid,
> > +                     int32 *typmod, int *tz)
>
> Given other to_<type> functions being SQL callable, I'm not a fan of the
> naming here.

I've removed that for now.

> > +{
> > +     struct pg_tm tm;
> > +     fsec_t          fsec;
> > +     int                     fprec = 0;
> > +     int                     flags;
> > +
> > +     do_to_timestamp(date_txt, fmt, strict, &tm, &fsec, &fprec, &flags);
> > +
> > +     *typmod = fprec ? fprec : -1;   /* fractional part precision */
> > +     *tz = 0;
> > +
> > +     if (flags & DCH_DATED)
> > +     {
> > +             if (flags & DCH_TIMED)
> > +             {
> > +                     if (flags & DCH_ZONED)
> > +                     {
> > +                             TimestampTz result;
> > +
> > +                             if (tm.tm_zone)
> > +                                     tzname = (char *) tm.tm_zone;
> > +
> > +                             if (tzname)
> > +                             {
> > +                                     int                     dterr = DecodeTimezone(tzname, tz);
> > +
> > +                                     if (dterr)
> > +                                             DateTimeParseError(dterr, tzname, "timestamptz");
>
>
> Do we really need 6/7 indentation levels in new code?

Also, removed that.

> > +jsonpath_scan.c: FLEXFLAGS = -CF -p -p
>
> Why -CF, and why is -p repeated?

BTW, for our SQL grammar we have

> scan.c: FLEXFLAGS = -CF -p -p

Is it kind of default?

> > -                             JsonEncodeDateTime(buf, val, DATEOID);
> > +                             JsonEncodeDateTime(buf, val, DATEOID, NULL);
>
> ISTM API changes like this ought to be done in a separate patch, to ease
> review.

Also, removed that for now.

> >  }
> >
> > +
> >  /*
> >   * Compare two jbvString JsonbValue values, a and b.
> >   *
>
> Random WS change.

Reverted

> No binary input support? I'd suggest adding that, but keeping the
> representation the same. Otherwise just adds unnecesary complexity for
> driver authors wishing to use the binar protocol.

Binary support is added.  I decided to make it in jsonb manner.
Format version 1 is text format, but it's possible we would have other
versions in future.

> > +/********************Support functions for JsonPath**************************/
> > +
> > +/*
> > + * Support macroses to read stored values
> > + */
>
> s/macroses/macros/

Fixed.

> > @@ -0,0 +1,2776 @@
> > +/*-------------------------------------------------------------------------
> > + *
> > + * jsonpath_exec.c
> > + *    Routines for SQL/JSON path execution.
> > + *
> > + * Copyright (c) 2019, PostgreSQL Global Development Group
> > + *
> > + * IDENTIFICATION
> > + *   src/backend/utils/adt/jsonpath_exec.c
> > + *
> > + *-------------------------------------------------------------------------
> > + */
>
> Somewhere there needs to be higher level documentation explaining how
> this stuff is supposed to work on a code level.

High level comments are added to jsonpath.c (about jsonpath binary
representation) jsonpath_exec.c (about jsonpath execution).

> > +
> > +/* strict/lax flags is decomposed into four [un]wrap/error flags */
> > +#define jspStrictAbsenseOfErrors(cxt)        (!(cxt)->laxMode)
> > +#define jspAutoUnwrap(cxt)                           ((cxt)->laxMode)
> > +#define jspAutoWrap(cxt)                             ((cxt)->laxMode)
> > +#define jspIgnoreStructuralErrors(cxt)       ((cxt)->ignoreStructuralErrors)
> > +#define jspThrowErrors(cxt)                          ((cxt)->throwErrors)
>
> What's the point?

These are convenience macros, which improves code readability.  For
instance, instead of direct checking for laxMode, we check for
particular aspects of its behavior.  For code reader, it becomes
easier to understand why do we behave one way or another.

> > +#define ThrowJsonPathError(code, detail) \
> > +     ereport(ERROR, \
> > +                      (errcode(ERRCODE_ ## code), \
> > +                       errmsg(ERRMSG_ ## code), \
> > +                       errdetail detail))
> > +
> > +#define ThrowJsonPathErrorHint(code, detail, hint) \
> > +     ereport(ERROR, \
> > +                      (errcode(ERRCODE_ ## code), \
> > +                       errmsg(ERRMSG_ ## code), \
> > +                       errdetail detail, \
> > +                       errhint hint))
>
> These seem ill-advised, just making it harder to understand the code,
> and grepping for it, without actually meaningfully simplifying anything.

Removed.  Instead, I've introduced RETURN_ERROR() macro, which either
returns jperError or issues ereport given in the argument.

> > +/*
> > + * Find value of jsonpath variable in a list of passing params
> > + */
>
> What is that comment supposed to mean?

Comment is rephrased.

> > +/*
> > + * Get the type name of a SQL/JSON item.
> > + */
> > +static const char *
> > +JsonbTypeName(JsonbValue *jb)
> > +{
>
> Wasn't there pretty similar infrastructure elsewhere?

Yes, but it wasn't exposed.  Moved to jsonb.c

 > > +/*
> > + * Cross-type comparison of two datetime SQL/JSON items.  If items are
> > + * uncomparable, 'error' flag is set.
> > + */
>
> Sounds like it'd not raise an error, but it does in a bunch of scenarios.

Removed as well.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-Jsonpath-engine-and-docs-v33.patch

Re: jsonpath

From

Tomas Vondra

Date:

24 February 2019, 11:44:46

Hi,

On 2/24/19 10:03 AM, Alexander Korotkov wrote:
> Hi!
> 
> Attached is revised version of jsonpath.  Assuming that jsonpath have
> problem places, I decided to propose partial implementation.
> Following functionality was cut from jsonpath:
> 1) Support of datetime datatypes.  Besides error suppression, this
> functionality have troubles with timezones.  According to standard we
> should support timezones in jsonpath expressions.  But that would
> prevent jsonpath functions from being immutable, that in turn limits
> their usage in expression indexes.

Assuming we get the patch committed without the datetime stuff now, what
does that mean for the future? Does that mean we'll be unable to extend
it to support datetime in the future, or what? If we define the jsonpath
functions as immutable now, people may create indexes - which means we
won't be able to downgrade it to stable later.

So, what's the plan here? The only thing I can think of is having two
sets of functions - an immutable one, prohibiting datetime expressions,
and stable that can't be used for indexes etc.

> 2) Suppression of numeric errors.  I will post it as a separate patch.
> Pushing this even this partial implementation of jsonpath to PG 12 is
> still very useful.  Also it will simplify a lot pushing other parts of
> SQL/JSON to future releases.
> 

+1 to push at least partial (but still useful) subset, instead of just
bumping the patch to PG13

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Alexander Korotkov

Date:

24 February 2019, 12:34:22

On Sun, Feb 24, 2019 at 2:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 2/24/19 10:03 AM, Alexander Korotkov wrote:
> > Attached is revised version of jsonpath.  Assuming that jsonpath have
> > problem places, I decided to propose partial implementation.
> > Following functionality was cut from jsonpath:
> > 1) Support of datetime datatypes.  Besides error suppression, this
> > functionality have troubles with timezones.  According to standard we
> > should support timezones in jsonpath expressions.  But that would
> > prevent jsonpath functions from being immutable, that in turn limits
> > their usage in expression indexes.
>
> Assuming we get the patch committed without the datetime stuff now, what
> does that mean for the future? Does that mean we'll be unable to extend
> it to support datetime in the future, or what? If we define the jsonpath
> functions as immutable now, people may create indexes - which means we
> won't be able to downgrade it to stable later.
>
> So, what's the plan here? The only thing I can think of is having two
> sets of functions - an immutable one, prohibiting datetime expressions,
> and stable that can't be used for indexes etc.

Reasonable question.  As I understand, not datetime support itself
making jsonpath functions not immutable, but implicit cast happening
during comparison timestamp vs. timestamptz (and time vs. timetz).
So, in future immutable functions can have limited support of
datetime, where comparison of non-tz vs. tz types is restricted.  And
stable versions of functions (for instance, with '_tz' prefix) with
full datetime support.

> > 2) Suppression of numeric errors.  I will post it as a separate patch.
> > Pushing this even this partial implementation of jsonpath to PG 12 is
> > still very useful.  Also it will simplify a lot pushing other parts of
> > SQL/JSON to future releases.
> >
>
> +1 to push at least partial (but still useful) subset, instead of just
> bumping the patch to PG13

Thank you for support!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

01 March 2019, 00:36:49

Attached 34th version of the patches.

1. Partial jsonpath support:  - Fixed copying of jsonb with vars jsonb_path_query() into SRF context  - Fixed error message for jsonpath vars  - Fixed file-level comment in jsonpath.c

2. Suppression of numeric errors:  Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal  numeric functions with 'bool *error' flag.

3. Datetime support:  Problems with timzeones still exist.  Comparison of tz types with  non-tz types is simply disallowed.  Default integer timezone offset (not  abbreviation) can be specified with the second .datetime() argument.  Error handling also is done using internal functions.

4. Json type support:  Json support was completely refactored since v23: double compilation with  function redefinitions was replaced with passing 'isJsonb' flag to low-level  json/jsonb access functions.  Also major refactoring with introduction of struct JsonItem was made.  JsonItem is used in executor instead of raw JsonbValue.  This helps to avoid  extending of JsonbValue for datetime types and also other numeric types   (integers and floats) required by standard.

5. GIN support:  Nothing was changed since v23.


Patch 1 is what we are going to commit in PG12.
Patches 2 and 3 add code that was removed in the previous v33 version.

On 24.02.2019 15:34, Alexander Korotkov wrote:

On Sun, Feb 24, 2019 at 2:44 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

On 2/24/19 10:03 AM, Alexander Korotkov wrote:

Attached is revised version of jsonpath.  Assuming that jsonpath have
problem places, I decided to propose partial implementation.
Following functionality was cut from jsonpath:
1) Support of datetime datatypes.  Besides error suppression, this
functionality have troubles with timezones.  According to standard we
should support timezones in jsonpath expressions.  But that would
prevent jsonpath functions from being immutable, that in turn limits
their usage in expression indexes.

Assuming we get the patch committed without the datetime stuff now, what
does that mean for the future? Does that mean we'll be unable to extend
it to support datetime in the future, or what? If we define the jsonpath
functions as immutable now, people may create indexes - which means we
won't be able to downgrade it to stable later.

So, what's the plan here? The only thing I can think of is having two
sets of functions - an immutable one, prohibiting datetime expressions,
and stable that can't be used for indexes etc.

Reasonable question.  As I understand, not datetime support itself
making jsonpath functions not immutable, but implicit cast happening
during comparison timestamp vs. timestamptz (and time vs. timetz).
So, in future immutable functions can have limited support of
datetime, where comparison of non-tz vs. tz types is restricted.  And
stable versions of functions (for instance, with '_tz' prefix) with
full datetime support.

I can also offset to explicitly pass timezone info into jsonpath function using
the special user dataype encapsulating struct pg_tz.

But simple integer timezone offset can be passed now using jsonpath variables
(standard says only about integer timezone offsets; also it requires presence
of timezone offset it in the input string if the format string contain timezone 
components):

=# SELECT jsonb_path_query(    '"28-02-2019 12:34"',    '$.datetime("DD-MM-YYYY HH24:MI TZH", $tz)',    jsonb_build_object('tz', EXTRACT(TIMEZONE FROM now()))  );     jsonb_path_query       
-----------------------------"2019-02-28T12:34:00+03:00"
(1 row)

2) Suppression of numeric errors.  I will post it as a separate patch.
Pushing this even this partial implementation of jsonpath to PG 12 is
still very useful.  Also it will simplify a lot pushing other parts of
SQL/JSON to future releases.

+1 to push at least partial (but still useful) subset, instead of just
bumping the patch to PG13

See patch #2.

Thank you for support!

02 March 2019, 05:15:33

Hi!

On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>
> Attached 34th version of the patches.
>
> 1. Partial jsonpath support:
>    - Fixed copying of jsonb with vars jsonb_path_query() into SRF context
>    - Fixed error message for jsonpath vars
>    - Fixed file-level comment in jsonpath.c
>
> 2. Suppression of numeric errors:
>    Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal
>    numeric functions with 'bool *error' flag.

Revised patches 1 and 2 are attached.  Changes are following

 * Small refactoring, comments adjustment and function renaming.  In
particular, I've removed "recursive" prefix from function names,
because it actually not that informative assuming header comment
explains that the whole jsonpath executor is recursive.  Also, I made
"Unwrap" suffix more clear.  Not it's distinguished what is unwrapped
target (UnwrapTarget) or result (UnwrapResult).  Also, now it's clear
that function doesn't always unwraps but has an option to do so
(OptUnwrap).
 * Some more cases are covered by regression tests.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Pavel Stehule

Date:

03 March 2019, 17:23:02

so 2. 3. 2019 v 6:15 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:

Hi!

On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>
> Attached 34th version of the patches.
>
> 1. Partial jsonpath support:
> - Fixed copying of jsonb with vars jsonb_path_query() into SRF context
> - Fixed error message for jsonpath vars
> - Fixed file-level comment in jsonpath.c
>
> 2. Suppression of numeric errors:
> Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal
> numeric functions with 'bool *error' flag.

Revised patches 1 and 2 are attached. Changes are following

* Small refactoring, comments adjustment and function renaming. In
particular, I've removed "recursive" prefix from function names,
because it actually not that informative assuming header comment
explains that the whole jsonpath executor is recursive. Also, I made
"Unwrap" suffix more clear. Not it's distinguished what is unwrapped
target (UnwrapTarget) or result (UnwrapResult). Also, now it's clear
that function doesn't always unwraps but has an option to do so
(OptUnwrap).
* Some more cases are covered by regression tests.

These patches are large, but I think so the granularity and modularity of these patches are correct.

Now, it looks very well.

Pavel

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Tomas Vondra

Date:

03 March 2019, 18:08:01

Hi,

Here are some initial comments from a review of the 0001 part. I plan to
do more testing on a large data set and additional round of review over
the next week. FWIW I've passed this through valgrind and the usual
battery of regression tests, and there were no issues.

I haven't looked at 0002 yet, but it seems fairly small (especially
compared to 0001).

func.sgml
=========

1) I see the changes removed <indexterm zone="functions-json"> for some
reason. Is that intentional?

2) <command>WHERE</command>

    We generally tag <literal>WHERE</literal>, not <command>.

3) Filter expressions are applied from left to right

    Perhaps s/applied/evaluated/ in reference to expressions?

4) The result of the filter expression may be true, false, or unknown.

    It's not entirely clear to me what "unknown" means here. NULL, or
    something else? There's a section in the SQL/JSON standard
    explaining this (page 83), but perhaps we should explain it a bit
    here too?

    The standard says "In the SQL/JSON data model, there are no SQL
    nulls, so Unknown is not part of the SQL/JSON data model." so I'm a
    bit confused what "unknown" references to. Maybe some example?

    Also, what happens when the result is unknown?


5) There's an example showing how to apply filter at a certain level,
using the @ variable, but what if we want to apply multiple filters at
different levels? Would it make sense to add such example?

6) ... extensions of the SQL/JSON standard

    I'm not sure what "extension" is here. Is that an extension defined
    in the SQL standard, or an additional PostgreSQL functionality not
    described in the standard? (I assume the latter, just checking.)


7) There are references to "SQL/JSON sequences" without any explanation
what it means. Maybe I'm missing something obvious, though.

8) Implicit unwrapping in the lax mode is not performed in the following
cases:

    I suggest to reword it like this:

In the lax mode, implicit unwrapping is not performed when:

9) We're not adding the datetime() method for now, due to the issues
with timezones. I wonder if we should add a note why it's missing to the
docs ...

10) I'm a bit puzzled though, because the standard says this in the
description of type() function on page 77

    If I is a datetime, then “date”, “time without time zone”, “time
    with time zone”, “timestamp without time zone”, or “timestamp with
    time zone”, as appropriate.

But I see our type() function does not return anything like that (which
I presume is independent of timezone stuff). But I see jsonb.h has no
concept of datetime values, and the standard actually says this in the
datetime() section

    JSON has no datetime types. Datetime values are most likely stored
    in character strings.

Considering this, the type() section makes little sense, no?


jsonb_util.c
============

I see we're now handling NaN values in convertJsonbScalar(). Isn't it
actually a bug that we don't do this already? Or is it not needed for
some reason?

jsonpath.c
==========

I suppose this should say "jsonpath version number" instead?

    elog(ERROR, "unsupported jsonb version number %d", version);


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

04 March 2019, 01:37:50

On 3/3/19 1:08 PM, Tomas Vondra wrote:
>
>
> jsonb_util.c
> ============
>
> I see we're now handling NaN values in convertJsonbScalar(). Isn't it
> actually a bug that we don't do this already? Or is it not needed for
> some reason?
>

JSON standard numerics don't support NaN, Infinity etc., so I assume
this can only happen in a jsonpath expression being converted to a jsonb
value. If so, the new section  should contain a comment to that effect,
otherwise it will be quite confusing.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tomas Vondra

Date:

04 March 2019, 23:27:46

A bunch of additional comments, after looking at the patch a bit today.
All are mostly minor, and sometime perhaps a matter of preference.


1) There's a mismatch between the comment and actual function name for
jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say
"_novars" instead.

2) In a couple of switches the "default" case does a return with a
value, following elog(ERROR). So it's practically unreachable, AFAICS
it's fine without it, and we don't do this elsewhere. And I don't get
any compiler warnings if I remove it either.

Examples:

JsonbTypeName

    default:
        elog(ERROR, "unrecognized jsonb value type: %d", jbv->type);
        return "unknown";

jspOperationName

    default:
        elog(ERROR, "unrecognized jsonpath item type: %d", type);
        return NULL;

compareItems

    default:
        elog(ERROR, "unrecognized jsonpath operation: %d", op);
        return jpbUnknown;

3) jsonpath_send is using makeStringInfo() for a value that is not
returned - IMHO it should use regular stack-allocated variable and use
initStringInfo() instead

4) the version number should be defined/used as a constant, not as a
magic constant somewhere in the code

5) Why does jsonPathToCstring do this?

    appendBinaryStringInfo(out, "strict ", 7);

Why not to use regular appendStringInfoString()? What am I missing?

6) comment typo: "Aling StringInfo"

7) alignStringInfoInt() should explain why we need this and why INTALIGN
is the right alignment.

8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next'

    I don't quite understand what it's doing with 'next' value?

    /*
     * Actual value will be recorded later, after next and children
     * processing.
     */
    appendBinaryStringInfo(buf,
                           (char *) &next, /* fake value */
                           sizeof(next));

Perhaps a comment explaining it (why we need a fake value at all?) would
be a good idea here.

9) I see printJsonPathItem is only calling check_stack_depth while
flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the
difference, considering they seem about equally expensive?

10) executeNumericItemMethod is missing a comment (unlike the other
executeXXX functions)

11) Wording of some of the error messages in the execute methods seems a
bit odd. For example executeNumericItemMethod may complain that it

    ... is applied to not a numeric value

but perhaps a more natural wording would be

    ... is applied to a non-numeric value

And similarly for the other execute methods. But I'm not a native
speaker, so perhaps the original wording is just fine.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Robert Haas

Date:

05 March 2019, 16:39:27

On Mon, Mar 4, 2019 at 6:27 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> 11) Wording of some of the error messages in the execute methods seems a
> bit odd. For example executeNumericItemMethod may complain that it
>
>         ... is applied to not a numeric value
>
> but perhaps a more natural wording would be
>
>         ... is applied to a non-numeric value
>
> And similarly for the other execute methods. But I'm not a native
> speaker, so perhaps the original wording is just fine.

As a native speaker I can confirm that the first wording is definitely
not OK.  The second one is tolerable, but I wonder if there is
something better, like "can only be applied to a numeric value" or
maybe there's a way to rephrase it so that we complain about the
non-numeric value itself rather than the application, e.g. ERROR:
json_frobnitz can only frob numeric values or ERROR: argument to
json_frobnitz must be numeric.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: jsonpath

From

Nikita Glukhov

Date:

05 March 2019, 21:40:25

Attached 36th version of the patches.

On 03.03.2019 21:08, Tomas Vondra wrote:

Hi,

Here are some initial comments from a review of the 0001 part. I plan to
do more testing on a large data set and additional round of review over
the next week. FWIW I've passed this through valgrind and the usual
battery of regression tests, and there were no issues.

Thanks again for your review.

I haven't looked at 0002 yet, but it seems fairly small (especially
compared to 0001).

func.sgml
=========

1) I see the changes removed <indexterm zone="functions-json"> for some
reason. Is that intentional?

Fixed.

2) <command>WHERE</command>
   We generally tag <literal>WHERE</literal>, not <command>.

Fixed.

3) Filter expressions are applied from left to right
   Perhaps s/applied/evaluated/ in reference to expressions?

Fixed.

4) The result of the filter expression may be true, false, or unknown.
   It's not entirely clear to me what "unknown" means here. NULL, or   something else? There's a section in the SQL/JSON standard   explaining this (page 83), but perhaps we should explain it a bit   here too?
   The standard says "In the SQL/JSON data model, there are no SQL   nulls, so Unknown is not part of the SQL/JSON data model." so I'm a   bit confused what "unknown" references to. Maybe some example?
   Also, what happens when the result is unknown?

"unknown" refers here to ordinary three-valued logical Unknown, which is
represented in SQL by NULL.

JSON path expressions return sequences of SQL/JSON items, which are defined by
SQL/JSON data model.  But JSON path predicates (logical expressions), which are
used in filters, return three-valued logical values: False, True, or Unknown.

Filters accept only items for which the filter predicate returned True. False
and Unknown results are skipped.

Unknown can be checked with IS UNKNOWN predicate:

SELECT jsonb_path_query_array('[1, "1", true, null]',                             '$[*] ? ((@ < 2) is unknown)');jsonb_path_query_array 
------------------------["1", true]
(1 row)

Comparison of JSON nulls to non-nulls returns always False, not Unknown (see
SQL/JSON data model). Comparison of non-comparable items return Unknown.

5) There's an example showing how to apply filter at a certain level,
using the @ variable, but what if we want to apply multiple filters at
different levels? Would it make sense to add such example?

Examples were added.

Filters can be nested, but by the standard it is impossible to reference item
of the outer filter, because @ always references current item of the innermost
filter.  We have additional patch with a simple jsonpath syntax extension for 
this case -- @N:@0, like @, references innermost item@1 references item one level upper@2 references item two levels upper

For example, selecting all objects from array
'[{"vals": [1,2,3], "val": 2}, {"vals": [4,5], "val": 6}]'

having in their .vals[] element greater than their .val field:
'$[*] ? (@.vals[*] ? (@ > @1.val))'

It is impossible to do this by the standard in the single jsonpath expression.


Also there is idea to use lambda expressions with ECMAScript 6 syntax in
filters and user method (see below):

'$[*] ? (obj => obj.vals[*] ? (val => val > obj.val))'

I already have a patch implementing lambda expressions, which were necessary
for implementaion of built-in/user-written functions and methods like
.map(), reduce(), max().  I can post it again if it is interesting.

6) ... extensions of the SQL/JSON standard
   I'm not sure what "extension" is here. Is that an extension defined   in the SQL standard, or an additional PostgreSQL functionality not   described in the standard? (I assume the latter, just checking.)

Yes, this is additional functionality.

 "Writing the path as an expression is also valid"   I have moved this into SQL/JSON patch, because it is related only path   specification in SQL/JSON functions.

7) There are references to "SQL/JSON sequences" without any explanation
what it means. Maybe I'm missing something obvious, though.

SQL/JSON sequence is a sequence of SQL/JSON items.
The corresponding definition was added.

8) Implicit unwrapping in the lax mode is not performed in the following
cases:
   I suggest to reword it like this:

In the lax mode, implicit unwrapping is not performed when:

Fixed. I also decided to merge this paragraph with paragraph describing 
auto-unwrapping.

9) We're not adding the datetime() method for now, due to the issues
with timezones. I wonder if we should add a note why it's missing to the
docs ...

The corresponding note was added.

10) I'm a bit puzzled though, because the standard says this in the
description of type() function on page 77
   If I is a datetime, then “date”, “time without time zone”, “time   with time zone”, “timestamp without time zone”, or “timestamp with   time zone”, as appropriate.

But I see our type() function does not return anything like that (which
I presume is independent of timezone stuff). But I see jsonb.h has no
concept of datetime values, and the standard actually says this in the
datetime() section
   JSON has no datetime types. Datetime values are most likely stored   in character strings.

Considering this, the type() section makes little sense, no?

According to the SQL/JSON data model, SQL/JSON items can be null, string, 
boolean, numeric, and datetime.

Datetime items exists only at execution time, they are serialized into JSON
strings when the resulting SQL/JSON item is converted into jsonb. After removal
of .datetime() method, support of datetime SQL/JSON items was removed from
jsonpath executor too.

Numeric items can be of any numeric type.  In PostgreSQL we have the following
numeric datatypes: integers, floats and numeric.  But our jsonpath executor
supports now only numeric-typed items, because this is only type we can get
directly from jsonb.  Support for other numeric datatypes (float8 is most
necessary, because it is produced by .double() item method) can be added by
extending JsonbValue or by introducing struct JsonItem in executor, having
JsonbValue as a part (see patch #4 in v34).

jsonb_util.c
============

I see we're now handling NaN values in convertJsonbScalar(). Isn't it
actually a bug that we don't do this already? Or is it not needed for
some reason?

Numeric JsonbValues created outside of jsonpath executor cannot be NaN, because
this case in explicitly handled in JsonbValue-producing functions.  For example,
datum_to_jsonb() which is used in to_jsonb(), jsonb_build_array() and others
converts Inf and NaN into JSON strings.  But jsonb_plperl and jsonb_plpython
transforms do not allow NaNs (I think it is needed only for consistency of
"PL => jsonb => PL" roundtrip).

In our jsonpath executor we can produce NaNs and we should not touch them before
conversion to the resulting jsonb.  Moreover, in SQL/JSON functions numeric
SQL/JSON item can be directly converted into numeric RETURNING type.  So, I
decided to add a check for NaN to the low-level convertJsonbScalar() instead of
checking items before every call to JsonbValueToJsonb() or pushJsonbValue().
But after introduction of struct JsonItem (see patch #4) introduction there will
appropriate place for this check -- JsonItemToJsonbValue().

==========

I suppose this should say "jsonpath version number" instead?
   elog(ERROR, "unsupported jsonb version number %d", version);

Fixed.

On 05.03.2019 2:27, Tomas Vondra wrote:

A bunch of additional comments, after looking at the patch a bit today.
All are mostly minor, and sometime perhaps a matter of preference.


1) There's a mismatch between the comment and actual function name for
jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say
"_novars" instead.

Fixed.

2) In a couple of switches the "default" case does a return with a
value, following elog(ERROR). So it's practically unreachable, AFAICS
it's fine without it, and we don't do this elsewhere. And I don't get
any compiler warnings if I remove it either.

Examples:

JsonbTypeName
   default:       elog(ERROR, "unrecognized jsonb value type: %d", jbv->type);       return "unknown";

jspOperationName
   default:       elog(ERROR, "unrecognized jsonpath item type: %d", type);       return NULL;

compareItems
   default:       elog(ERROR, "unrecognized jsonpath operation: %d", op);       return jpbUnknown;

It seems to be a standard practice in jsonb code, so we followed it.

3) jsonpath_send is using makeStringInfo() for a value that is not
returned - IMHO it should use regular stack-allocated variable and use
initStringInfo() instead

Fixed.

4) the version number should be defined/used as a constant, not as a
magic constant somewhere in the code

Fixed.

5) Why does jsonPathToCstring do this?
   appendBinaryStringInfo(out, "strict ", 7);

Why not to use regular appendStringInfoString()? What am I missing?

appendStringInfoString() is a bit slower than appendBinaryStringInfo() 
because to strlen() call inside it.

6) comment typo: "Aling StringInfo"

Fixed.

7) alignStringInfoInt() should explain why we need this and why INTALIGN
is the right alignment.

Comment was added.

8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next'
   I don't quite understand what it's doing with 'next' value?
   /*    * Actual value will be recorded later, after next and children    * processing.    */   appendBinaryStringInfo(buf,                          (char *) &next, /* fake value */                          sizeof(next));

Perhaps a comment explaining it (why we need a fake value at all?) would
be a good idea here.

No fake value is needed here, zero placeholder is enough. I have refactored
this using new function reserveSpaceForItemPointer().

9) I see printJsonPathItem is only calling check_stack_depth while
flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the
difference, considering they seem about equally expensive?

CHECK_INTERRUPT() was added to printJsonPathItem() too.

10) executeNumericItemMethod is missing a comment (unlike the other
executeXXX functions)

Comment was added.

11) Wording of some of the error messages in the execute methods seems a
bit odd. For example executeNumericItemMethod may complain that it
... is applied to not a numeric value

but perhaps a more natural wording would be
... is applied to a non-numeric value

And similarly for the other execute methods. But I'm not a native
speaker, so perhaps the original wording is just fine.

Error messages were changed on the advice of Robert Haas to:
"... can only be applied to a numeric value"

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

10 March 2019, 10:51:45

Hi!

On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> Attached 36th version of the patches.

Thank yo for the revision!

In the attached revision following changes are made:

> "unknown" refers here to ordinary three-valued logical Unknown, which is
> represented in SQL by NULL.
>
> JSON path expressions return sequences of SQL/JSON items, which are defined by
> SQL/JSON data model.  But JSON path predicates (logical expressions), which are
> used in filters, return three-valued logical values: False, True, or Unknown.

 * I've added short explanation of this to the documentation.
 * Removed no longer present data structures from typedefs.list of the
first patch.
 * Moved GIN support patch to number 3.  Seems to be well-isolated and
not very complex patch.  I propose to consider this to 12 too.  I
added high-level comment there, commit message and made some code
beautification.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

14 March 2019, 09:07:13

On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> > Attached 36th version of the patches.
>
> Thank yo for the revision!
>
> In the attached revision following changes are made:
>
> > "unknown" refers here to ordinary three-valued logical Unknown, which is
> > represented in SQL by NULL.
> >
> > JSON path expressions return sequences of SQL/JSON items, which are defined by
> > SQL/JSON data model.  But JSON path predicates (logical expressions), which are
> > used in filters, return three-valued logical values: False, True, or Unknown.
>
>  * I've added short explanation of this to the documentation.
>  * Removed no longer present data structures from typedefs.list of the
> first patch.
>  * Moved GIN support patch to number 3.  Seems to be well-isolated and
> not very complex patch.  I propose to consider this to 12 too.  I
> added high-level comment there, commit message and made some code
> beautification.

I think patches 1 and 2 are in committable shape (I reached Tomas
off-list, he doesn't have more notes regarding them).  While patch 3
requires more review.

I'm going to push 1 and 2 if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

16 March 2019, 09:35:52

On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> > > Attached 36th version of the patches.
> >
> > Thank yo for the revision!
> >
> > In the attached revision following changes are made:
> >
> > > "unknown" refers here to ordinary three-valued logical Unknown, which is
> > > represented in SQL by NULL.
> > >
> > > JSON path expressions return sequences of SQL/JSON items, which are defined by
> > > SQL/JSON data model.  But JSON path predicates (logical expressions), which are
> > > used in filters, return three-valued logical values: False, True, or Unknown.
> >
> >  * I've added short explanation of this to the documentation.
> >  * Removed no longer present data structures from typedefs.list of the
> > first patch.
> >  * Moved GIN support patch to number 3.  Seems to be well-isolated and
> > not very complex patch.  I propose to consider this to 12 too.  I
> > added high-level comment there, commit message and made some code
> > beautification.
>
> I think patches 1 and 2 are in committable shape (I reached Tomas
> off-list, he doesn't have more notes regarding them).  While patch 3
> requires more review.
>
> I'm going to push 1 and 2 if no objections.

So, pushed.  Many thanks to reviewers and authors!

Remaining part I'm proposing for 12 is attached.  I appreciate review of it.


------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-Jsonpath-GIN-support-v38.path

Re: jsonpath

From

Jeff Janes

Date:

16 March 2019, 17:51:44

On Sat, Mar 16, 2019 at 5:36 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

So, pushed. Many thanks to reviewers and authors!

I think these files have to be cleaned up by "make maintainer-clean"

./src/backend/utils/adt/jsonpath_gram.c

./src/backend/utils/adt/jsonpath_scan.c

Cheers,

Jeff

Re: jsonpath

From

Alexander Korotkov

Date:

16 March 2019, 17:55:16

сб, 16 мар. 2019 г., 20:52 Jeff Janes <jeff.janes@gmail.com>:

On Sat, Mar 16, 2019 at 5:36 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

So, pushed. Many thanks to reviewers and authors!

I think these files have to be cleaned up by "make maintainer-clean"

./src/backend/utils/adt/jsonpath_gram.c
./src/backend/utils/adt/jsonpath_scan.c

Good catch, thanks! Will fix.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Tom Lane

Date:

16 March 2019, 17:59:04

Jeff Janes <jeff.janes@gmail.com> writes:
> I think these files have to be cleaned up by "make maintainer-clean"

> ./src/backend/utils/adt/jsonpath_gram.c
> ./src/backend/utils/adt/jsonpath_scan.c

Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:

$ git status --ignored
On branch master
Your branch is up-to-date with 'origin/master'.
Ignored files:
  (use "git add -f <file>..." to include in what will be committed)

        src/backend/utils/adt/jsonpath_gram.c
        src/backend/utils/adt/jsonpath_scan.c
        src/include/utils/jsonpath_gram.h

Looks like whoever modified src/backend/Makefile's distprep target
didn't bother to read the comment.

            regards, tom lane

Re: jsonpath

From

Tom Lane

Date:

16 March 2019, 18:11:58

I wrote:
> Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:

Oh, and of potentially more significance: after maintainer-clean and
re-configure, make fails with

In file included from jsonpath_gram.y:24:
../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory

I first thought this was a problem with insufficient dependencies
allowing parallel make to do things in the wrong order, but the problem
repeats even without any parallelism.  It looks like the dependencies
have been constructed in such a way that if the symlink at
src/include/utils/jsonpath_gram.h exists but the underlying file
does not, nothing will make the underlying file.  This is pretty broken;
aside from this outright failure, it also suggests that nothing will
update that file if it exists but is out of date relative to its
sources.

Please make sure that the make rules associated with these files look
exactly like the previously-debugged rules for existing bison/flex
output files.  There are generally good reasons for every last bit
of weirdness in those.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

16 March 2019, 18:16:21

сб, 16 мар. 2019 г., 21:12 Tom Lane <tgl@sss.pgh.pa.us>:

I wrote:
> Good point. I also see jsonpath_gram.h left behind after maintainer-clean:

Oh, and of potentially more significance: after maintainer-clean and
re-configure, make fails with

In file included from jsonpath_gram.y:24:
../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory

I first thought this was a problem with insufficient dependencies
allowing parallel make to do things in the wrong order, but the problem
repeats even without any parallelism. It looks like the dependencies
have been constructed in such a way that if the symlink at
src/include/utils/jsonpath_gram.h exists but the underlying file
does not, nothing will make the underlying file. This is pretty broken;
aside from this outright failure, it also suggests that nothing will
update that file if it exists but is out of date relative to its
sources.

Please make sure that the make rules associated with these files look
exactly like the previously-debugged rules for existing bison/flex
output files. There are generally good reasons for every last bit
of weirdness in those.

Uh, I didn't check that carefully enough. Thank you for the explanation. Will fix.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Pavel Stehule

Date:

16 March 2019, 18:38:22

so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:

On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> > > Attached 36th version of the patches.
> >
> > Thank yo for the revision!
> >
> > In the attached revision following changes are made:
> >
> > > "unknown" refers here to ordinary three-valued logical Unknown, which is
> > > represented in SQL by NULL.
> > >
> > > JSON path expressions return sequences of SQL/JSON items, which are defined by
> > > SQL/JSON data model. But JSON path predicates (logical expressions), which are
> > > used in filters, return three-valued logical values: False, True, or Unknown.
> >
> > * I've added short explanation of this to the documentation.
> > * Removed no longer present data structures from typedefs.list of the
> > first patch.
> > * Moved GIN support patch to number 3. Seems to be well-isolated and
> > not very complex patch. I propose to consider this to 12 too. I
> > added high-level comment there, commit message and made some code
> > beautification.
>
> I think patches 1 and 2 are in committable shape (I reached Tomas
> off-list, he doesn't have more notes regarding them). While patch 3
> requires more review.
>
> I'm going to push 1 and 2 if no objections.

So, pushed. Many thanks to reviewers and authors!

Remaining part I'm proposing for 12 is attached. I appreciate review of it.

I tested this patch and I didn't find any issue - just I tested basic functionality and regress tests.

looks well

Pavel

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 07:13:08

On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:
>>
>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
>> <a.korotkov@postgrespro.ru> wrote:
>> > On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
>> > <a.korotkov@postgrespro.ru> wrote:
>> > > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>> > > > Attached 36th version of the patches.
>> > >
>> > > Thank yo for the revision!
>> > >
>> > > In the attached revision following changes are made:
>> > >
>> > > > "unknown" refers here to ordinary three-valued logical Unknown, which is
>> > > > represented in SQL by NULL.
>> > > >
>> > > > JSON path expressions return sequences of SQL/JSON items, which are defined by
>> > > > SQL/JSON data model.  But JSON path predicates (logical expressions), which are
>> > > > used in filters, return three-valued logical values: False, True, or Unknown.
>> > >
>> > >  * I've added short explanation of this to the documentation.
>> > >  * Removed no longer present data structures from typedefs.list of the
>> > > first patch.
>> > >  * Moved GIN support patch to number 3.  Seems to be well-isolated and
>> > > not very complex patch.  I propose to consider this to 12 too.  I
>> > > added high-level comment there, commit message and made some code
>> > > beautification.
>> >
>> > I think patches 1 and 2 are in committable shape (I reached Tomas
>> > off-list, he doesn't have more notes regarding them).  While patch 3
>> > requires more review.
>> >
>> > I'm going to push 1 and 2 if no objections.
>>
>> So, pushed.  Many thanks to reviewers and authors!
>>
>> Remaining part I'm proposing for 12 is attached.  I appreciate review of it.
>
>
> I tested this patch and I didn't  find any issue - just I tested basic functionality and regress tests.
>
> looks well

Thank you for your feedback!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 08:03:44

On Sat, Mar 16, 2019 at 9:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:
>
> Oh, and of potentially more significance: after maintainer-clean and
> re-configure, make fails with
>
> In file included from jsonpath_gram.y:24:
> ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory
>
> I first thought this was a problem with insufficient dependencies
> allowing parallel make to do things in the wrong order, but the problem
> repeats even without any parallelism.  It looks like the dependencies
> have been constructed in such a way that if the symlink at
> src/include/utils/jsonpath_gram.h exists but the underlying file
> does not, nothing will make the underlying file.  This is pretty broken;
> aside from this outright failure, it also suggests that nothing will
> update that file if it exists but is out of date relative to its
> sources.
>
> Please make sure that the make rules associated with these files look
> exactly like the previously-debugged rules for existing bison/flex
> output files.  There are generally good reasons for every last bit
> of weirdness in those.

I've pushed a fix.  I hope I didn't forget anything.

BTW, it appears that windows build scripts also needs some fixup.  I'm
not very familiar with that.  I would be glad if somebody review the
attached patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

jsonpath_windows_build_fixes.patch

Re: jsonpath

From

Andrew Dunstan

Date:

17 March 2019, 14:00:44

On 3/17/19 4:03 AM, Alexander Korotkov wrote:
> On Sat, Mar 16, 2019 at 9:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I wrote:
>>> Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:
>> Oh, and of potentially more significance: after maintainer-clean and
>> re-configure, make fails with
>>
>> In file included from jsonpath_gram.y:24:
>> ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory
>>
>> I first thought this was a problem with insufficient dependencies
>> allowing parallel make to do things in the wrong order, but the problem
>> repeats even without any parallelism.  It looks like the dependencies
>> have been constructed in such a way that if the symlink at
>> src/include/utils/jsonpath_gram.h exists but the underlying file
>> does not, nothing will make the underlying file.  This is pretty broken;
>> aside from this outright failure, it also suggests that nothing will
>> update that file if it exists but is out of date relative to its
>> sources.
>>
>> Please make sure that the make rules associated with these files look
>> exactly like the previously-debugged rules for existing bison/flex
>> output files.  There are generally good reasons for every last bit
>> of weirdness in those.
> I've pushed a fix.  I hope I didn't forget anything.
>
> BTW, it appears that windows build scripts also needs some fixup.  I'm
> not very familiar with that.  I would be glad if somebody review the
> attached patch.




Why are we installing the jsonpath_gram.h file? It's not going to be
used by anything else, is it? TBH, I'm not sure I see why we're
installing the scanner.h file either.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tom Lane

Date:

17 March 2019, 15:03:25

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Why are we installing the jsonpath_gram.h file? It's not going to be
> used by anything else, is it? TBH, I'm not sure I see why we're
> installing the scanner.h file either.

As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
we'd be better off handling that need by compiling the .l file as part
of the .y file, as we used to do with the core lexer and still do with
several others (cf comments for commit 72b1e3a21); then we wouldn't
even have to generate these files much less install them.

A quick look at jsonpath_scan.c shows that it's pretty innocent of
the tricks we've learned over the years for flex/bison portability;
in particular I see that it's #including <stdio.h> before postgres.h,
which is a no-no.  So that whole area needs more review anyway.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 16:31:38

On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> > Why are we installing the jsonpath_gram.h file? It's not going to be
> > used by anything else, is it? TBH, I'm not sure I see why we're
> > installing the scanner.h file either.
>
> As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
> for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
> we'd be better off handling that need by compiling the .l file as part
> of the .y file, as we used to do with the core lexer and still do with
> several others (cf comments for commit 72b1e3a21); then we wouldn't
> even have to generate these files much less install them.
>
> A quick look at jsonpath_scan.c shows that it's pretty innocent of
> the tricks we've learned over the years for flex/bison portability;
> in particular I see that it's #including <stdio.h> before postgres.h,
> which is a no-no.  So that whole area needs more review anyway.

Yeah, I didn't review this part of patch carefully enough (and it
seems that other reviewers too).  I'm going to write a patch revising
this part in next couple of days.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

"Jonathan S. Katz"

Date:

17 March 2019, 16:46:01

On 3/17/19 3:13 AM, Alexander Korotkov wrote:
> On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:
>>>
>>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
>>> <a.korotkov@postgrespro.ru> wrote:
>>>> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
>>>> <a.korotkov@postgrespro.ru> wrote:
>>>>> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>>>>>> Attached 36th version of the patches.
>>>>>
>>>>> Thank yo for the revision!
>>>>>
>>>>> In the attached revision following changes are made:
>>>>>
>>>>>> "unknown" refers here to ordinary three-valued logical Unknown, which is
>>>>>> represented in SQL by NULL.
>>>>>>
>>>>>> JSON path expressions return sequences of SQL/JSON items, which are defined by
>>>>>> SQL/JSON data model.  But JSON path predicates (logical expressions), which are
>>>>>> used in filters, return three-valued logical values: False, True, or Unknown.
>>>>>
>>>>>  * I've added short explanation of this to the documentation.
>>>>>  * Removed no longer present data structures from typedefs.list of the
>>>>> first patch.
>>>>>  * Moved GIN support patch to number 3.  Seems to be well-isolated and
>>>>> not very complex patch.  I propose to consider this to 12 too.  I
>>>>> added high-level comment there, commit message and made some code
>>>>> beautification.
>>>>
>>>> I think patches 1 and 2 are in committable shape (I reached Tomas
>>>> off-list, he doesn't have more notes regarding them).  While patch 3
>>>> requires more review.
>>>>
>>>> I'm going to push 1 and 2 if no objections.
>>>
>>> So, pushed.  Many thanks to reviewers and authors!
>>>
>>> Remaining part I'm proposing for 12 is attached.  I appreciate review of it.
>>
>>
>> I tested this patch and I didn't  find any issue - just I tested basic functionality and regress tests.
>>
>> looks well
>
> Thank you for your feedback!

Like Pavel, I did some basic testing of the patch (on current HEAD
4178d8b91c) trying out various JSON path expressions, and yes, it all
worked. I had a brief scare while testing on 4178d8b91c where initdb was
failing on the bootstrapping step, but after doing a thorough wipe of
build files and my output directory, it seems to be initializing okay.

I also did some testing of the GIN patch upthread, as the quickness of
retrieval of the data using JSON path is of course important as well.
Using a schema roughly like this:

CREATE TABLE news_feed (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    data jsonb NOT NULL
);
CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);

I loaded in a data set of roughly 420,000 rows. Each row had all the
same keys but differing values (e.g. "length" and "content" as keys)

I tested a few different JSON path scenarios. Some of the index scans
performed way better than the equivalent sequential scans, for instance:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ == 200)';

SELECT *
FROM news_feed
WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';

Using the index outperformed the sequential scan (and parallel seq scan)
by ~10-100x based on my config + laptop hardware!

However, when I did something a little more complex, like the below:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ < 150)';

SELECT count(*)
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "^Start")';

SELECT id, jsonb_path_query(data, '$.content')
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';

I would find that the index scan performed as well as the sequential
scan. Additionally, on my laptop, the parallel sequential scan would
beat the index scan by ~2.5x in some cases.

Reading up on what the GIN patch does, this all makes sense: it's
optimized for equality, I understand there are challenges to be able to
handle inequality, regex exps, etc. And the cases where it really does
work well, it's _incredibly_ fast.

My suggestion would be adding some additional guidance in the user
documentation around how GIN works with the @@ and @? operators so they
can understand where GIN will work very well with JSON path + their data
and not be surprised when other types of JSON path queries are
performing on par with a sequential scan (or worse than a parallel seq
scan).

Thanks,

Jonathan

Attachment

signature.asc

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 16:55:20

Hi!

On Sun, Mar 17, 2019 at 7:46 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> Like Pavel, I did some basic testing of the patch (on current HEAD
> 4178d8b91c) trying out various JSON path expressions, and yes, it all
> worked. I had a brief scare while testing on 4178d8b91c where initdb was
> failing on the bootstrapping step, but after doing a thorough wipe of
> build files and my output directory, it seems to be initializing okay.
>
> I also did some testing of the GIN patch upthread, as the quickness of
> retrieval of the data using JSON path is of course important as well.

Thank you very much for testing!

> Using a schema roughly like this:
>
> CREATE TABLE news_feed (
>     id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
>     data jsonb NOT NULL
> );
> CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);
>
> I loaded in a data set of roughly 420,000 rows. Each row had all the
> same keys but differing values (e.g. "length" and "content" as keys)
>
> I tested a few different JSON path scenarios. Some of the index scans
> performed way better than the equivalent sequential scans, for instance:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ == 200)';
>
> SELECT *
> FROM news_feed
> WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';
>
> Using the index outperformed the sequential scan (and parallel seq scan)
> by ~10-100x based on my config + laptop hardware!

Great!

> However, when I did something a little more complex, like the below:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ < 150)';
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "^Start")';
>
> SELECT id, jsonb_path_query(data, '$.content')
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>
> I would find that the index scan performed as well as the sequential
> scan. Additionally, on my laptop, the parallel sequential scan would
> beat the index scan by ~2.5x in some cases.

Yeah, this cases are not supported.  Did optimizer automatically
select sequential scan in this case (if not touching enable_*
variables)?  It should, because optimizer understands that GIN scan
will be bad if extract_query method failed to extract anything.

> Reading up on what the GIN patch does, this all makes sense: it's
> optimized for equality, I understand there are challenges to be able to
> handle inequality, regex exps, etc. And the cases where it really does
> work well, it's _incredibly_ fast.

Yes, for more complex cases, we need different opclasses.  For
instance, we can consider porting jsquery opclasses to PG 13.  And it
become even more important to get parametrized opclasses, because we
don't necessary want to index all the json fields in this same way.
That's another challenge for future releases.  But what we have now is
just support for some of jsonpathes for existing opclasses.

> My suggestion would be adding some additional guidance in the user
> documentation around how GIN works with the @@ and @? operators so they
> can understand where GIN will work very well with JSON path + their data
> and not be surprised when other types of JSON path queries are
> performing on par with a sequential scan (or worse than a parallel seq
> scan).

Good point.  Will do.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

"Jonathan S. Katz"

Date:

17 March 2019, 17:00:29

On 3/17/19 12:55 PM, Alexander Korotkov wrote:
>
>> However, when I did something a little more complex, like the below:
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.length ? (@ < 150)';
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "^Start")';
>>
>> SELECT id, jsonb_path_query(data, '$.content')
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>>
>> I would find that the index scan performed as well as the sequential
>> scan. Additionally, on my laptop, the parallel sequential scan would
>> beat the index scan by ~2.5x in some cases.
>
> Yeah, this cases are not supported.  Did optimizer automatically
> select sequential scan in this case (if not touching enable_*
> variables)?  It should, because optimizer understands that GIN scan
> will be bad if extract_query method failed to extract anything.

It did not - it was doing a bitmap heap scan. I have default costs
setup. Example output from EXPLAIN ANALYZE with the index available:

 Aggregate  (cost=1539.78..1539.79 rows=1 width=8) (actual
time=270.419..270.419 rows=1 loops=1)
   ->  Bitmap Heap Scan on news_feed  (cost=23.24..1538.73 rows=418
width=0) (actual time=84.040..270.407 rows=5 loops=1)
         Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
         Rows Removed by Index Recheck: 418360
         Heap Blocks: exact=28690
         ->  Bitmap Index Scan on news_feed_data_gin_idx
(cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788
rows=418365 loops=1)
               Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
 Planning Time: 0.168 ms
 Execution Time: 271.105 ms

And for arguments sake, after I dropped the index (and
max_parallel_workers = 8):

 Finalize Aggregate  (cost=30998.07..30998.08 rows=1 width=8) (actual
time=91.062..91.062 rows=1 loops=1)
   ->  Gather  (cost=30997.65..30998.06 rows=4 width=8) (actual
time=90.892..97.739 rows=5 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Partial Aggregate  (cost=29997.65..29997.66 rows=1 width=8)
(actual time=76.977..76.977 rows=1 loops=5)
               ->  Parallel Seq Scan on news_feed  (cost=0.00..29997.39
rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5)
                     Filter: (data @? '$."length"?(@ < 150)'::jsonpath)
                     Rows Removed by Filter: 83672
 Planning Time: 0.127 ms
 Execution Time: 97.801 ms


>> Reading up on what the GIN patch does, this all makes sense: it's
>> optimized for equality, I understand there are challenges to be able to
>> handle inequality, regex exps, etc. And the cases where it really does
>> work well, it's _incredibly_ fast.
>
> Yes, for more complex cases, we need different opclasses.  For
> instance, we can consider porting jsquery opclasses to PG 13.  And it
> become even more important to get parametrized opclasses, because we
> don't necessary want to index all the json fields in this same way.
> That's another challenge for future releases.  But what we have now is
> just support for some of jsonpathes for existing opclasses.

Yeah, that makes sense, and seems to be my recollection from the several
years of presentations I've seen on the topic ;)

>> My suggestion would be adding some additional guidance in the user
>> documentation around how GIN works with the @@ and @? operators so they
>> can understand where GIN will work very well with JSON path + their data
>> and not be surprised when other types of JSON path queries are
>> performing on par with a sequential scan (or worse than a parallel seq
>> scan).
>
> Good point.  Will do.

Thanks!

Jonathan

Attachment

signature.asc

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 17:02:41

On Sun, Mar 17, 2019 at 8:00 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> On 3/17/19 12:55 PM, Alexander Korotkov wrote:
> >
> >> However, when I did something a little more complex, like the below:
> >>
> >> SELECT count(*)
> >> FROM news_feed
> >> WHERE data @? '$.length ? (@ < 150)';
> >>
> >> SELECT count(*)
> >> FROM news_feed
> >> WHERE data @? '$.content ? (@ like_regex "^Start")';
> >>
> >> SELECT id, jsonb_path_query(data, '$.content')
> >> FROM news_feed
> >> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
> >>
> >> I would find that the index scan performed as well as the sequential
> >> scan. Additionally, on my laptop, the parallel sequential scan would
> >> beat the index scan by ~2.5x in some cases.
> >
> > Yeah, this cases are not supported.  Did optimizer automatically
> > select sequential scan in this case (if not touching enable_*
> > variables)?  It should, because optimizer understands that GIN scan
> > will be bad if extract_query method failed to extract anything.
>
> It did not - it was doing a bitmap heap scan. I have default costs
> setup. Example output from EXPLAIN ANALYZE with the index available:
>
>  Aggregate  (cost=1539.78..1539.79 rows=1 width=8) (actual
> time=270.419..270.419 rows=1 loops=1)
>    ->  Bitmap Heap Scan on news_feed  (cost=23.24..1538.73 rows=418
> width=0) (actual time=84.040..270.407 rows=5 loops=1)
>          Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
>          Rows Removed by Index Recheck: 418360
>          Heap Blocks: exact=28690
>          ->  Bitmap Index Scan on news_feed_data_gin_idx
> (cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788
> rows=418365 loops=1)
>                Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
>  Planning Time: 0.168 ms
>  Execution Time: 271.105 ms
>
> And for arguments sake, after I dropped the index (and
> max_parallel_workers = 8):
>
>  Finalize Aggregate  (cost=30998.07..30998.08 rows=1 width=8) (actual
> time=91.062..91.062 rows=1 loops=1)
>    ->  Gather  (cost=30997.65..30998.06 rows=4 width=8) (actual
> time=90.892..97.739 rows=5 loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          ->  Partial Aggregate  (cost=29997.65..29997.66 rows=1 width=8)
> (actual time=76.977..76.977 rows=1 loops=5)
>                ->  Parallel Seq Scan on news_feed  (cost=0.00..29997.39
> rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5)
>                      Filter: (data @? '$."length"?(@ < 150)'::jsonpath)
>                      Rows Removed by Filter: 83672
>  Planning Time: 0.127 ms
>  Execution Time: 97.801 ms

Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

"Jonathan S. Katz"

Date:

17 March 2019, 17:06:04

On 3/17/19 1:02 PM, Alexander Korotkov wrote:
>
> Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

I just used "USING gin(col)" so jsonb_ops.

Thanks,

Jonathan

Attachment

signature.asc

Re: jsonpath

From

Alexander Korotkov

Date:

17 March 2019, 17:14:24

On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
> >
> > Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?
>
> I just used "USING gin(col)" so jsonb_ops.

I see.  So, jsonb_ops extracts from this query only existence of
.length key.  And I can bet it exists in all (or almost all) the
documents.  Thus, optimizer thinks index might be useful, while it's
useless.  There is not much can be done while we don't have statistics
for jsonb (and access to it from GIN extract_query).  So, for now we
can just refuse from extracting only keys from jsonpath in jsonb_ops.
But I think it would be better to just document this issue.  In future
we should improve that with statistics.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

"Jonathan S. Katz"

Date:

17 March 2019, 18:29:44

On 3/17/19 1:14 PM, Alexander Korotkov wrote:
> On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
>>>
>>> Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?
>>
>> I just used "USING gin(col)" so jsonb_ops.
>
> I see.  So, jsonb_ops extracts from this query only existence of
> .length key.  And I can bet it exists in all (or almost all) the
> documents.  Thus, optimizer thinks index might be useful, while it's
> useless.  There is not much can be done while we don't have statistics
> for jsonb (and access to it from GIN extract_query).  So, for now we
> can just refuse from extracting only keys from jsonpath in jsonb_ops.
> But I think it would be better to just document this issue.  In future
> we should improve that with statistics.

That seems to make sense, especially given how I've typically stored
JSON documents in PostgreSQL. It sounds like this particular problem
would be solved appropriately with JSONB statistics.

Thanks,

Jonathan

Attachment

signature.asc

Re: jsonpath

From

Nikita Glukhov

Date:

17 March 2019, 22:57:36

On 17.03.2019 21:29, Jonathan S. Katz wrote:

On 3/17/19 1:14 PM, Alexander Korotkov wrote:

On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:

On 3/17/19 1:02 PM, Alexander Korotkov wrote:

Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

I just used "USING gin(col)" so jsonb_ops.

I see.  So, jsonb_ops extracts from this query only existence of
.length key.  And I can bet it exists in all (or almost all) the
documents.  Thus, optimizer thinks index might be useful, while it's
useless.  There is not much can be done while we don't have statistics
for jsonb (and access to it from GIN extract_query).  So, for now we
can just refuse from extracting only keys from jsonpath in jsonb_ops.
But I think it would be better to just document this issue.  In future
we should improve that with statistics.

That seems to make sense, especially given how I've typically stored
JSON documents in PostgreSQL. It sounds like this particular problem
would be solved appropriately with JSONB statistics.

GIN jsonb_ops extracts from query
 data @? '$.length ? (@ < 150)'

the same GIN entries as from queries
 data @? '$.length' data ? 'length'


If you don't want to extract entries from unsupported expressions, you can try 
to use another jsonpath operator @@. Queries will also look like a bit simpler:
 data @@ '$.length < 150' data @@ '$.content like_regex "^Start"' data @@ '$.content like_regex "risk" flag "i"'

All this queries emit no GIN entries. But note that
 data @@ '$ ? (@.content == "foo").length < 150'

emits the same entries as
 data @@ '$.content == "foo"'



We already have a POC implementation of jsonb statistics that was written
2 years ago.  I rebased it onto the current master yesterday.  If it is
interesting, you can find it on my GitHub [1].  But note, that there is 
no support for  jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and
CONTAINS @> operators are supported.  Also there is no docs, and it works
slowly  (a more effective storage method for statistics of individual JSON
paths is needed).

Also there is ability to calculate derived statistics of expressions like
 js -> 'x' -> 0 -> 'y' js #> '{x,0,y}'

using jsonb statistics for columns "js". So the selectivity of expressions
 js -> 'x' -> 0 -> 'y' = '123' js #> '{x,0,y}' >= '123'

also can be estimated (but these expressions can't be used by index on "js").


This topic deserves a separate discussion.  I hope, we will start the 
corresponding thread for PG13 after we find a more effective way of jsonb 
statistics storing.


[1] https://github.com/glukhovn/postgres/tree/jsonb_stats

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Tom Lane

Date:

18 March 2019, 19:08:19

Just another minor bitch about this patch: jsonpath_scan.l has introduced
a typedef called "keyword".  This is causing pgindent to produce seriously
ugly results in libpq, and probably in other places where that is used as
a field or variable name.  Please rename that typedef to something less
generic.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

18 March 2019, 19:49:46

On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just another minor bitch about this patch: jsonpath_scan.l has introduced
> a typedef called "keyword".  This is causing pgindent to produce seriously
> ugly results in libpq, and probably in other places where that is used as
> a field or variable name.  Please rename that typedef to something less
> generic.

Ooops... I propose to rename it to KeyWord, which is already
typedef'ed in formatting.c.  See the attached patch.  Is it OK?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

jsonpath_rename_typedef.patch

Re: jsonpath

From

Tom Lane

Date:

18 March 2019, 20:23:03

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just another minor bitch about this patch: jsonpath_scan.l has introduced
>> a typedef called "keyword".  This is causing pgindent to produce seriously
>> ugly results in libpq, and probably in other places where that is used as
>> a field or variable name.  Please rename that typedef to something less
>> generic.

> Ooops... I propose to rename it to KeyWord, which is already
> typedef'ed in formatting.c.  See the attached patch.  Is it OK?

I had in mind JsonPathKeyword or something like that.  If you re-use
formatting.c's typedef name, pgindent won't care, but it's possible
you'd be in for unhappiness when trying to look at these structs in
gdb for instance.

            regards, tom lane

Re: jsonpath

From

Pavel Stehule

Date:

18 March 2019, 20:24:33

po 18. 3. 2019 v 21:23 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just another minor bitch about this patch: jsonpath_scan.l has introduced
>> a typedef called "keyword". This is causing pgindent to produce seriously
>> ugly results in libpq, and probably in other places where that is used as
>> a field or variable name. Please rename that typedef to something less
>> generic.

> Ooops... I propose to rename it to KeyWord, which is already
> typedef'ed in formatting.c. See the attached patch. Is it OK?

I had in mind JsonPathKeyword or something like that. If you re-use
formatting.c's typedef name, pgindent won't care, but it's possible
you'd be in for unhappiness when trying to look at these structs in
gdb for instance.

JsonPathKeyword is better verbose

Pavel

regards, tom lane

Re: jsonpath

From

John Naylor

Date:

19 March 2019, 09:23:31

On Sun, Feb 24, 2019 at 5:03 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote:
> > Why -CF, and why is -p repeated?
>
> BTW, for our SQL grammar we have
>
> > scan.c: FLEXFLAGS = -CF -p -p
>
> Is it kind of default?

I just saw this in the committed patch. This is not default, it's
chosen for maximum performance at the expense of binary/memory size.
That's fine, but with a little effort you can also make the scanner
non-backtracking for additional performance, as in the attached.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

jsonpath-scan-nobackup.patch

Re: jsonpath

From

Alexander Korotkov

Date:

19 March 2019, 10:41:42

On Mon, Mar 18, 2019 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Just another minor bitch about this patch: jsonpath_scan.l has introduced
> >> a typedef called "keyword".  This is causing pgindent to produce seriously
> >> ugly results in libpq, and probably in other places where that is used as
> >> a field or variable name.  Please rename that typedef to something less
> >> generic.
>
> > Ooops... I propose to rename it to KeyWord, which is already
> > typedef'ed in formatting.c.  See the attached patch.  Is it OK?
>
> I had in mind JsonPathKeyword or something like that.  If you re-use
> formatting.c's typedef name, pgindent won't care, but it's possible
> you'd be in for unhappiness when trying to look at these structs in
> gdb for instance.

Good point, thanks!  Pushed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

19 March 2019, 10:43:12

On Tue, Mar 19, 2019 at 12:23 PM John Naylor
<john.naylor@2ndquadrant.com> wrote:
> On Sun, Feb 24, 2019 at 5:03 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote:
> > > Why -CF, and why is -p repeated?
> >
> > BTW, for our SQL grammar we have
> >
> > > scan.c: FLEXFLAGS = -CF -p -p
> >
> > Is it kind of default?
>
> I just saw this in the committed patch. This is not default, it's
> chosen for maximum performance at the expense of binary/memory size.
> That's fine, but with a little effort you can also make the scanner
> non-backtracking for additional performance, as in the attached.

We're working on patchset improving flex scanner.  Already have
similar change among the others.  But thanks a lot for your attention!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

19 March 2019, 17:10:31

On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> > Why are we installing the jsonpath_gram.h file? It's not going to be
> > used by anything else, is it? TBH, I'm not sure I see why we're
> > installing the scanner.h file either.
>
> As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
> for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
> we'd be better off handling that need by compiling the .l file as part
> of the .y file, as we used to do with the core lexer and still do with
> several others (cf comments for commit 72b1e3a21); then we wouldn't
> even have to generate these files much less install them.
>
> A quick look at jsonpath_scan.c shows that it's pretty innocent of
> the tricks we've learned over the years for flex/bison portability;
> in particular I see that it's #including <stdio.h> before postgres.h,
> which is a no-no.  So that whole area needs more review anyway.

Attached patch is getting rid of jsonpath_gram.h.  Would like to see
results of http://commitfest.cputube.org/ before committing, because
I'm not available to test this of windows machine.  There would be
further patches rearranging jsonpath_gram.y and jsonpath_scan.l.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-get-rid-of-jsonpath_gram_h.patch

Re: jsonpath

From

Alexander Korotkov

Date:

21 March 2019, 13:58:46

On Tue, Mar 19, 2019 at 8:10 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> > > Why are we installing the jsonpath_gram.h file? It's not going to be
> > > used by anything else, is it? TBH, I'm not sure I see why we're
> > > installing the scanner.h file either.
> >
> > As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
> > for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
> > we'd be better off handling that need by compiling the .l file as part
> > of the .y file, as we used to do with the core lexer and still do with
> > several others (cf comments for commit 72b1e3a21); then we wouldn't
> > even have to generate these files much less install them.
> >
> > A quick look at jsonpath_scan.c shows that it's pretty innocent of
> > the tricks we've learned over the years for flex/bison portability;
> > in particular I see that it's #including <stdio.h> before postgres.h,
> > which is a no-no.  So that whole area needs more review anyway.
>
> Attached patch is getting rid of jsonpath_gram.h.  Would like to see
> results of http://commitfest.cputube.org/ before committing, because
> I'm not available to test this of windows machine.  There would be
> further patches rearranging jsonpath_gram.y and jsonpath_scan.l.

Attaches patches improving jsonpath parser.  First one introduces
cosmetic changes, while second gets rid of backtracking.  I'm also
planning to add high-level comment for both grammar and lexer.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Nikita Glukhov

Date:

22 March 2019, 00:15:07

Hi.

Attached patch enables throwing of errors in jsonb_path_match() in its 
non-silent mode when the jsonpath expression failed to return a singleton
boolean.  Previously, NULL was always returned, and it seemed to be 
inconsistent with the behavior of other functions, in which structural and 
other errors were not suppressed in non-silent mode.


We also think that jsonb_path_match() needs to be renamed, because its 
current name is confusing to many users.  "Match" name is more suitable for
jsonpath-based pattern matching like that (maybe we'll implement it later):
 jsonb_path_match(   '{ "a": 1, "b": 2,    "c": "str" }',   '{ "a": 1, "b": @ > 1, * : @.type == "string" }' )

Below are some possible name variants:
 jsonb_path_predicate() (original name) jsonb_path_pred() jsonb_path_test() jsonb_path_check() jsonb_path_bool()

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-Throw-error-in-jsonb_path_match-when-silent-is-false.patch

Re: jsonpath

From

John Naylor

Date:

22 March 2019, 02:38:16

On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attaches patches improving jsonpath parser.  First one introduces
> cosmetic changes, while second gets rid of backtracking.  I'm also
> planning to add high-level comment for both grammar and lexer.

The cosmetic changes look good to me. I just noticed a couple things
about the comments.

0001:

+/* Check if current scanstring value constitutes a keyword */

'is a keyword' is better. 'Constitutes' implies parts of a whole.

+ * Resize scanstring for appending of given length.  Reinitilize if required.

s/Reinitilize/Reinitialize/

The first sentence is not entirely clear to me.


0002:

These two rules are not strictly necessary:

<xnq,xq,xvq,xsq>{unicode}+\\ {
/* throw back the \\, and treat as unicode */
yyless(yyleng - 1);
parseUnicode(yytext, yyleng);
}

<xnq,xq,xvq,xsq>{hex_char}+\\ {
/* throw back the \\, and treat as hex */
yyless(yyleng - 1);
parseHexChars(yytext, yyleng);
}

...and only seem to be there because of how these are written:

<xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); }
<xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); }
<xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode
sequence is invalid"); }
<xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character
sequence is invalid"); }

I don't understand the reasoning here -- is it a micro-optimization?
The following is simpler, allow the rules I mentioned to be removed,
and make check still passes. I would prefer it unless there is a
performance penalty, in which case a comment to describe the
additional complexity would be helpful.

<xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); }
<xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); }
<xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); }
<xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is
invalid"); }

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Oleg Bartunov

Date:

22 March 2019, 03:46:35

On Fri, 22 Mar 2019, 03:14 Nikita Glukhov, <n.gluhov@postgrespro.ru> wrote:

Hi.

Attached patch enables throwing of errors in jsonb_path_match() in its 
non-silent mode when the jsonpath expression failed to return a singleton
boolean.  Previously, NULL was always returned, and it seemed to be 
inconsistent with the behavior of other functions, in which structural and 
other errors were not suppressed in non-silent mode.


We also think that jsonb_path_match() needs to be renamed, because its 
current name is confusing to many users.  "Match" name is more suitable for
jsonpath-based pattern matching like that (maybe we'll implement it later):
 jsonb_path_match(   '{ "a": 1, "b": 2,    "c": "str" }',   '{ "a": 1, "b": @ > 1, * : @.type == "string" }' )

Would be very useful for constraints.

Below are some possible name variants:
 jsonb_path_predicate() (original name)

Too long

  jsonb_path_pred() jsonb_path_test() jsonb_path_check()

Check is good to me

 jsonb_path_bool()

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Nikita Glukhov

Date:

22 March 2019, 11:48:30

On 21.03.2019 16:58, Alexander Korotkov wrote:

On Tue, Mar 19, 2019 at 8:10 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Attaches patches improving jsonpath parser.  First one introduces
cosmetic changes, while second gets rid of backtracking.  I'm also
planning to add high-level comment for both grammar and lexer.

Parsing of integers now is wrong: neither JSON specification, nor our json[b]
allow leading zeros in integers and floats starting with a dot.

=# SELECT json '.1';
ERROR:  invalid input syntax for type json
LINE 1: SELECT jsonb '.1';                    ^
DETAIL:  Token "." is invalid.
CONTEXT:  JSON data, line 1: ....

=# SELECT json '00';
ERROR:  invalid input syntax for type json
LINE 1: SELECT jsonb '00';                    ^
DETAIL:  Token "00" is invalid.
CONTEXT:  JSON data, line 1: 00

=# SELECT json '00.1';
ERROR:  invalid input syntax for type json
LINE 1: SELECT jsonb '00.1';                    ^
DETAIL:  Token "00" is invalid.
CONTEXT:  JSON data, line 1: 00...


In JavaScript integers with leading zero are treated as octal numbers,
but leading dot is a allowed:

v8 > 0123
83
v8 > 000.1
Uncaught SyntaxError: Unexpected number
v8> .1
0.1

Attached patch 0003 fixes this issue.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0003-Fix-parsing-of-numbers-in-jsonpath.patch

Re: jsonpath

From

Alexander Korotkov

Date:

22 March 2019, 12:14:10

On Fri, Mar 22, 2019 at 5:38 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attaches patches improving jsonpath parser.  First one introduces
> > cosmetic changes, while second gets rid of backtracking.  I'm also
> > planning to add high-level comment for both grammar and lexer.
>
> The cosmetic changes look good to me. I just noticed a couple things
> about the comments.
>
> 0001:
>
> +/* Check if current scanstring value constitutes a keyword */
>
> 'is a keyword' is better. 'Constitutes' implies parts of a whole.
>
> + * Resize scanstring for appending of given length.  Reinitilize if required.
>
> s/Reinitilize/Reinitialize/
>
> The first sentence is not entirely clear to me.

Thank you, fixed.

> 0002:
>
> These two rules are not strictly necessary:
>
> <xnq,xq,xvq,xsq>{unicode}+\\ {
> /* throw back the \\, and treat as unicode */
> yyless(yyleng - 1);
> parseUnicode(yytext, yyleng);
> }
>
> <xnq,xq,xvq,xsq>{hex_char}+\\ {
> /* throw back the \\, and treat as hex */
> yyless(yyleng - 1);
> parseHexChars(yytext, yyleng);
> }
>
> ...and only seem to be there because of how these are written:
>
> <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode
> sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character
> sequence is invalid"); }
>
> I don't understand the reasoning here -- is it a micro-optimization?
> The following is simpler, allow the rules I mentioned to be removed,
> and make check still passes. I would prefer it unless there is a
> performance penalty, in which case a comment to describe the
> additional complexity would be helpful.
>
> <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is
> invalid"); }

These rules are needed for unicode.  Sequential escaped unicode
characters might be connected by hi surrogate value.  See
jsonpath_encoding regression test in attached patch.

Regarding hex, I made it so for the sake of uniformity.  But I changed
my mind and decided that simpler flex rules are more important.  So,
now they are considered one-by-one.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 3/27/19 9:48 AM, Tom Lane wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>> Still no reproduction.
> Annoying, but it's probably not worth expending more effort on
> right now.  I wonder whether that buildfarm animal can be upgraded
> to capture core dump stack traces --- if so, then if it happens
> again we'd have more info.
>
>             



I was able to get this stack trace.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

jsonpath.trace

Re: jsonpath

From

Tom Lane

Date:

28 March 2019, 05:01:35

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> I was able to get this stack trace.
>
> (gdb) bt
> #0  0x00007ffb9ce6a458 in ntdll!RtlRaiseStatus ()
>    from C:\WINDOWS\SYSTEM32\ntdll.dll
> #1  0x00007ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll
> #2  0x00007ffb9ac52e1a in msvcrt!_setjmpex ()
>    from C:\WINDOWS\System32\msvcrt.dll
> #3  0x000000000087431a in pg_re_throw ()
>     at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720
> #4  0x0000000000874106 in errfinish (dummy=<optimized out>)
>     at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464
> #5  0x00000000007cc938 in jsonpath_yyerror (result=result@entry=0x0,
>     message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag of LIKE_REGEX predicate")
>     at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305
> #6  0x00000000007cec9d in makeItemLikeRegex (pattern=<optimized out>,
>     pattern=<optimized out>, flags=<optimized out>, expr=0x73c7a80)
>     at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512

Hmm.  Reaching the yyerror call is expected given this input, but
seemingly the siglongjmp stack has been trashed?  The best I can
think of is a wild store that either occurs only on this platform
or happens to be harmless elsewhere ... but neither idea is terribly
convincing.

BTW, the expected behavior according to the regression test is

regression=# select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath;
ERROR:  bad jsonpath representation
LINE 1: select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath;
               ^
DETAIL:  unrecognized flag of LIKE_REGEX predicate at or near """

which leaves quite a lot to be desired already.  The "bad whatever"
error wording is a flat out violation of our message style guide,
while the "at or near """" bit is pretty darn unhelpful.

The latter problem occurs because the last flex production was

<xq>\"                          {
                                    yylval->str = scanstring;
                                    BEGIN INITIAL;
                                    return STRING_P;
                                }

so that flex thinks the last token was just the quote mark ending the
string.  This could be improved on by adopting something similar to the
SET_YYLLOC() convention used in scan.l to remember the start of what the
user would think the token is.  Probably it's not worth the work right
now, but details like this are important from a fit-and-finish
perspective, so I'd like to see it improved sometime.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

28 March 2019, 09:38:30

On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On 3/27/19 9:48 AM, Tom Lane wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >> Still no reproduction.
> > Annoying, but it's probably not worth expending more effort on
> > right now.  I wonder whether that buildfarm animal can be upgraded
> > to capture core dump stack traces --- if so, then if it happens
> > again we'd have more info.
>
> I was able to get this stack trace.

Thank you very much!  It appears to be hard to investigate this even
with backtrace.  You told you can almost reliably reproduce this.
Could you please, find exact commit caused this error using "git
bisect"?  I would very appreciate this.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Andrew Dunstan

Date:

28 March 2019, 11:45:16

On 3/28/19 1:01 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> I was able to get this stack trace.
>>
>> (gdb) bt
>> #0  0x00007ffb9ce6a458 in ntdll!RtlRaiseStatus ()
>>    from C:\WINDOWS\SYSTEM32\ntdll.dll
>> #1  0x00007ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll
>> #2  0x00007ffb9ac52e1a in msvcrt!_setjmpex ()
>>    from C:\WINDOWS\System32\msvcrt.dll
>> #3  0x000000000087431a in pg_re_throw ()
>>     at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720
>> #4  0x0000000000874106 in errfinish (dummy=<optimized out>)
>>     at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464
>> #5  0x00000000007cc938 in jsonpath_yyerror (result=result@entry=0x0, 
>>     message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag of LIKE_REGEX predicate")
>>     at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305
>> #6  0x00000000007cec9d in makeItemLikeRegex (pattern=<optimized out>, 
>>     pattern=<optimized out>, flags=<optimized out>, expr=0x73c7a80)
>>     at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512
> Hmm.  Reaching the yyerror call is expected given this input, but
> seemingly the siglongjmp stack has been trashed?  The best I can
> think of is a wild store that either occurs only on this platform
> or happens to be harmless elsewhere ... but neither idea is terribly
> convincing.



Further data point: if I just call the offending statement alone,
there's no crash. The crash only occurs when I call the whole script.
I'll see if I can triangulate a bit to get a minimal test case.


cheers


andrew



-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Andrew Dunstan

Date:

28 March 2019, 12:24:57

On 3/28/19 5:38 AM, Alexander Korotkov wrote:
> On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
>> On 3/27/19 9:48 AM, Tom Lane wrote:
>>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>>>> Still no reproduction.
>>> Annoying, but it's probably not worth expending more effort on
>>> right now.  I wonder whether that buildfarm animal can be upgraded
>>> to capture core dump stack traces --- if so, then if it happens
>>> again we'd have more info.
>> I was able to get this stack trace.
> Thank you very much!  It appears to be hard to investigate this even
> with backtrace.  You told you can almost reliably reproduce this.
> Could you please, find exact commit caused this error using "git
> bisect"?  I would very appreciate this.
>


I'll try. It's time consuming given how long builds take.


Here's an interesting data point. If I run the whole jsonpath.sql script
it crashes every time. If I run just the offending statement it crashes
exactly every other time. It looks like in that case something gets
clobbered and then cleared.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Alexander Korotkov

Date:

28 March 2019, 12:49:23

On Thu, Mar 28, 2019 at 3:25 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On 3/28/19 5:38 AM, Alexander Korotkov wrote:
> > On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan
> > <andrew.dunstan@2ndquadrant.com> wrote:
> >> On 3/27/19 9:48 AM, Tom Lane wrote:
> >>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >>>> Still no reproduction.
> >>> Annoying, but it's probably not worth expending more effort on
> >>> right now.  I wonder whether that buildfarm animal can be upgraded
> >>> to capture core dump stack traces --- if so, then if it happens
> >>> again we'd have more info.
> >> I was able to get this stack trace.
> > Thank you very much!  It appears to be hard to investigate this even
> > with backtrace.  You told you can almost reliably reproduce this.
> > Could you please, find exact commit caused this error using "git
> > bisect"?  I would very appreciate this.
>
> I'll try. It's time consuming given how long builds take.

Thank you very much for your efforts!

> Here's an interesting data point. If I run the whole jsonpath.sql script
> it crashes every time. If I run just the offending statement it crashes
> exactly every other time. It looks like in that case something gets
> clobbered and then cleared.

Could you clarify this a bit?  What is exactly every other time?  Do
you mean it doesn't crash first time, but crashes second time?  Do you
run each offending statement in the separate session?  Or do you run
multiple offending statements in the same session?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Andrew Dunstan

Date:

28 March 2019, 13:10:51

On 3/28/19 8:49 AM, Alexander Korotkov wrote:
>
>> Here's an interesting data point. If I run the whole jsonpath.sql script
>> it crashes every time. If I run just the offending statement it crashes
>> exactly every other time. It looks like in that case something gets
>> clobbered and then cleared.
> Could you clarify this a bit?  What is exactly every other time?  Do
> you mean it doesn't crash first time, but crashes second time?  Do you
> run each offending statement in the separate session?  Or do you run
> multiple offending statements in the same session?
>

I mean repeated invocations of


    psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath"


These get crash, no crash, crash, no crash ...



cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tom Lane

Date:

28 March 2019, 13:31:14

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> I mean repeated invocations of
>     psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath"
> These get crash, no crash, crash, no crash ...

That is just wacko ... but it does seem to support my thought of
a wild store somewhere.  The mechanism for this case might be
that memory layout is different depending on whether we had to
rebuild the relcache init file at session start or not.  Similarly,
the fact that the full test script reliably crashes might be
dependent on previous commands having left things in a certain
state.  Unfortunately that gets us little closer to a fix.

Has anybody gotten through a valgrind run on this code yet?

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

28 March 2019, 13:33:44

On Thu, Mar 28, 2019 at 4:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> > I mean repeated invocations of
> >     psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath"
> > These get crash, no crash, crash, no crash ...
>
> That is just wacko ... but it does seem to support my thought of
> a wild store somewhere.  The mechanism for this case might be
> that memory layout is different depending on whether we had to
> rebuild the relcache init file at session start or not.  Similarly,
> the fact that the full test script reliably crashes might be
> dependent on previous commands having left things in a certain
> state.  Unfortunately that gets us little closer to a fix.
>
> Has anybody gotten through a valgrind run on this code yet?

I've [1].  Find single backtrace, but it doesn't seem to be related to
our issue.

1. https://www.postgresql.org/message-id/CAPpHfdsgyPKbaqOsJ4tyC97Ybpm69eGLHzBGLKYXsfJi%3Dc-fjA%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Andres Freund

Date:

28 March 2019, 13:34:15


On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> I mean repeated invocations of
>>     psql -c "select '$ ? (@ like_regex \"pattern\" flag
>\"a\")'::jsonpath"
>> These get crash, no crash, crash, no crash ...
>
>That is just wacko ... but it does seem to support my thought of
>a wild store somewhere.  The mechanism for this case might be
>that memory layout is different depending on whether we had to
>rebuild the relcache init file at session start or not.  Similarly,
>the fact that the full test script reliably crashes might be
>dependent on previous commands having left things in a certain
>state.  Unfortunately that gets us little closer to a fix.
>
>Has anybody gotten through a valgrind run on this code yet?

Skink has successfully passed since - but that's x86...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: jsonpath

From

Tom Lane

Date:

28 March 2019, 13:50:27

Andres Freund <andres@anarazel.de> writes:
> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Has anybody gotten through a valgrind run on this code yet?

> Skink has successfully passed since - but that's x86...

Yeah, there is a depressingly high chance that this is somehow specific
to the bison version, flex version, and/or compiler in use on jacana.

            regards, tom lane

Re: jsonpath

From

Andrew Dunstan

Date:

28 March 2019, 16:43:10

On 3/28/19 9:50 AM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Has anybody gotten through a valgrind run on this code yet?
>> Skink has successfully passed since - but that's x86...
> Yeah, there is a depressingly high chance that this is somehow specific
> to the bison version, flex version, and/or compiler in use on jacana.
>
>             



lousyjack has also passed it (x64).


git bisect on jacana blames commit 550b9d26f.


cheers


andrew


-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Alexander Korotkov

Date:

29 March 2019, 13:15:40

On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On 3/28/19 9:50 AM, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Has anybody gotten through a valgrind run on this code yet?
> >> Skink has successfully passed since - but that's x86...
> > Yeah, there is a depressingly high chance that this is somehow specific
> > to the bison version, flex version, and/or compiler in use on jacana.
>
> lousyjack has also passed it (x64).
>
> git bisect on jacana blames commit 550b9d26f.

Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l
compile at once.  I've re-read this commit and didn't find anything
suspicious.
I've asked Andrew for access to jacana in order to investigate this myself.



------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Alexander Korotkov

Date:

30 March 2019, 15:25:12

On Fri, Mar 29, 2019 at 4:15 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
> > On 3/28/19 9:50 AM, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >>> Has anybody gotten through a valgrind run on this code yet?
> > >> Skink has successfully passed since - but that's x86...
> > > Yeah, there is a depressingly high chance that this is somehow specific
> > > to the bison version, flex version, and/or compiler in use on jacana.
> >
> > lousyjack has also passed it (x64).
> >
> > git bisect on jacana blames commit 550b9d26f.
>
> Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l
> compile at once.  I've re-read this commit and didn't find anything
> suspicious.
> I've asked Andrew for access to jacana in order to investigate this myself.

I'm going to push there 3 attached patches for jsonpath.

1st one is revised patch implementing GIN index support for jsonpath.
Based on feedback from Jonathan Katz, I decided to restrict jsonb_ops
from querying only case.  Now, jsonb_ops also works on if
"accessors_chain = const" statement is found.  Keys are frequently not
selective.  Given we have no statistics of them, purely key GIN
queries are likely confuse optimizer making it select inefficient
plan.  Actually, jsonb_ops may be used for pure key query when ?
operator is used.  But in this case, user explicitly searches for key.
With jsonpath, purely key GIN searches can easily happen unintended.
So, restrict that.

2nd and 3rd patches are from Nikita Glukhov upthread.  2nd restrict
some cases in parsing numerics.  3rd make jsonb_path_match() function
throw error when result is not single boolean and silent mode is off.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On Fri, Mar 29, 2019 at 4:15 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
> > On 3/28/19 9:50 AM, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >>> Has anybody gotten through a valgrind run on this code yet?
> > >> Skink has successfully passed since - but that's x86...
> > > Yeah, there is a depressingly high chance that this is somehow specific
> > > to the bison version, flex version, and/or compiler in use on jacana.
> >
> > lousyjack has also passed it (x64).
> >
> > git bisect on jacana blames commit 550b9d26f.
>
> Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l
> compile at once.  I've re-read this commit and didn't find anything
> suspicious.
> I've asked Andrew for access to jacana in order to investigate this myself.

Thanks to Andrew I got access to jacana and made some investigation.

At first, I found that existence of separate jsonpath_gram.h doesn't
influence the situation.  If have jsonpath_gram.h generated, test
still fails if compile jsonpath_gram.c and jsonpath_scan.c together.
But if build them separately, error is gone.

Then I did following trick: build jsonpath_gram.c and jsonpath_scan.c
separately, but copy contents of  jsonpath_gram.c to the top of
jsonpath_scan.c.  I also renamed yyparse to yyparse2 in the copy of
jsonpath_gram.c in order to make jsonpath_scan.c use another copy of
this function defined in the separate file.  Then test fails again.
After that, I found if I remove contents of yyparse2 function, then
test passes OK.  See versions of jsonpath_scan.c attached.

Thus, contents of unused function makes test fail or pass.  So far, it
looks like a compiler bug.  Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attached is a patch to fix some minor issues:

-misspelling of an error message
-Commit 550b9d26f80f failed to update the Makefile comment to reflect
how the build changed, and also removed the clean target, which we now
have use for since we later got rid of backtracking in the scanner.

Also, while I have the thought in my head, for v13 we should consider
replacing the keyword binary search with the perfect hash technique
added in c64d0cd5ce2 -- it might give a small performance boost to the
scanner.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

json-path-minor-fixes.patch

Re: jsonpath

From

Tom Lane

Date:

17 April 2019, 17:43:25

John Naylor <john.naylor@2ndquadrant.com> writes:
> Attached is a patch to fix some minor issues:
> -misspelling of an error message

Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
needs a sweep to bring its error messages into line with our style
guidelines, but no harm in starting with the obvious bugs.

> -Commit 550b9d26f80f failed to update the Makefile comment to reflect
> how the build changed, and also removed the clean target, which we now
> have use for since we later got rid of backtracking in the scanner.

Right.  I'm not really sure why we're bothering with anti-backtracking
here, or with using speed-rather-than-code-space lexer optimization
options.  It's hard for me to credit that any practically-useful jsonpath
pattern would be long enough for lexer speed to matter, and even harder to
credit that the speed of the flex code itself would be an important factor
in the overall processing cost of a long jsonpath.  Still, as long as we
have the code it needs to be right.

> Also, while I have the thought in my head, for v13 we should consider
> replacing the keyword binary search with the perfect hash technique
> added in c64d0cd5ce2 -- it might give a small performance boost to the
> scanner.

I doubt it's worth the trouble, per above.

Patch LGTM, pushed.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

17 April 2019, 20:14:44

On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <john.naylor@2ndquadrant.com> writes:
> > Attached is a patch to fix some minor issues:
> > -misspelling of an error message
>
> Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> needs a sweep to bring its error messages into line with our style
> guidelines, but no harm in starting with the obvious bugs.

I'll go trough the jsonpath error messages and post a patch for fixing them.

> > -Commit 550b9d26f80f failed to update the Makefile comment to reflect
> > how the build changed, and also removed the clean target, which we now
> > have use for since we later got rid of backtracking in the scanner.
>
> Right.  I'm not really sure why we're bothering with anti-backtracking
> here, or with using speed-rather-than-code-space lexer optimization
> options.  It's hard for me to credit that any practically-useful jsonpath
> pattern would be long enough for lexer speed to matter, and even harder to
> credit that the speed of the flex code itself would be an important factor
> in the overall processing cost of a long jsonpath.  Still, as long as we
> have the code it needs to be right.

Actually I found that non of in-core lexers are backtracking.  So, I
understood no backtracking as kind of standard and didn't want to
break that :)

Nevertheless, I could imagine use-case involving parsing a lot of
jsonpath'es.  For example we may construct jsonpath based on table
data and check that for just few jsonb's.  For sure, that wouldn't be
a common use-case, but still.

> > Also, while I have the thought in my head, for v13 we should consider
> > replacing the keyword binary search with the perfect hash technique
> > added in c64d0cd5ce2 -- it might give a small performance boost to the
> > scanner.
>
> I doubt it's worth the trouble, per above.
>
> Patch LGTM, pushed.

Thank you for pushing this!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

John Naylor

Date:

18 April 2019, 01:08:36

On Thu, Apr 18, 2019 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> writes:
> > Attached is a patch to fix some minor issues:
> > -misspelling of an error message
>
> Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> needs a sweep to bring its error messages into line with our style
> guidelines, but no harm in starting with the obvious bugs.
>
> > -Commit 550b9d26f80f failed to update the Makefile comment to reflect
> > how the build changed, and also removed the clean target, which we now
> > have use for since we later got rid of backtracking in the scanner.
>
> Right.  I'm not really sure why we're bothering with anti-backtracking
> here, or with using speed-rather-than-code-space lexer optimization
> options.  It's hard for me to credit that any practically-useful jsonpath
> pattern would be long enough for lexer speed to matter, and even harder to
> credit that the speed of the flex code itself would be an important factor
> in the overall processing cost of a long jsonpath.  Still, as long as we
> have the code it needs to be right.

I was wondering about that. I measured the current size of
yy_transition to be 36492 on my machine. With the flag -Cfe, which
gives the smallest representation without backtracking, yy_nxt is 6336
(there is no yy_transition). I'd say that's a large enough difference
that we'd want the smaller representation if it makes little
difference in performance.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Alexander Korotkov

Date:

20 April 2019, 17:54:45

On Wed, Apr 17, 2019 at 11:14 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > John Naylor <john.naylor@2ndquadrant.com> writes:
> > > Attached is a patch to fix some minor issues:
> > > -misspelling of an error message
> >
> > Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> > needs a sweep to bring its error messages into line with our style
> > guidelines, but no harm in starting with the obvious bugs.
>
> I'll go trough the jsonpath error messages and post a patch for fixing them.
>
> > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect
> > > how the build changed, and also removed the clean target, which we now
> > > have use for since we later got rid of backtracking in the scanner.
> >
> > Right.  I'm not really sure why we're bothering with anti-backtracking
> > here, or with using speed-rather-than-code-space lexer optimization
> > options.  It's hard for me to credit that any practically-useful jsonpath
> > pattern would be long enough for lexer speed to matter, and even harder to
> > credit that the speed of the flex code itself would be an important factor
> > in the overall processing cost of a long jsonpath.  Still, as long as we
> > have the code it needs to be right.
>
> Actually I found that non of in-core lexers are backtracking.  So, I
> understood no backtracking as kind of standard and didn't want to
> break that :)
>
> Nevertheless, I could imagine use-case involving parsing a lot of
> jsonpath'es.  For example we may construct jsonpath based on table
> data and check that for just few jsonb's.  For sure, that wouldn't be
> a common use-case, but still.
>
> > > Also, while I have the thought in my head, for v13 we should consider
> > > replacing the keyword binary search with the perfect hash technique
> > > added in c64d0cd5ce2 -- it might give a small performance boost to the
> > > scanner.
> >
> > I doubt it's worth the trouble, per above.
> >
> > Patch LGTM, pushed.
>
> Thank you for pushing this!

I went trough the jsonpath errors and made some corrections.  See the
attached patch.

One thing makes me uneasy.  jsonpath_yyerror() substitutes its
"message" argument to the errdetail().  Out style guide requires
errdetails to be complete sentences, while bison passes
non-capitalized error messages.  Should we bother capitalize first
character of "message"?  I wonder how "portable" this solution would
be for various languages.  cubescan.l reports error in the similar way
to jsonpath and doesn't bother about making errdetail a complete
sentence.  Or should we move bison error to errmsg()?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

jsonpath-errors-improve-1.patch

Re: jsonpath

From

Alexander Korotkov

Date:

20 April 2019, 18:01:17

On Thu, Apr 18, 2019 at 4:09 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> On Thu, Apr 18, 2019 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > John Naylor <john.naylor@2ndquadrant.com> writes:
> > > Attached is a patch to fix some minor issues:
> > > -misspelling of an error message
> >
> > Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> > needs a sweep to bring its error messages into line with our style
> > guidelines, but no harm in starting with the obvious bugs.
> >
> > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect
> > > how the build changed, and also removed the clean target, which we now
> > > have use for since we later got rid of backtracking in the scanner.
> >
> > Right.  I'm not really sure why we're bothering with anti-backtracking
> > here, or with using speed-rather-than-code-space lexer optimization
> > options.  It's hard for me to credit that any practically-useful jsonpath
> > pattern would be long enough for lexer speed to matter, and even harder to
> > credit that the speed of the flex code itself would be an important factor
> > in the overall processing cost of a long jsonpath.  Still, as long as we
> > have the code it needs to be right.
>
> I was wondering about that. I measured the current size of
> yy_transition to be 36492 on my machine. With the flag -Cfe, which
> gives the smallest representation without backtracking, yy_nxt is 6336
> (there is no yy_transition). I'd say that's a large enough difference
> that we'd want the smaller representation if it makes little
> difference in performance.

Did I understand correctly that you've tried the same version of
jsonpath_scan.l with different flex flags?  Did you also notice if
changes 1d88a75c made to jsonpath_scan.l have singnificant influence?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Tom Lane

Date:

21 April 2019, 22:39:47

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
>>> needs a sweep to bring its error messages into line with our style
>>> guidelines, but no harm in starting with the obvious bugs.

> I went trough the jsonpath errors and made some corrections.  See the
> attached patch.

Please don't do this sort of change:

-            elog(ERROR, "unrecognized jsonpath item type: %d", item->type);
+            ereport(ERROR,
+                    (errcode(ERRCODE_INTERNAL_ERROR),
+                     errmsg("unrecognized jsonpath item type: %d", item->type)));

elog() is the appropriate thing for shouldn't-happen internal errors like
these.  The only thing you've changed here, aside from making the source
code longer, is to expose the error message for translation ... which is
really just wasting translators' time.  Only messages we actually think
users might need to deal with should be exposed for translation.

@@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
                     ereport(ERROR,
                             (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \
                              errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND),
-                             errdetail("JSON object does not contain key %s",
+                             errdetail("JSON object does not contain key %s.",
                                        keybuf.data)));
                 }
             }

OK as far as it went, but you should also put double quotes around the %s.
(I also noticed some messages that are using single-quotes around
interpolated strings, which is not the project standard either.)

Other specific things I wanted to see fixed:

* jsonpath_scan.l has some messages like "bad ..." which is not project
style; use "invalid" or "unrecognized".  (There's probably no good
reason not to use the same string "invalid input syntax for type jsonpath"
that is used elsewhere.)

* This in jsonpath_gram.y is quite unhelpful:

                yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate");

since it doesn't tell you what flag character it doesn't like
(and the error positioning info isn't accurate enough to let the
user figure that out).  It really needs to be something more like
"unrecognized flag character \"%c\" in LIKE_REGEX predicate".
That probably means you can't use yyerror for this, but I don't
think yyerror was providing any useful functionality anyway :-(

More generally, I'm not very much on board with this coding technique:

/* Standard error message for SQL/JSON errors */
#define ERRMSG_JSON_ARRAY_NOT_FOUND            "SQL/JSON array not found"

...

                RETURN_ERROR(ereport(ERROR,
                                     (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
                                      errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND),
                                      errdetail("Jsonpath wildcard array accessor "

In the first place, I'm not certain that this will result in the error
message being translatable --- do the gettext tools know how to expand
macros?

In the second place, the actual strings are just restatements of their
ERRMSG macro names, which IMO is not conformant to our message style,
but it's too hard to see that from source code like this.  Also this
style is pretty unworkable/unfriendly if the message needs to contain
any %-markers, so I suspect that having a coding style like this may be
discouraging you from providing values in places where it'd be helpful to
do so.  What I actually see happening as a consequence of this approach is
that you're pushing the useful information off to an errdetail, which is
not really helpful and it's not per project style either.  The idea is to
make the primary message as helpful as possible without being long, not
to make it a simple restatement of the SQLSTATE that nobody can understand
without also looking at the errdetail.

In the third place, this makes it hard for people to grep for occurrences
of an error string in our source code.

And in the fourth place, we don't do this elsewhere; it does not help
anybody for jsonpath to invent its own coding conventions that are unlike
the rest of Postgres.

So I think you should drop the ERRMSG_xxx macros, write out these error
messages where they are used, and rethink your use of errmsg vs. errdetail.

Along the same line of not making it unnecessarily hard for people to grep
for error texts, it's best not to split texts across lines like this:

        RETURN_ERROR(ereport(ERROR,
                             (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT),
                              errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT),
                              errdetail("Jsonpath array subscript is not a "
                                        "singleton numeric value."))));

Somebody grepping for "not a singleton" would not get a hit on that, which
could be quite misleading if they do get hits elsewhere.  I think for the
most part people have decided that it's better to have overly long source
lines than to break up error message literals.  It's especially pointless
to break up source lines when the result still doesn't fit in 80 columns.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

21 April 2019, 23:50:14

Hi!

Thank you for your review!

On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
>                 RETURN_ERROR(ereport(ERROR,
>                                      (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
>                                       errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND),
>                                       errdetail("Jsonpath wildcard array accessor "
>
> In the first place, I'm not certain that this will result in the error
> message being translatable --- do the gettext tools know how to expand
> macros?
>
> In the second place, the actual strings are just restatements of their
> ERRMSG macro names, which IMO is not conformant to our message style,
> but it's too hard to see that from source code like this.  Also this
> style is pretty unworkable/unfriendly if the message needs to contain
> any %-markers, so I suspect that having a coding style like this may be
> discouraging you from providing values in places where it'd be helpful to
> do so.  What I actually see happening as a consequence of this approach is
> that you're pushing the useful information off to an errdetail, which is
> not really helpful and it's not per project style either.  The idea is to
> make the primary message as helpful as possible without being long, not
> to make it a simple restatement of the SQLSTATE that nobody can understand
> without also looking at the errdetail.
>
> In the third place, this makes it hard for people to grep for occurrences
> of an error string in our source code.
>
> And in the fourth place, we don't do this elsewhere; it does not help
> anybody for jsonpath to invent its own coding conventions that are unlike
> the rest of Postgres.

Just to clarify things.  Do you propose to get rid of RETURN_ERROR()
macro by expanding it at every occurrence?  Or do you have other ideas
in the mind?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

John Naylor

Date:

22 April 2019, 00:07:01

On Sun, Apr 21, 2019 at 2:01 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Thu, Apr 18, 2019 at 4:09 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> > I was wondering about that. I measured the current size of
> > yy_transition to be 36492 on my machine. With the flag -Cfe, which
> > gives the smallest representation without backtracking, yy_nxt is 6336
> > (there is no yy_transition). I'd say that's a large enough difference
> > that we'd want the smaller representation if it makes little
> > difference in performance.
>
> Did I understand correctly that you've tried the same version of
> jsonpath_scan.l with different flex flags?

Correct.

> Did you also notice if
> changes 1d88a75c made to jsonpath_scan.l have singnificant influence?

Trying the same measurements above with backtracking put back in,
jsonpath_yylex was actually larger by a few hundred bytes, and there
was almost no difference in the transition/nxt tables.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Tom Lane

Date:

22 April 2019, 00:27:20

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> Just to clarify things.  Do you propose to get rid of RETURN_ERROR()
> macro by expanding it at every occurrence?  Or do you have other ideas
> in the mind?

I wasn't really complaining about RETURN_ERROR() --- it was the
macros rather than literal strings for the errmsg() texts that
was bothering me.

Mind you, I'm not really sure about RETURN_ERROR --- it looks
a little weird to have something that appears to be doing something
with the value of ereport(), which hasn't got a value.  But I don't
have a better idea at the moment.  I doubt that writing out the
underlying ereport-or-return business at each spot would be any
more readable.

(Maybe spelling it RETURN_OR_ERROR, or vice versa, would help?
Not sure.)

            regards, tom lane

Re: jsonpath

From

Alvaro Herrera

Date:

22 April 2019, 00:31:21

On 2019-Apr-22, Alexander Korotkov wrote:

> Hi!
> 
> Thank you for your review!
> 
> On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >                 RETURN_ERROR(ereport(ERROR,
> >                                      (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
> >                                       errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND),
> >                                       errdetail("Jsonpath wildcard array accessor "
> >
> > In the first place, I'm not certain that this will result in the error
> > message being translatable --- do the gettext tools know how to expand
> > macros?
> >
> > In the second place, the actual strings are just restatements of their
> > ERRMSG macro names, which IMO is not conformant to our message style,
> > but it's too hard to see that from source code like this.  Also this
> > style is pretty unworkable/unfriendly if the message needs to contain
> > any %-markers, so I suspect that having a coding style like this may be
> > discouraging you from providing values in places where it'd be helpful to
> > do so.  What I actually see happening as a consequence of this approach is
> > that you're pushing the useful information off to an errdetail, which is
> > not really helpful and it's not per project style either.  The idea is to
> > make the primary message as helpful as possible without being long, not
> > to make it a simple restatement of the SQLSTATE that nobody can understand
> > without also looking at the errdetail.
> >
> > In the third place, this makes it hard for people to grep for occurrences
> > of an error string in our source code.
> >
> > And in the fourth place, we don't do this elsewhere; it does not help
> > anybody for jsonpath to invent its own coding conventions that are unlike
> > the rest of Postgres.
> 
> Just to clarify things.  Do you propose to get rid of RETURN_ERROR()
> macro by expanding it at every occurrence?  Or do you have other ideas
> in the mind?

I think he's not talking about the RETURN_ERROR macro, but about the
ERRMSG_JSON_ARRAY_NOT_FOUND macro.  The PG convention is to repeat the
message literal in every place instead of defining a macro with the
literal.  But at the same time, using the same errmsg() and only vary
the errdetail() is unhelpful, so we want most detail in the errmsg
instead; I think it'd be something like this:

ereport(ERROR,
    (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
     errmsg("%s wildcard array accessor not found", "jsonpath")));

note I put the type name "jsonpath" in a separate literal, so that only
the interesting part is seen by translators.

I don't think Tom said anything about the RETURN_ERROR macro.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonpath

From

Alexander Korotkov

Date:

24 April 2019, 18:03:27

On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> >>> needs a sweep to bring its error messages into line with our style
> >>> guidelines, but no harm in starting with the obvious bugs.
>
> > I went trough the jsonpath errors and made some corrections.  See the
> > attached patch.
>
> Please don't do this sort of change:
>
> -            elog(ERROR, "unrecognized jsonpath item type: %d", item->type);
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_INTERNAL_ERROR),
> +                     errmsg("unrecognized jsonpath item type: %d", item->type)));
>
> elog() is the appropriate thing for shouldn't-happen internal errors like
> these.  The only thing you've changed here, aside from making the source
> code longer, is to expose the error message for translation ... which is
> really just wasting translators' time.  Only messages we actually think
> users might need to deal with should be exposed for translation.

Makes sense.  Removed from the patch.

> @@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
>                      ereport(ERROR,
>                              (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \
>                               errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND),
> -                             errdetail("JSON object does not contain key %s",
> +                             errdetail("JSON object does not contain key %s.",
>                                         keybuf.data)));
>                  }
>              }
>
> OK as far as it went, but you should also put double quotes around the %s.

This code actually passed key trough escape_json(), which adds double
quotes itself.  However, we don't do such transformation in other
places.  So, patch removes call of ecsape_json() while putting double
quotes to the error message.

> (I also noticed some messages that are using single-quotes around
> interpolated strings, which is not the project standard either.)

Single-quotes are replaced with double-quotes.

> Other specific things I wanted to see fixed:
>
> * jsonpath_scan.l has some messages like "bad ..." which is not project
> style; use "invalid" or "unrecognized".  (There's probably no good
> reason not to use the same string "invalid input syntax for type jsonpath"
> that is used elsewhere.)

Fixed.

> * This in jsonpath_gram.y is quite unhelpful:
>
>                 yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate");
>
> since it doesn't tell you what flag character it doesn't like
> (and the error positioning info isn't accurate enough to let the
> user figure that out).  It really needs to be something more like
> "unrecognized flag character \"%c\" in LIKE_REGEX predicate".
> That probably means you can't use yyerror for this, but I don't
> think yyerror was providing any useful functionality anyway :-(

Fixed.

> More generally, I'm not very much on board with this coding technique:
>
> /* Standard error message for SQL/JSON errors */
> #define ERRMSG_JSON_ARRAY_NOT_FOUND            "SQL/JSON array not found"
>
> ...
>
>                 RETURN_ERROR(ereport(ERROR,
>                                      (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
>                                       errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND),
>                                       errdetail("Jsonpath wildcard array accessor "
>
> In the first place, I'm not certain that this will result in the error
> message being translatable --- do the gettext tools know how to expand
> macros?
>
> In the second place, the actual strings are just restatements of their
> ERRMSG macro names, which IMO is not conformant to our message style,
> but it's too hard to see that from source code like this.  Also this
> style is pretty unworkable/unfriendly if the message needs to contain
> any %-markers, so I suspect that having a coding style like this may be
> discouraging you from providing values in places where it'd be helpful to
> do so.  What I actually see happening as a consequence of this approach is
> that you're pushing the useful information off to an errdetail, which is
> not really helpful and it's not per project style either.  The idea is to
> make the primary message as helpful as possible without being long, not
> to make it a simple restatement of the SQLSTATE that nobody can understand
> without also looking at the errdetail.
>
> In the third place, this makes it hard for people to grep for occurrences
> of an error string in our source code.
>
> And in the fourth place, we don't do this elsewhere; it does not help
> anybody for jsonpath to invent its own coding conventions that are unlike
> the rest of Postgres.
>
> So I think you should drop the ERRMSG_xxx macros, write out these error
> messages where they are used, and rethink your use of errmsg vs. errdetail.

OK, ERRMSG_* macros are removed.

> Along the same line of not making it unnecessarily hard for people to grep
> for error texts, it's best not to split texts across lines like this:
>
>         RETURN_ERROR(ereport(ERROR,
>                              (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT),
>                               errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT),
>                               errdetail("Jsonpath array subscript is not a "
>                                         "singleton numeric value."))));
>
> Somebody grepping for "not a singleton" would not get a hit on that, which
> could be quite misleading if they do get hits elsewhere.  I think for the
> most part people have decided that it's better to have overly long source
> lines than to break up error message literals.  It's especially pointless
> to break up source lines when the result still doesn't fit in 80 columns.

OK, now no line breaks in error messages.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

jsonpath-errors-improve-2.patch

Re: jsonpath

From

Alexander Korotkov

Date:

25 April 2019, 18:03:54

On Wed, Apr 24, 2019 at 9:03 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > >> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >>> Yeah, I'd noticed that one too :-(.  I think the whole jsonpath patch
> > >>> needs a sweep to bring its error messages into line with our style
> > >>> guidelines, but no harm in starting with the obvious bugs.
> >
> > > I went trough the jsonpath errors and made some corrections.  See the
> > > attached patch.
> >
> > Please don't do this sort of change:
> >
> > -            elog(ERROR, "unrecognized jsonpath item type: %d", item->type);
> > +            ereport(ERROR,
> > +                    (errcode(ERRCODE_INTERNAL_ERROR),
> > +                     errmsg("unrecognized jsonpath item type: %d", item->type)));
> >
> > elog() is the appropriate thing for shouldn't-happen internal errors like
> > these.  The only thing you've changed here, aside from making the source
> > code longer, is to expose the error message for translation ... which is
> > really just wasting translators' time.  Only messages we actually think
> > users might need to deal with should be exposed for translation.
>
> Makes sense.  Removed from the patch.
>
> > @@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
> >                      ereport(ERROR,
> >                              (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \
> >                               errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND),
> > -                             errdetail("JSON object does not contain key %s",
> > +                             errdetail("JSON object does not contain key %s.",
> >                                         keybuf.data)));
> >                  }
> >              }
> >
> > OK as far as it went, but you should also put double quotes around the %s.
>
> This code actually passed key trough escape_json(), which adds double
> quotes itself.  However, we don't do such transformation in other
> places.  So, patch removes call of ecsape_json() while putting double
> quotes to the error message.
>
> > (I also noticed some messages that are using single-quotes around
> > interpolated strings, which is not the project standard either.)
>
> Single-quotes are replaced with double-quotes.
>
> > Other specific things I wanted to see fixed:
> >
> > * jsonpath_scan.l has some messages like "bad ..." which is not project
> > style; use "invalid" or "unrecognized".  (There's probably no good
> > reason not to use the same string "invalid input syntax for type jsonpath"
> > that is used elsewhere.)
>
> Fixed.
>
> > * This in jsonpath_gram.y is quite unhelpful:
> >
> >                 yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate");
> >
> > since it doesn't tell you what flag character it doesn't like
> > (and the error positioning info isn't accurate enough to let the
> > user figure that out).  It really needs to be something more like
> > "unrecognized flag character \"%c\" in LIKE_REGEX predicate".
> > That probably means you can't use yyerror for this, but I don't
> > think yyerror was providing any useful functionality anyway :-(
>
> Fixed.
>
> > More generally, I'm not very much on board with this coding technique:
> >
> > /* Standard error message for SQL/JSON errors */
> > #define ERRMSG_JSON_ARRAY_NOT_FOUND            "SQL/JSON array not found"
> >
> > ...
> >
> >                 RETURN_ERROR(ereport(ERROR,
> >                                      (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND),
> >                                       errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND),
> >                                       errdetail("Jsonpath wildcard array accessor "
> >
> > In the first place, I'm not certain that this will result in the error
> > message being translatable --- do the gettext tools know how to expand
> > macros?
> >
> > In the second place, the actual strings are just restatements of their
> > ERRMSG macro names, which IMO is not conformant to our message style,
> > but it's too hard to see that from source code like this.  Also this
> > style is pretty unworkable/unfriendly if the message needs to contain
> > any %-markers, so I suspect that having a coding style like this may be
> > discouraging you from providing values in places where it'd be helpful to
> > do so.  What I actually see happening as a consequence of this approach is
> > that you're pushing the useful information off to an errdetail, which is
> > not really helpful and it's not per project style either.  The idea is to
> > make the primary message as helpful as possible without being long, not
> > to make it a simple restatement of the SQLSTATE that nobody can understand
> > without also looking at the errdetail.
> >
> > In the third place, this makes it hard for people to grep for occurrences
> > of an error string in our source code.
> >
> > And in the fourth place, we don't do this elsewhere; it does not help
> > anybody for jsonpath to invent its own coding conventions that are unlike
> > the rest of Postgres.
> >
> > So I think you should drop the ERRMSG_xxx macros, write out these error
> > messages where they are used, and rethink your use of errmsg vs. errdetail.
>
> OK, ERRMSG_* macros are removed.
>
> > Along the same line of not making it unnecessarily hard for people to grep
> > for error texts, it's best not to split texts across lines like this:
> >
> >         RETURN_ERROR(ereport(ERROR,
> >                              (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT),
> >                               errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT),
> >                               errdetail("Jsonpath array subscript is not a "
> >                                         "singleton numeric value."))));
> >
> > Somebody grepping for "not a singleton" would not get a hit on that, which
> > could be quite misleading if they do get hits elsewhere.  I think for the
> > most part people have decided that it's better to have overly long source
> > lines than to break up error message literals.  It's especially pointless
> > to break up source lines when the result still doesn't fit in 80 columns.
>
> OK, now no line breaks in error messages.

I'm going to commit these adjustments if no objections.

My question regarding jsonpath_yyerror() vs. bison errors is still
relevant.  Should we bother making bison-based errdetail() a complete
sentence starting from uppercase character?  If not, should we make
other yyerror() calls look the same?  Or should we rather move bison
error from errdetail() to errmsg()?

I would probably leave bison-based messages "as is", but make messages
in other invocations of yyerror() starting from lowercase characters
for the uniformity.  But I'd like to hear other opinions.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

From

Tom Lane

Date:

25 April 2019, 19:28:52

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I'm going to commit these adjustments if no objections.

Sorry for not getting to this sooner.  Looking quickly at the v2 patch,
it seems like you didn't entirely take to heart the idea of preferring
a useful primary error message over a boilerplate primary with errdetail.
In particular, in places like

-                 errmsg(ERRMSG_SINGLETON_JSON_ITEM_REQUIRED),
-                 errdetail("expression should return a singleton boolean")));
+                 errmsg("singleton SQL/JSON item required"),
+                 errdetail("Singleton boolean result is expected.")));

why bother with errdetail at all?  It's not conveying any useful increment
of information.  In this example I think

                 errmsg("expression should return a singleton boolean")

is sufficient and well-phrased.  Likewise, a bit further down,

+                             errmsg("SQL/JSON member not found"),
+                             errdetail("JSON object does not contain key \"%s\".",

there is nothing being said here that wouldn't fit perfectly well into
one errmsg.

> My question regarding jsonpath_yyerror() vs. bison errors is still
> relevant.  Should we bother making bison-based errdetail() a complete
> sentence starting from uppercase character?  If not, should we make
> other yyerror() calls look the same?  Or should we rather move bison
> error from errdetail() to errmsg()?

The latter I think.  The core lexer just presents the yyerror message
as primary:

scanner_yyerror(const char *message, core_yyscan_t yyscanner)
{
        ...
        ereport(ERROR,
                (errcode(ERRCODE_SYNTAX_ERROR),
        /* translator: %s is typically the translation of "syntax error" */
                 errmsg("%s at end of input", _(message)),
                 lexer_errposition()));

and in a quick look at what jsonpath is doing, I'm not really seeing
the point of it being different.  You could do something like

        /* translator: %s is typically the translation of "syntax error" */
                 errmsg("%s in jsonpath input", _(message))

to preserve the information that this is about jsonpath, but beyond
that I don't see that splitting off an errdetail is helping much.

Or, perhaps, provide an errdetail giving the full jsonpath input string?
That might or might not be redundant with other context information,
so I'm not sure how useful it'd be.

Anyway, my main criticism is still that this is way too eager to use
an errdetail message when it could perfectly well fit all the info
into the primary error.

            regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

28 April 2019, 18:28:46

On Thu, Apr 25, 2019 at 10:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > I'm going to commit these adjustments if no objections.
>
> Sorry for not getting to this sooner.  Looking quickly at the v2 patch,
> it seems like you didn't entirely take to heart the idea of preferring
> a useful primary error message over a boilerplate primary with errdetail.
> In particular, in places like
>
> -                 errmsg(ERRMSG_SINGLETON_JSON_ITEM_REQUIRED),
> -                 errdetail("expression should return a singleton boolean")));
> +                 errmsg("singleton SQL/JSON item required"),
> +                 errdetail("Singleton boolean result is expected.")));
>
> why bother with errdetail at all?  It's not conveying any useful increment
> of information.  In this example I think
>
>                  errmsg("expression should return a singleton boolean")
>
> is sufficient and well-phrased.  Likewise, a bit further down,
>
> +                             errmsg("SQL/JSON member not found"),
> +                             errdetail("JSON object does not contain key \"%s\".",
>
> there is nothing being said here that wouldn't fit perfectly well into
> one errmsg.

Makes sense.  Attached revision of patch gets rid of errdetail() where
it seems to be appropriate.

> > My question regarding jsonpath_yyerror() vs. bison errors is still
> > relevant.  Should we bother making bison-based errdetail() a complete
> > sentence starting from uppercase character?  If not, should we make
> > other yyerror() calls look the same?  Or should we rather move bison
> > error from errdetail() to errmsg()?
>
> The latter I think.  The core lexer just presents the yyerror message
> as primary:
>
> scanner_yyerror(const char *message, core_yyscan_t yyscanner)
> {
>         ...
>         ereport(ERROR,
>                 (errcode(ERRCODE_SYNTAX_ERROR),
>         /* translator: %s is typically the translation of "syntax error" */
>                  errmsg("%s at end of input", _(message)),
>                  lexer_errposition()));
>
> and in a quick look at what jsonpath is doing, I'm not really seeing
> the point of it being different.  You could do something like
>
>         /* translator: %s is typically the translation of "syntax error" */
>                  errmsg("%s in jsonpath input", _(message))
>
> to preserve the information that this is about jsonpath, but beyond
> that I don't see that splitting off an errdetail is helping much.

I've moved error message into errmsg().

> Or, perhaps, provide an errdetail giving the full jsonpath input string?
> That might or might not be redundant with other context information,
> so I'm not sure how useful it'd be.

I'm also not sure about this.  Didn't do this for now.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

jsonpath-errors-improve-3.patch

Re: jsonpath

From

Tom Lane

Date:

29 April 2019, 15:10:45

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> [ jsonpath-errors-improve-3.patch ]

This is getting better, but IMO it's still a bit too willing to use
a boilerplate primary error message plus errdetail. I do not think
that is project style nor something to be encouraged.

In particular, you've got a whole lot of cases like this:

+ errmsg("JSON object is expected"),
+ errdetail("Jsonpath member accessor can only be applied to an object."))));

I think you should just drop the errmsg and use the errdetail as primary
(with no-initcap and no-trailing-period, of course). I don't see more
than two or three cases in this whole patch where I'd use an errdetail
at all; in almost all of them, the proposed errdetail looks perfectly
suitable to be a primary message.

One other generic gripe is that a lot of these messages use the term
"singleton", which seems a bit too jargon-y to me. As far as I can
see in a quick look at the backend .po files, we have not up to now
used that term in *any* user-facing error message. Nor does it appear
anywhere in our user-facing documentation, except for one place that
was itself inserted by the jsonpath patch:

Arrays of size 1 are interchangeable with a singleton.

I don't think that's either obvious to a non-mathematician, or
even technically correct; maybe it'd be better as

An array of size 1 is considered equal to its sole element.

Likewise, I think it'd be better to avoid "singleton" in the error
messages. In some places you could perhaps use "single" instead.
In some you just don't need it at all, eg in
Jsonpath array subscript is not a singleton numeric value.
you could just drop the word "singleton" and it'd be perfectly
correct, since a numeric is necessarily a single value.

Also, we do often use the term "scalar" to mean a non-composite
value; maybe that would work for this context, in places where
you do really need that meaning.

Sorry to be making you work so hard on this, but I think good
error messages are an important part of having a quality feature.
I do see a lot of improvements already compared to where we started.

regards, tom lane

Re: jsonpath

From

Alexander Korotkov

Date:

29 April 2019, 22:20:06

On Mon, Apr 29, 2019 at 6:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > [ jsonpath-errors-improve-3.patch ]
>
> This is getting better, but IMO it's still a bit too willing to use
> a boilerplate primary error message plus errdetail.  I do not think
> that is project style nor something to be encouraged.
>
> In particular, you've got a whole lot of cases like this:
>
> +                                      errmsg("JSON object is expected"),
> +                                      errdetail("Jsonpath member accessor can only be applied to an object."))));
>
> I think you should just drop the errmsg and use the errdetail as primary
> (with no-initcap and no-trailing-period, of course).  I don't see more
> than two or three cases in this whole patch where I'd use an errdetail
> at all; in almost all of them, the proposed errdetail looks perfectly
> suitable to be a primary message.

Ok, now it seems that I understood.  errdetail is removed from vast
majority of cases.

> One other generic gripe is that a lot of these messages use the term
> "singleton", which seems a bit too jargon-y to me.  As far as I can
> see in a quick look at the backend .po files, we have not up to now
> used that term in *any* user-facing error message.  Nor does it appear
> anywhere in our user-facing documentation, except for one place that
> was itself inserted by the jsonpath patch:
>
>     Arrays of size 1 are interchangeable with a singleton.
>
> I don't think that's either obvious to a non-mathematician, or
> even technically correct; maybe it'd be better as
>
>     An array of size 1 is considered equal to its sole element.
>
> Likewise, I think it'd be better to avoid "singleton" in the error
> messages.  In some places you could perhaps use "single" instead.
> In some you just don't need it at all, eg in
>     Jsonpath array subscript is not a singleton numeric value.
> you could just drop the word "singleton" and it'd be perfectly
> correct, since a numeric is necessarily a single value.
>
> Also, we do often use the term "scalar" to mean a non-composite
> value; maybe that would work for this context, in places where
> you do really need that meaning.

Makes sense for me.  "Singleton" word comes from the standard.  But
assuming we almost don't use it in the documentation (and especially
don't define it), it's better to get rid of this word altogether.
Removed from error messages.  Separate patch adjusting docs as you
proposed is also attached.

> Sorry to be making you work so hard on this, but I think good
> error messages are an important part of having a quality feature.
> I do see a lot of improvements already compared to where we started.

It's nothing to be sorry about.  I need to learn this in order to make
my further commits better.  Thank you for your explanations.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

07 May 2019, 04:14:40

On Tue, Apr 30, 2019 at 1:20 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Apr 29, 2019 at 6:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > > [ jsonpath-errors-improve-3.patch ]
> >
> > This is getting better, but IMO it's still a bit too willing to use
> > a boilerplate primary error message plus errdetail.  I do not think
> > that is project style nor something to be encouraged.
> >
> > In particular, you've got a whole lot of cases like this:
> >
> > +                                      errmsg("JSON object is expected"),
> > +                                      errdetail("Jsonpath member accessor can only be applied to an object."))));
> >
> > I think you should just drop the errmsg and use the errdetail as primary
> > (with no-initcap and no-trailing-period, of course).  I don't see more
> > than two or three cases in this whole patch where I'd use an errdetail
> > at all; in almost all of them, the proposed errdetail looks perfectly
> > suitable to be a primary message.
>
> Ok, now it seems that I understood.  errdetail is removed from vast
> majority of cases.
>
> > One other generic gripe is that a lot of these messages use the term
> > "singleton", which seems a bit too jargon-y to me.  As far as I can
> > see in a quick look at the backend .po files, we have not up to now
> > used that term in *any* user-facing error message.  Nor does it appear
> > anywhere in our user-facing documentation, except for one place that
> > was itself inserted by the jsonpath patch:
> >
> >     Arrays of size 1 are interchangeable with a singleton.
> >
> > I don't think that's either obvious to a non-mathematician, or
> > even technically correct; maybe it'd be better as
> >
> >     An array of size 1 is considered equal to its sole element.
> >
> > Likewise, I think it'd be better to avoid "singleton" in the error
> > messages.  In some places you could perhaps use "single" instead.
> > In some you just don't need it at all, eg in
> >     Jsonpath array subscript is not a singleton numeric value.
> > you could just drop the word "singleton" and it'd be perfectly
> > correct, since a numeric is necessarily a single value.
> >
> > Also, we do often use the term "scalar" to mean a non-composite
> > value; maybe that would work for this context, in places where
> > you do really need that meaning.
>
> Makes sense for me.  "Singleton" word comes from the standard.  But
> assuming we almost don't use it in the documentation (and especially
> don't define it), it's better to get rid of this word altogether.
> Removed from error messages.  Separate patch adjusting docs as you
> proposed is also attached.
>
> > Sorry to be making you work so hard on this, but I think good
> > error messages are an important part of having a quality feature.
> > I do see a lot of improvements already compared to where we started.
>
> It's nothing to be sorry about.  I need to learn this in order to make
> my further commits better.  Thank you for your explanations.

Attached patchset contains revised commit messages.  I'm going to
commit this on no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Couple patches improving jsonpath docs are attached.  The first one
documents nesting level filter for .** accessor.  The second adds to
documentation of jsonpath array subsciption usage of expressions and
multiple slice ranges.  I'm going to push both patches if no
objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: jsonpath

From

Alexander Korotkov

Date:

17 May 2019, 15:35:01

On Fri, May 17, 2019 at 6:50 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Couple patches improving jsonpath docs are attached.  The first one
> documents nesting level filter for .** accessor.  The second adds to
> documentation of jsonpath array subsciption usage of expressions and
> multiple slice ranges.  I'm going to push both patches if no
> objections.

Looking more on documentation I found that I'm not exactly happy on
how jsonpath description is organized.  Part of description including
accessors is given in json datatypes section [1], while part of
description including functions and operators is given in json
functions and operators section.  I think we should give the whole
jsonpath description in the single place.  So, what about moving it to
datatype section leaving functions sections with just SQL-level
functions?

1. https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
2. https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-PATH

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company