Re: jsonpath - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: jsonpath
Date
Msg-id 220821fb-52c0-c277-0959-322c8eef76b2@postgrespro.ru
Whole thread Raw
In response to Re: jsonpath  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: jsonpath
List pgsql-hackers

Attached 36th version of the patches.


On 03.03.2019 21:08, Tomas Vondra wrote:

Hi,

Here are some initial comments from a review of the 0001 part. I plan to
do more testing on a large data set and additional round of review over
the next week. FWIW I've passed this through valgrind and the usual
battery of regression tests, and there were no issues.
Thanks again for your review.
I haven't looked at 0002 yet, but it seems fairly small (especially
compared to 0001).

func.sgml
=========

1) I see the changes removed <indexterm zone="functions-json"> for some
reason. Is that intentional?
Fixed.
2) <command>WHERE</command>
   We generally tag <literal>WHERE</literal>, not <command>.
Fixed.
3) Filter expressions are applied from left to right
   Perhaps s/applied/evaluated/ in reference to expressions?
Fixed.
4) The result of the filter expression may be true, false, or unknown.
   It's not entirely clear to me what "unknown" means here. NULL, or   something else? There's a section in the SQL/JSON standard   explaining this (page 83), but perhaps we should explain it a bit   here too?
   The standard says "In the SQL/JSON data model, there are no SQL   nulls, so Unknown is not part of the SQL/JSON data model." so I'm a   bit confused what "unknown" references to. Maybe some example?
   Also, what happens when the result is unknown?
"unknown" refers here to ordinary three-valued logical Unknown, which is
represented in SQL by NULL.

JSON path expressions return sequences of SQL/JSON items, which are defined by
SQL/JSON data model.  But JSON path predicates (logical expressions), which are
used in filters, return three-valued logical values: False, True, or Unknown.

Filters accept only items for which the filter predicate returned True. False
and Unknown results are skipped.

Unknown can be checked with IS UNKNOWN predicate:

SELECT jsonb_path_query_array('[1, "1", true, null]',                             '$[*] ? ((@ < 2) is unknown)');jsonb_path_query_array 
------------------------["1", true]
(1 row)

Comparison of JSON nulls to non-nulls returns always False, not Unknown (see
SQL/JSON data model). Comparison of non-comparable items return Unknown.

5) There's an example showing how to apply filter at a certain level,
using the @ variable, but what if we want to apply multiple filters at
different levels? Would it make sense to add such example?
Examples were added.

Filters can be nested, but by the standard it is impossible to reference item
of the outer filter, because @ always references current item of the innermost
filter.  We have additional patch with a simple jsonpath syntax extension for 
this case -- @N:@0, like @, references innermost item@1 references item one level upper@2 references item two levels upper

For example, selecting all objects from array
'[{"vals": [1,2,3], "val": 2}, {"vals": [4,5], "val": 6}]'

having in their .vals[] element greater than their .val field:
'$[*] ? (@.vals[*] ? (@ > @1.val))'

It is impossible to do this by the standard in the single jsonpath expression.


Also there is idea to use lambda expressions with ECMAScript 6 syntax in
filters and user method (see below):

'$[*] ? (obj => obj.vals[*] ? (val => val > obj.val))'

I already have a patch implementing lambda expressions, which were necessary
for implementaion of built-in/user-written functions and methods like
.map(), reduce(), max().  I can post it again if it is interesting.

6) ... extensions of the SQL/JSON standard
   I'm not sure what "extension" is here. Is that an extension defined   in the SQL standard, or an additional PostgreSQL functionality not   described in the standard? (I assume the latter, just checking.)
Yes, this is additional functionality.

 "Writing the path as an expression is also valid"   I have moved this into SQL/JSON patch, because it is related only path   specification in SQL/JSON functions.
7) There are references to "SQL/JSON sequences" without any explanation
what it means. Maybe I'm missing something obvious, though.
SQL/JSON sequence is a sequence of SQL/JSON items.
The corresponding definition was added.
8) Implicit unwrapping in the lax mode is not performed in the following
cases:
   I suggest to reword it like this:

In the lax mode, implicit unwrapping is not performed when:
Fixed. I also decided to merge this paragraph with paragraph describing 
auto-unwrapping.
9) We're not adding the datetime() method for now, due to the issues
with timezones. I wonder if we should add a note why it's missing to the
docs ...
The corresponding note was added.
10) I'm a bit puzzled though, because the standard says this in the
description of type() function on page 77
   If I is a datetime, then “date”, “time without time zone”, “time   with time zone”, “timestamp without time zone”, or “timestamp with   time zone”, as appropriate.

But I see our type() function does not return anything like that (which
I presume is independent of timezone stuff). But I see jsonb.h has no
concept of datetime values, and the standard actually says this in the
datetime() section
   JSON has no datetime types. Datetime values are most likely stored   in character strings.

Considering this, the type() section makes little sense, no?
According to the SQL/JSON data model, SQL/JSON items can be null, string, 
boolean, numeric, and datetime.

Datetime items exists only at execution time, they are serialized into JSON
strings when the resulting SQL/JSON item is converted into jsonb. After removal
of .datetime() method, support of datetime SQL/JSON items was removed from
jsonpath executor too.

Numeric items can be of any numeric type.  In PostgreSQL we have the following
numeric datatypes: integers, floats and numeric.  But our jsonpath executor
supports now only numeric-typed items, because this is only type we can get
directly from jsonb.  Support for other numeric datatypes (float8 is most
necessary, because it is produced by .double() item method) can be added by
extending JsonbValue or by introducing struct JsonItem in executor, having
JsonbValue as a part (see patch #4 in v34).

jsonb_util.c
============

I see we're now handling NaN values in convertJsonbScalar(). Isn't it
actually a bug that we don't do this already? Or is it not needed for
some reason?
Numeric JsonbValues created outside of jsonpath executor cannot be NaN, because
this case in explicitly handled in JsonbValue-producing functions.  For example,
datum_to_jsonb() which is used in to_jsonb(), jsonb_build_array() and others
converts Inf and NaN into JSON strings.  But jsonb_plperl and jsonb_plpython
transforms do not allow NaNs (I think it is needed only for consistency of
"PL => jsonb => PL" roundtrip).

In our jsonpath executor we can produce NaNs and we should not touch them before
conversion to the resulting jsonb.  Moreover, in SQL/JSON functions numeric
SQL/JSON item can be directly converted into numeric RETURNING type.  So, I
decided to add a check for NaN to the low-level convertJsonbScalar() instead of
checking items before every call to JsonbValueToJsonb() or pushJsonbValue().
But after introduction of struct JsonItem (see patch #4) introduction there will
appropriate place for this check -- JsonItemToJsonbValue().

==========

I suppose this should say "jsonpath version number" instead?
   elog(ERROR, "unsupported jsonb version number %d", version);
Fixed.


On 05.03.2019 2:27, Tomas Vondra wrote:
A bunch of additional comments, after looking at the patch a bit today.
All are mostly minor, and sometime perhaps a matter of preference.


1) There's a mismatch between the comment and actual function name for
jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say
"_novars" instead.
Fixed.
2) In a couple of switches the "default" case does a return with a
value, following elog(ERROR). So it's practically unreachable, AFAICS
it's fine without it, and we don't do this elsewhere. And I don't get
any compiler warnings if I remove it either.

Examples:

JsonbTypeName
   default:       elog(ERROR, "unrecognized jsonb value type: %d", jbv->type);       return "unknown";

jspOperationName
   default:       elog(ERROR, "unrecognized jsonpath item type: %d", type);       return NULL;

compareItems
   default:       elog(ERROR, "unrecognized jsonpath operation: %d", op);       return jpbUnknown;
It seems to be a standard practice in jsonb code, so we followed it.
3) jsonpath_send is using makeStringInfo() for a value that is not
returned - IMHO it should use regular stack-allocated variable and use
initStringInfo() instead
Fixed.
4) the version number should be defined/used as a constant, not as a
magic constant somewhere in the code
Fixed.
5) Why does jsonPathToCstring do this?
   appendBinaryStringInfo(out, "strict ", 7);

Why not to use regular appendStringInfoString()? What am I missing?
appendStringInfoString() is a bit slower than appendBinaryStringInfo() 
because to strlen() call inside it.
6) comment typo: "Aling StringInfo"
Fixed.
7) alignStringInfoInt() should explain why we need this and why INTALIGN
is the right alignment.
Comment was added.
8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next'
   I don't quite understand what it's doing with 'next' value?
   /*    * Actual value will be recorded later, after next and children    * processing.    */   appendBinaryStringInfo(buf,                          (char *) &next, /* fake value */                          sizeof(next));

Perhaps a comment explaining it (why we need a fake value at all?) would
be a good idea here.
No fake value is needed here, zero placeholder is enough. I have refactored
this using new function reserveSpaceForItemPointer().
9) I see printJsonPathItem is only calling check_stack_depth while
flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the
difference, considering they seem about equally expensive?
CHECK_INTERRUPT() was added to printJsonPathItem() too.
10) executeNumericItemMethod is missing a comment (unlike the other
executeXXX functions)
Comment was added.
11) Wording of some of the error messages in the execute methods seems a
bit odd. For example executeNumericItemMethod may complain that it
... is applied to not a numeric value

but perhaps a more natural wording would be
... is applied to a non-numeric value

And similarly for the other execute methods. But I'm not a native
speaker, so perhaps the original wording is just fine.
Error messages were changed on the advice of Robert Haas to:
"... can only be applied to a numeric value"
-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Filip Rembiałkowski
Date:
Subject: fix for BUG #3720: wrong results at using ltree
Next
From: Fabien COELHO
Date:
Subject: Re: Patch to document base64 encoding