Thread: jsonpath
Next up in the proposed SQL/JSON feature set I will review the three jsonpath patches. These are attached, extracted from the patch set Nikita posted on Jan 2nd. Note that they depend on the TZH/TZM patch (see separate email thread). I'd like to be able to commit that patch pretty soon. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
2018-01-07 20:31 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:
Next up in the proposed SQL/JSON feature set I will review the three
jsonpath patches. These are attached, extracted from the patch set
Nikita posted on Jan 2nd.
Note that they depend on the TZH/TZM patch (see separate email thread).
I'd like to be able to commit that patch pretty soon.
I did few tests - and it is working very well.
regards
Pavel
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/07/2018 03:00 PM, Pavel Stehule wrote: > > > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan > <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>: > > > Next up in the proposed SQL/JSON feature set I will review the three > jsonpath patches. These are attached, extracted from the patch set > Nikita posted on Jan 2nd. > > > Note that they depend on the TZH/TZM patch (see separate email > thread). > I'd like to be able to commit that patch pretty soon. > > > I did few tests - and it is working very well. > > You mean on the revised TZH/TZM patch that I posted? cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2018-01-07 21:14 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:
On 01/07/2018 03:00 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@ 2ndquadrant.com>>:
>
>
> Next up in the proposed SQL/JSON feature set I will review the three
> jsonpath patches. These are attached, extracted from the patch set
> Nikita posted on Jan 2nd.
>
>
> Note that they depend on the TZH/TZM patch (see separate email
> thread).
> I'd like to be able to commit that patch pretty soon.
>
>
> I did few tests - and it is working very well.
>
>
You mean on the revised TZH/TZM patch that I posted?
no - I tested jsonpath implementation
Pavel
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/07/2018 03:16 PM, Pavel Stehule wrote: > > > 2018-01-07 21:14 GMT+01:00 Andrew Dunstan > <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@2ndquadrant.com>>: > > > > On 01/07/2018 03:00 PM, Pavel Stehule wrote: > > > > > > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan > > <andrew.dunstan@2ndquadrant.com > <mailto:andrew.dunstan@2ndquadrant.com> > <mailto:andrew.dunstan@2ndquadrant.com > <mailto:andrew.dunstan@2ndquadrant.com>>>: > > > > > > Next up in the proposed SQL/JSON feature set I will review > the three > > jsonpath patches. These are attached, extracted from the > patch set > > Nikita posted on Jan 2nd. > > > > > > Note that they depend on the TZH/TZM patch (see separate email > > thread). > > I'd like to be able to commit that patch pretty soon. > > > > > > I did few tests - and it is working very well. > > > > > > > You mean on the revised TZH/TZM patch that I posted? > > > no - I tested jsonpath implementation > > OK. You can help by also reviewing and testing the small patch at <https://www.postgresql.org/message-id/f16a6408-45fe-b299-02b0-41e50a1c3023%402ndQuadrant.com> thanks cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2018-01-07 21:20 GMT+01:00 Andrew Dunstan <andrew.dunstan@2ndquadrant.com>:
On 01/07/2018 03:16 PM, Pavel Stehule wrote:
>
>
> 2018-01-07 21:14 GMT+01:00 Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com <mailto:andrew.dunstan@ 2ndquadrant.com>>:
>
>
>
> On 01/07/2018 03:00 PM, Pavel Stehule wrote:
> >
> >
> > 2018-01-07 20:31 GMT+01:00 Andrew Dunstan
> > <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>
> <mailto:andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>>>:
> >
> >
> > Next up in the proposed SQL/JSON feature set I will review
> the three
> > jsonpath patches. These are attached, extracted from the
> patch set
> > Nikita posted on Jan 2nd.
> >
> >
> > Note that they depend on the TZH/TZM patch (see separate email
> > thread).
> > I'd like to be able to commit that patch pretty soon.
> >
> >
> > I did few tests - and it is working very well.
> >
> >
>
>
> You mean on the revised TZH/TZM patch that I posted?
>
>
> no - I tested jsonpath implementation
>
>
OK. You can help by also reviewing and testing the small patch at
<https://www.postgresql.org/message-id/f16a6408-45fe-b299- 02b0-41e50a1c3023% 402ndQuadrant.com>
I'll look there
Regards
Pavel
thanks
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached new 8th version of jsonpath related patches. Complete documentation is still missing. The first 4 small patches are necessary datetime handling in jsonpath: 1. simple refactoring, extracted function that will be used later in jsonpath 2. throw an error when the input or format string contains trailing elements 3. avoid unnecessary cstring to text conversions 4. add function for automatic datetime type recognition by the presence of formatting components Should they be posted in a separate thread? -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 01/10/2018 05:42 PM, Nikita Glukhov wrote: > Attached new 8th version of jsonpath related patches. Complete > documentation is still missing. > > The first 4 small patches are necessary datetime handling in jsonpath: > 1. simple refactoring, extracted function that will be used later in > jsonpath > 2. throw an error when the input or format string contains trailing > elements > 3. avoid unnecessary cstring to text conversions > 4. add function for automatic datetime type recognition by the > presence of formatting components > > Should they be posted in a separate thread? > The first of these refactors the json/jsonb timestamp formatting into a single function, removing a lot of code duplication. The involves exposing time2tm() and timetz2tm(). I don't think that's a tragedy, so unless there is any objection I propose to commit it shortly. The next three expose a bit more of the date/time API. I'm still reviewing those. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/15/2018 07:24 PM, Andrew Dunstan wrote: > > On 01/10/2018 05:42 PM, Nikita Glukhov wrote: >> Attached new 8th version of jsonpath related patches. Complete >> documentation is still missing. >> >> The first 4 small patches are necessary datetime handling in jsonpath: >> 1. simple refactoring, extracted function that will be used later in >> jsonpath >> 2. throw an error when the input or format string contains trailing >> elements >> 3. avoid unnecessary cstring to text conversions >> 4. add function for automatic datetime type recognition by the >> presence of formatting components >> >> Should they be posted in a separate thread? >> > > The first of these refactors the json/jsonb timestamp formatting into a > single function, removing a lot of code duplication. The involves > exposing time2tm() and timetz2tm(). I don't think that's a tragedy, so > unless there is any objection I propose to commit it shortly. > > The next three expose a bit more of the date/time API. I'm still > reviewing those. > I have committed the first of these patches. I have reviewed the next three, and I think they are generally good. There is no real point in committing them ahead of the jsonpath patch since there would be no point in having them at all but for that patch. Note that these do export the following hitherto internal bits of the datetime functionality: tm2time tm2timetz AdjustTimeForTypmod AdjustTimestampForTypmod Moving on to review the main jsonpath patch. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/17/2018 04:01 PM, Andrew Dunstan wrote: > > On 01/15/2018 07:24 PM, Andrew Dunstan wrote: >> On 01/10/2018 05:42 PM, Nikita Glukhov wrote: >>> Attached new 8th version of jsonpath related patches. Complete >>> documentation is still missing. >>> >>> The first 4 small patches are necessary datetime handling in jsonpath: >>> 1. simple refactoring, extracted function that will be used later in >>> jsonpath >>> 2. throw an error when the input or format string contains trailing >>> elements >>> 3. avoid unnecessary cstring to text conversions >>> 4. add function for automatic datetime type recognition by the >>> presence of formatting components >>> >>> Should they be posted in a separate thread? >>> >> The first of these refactors the json/jsonb timestamp formatting into a >> single function, removing a lot of code duplication. The involves >> exposing time2tm() and timetz2tm(). I don't think that's a tragedy, so >> unless there is any objection I propose to commit it shortly. >> >> The next three expose a bit more of the date/time API. I'm still >> reviewing those. >> > > I have committed the first of these patches. > > I have reviewed the next three, and I think they are generally good. > There is no real point in committing them ahead of the jsonpath patch > since there would be no point in having them at all but for that patch. > > Note that these do export the following hitherto internal bits of the > datetime functionality: > > tm2time > tm2timetz > AdjustTimeForTypmod > AdjustTimestampForTypmod > > Moving on to review the main jsonpath patch. > OK, I understand a good deal more than I did about how the jsopnpath code works, but the commenting is abysmal. I got quite nervous about adding a new datetime variant to JsonbValue. However, my understanding from the code is that this will only ever be used in an intermediate jsonpath processing result, and it won't be used in a stored or build jsonb object. Is that right? If it is we need to say so, and moreover we need to warn coders in the header file about the restricted use of this variant. I'm not sure we can enforce it with an Assert, but If we can we should. I'm not 100% sure that doing it this way, i.e. by extending and resuing jsonbValue, is desirable, I'd like to know some of the thinking behind the design. The encoding of a jsonpath value into a binary string is quite nifty, but it needs to be documented. Once I understood it better some of my fears about parser overhead were alleviated. The use of a function pointer inside JsonPathVariable seems unnecessary and baroque. It would be much more straightforward to set an isnull flag in the struct and process the value in an "if" statement accordingly, avoiding the function pointer altogether. I am going to be travelling for a few days, then back online for a day or two, then gone for a week. I hope we can make progress on these features, but this needs lots more eyeballs, reviewing code as well as testing, and lots more responsiveness. The whole suite is enormous. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23.01.2018 01:41, Andrew Dunstan wrote: > On 01/17/2018 04:01 PM, Andrew Dunstan wrote: >> On 01/15/2018 07:24 PM, Andrew Dunstan wrote: >>> On 01/10/2018 05:42 PM, Nikita Glukhov wrote: >>>> Attached new 8th version of jsonpath related patches. Complete >>>> documentation is still missing. >>>> >>>> The first 4 small patches are necessary datetime handling in jsonpath: >>>> 1. simple refactoring, extracted function that will be used later in >>>> jsonpath >>>> 2. throw an error when the input or format string contains trailing >>>> elements >>>> 3. avoid unnecessary cstring to text conversions >>>> 4. add function for automatic datetime type recognition by the >>>> presence of formatting components >>>> >>>> Should they be posted in a separate thread? >>>> >>> The first of these refactors the json/jsonb timestamp formatting into a >>> single function, removing a lot of code duplication. The involves >>> exposing time2tm() and timetz2tm(). I don't think that's a tragedy, so >>> unless there is any objection I propose to commit it shortly. >>> >>> The next three expose a bit more of the date/time API. I'm still >>> reviewing those. >> I have committed the first of these patches. >> >> I have reviewed the next three, and I think they are generally good. >> There is no real point in committing them ahead of the jsonpath patch >> since there would be no point in having them at all but for that patch. >> >> Note that these do export the following hitherto internal bits of the >> datetime functionality: >> >> tm2time >> tm2timetz >> AdjustTimeForTypmod >> AdjustTimestampForTypmod >> >> Moving on to review the main jsonpath patch. > > OK, I understand a good deal more than I did about how the jsopnpath > code works, but the commenting is abysmal. Thank you for reviewing. > I got quite nervous about adding a new datetime variant to JsonbValue. > However, my understanding from the code is that this will only ever be > used in an intermediate jsonpath processing result, and it won't be used > in a stored or build jsonb object. Is that right? Yes, you are right. Datetime JsonbValues are used only for for in-memory representation of SQL/JSON datetime items, they converted into ordinary JSON strings in ISO format when json/jsonb encoded into a datum. > If it is we need to say so, and moreover we need to warn coders in the > header file about the restricted use of this variant. I'm not sure we > can enforce it with an Assert, but If we can we should. I'm not 100% > sure that doing it this way, i.e. by extending and resuing jsonbValue, > is desirable, I'd like to know some of the thinking behind the design. Datetime support was added to our jsonpath implementation when there was already a lot of code using plain JsonbValue. So, the simplest are most effective solution I found was JsonbValue extension. We could also introduce extended struct SqlJsonItem, but it seems that there will be a lot of unnecessary conversions between SqlJsonItem and JsonbValue. > The encoding of a jsonpath value into a binary string is quite nifty, > but it needs to be documented. Once I understood it better some of my > fears about parser overhead were alleviated. > The use of a function pointer inside JsonPathVariable seems unnecessary > and baroque. It would be much more straightforward to set an isnull flag > in the struct and process the value in an "if" statement accordingly, > avoiding the function pointer altogether. Callback in JsonPathVariable is used for on-demand evaluation of SQL/JSON PASSING parameters (see EvalJsonPathVar() from patch 0010-sqljson-v08.patch). For jsonpath itself it is really unnecessary. > I am going to be travelling for a few days, then back online for a day > or two, then gone for a week. I hope we can make progress on these > features, but this needs lots more eyeballs, reviewing code as well as > testing, and lots more responsiveness. The whole suite is enormous. > > cheers > > andrew -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached 9th version of jsonpath patches rebased onto the latest master. Jsonpath grammar for parenthesized expressions and predicates was fixed. Documentation drafts for jsonpath written by Oleg Bartunov: https://github.com/obartunov/sqljsondoc -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Feb 14, 2018 at 2:04 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 9th version of jsonpath patches rebased onto the latest master. > > Jsonpath grammar for parenthesized expressions and predicates was fixed. > > Documentation drafts for jsonpath written by Oleg Bartunov: > https://github.com/obartunov/sqljsondoc Direct link is https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md Please consider it as WIP, I will update it in my spare time. > > > -- > Nikita Glukhov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company
Attached 10th version of the jsonpath patches. 1. Fixed error handling in arithmetic operators. Now run-time errors in arithmetic operators are catched (added PG_TRY/PG_CATCH around operator's functions calls) and converted into Unknown values in predicates as it is required by the standard: =# SELECT jsonb '[1,0,2]' @* '$[*] ? (1 / @ > 0)'; ?column? ---------- 1 2 (2 rows) 2. Fixed grammar for parenthesized expressions. 3. Refactored GIN support for jsonpath operators. 4. Added one more operator json[b] @# jsonpath returning singleton json[b] with automatic conditional wrapping of sequences with more than one element into arrays: =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 2)'; ?column? ----------- [3, 4, 5] (1 row) =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 4)'; ?column? ---------- 5 (1 row) =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 5)'; ?column? ---------- (null) (1 row) Existing set-returning operator json[b] @* jsonpath is also very userful but can't be used in functional indices like new operator @#. Note that conditional wrapping of @# differs from the wrapping in JSON_QUERY(... WITH [ARRAY] WRAPPER), where only singleton objects and arrays are not wrapped. Unconditional wrapping can be emulated with our array construction feature (see below). 5. Changed time zone behavior in .datetime() item method. In the previous version of the patch timestamptz SQL/JSON items were serialized into JSON string items using session time zone. This behavior did not allow jsonpath operators to be marked as immutable, and therefore they could not be used in functional indices. Also, when the time zone was not specified in the input string, but TZM or TZH format fields were present in the format string, session time zone was used as a default for timestamptz items. To make jsonpath operators immutable we decided to save input time zone for timestamptz items and disallow automatic time zone assignment. Also additional parameter was added to .datetime() for default time zone specification: =# SET timezone = '+03'; SET =# SELECT jsonb '"10-03-2017 12:34:56"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")'; ERROR: Invalid argument for SQL/JSON datetime function =# SELECT jsonb '"10-03-2017 12:34:56"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS TZH", "+05")'; ?column? ----------------------------- "2017-03-10T12:34:56+05:00" (1 row) =# SELECT jsonb '"10-03-2017 12:34:56 +05"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")'; ?column? ----------------------------- "2017-03-10T12:34:56+05:00" (1 row) Please note that our .datetime() behavior is not standard now: by the standard, input and format strings must match exactly, i.e. they both should not contain trailing unmatched elements, so automatic time zone assignment is impossible. But it too restrictive for PostgreSQL users, so we decided to preserve usual PostgreSQL behavior here: =# SELECT jsonb '"10-03-2017"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS")'; ?column? ----------------------- "2017-03-10T00:00:00" (1 row) Also PostgreSQL is able to automatically recognize format of the input string for the specified datetime type, but we can only bring this behavior into jsonpath by introducing separate item methods .date(), .time(), .timetz(), .timestamp() and .timestamptz(). Also we can use here our unfinished feature that gives us ability to work with PostresSQL types in jsonpath using cast operator :: (see sqljson_ext branch in our github repo): =# SELECT jsonb '"10/03/2017 12:34"' @* '$::timestamptz'; ?column? ----------------------------- "2017-03-10T12:34:00+03:00" (1 row) A brief description of the extra jsonpath syntax features contained in the patch #7: * Sequence construction by joining path expressions with comma: =# SELECT jsonb '[1, 2, 3]' @* '$[*], 4, 5'; ?column? ---------- 1 2 3 4 5 (5 rows) * Array construction by placing sequence into brackets (equivalent to JSON_QUERY(... WITH UNCONDITIONAL WRAPPER)): =# SELECT jsonb '[1, 2, 3]' @* '[$[*], 4, 5]'; ?column? ----------------- [1, 2, 3, 4, 5] (1 row) * Object construction by placing sequences of key-value pairs into braces: =# SELECT jsonb '{"a" : [1, 2, 3]}' @* '{a: [$.a[*], 4, 5], "b c": "dddd"}'; ?column? --------------------------------------- {"a": [1, 2, 3, 4, 5], "b c": "dddd"} (1 row) * Object subscripting with string-valued expressions: =# SELECT jsonb '{"a" : "aaa", "b": "a", "c": "ccc"}' @* '$[$.b, "c"]'; ?column? ---------- "aaa" "ccc" (2 rows) * Support of UNIX epoch time in .datetime() item method: =# SELECT jsonb '1519649957.37' @* '$.datetime()'; ?column? -------------------------------- "2018-02-26T12:59:17.37+00:00" (1 row) -- Nikita Glukhov Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Feb 26, 2018 at 6:34 PM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 10th version of the jsonpath patches. > > 1. Fixed error handling in arithmetic operators. > > Now run-time errors in arithmetic operators are catched (added > PG_TRY/PG_CATCH around operator's functions calls) and converted into > Unknown values in predicates as it is required by the standard: > > =# SELECT jsonb '[1,0,2]' @* '$[*] ? (1 / @ > 0)'; > ?column? > ---------- > 1 > 2 > (2 rows) > > 2. Fixed grammar for parenthesized expressions. > > 3. Refactored GIN support for jsonpath operators. > > 4. Added one more operator json[b] @# jsonpath returning singleton json[b] > with > automatic conditional wrapping of sequences with more than one element > into > arrays: > > =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 2)'; > ?column? > ----------- > [3, 4, 5] > (1 row) > > =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 4)'; > ?column? > ---------- > 5 > (1 row) > > =# SELECT jsonb '[1,2,3,4,5]' @# '$[*] ? (@ > 5)'; > ?column? > ---------- > (null) > (1 row) > > Existing set-returning operator json[b] @* jsonpath is also very userful > but > can't be used in functional indices like new operator @#. > > Note that conditional wrapping of @# differs from the wrapping in > JSON_QUERY(... WITH [ARRAY] WRAPPER), where only singleton objects and > arrays are not wrapped. Unconditional wrapping can be emulated with our > array construction feature (see below). > > 5. Changed time zone behavior in .datetime() item method. > > In the previous version of the patch timestamptz SQL/JSON items were > serialized into JSON string items using session time zone. This behavior > did not allow jsonpath operators to be marked as immutable, and therefore > they could not be used in functional indices. Also, when the time zone > was > not specified in the input string, but TZM or TZH format fields were > present > in the format string, session time zone was used as a default for > timestamptz items. > > To make jsonpath operators immutable we decided to save input time zone > for > timestamptz items and disallow automatic time zone assignment. Also > additional parameter was added to .datetime() for default time zone > specification: > > =# SET timezone = '+03'; > SET > > =# SELECT jsonb '"10-03-2017 12:34:56"' @* > '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")'; > ERROR: Invalid argument for SQL/JSON datetime function > > =# SELECT jsonb '"10-03-2017 12:34:56"' @* > '$.datetime("DD-MM-YYYY HH24:MI:SS TZH", "+05")'; > ?column? > ----------------------------- > "2017-03-10T12:34:56+05:00" > (1 row) > > =# SELECT jsonb '"10-03-2017 12:34:56 +05"' @* > '$.datetime("DD-MM-YYYY HH24:MI:SS TZH")'; > ?column? > ----------------------------- > "2017-03-10T12:34:56+05:00" > (1 row) > > Please note that our .datetime() behavior is not standard now: by the > standard, input and format strings must match exactly, i.e. they both > should > not contain trailing unmatched elements, so automatic time zone > assignment > is impossible. But it too restrictive for PostgreSQL users, so we > decided > to preserve usual PostgreSQL behavior here: > > =# SELECT jsonb '"10-03-2017"' @* '$.datetime("DD-MM-YYYY HH24:MI:SS")'; > ?column? > ----------------------- > "2017-03-10T00:00:00" > (1 row) I think someday we should consider adding support for sql standard conforming datetime. Since it breaks postgres behaviour we will need 'standard_conforming_datetime' guc. > > > Also PostgreSQL is able to automatically recognize format of the input > string for the specified datetime type, but we can only bring this > behavior > into jsonpath by introducing separate item methods .date(), .time(), > .timetz(), .timestamp() and .timestamptz(). Also we can use here our > unfinished feature that gives us ability to work with PostresSQL types in > jsonpath using cast operator :: (see sqljson_ext branch in our github > repo): > > =# SELECT jsonb '"10/03/2017 12:34"' @* '$::timestamptz'; > ?column? > ----------------------------- > "2017-03-10T12:34:00+03:00" > (1 row) Another note. We decided to preserve TZ in JSON_QUERY function and follow standard Postgres behaviour in JSON_VALUE, since JSON_QUERY returns JSON object and JSON_VALUE returns SQL value. SELECT JSON_QUERY(jsonb '"2018-02-21 17:01:23 +05"', '$.datetime("YYYY-MM-DD HH24:MI:SS TZH")'); json_query ----------------------------- "2018-02-21T17:01:23+05:00" (1 row) show timezone; TimeZone ---------- W-SU (1 row) SELECT JSON_VALUE(jsonb '"2018-02-21 17:01:23 +05"', '$.datetime("YYYY-MM-DD HH24:MI:SS TZH")'); json_value ------------------------ 2018-02-21 15:01:23+03 (1 row) > > > > A brief description of the extra jsonpath syntax features contained in the > patch #7: > > * Sequence construction by joining path expressions with comma: > > =# SELECT jsonb '[1, 2, 3]' @* '$[*], 4, 5'; > ?column? > ---------- > 1 > 2 > 3 > 4 > 5 > (5 rows) > > * Array construction by placing sequence into brackets (equivalent to > JSON_QUERY(... WITH UNCONDITIONAL WRAPPER)): > > =# SELECT jsonb '[1, 2, 3]' @* '[$[*], 4, 5]'; > ?column? > ----------------- > [1, 2, 3, 4, 5] > (1 row) > > * Object construction by placing sequences of key-value pairs into braces: > > =# SELECT jsonb '{"a" : [1, 2, 3]}' @* '{a: [$.a[*], 4, 5], "b c": > "dddd"}'; > ?column? > --------------------------------------- > {"a": [1, 2, 3, 4, 5], "b c": "dddd"} > (1 row) > > * Object subscripting with string-valued expressions: > > =# SELECT jsonb '{"a" : "aaa", "b": "a", "c": "ccc"}' @* '$[$.b, "c"]'; > ?column? > ---------- > "aaa" > "ccc" > (2 rows) > > * Support of UNIX epoch time in .datetime() item method: > > =# SELECT jsonb '1519649957.37' @* '$.datetime()'; > ?column? > -------------------------------- > "2018-02-26T12:59:17.37+00:00" > (1 row) > Documentation in user-friendly format (it will be convered to xml, of course) is available https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md We are permanently working on it. > > -- > Nikita Glukhov > Postgres Professional:http://www.postgrespro.com > The Russian Postgres Company >
On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 10th version of the jsonpath patches. > > 1. Fixed error handling in arithmetic operators. > > Now run-time errors in arithmetic operators are catched (added > PG_TRY/PG_CATCH around operator's functions calls) and converted into > Unknown values in predicates as it is required by the standard: I think we really need to rename PG_TRY and PG_CATCH or rethink this whole interface so that people stop thinking they can use it to prevent errors from being thrown. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28.02.2018 06:55, Robert Haas wrote: > On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov > <n.gluhov@postgrespro.ru> wrote: >> Attached 10th version of the jsonpath patches. >> >> 1. Fixed error handling in arithmetic operators. >> >> Now run-time errors in arithmetic operators are catched (added >> PG_TRY/PG_CATCH around operator's functions calls) and converted into >> Unknown values in predicates as it is required by the standard: > I think we really need to rename PG_TRY and PG_CATCH or rethink this > whole interface so that people stop thinking they can use it to > prevent errors from being thrown. I understand that it is unsafe to call arbitrary function inside PG_TRY without rethrowing of caught errors in PG_CATCH, but in jsonpath only the following numeric and datetime functions with known behavior are called inside PG_TRY and only errors of category ERRCODE_DATA_EXCEPTION are caught: numeric_add() numeric_mul() numeric_div() numeric_mod() numeric_float8() float8in() float8_numeric() to_datetime() SQL/JSON standard requires us to handle errors and then perform the specified ON ERROR behavior. In the next SQL/JSON patch I had to use subtransactions for catching errors in JSON_VALUE, JSON_QUERY and JSON_EXISTS where an arbitrary user-defined typecast function can be called. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On 28.02.2018 06:55, Robert Haas wrote:On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov
<n.gluhov@postgrespro.ru> wrote:Attached 10th version of the jsonpath patches.I think we really need to rename PG_TRY and PG_CATCH or rethink this
1. Fixed error handling in arithmetic operators.
Now run-time errors in arithmetic operators are catched (added
PG_TRY/PG_CATCH around operator's functions calls) and converted into
Unknown values in predicates as it is required by the standard:
whole interface so that people stop thinking they can use it to
prevent errors from being thrown.
I understand that it is unsafe to call arbitrary function inside PG_TRY without
rethrowing of caught errors in PG_CATCH, but in jsonpath only the following
numeric and datetime functions with known behavior are called inside PG_TRY
and only errors of category ERRCODE_DATA_EXCEPTION are caught:
numeric_add()
numeric_mul()
numeric_div()
numeric_mod()
numeric_float8()
float8in()
float8_numeric()
to_datetime()
That seems like a quite limited list of functions. What about reworking them
providing a way of calling them without risk of exception? For example, we can
have numeric_add_internal() function which fill given data structure with
error information instead of throwing the error. numeric_add() would be a
wrapper over numeric_add_internal(), which throws an error if corresponding
data structure is filled. In jsonpath we can call numeric_add_internal() and
interpret errors in another way. That seems to be better than use of PG_TRY
and PG_CATCH.
------
Alexander Korotkov
Postgres Professional: http://www. postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Hi, The patch no longer applies - it got broken by fd1a421fe66 which changed columns in pg_proc. A rebase is needed. Fixing it is pretty simle, so I've done that locally and tried to run 'make check' under valgrind. And I got a bunch of reports about uninitialised values. Full report attached, but in general there seem to be two types of failures: Conditional jump or move depends on uninitialised value(s) at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so) by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so) by 0xA926D3: pvsnprintf (psprintf.c:121) by 0x723E03: appendStringInfoVA (stringinfo.c:130) by 0x723D58: appendStringInfo (stringinfo.c:87) by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413) by 0x776F99: outNode (outfuncs.c:3978) by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779) by 0x777CB9: outNode (outfuncs.c:4398) by 0x76D507: _outJsonExpr (outfuncs.c:1752) by 0x777CA1: outNode (outfuncs.c:4395) by 0x767000: _outList (outfuncs.c:187) by 0x776874: outNode (outfuncs.c:3753) by 0x76A4D2: _outTableFunc (outfuncs.c:1068) by 0x776D89: outNode (outfuncs.c:3912) by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209) by 0x777959: outNode (outfuncs.c:4290) by 0x767000: _outList (outfuncs.c:187) by 0x776874: outNode (outfuncs.c:3753) by 0x773713: _outQuery (outfuncs.c:3049) Uninitialised value was created by a stack allocation at 0x5B0C19: base_yyparse (gram.c:26287) This happens when _outCoerceViaIO tries to output 'location' field (that's line 1413), so I guess it's not set/copied somewhere. The second failure looks like this: Conditional jump or move depends on uninitialised value(s) at 0x49E58B: ginFillScanEntry (ginscan.c:72) by 0x49EB56: ginFillScanKey (ginscan.c:221) by 0x49EF72: ginNewScanKey (ginscan.c:369) by 0x4A3875: gingetbitmap (ginget.c:1807) by 0x4F620B: index_getbitmap (indexam.c:727) by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) by 0x6DC26D: ExecScanFetch (execScan.c:95) by 0x6DC308: ExecScan (execScan.c:162) by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) by 0x6E5961: ExecProcNode (executor.h:239) by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) by 0x6E7C84: ExecAgg (nodeAgg.c:1551) by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) by 0x6D1361: ExecProcNode (executor.h:239) by 0x6D3BB7: ExecutePlan (execMain.c:1721) by 0x6D1917: standard_ExecutorRun (execMain.c:361) Uninitialised value was created by a heap allocation at 0xA64FDC: palloc (mcxt.c:858) by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630) by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746) by 0xA340C0: FunctionCall7Coll (fmgr.c:1201) by 0x49EE7F: ginNewScanKey (ginscan.c:313) by 0x4A3875: gingetbitmap (ginget.c:1807) by 0x4F620B: index_getbitmap (indexam.c:727) by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) by 0x6DC26D: ExecScanFetch (execScan.c:95) by 0x6DC308: ExecScan (execScan.c:162) by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) by 0x6E5961: ExecProcNode (executor.h:239) by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) by 0x6E7C84: ExecAgg (nodeAgg.c:1551) by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) by 0x6D1361: ExecProcNode (executor.h:239) So the extra_data allocated in gin_extract_jsonpath_query() get to ginFillScanEntry() uninitialised. Both seem like a valid issues, I think. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Thanks, Tomas ! Will publish a new version really soon ! Regards, Oleg On Tue, Mar 6, 2018 at 12:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hi, > > The patch no longer applies - it got broken by fd1a421fe66 which changed > columns in pg_proc. A rebase is needed. > > Fixing it is pretty simle, so I've done that locally and tried to run > 'make check' under valgrind. And I got a bunch of reports about > uninitialised values. Full report attached, but in general there seem to > be two types of failures: > > Conditional jump or move depends on uninitialised value(s) > at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so) > by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so) > by 0xA926D3: pvsnprintf (psprintf.c:121) > by 0x723E03: appendStringInfoVA (stringinfo.c:130) > by 0x723D58: appendStringInfo (stringinfo.c:87) > by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413) > by 0x776F99: outNode (outfuncs.c:3978) > by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779) > by 0x777CB9: outNode (outfuncs.c:4398) > by 0x76D507: _outJsonExpr (outfuncs.c:1752) > by 0x777CA1: outNode (outfuncs.c:4395) > by 0x767000: _outList (outfuncs.c:187) > by 0x776874: outNode (outfuncs.c:3753) > by 0x76A4D2: _outTableFunc (outfuncs.c:1068) > by 0x776D89: outNode (outfuncs.c:3912) > by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209) > by 0x777959: outNode (outfuncs.c:4290) > by 0x767000: _outList (outfuncs.c:187) > by 0x776874: outNode (outfuncs.c:3753) > by 0x773713: _outQuery (outfuncs.c:3049) > Uninitialised value was created by a stack allocation > at 0x5B0C19: base_yyparse (gram.c:26287) > > This happens when _outCoerceViaIO tries to output 'location' field > (that's line 1413), so I guess it's not set/copied somewhere. > > The second failure looks like this: > > Conditional jump or move depends on uninitialised value(s) > at 0x49E58B: ginFillScanEntry (ginscan.c:72) > by 0x49EB56: ginFillScanKey (ginscan.c:221) > by 0x49EF72: ginNewScanKey (ginscan.c:369) > by 0x4A3875: gingetbitmap (ginget.c:1807) > by 0x4F620B: index_getbitmap (indexam.c:727) > by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) > by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) > by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) > by 0x6DC26D: ExecScanFetch (execScan.c:95) > by 0x6DC308: ExecScan (execScan.c:162) > by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6E5961: ExecProcNode (executor.h:239) > by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) > by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) > by 0x6E7C84: ExecAgg (nodeAgg.c:1551) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6D1361: ExecProcNode (executor.h:239) > by 0x6D3BB7: ExecutePlan (execMain.c:1721) > by 0x6D1917: standard_ExecutorRun (execMain.c:361) > Uninitialised value was created by a heap allocation > at 0xA64FDC: palloc (mcxt.c:858) > by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630) > by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746) > by 0xA340C0: FunctionCall7Coll (fmgr.c:1201) > by 0x49EE7F: ginNewScanKey (ginscan.c:313) > by 0x4A3875: gingetbitmap (ginget.c:1807) > by 0x4F620B: index_getbitmap (indexam.c:727) > by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) > by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) > by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) > by 0x6DC26D: ExecScanFetch (execScan.c:95) > by 0x6DC308: ExecScan (execScan.c:162) > by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6E5961: ExecProcNode (executor.h:239) > by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) > by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) > by 0x6E7C84: ExecAgg (nodeAgg.c:1551) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6D1361: ExecProcNode (executor.h:239) > > So the extra_data allocated in gin_extract_jsonpath_query() get to > ginFillScanEntry() uninitialised. > > > Both seem like a valid issues, I think. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06.03.2018 00:29, Tomas Vondra wrote: > Hi, > > The patch no longer applies - it got broken by fd1a421fe66 which changed > columns in pg_proc. A rebase is needed. > > Fixing it is pretty simle, so I've done that locally and tried to run > 'make check' under valgrind. And I got a bunch of reports about > uninitialised values. Thank you very much for your discovery. It is my fault that I never ran PostgreSQL under valgrind. Rebased and fixed patches are attached. > Full report attached, but in general there seem to > be two types of failures: > > Conditional jump or move depends on uninitialised value(s) > at 0x57BB47A: vfprintf (in /usr/lib64/libc-2.24.so) > by 0x57E5478: vsnprintf (in /usr/lib64/libc-2.24.so) > by 0xA926D3: pvsnprintf (psprintf.c:121) > by 0x723E03: appendStringInfoVA (stringinfo.c:130) > by 0x723D58: appendStringInfo (stringinfo.c:87) > by 0x76BEFF: _outCoerceViaIO (outfuncs.c:1413) > by 0x776F99: outNode (outfuncs.c:3978) > by 0x76D7E7: _outJsonCoercion (outfuncs.c:1779) > by 0x777CB9: outNode (outfuncs.c:4398) > by 0x76D507: _outJsonExpr (outfuncs.c:1752) > by 0x777CA1: outNode (outfuncs.c:4395) > by 0x767000: _outList (outfuncs.c:187) > by 0x776874: outNode (outfuncs.c:3753) > by 0x76A4D2: _outTableFunc (outfuncs.c:1068) > by 0x776D89: outNode (outfuncs.c:3912) > by 0x7744FD: _outRangeTblEntry (outfuncs.c:3209) > by 0x777959: outNode (outfuncs.c:4290) > by 0x767000: _outList (outfuncs.c:187) > by 0x776874: outNode (outfuncs.c:3753) > by 0x773713: _outQuery (outfuncs.c:3049) > Uninitialised value was created by a stack allocation > at 0x5B0C19: base_yyparse (gram.c:26287) > > This happens when _outCoerceViaIO tries to output 'location' field > (that's line 1413), so I guess it's not set/copied somewhere. Yes, JSON FORMAT location was not set in gram.y. > The second failure looks like this: > > Conditional jump or move depends on uninitialised value(s) > at 0x49E58B: ginFillScanEntry (ginscan.c:72) > by 0x49EB56: ginFillScanKey (ginscan.c:221) > by 0x49EF72: ginNewScanKey (ginscan.c:369) > by 0x4A3875: gingetbitmap (ginget.c:1807) > by 0x4F620B: index_getbitmap (indexam.c:727) > by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) > by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) > by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) > by 0x6DC26D: ExecScanFetch (execScan.c:95) > by 0x6DC308: ExecScan (execScan.c:162) > by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6E5961: ExecProcNode (executor.h:239) > by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) > by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) > by 0x6E7C84: ExecAgg (nodeAgg.c:1551) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6D1361: ExecProcNode (executor.h:239) > by 0x6D3BB7: ExecutePlan (execMain.c:1721) > by 0x6D1917: standard_ExecutorRun (execMain.c:361) > Uninitialised value was created by a heap allocation > at 0xA64FDC: palloc (mcxt.c:858) > by 0x938636: gin_extract_jsonpath_query (jsonb_gin.c:630) > by 0x938AB6: gin_extract_jsonb_query (jsonb_gin.c:746) > by 0xA340C0: FunctionCall7Coll (fmgr.c:1201) > by 0x49EE7F: ginNewScanKey (ginscan.c:313) > by 0x4A3875: gingetbitmap (ginget.c:1807) > by 0x4F620B: index_getbitmap (indexam.c:727) > by 0x6EE342: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:104) > by 0x6DA8F8: MultiExecProcNode (execProcnode.c:506) > by 0x6EC53D: BitmapHeapNext (nodeBitmapHeapscan.c:118) > by 0x6DC26D: ExecScanFetch (execScan.c:95) > by 0x6DC308: ExecScan (execScan.c:162) > by 0x6ED7E5: ExecBitmapHeapScan (nodeBitmapHeapscan.c:730) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6E5961: ExecProcNode (executor.h:239) > by 0x6E5E25: fetch_input_tuple (nodeAgg.c:406) > by 0x6E8091: agg_retrieve_direct (nodeAgg.c:1736) > by 0x6E7C84: ExecAgg (nodeAgg.c:1551) > by 0x6DA80A: ExecProcNodeFirst (execProcnode.c:446) > by 0x6D1361: ExecProcNode (executor.h:239) > > So the extra_data allocated in gin_extract_jsonpath_query() get to > ginFillScanEntry() uninitialised. Yes, only the first element of extra_data[] was initialized and used later in gin_consistent_jsonb(), but ginFillScanEntry() wants all of them to be initialized. > Both seem like a valid issues, I think. > > regards -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 02.03.2018 00:57, Alexander Korotkov wrote:
On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:On 28.02.2018 06:55, Robert Haas wrote:On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov
<n.gluhov@postgrespro.ru> wrote:Attached 10th version of the jsonpath patches.I think we really need to rename PG_TRY and PG_CATCH or rethink this
1. Fixed error handling in arithmetic operators.
Now run-time errors in arithmetic operators are catched (added
PG_TRY/PG_CATCH around operator's functions calls) and converted into
Unknown values in predicates as it is required by the standard:
whole interface so that people stop thinking they can use it to
prevent errors from being thrown.
I understand that it is unsafe to call arbitrary function inside PG_TRY without
rethrowing of caught errors in PG_CATCH, but in jsonpath only the following
numeric and datetime functions with known behavior are called inside PG_TRY
and only errors of category ERRCODE_DATA_EXCEPTION are caught:
numeric_add()
numeric_mul()
numeric_div()
numeric_mod()
numeric_float8()
float8in()
float8_numeric()
to_datetime()That seems like a quite limited list of functions. What about reworking themproviding a way of calling them without risk of exception? For example, we canhave numeric_add_internal() function which fill given data structure witherror information instead of throwing the error. numeric_add() would be awrapper over numeric_add_internal(), which throws an error if correspondingdata structure is filled. In jsonpath we can call numeric_add_internal() andinterpret errors in another way. That seems to be better than use of PG_TRYand PG_CATCH.
Attached 12th version of jsonpath patches. I added the 7th patch where the following functions were extracted for safe error handling in jsonpath: numeric_add_internal() numeric_sub_internal() numeric_mul_internal() numeric_div_internal() numeric_mod_internal() float8_numeric_internal() numeric_float8_internal() float8in_internal() Errors are passed to caller with new ereport_safe() macro when ErrorData **edata is not NULL: +#define ereport_safe(edata, elevel, rest) \ + do { \ + if (edata) { \ + errstart(elevel, __FILE__, __LINE__, PG_FUNCNAME_MACRO, TEXTDOMAIN); \ + (rest); \ + *(edata) = CopyErrorData(); \ + FlushErrorState(); \ + } else { \ + ereport(elevel, rest); \ + } \ + } while (0) But to_datetime() is still called in jsonpath inside PG_TRY/PG_CATCH block because it needs too deep error propagation. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Attached 13th version of the jsonpath patches. Syntax of .** accessor (our extension to standard) was changed to become more similar to the syntax of the standard array accessor: .**{2, 5} => .**{2 to 5} .**{3,} => .**{3 to last} -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Fri, Mar 2, 2018 at 8:27 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> > wrote: >> >> On 28.02.2018 06:55, Robert Haas wrote: >> >>> On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov >>> <n.gluhov@postgrespro.ru> wrote: >>>> >>>> Attached 10th version of the jsonpath patches. >>>> >>>> 1. Fixed error handling in arithmetic operators. >>>> >>>> Now run-time errors in arithmetic operators are catched (added >>>> PG_TRY/PG_CATCH around operator's functions calls) and converted >>>> into >>>> Unknown values in predicates as it is required by the standard: >>> >>> I think we really need to rename PG_TRY and PG_CATCH or rethink this >>> whole interface so that people stop thinking they can use it to >>> prevent errors from being thrown. >> >> >> I understand that it is unsafe to call arbitrary function inside PG_TRY >> without >> rethrowing of caught errors in PG_CATCH, but in jsonpath only the >> following >> numeric and datetime functions with known behavior are called inside >> PG_TRY >> and only errors of category ERRCODE_DATA_EXCEPTION are caught: >> >> numeric_add() >> numeric_mul() >> numeric_div() >> numeric_mod() >> numeric_float8() >> float8in() >> float8_numeric() >> to_datetime() > > > That seems like a quite limited list of functions. What about reworking > them > providing a way of calling them without risk of exception? For example, we > can > have numeric_add_internal() function which fill given data structure with > error information instead of throwing the error. numeric_add() would be a > wrapper over numeric_add_internal(), which throws an error if corresponding > data structure is filled. In jsonpath we can call numeric_add_internal() > and > interpret errors in another way. That seems to be better than use of PG_TRY > and PG_CATCH. > I haven't seen a response to this email. Do we need one before proceeding any further with jsonpath? cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 20, 2018 at 3:36 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On Fri, Mar 2, 2018 at 8:27 AM, Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >> On Fri, Mar 2, 2018 at 12:40 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> >> wrote: >>> >>> On 28.02.2018 06:55, Robert Haas wrote: >>> >>>> On Mon, Feb 26, 2018 at 10:34 AM, Nikita Glukhov >>>> <n.gluhov@postgrespro.ru> wrote: >>>>> >>>>> Attached 10th version of the jsonpath patches. >>>>> >>>>> 1. Fixed error handling in arithmetic operators. >>>>> >>>>> Now run-time errors in arithmetic operators are catched (added >>>>> PG_TRY/PG_CATCH around operator's functions calls) and converted >>>>> into >>>>> Unknown values in predicates as it is required by the standard: >>>> >>>> I think we really need to rename PG_TRY and PG_CATCH or rethink this >>>> whole interface so that people stop thinking they can use it to >>>> prevent errors from being thrown. >>> >>> >>> I understand that it is unsafe to call arbitrary function inside PG_TRY >>> without >>> rethrowing of caught errors in PG_CATCH, but in jsonpath only the >>> following >>> numeric and datetime functions with known behavior are called inside >>> PG_TRY >>> and only errors of category ERRCODE_DATA_EXCEPTION are caught: >>> >>> numeric_add() >>> numeric_mul() >>> numeric_div() >>> numeric_mod() >>> numeric_float8() >>> float8in() >>> float8_numeric() >>> to_datetime() >> >> >> That seems like a quite limited list of functions. What about reworking >> them >> providing a way of calling them without risk of exception? For example, we >> can >> have numeric_add_internal() function which fill given data structure with >> error information instead of throwing the error. numeric_add() would be a >> wrapper over numeric_add_internal(), which throws an error if corresponding >> data structure is filled. In jsonpath we can call numeric_add_internal() >> and >> interpret errors in another way. That seems to be better than use of PG_TRY >> and PG_CATCH. >> > > > I haven't seen a response to this email. Do we need one before > proceeding any further with jsonpath? > Apologies. I see the reply now. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> That seems like a quite limited list of functions. What about >> reworking them providing a way of calling them without risk of >> exception? > I haven't seen a response to this email. Do we need one before > proceeding any further with jsonpath? I've not been following this thread in detail, but IMO any code anywhere that thinks that no error can be thrown out of non-straight-line code is broken beyond redemption. What, for example, happens if we get ENOMEM within one of the elog.c functions? I did look through 0007-jsonpath-arithmetic-error-handling-v12.patch, and I can't believe that's seriously proposed for commit. It's making some pretty fragile changes in error handling, and so far as I can find there is not even one line of commentary as to what the new design rules are supposed to be. Even if it's completely bug-free today (which I would bet against), how could we keep it so? regards, tom lane
Attached 14th version of the patches: * refactored predicates, introduced 3-valued JsonPathBool type instead of JsonPathExecResult * refactored JsonPathExecResult: now it is typedefed to ErrorData * * fixed recursive wildcard accessor (.**) in strict mode: structural errors after .** are ignored now On 20.03.2018 08:36, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> That seems like a quite limited list of functions. What about >>> reworking them providing a way of calling them without risk of >>> exception? >> I haven't seen a response to this email. Do we need one before >> proceeding any further with jsonpath? > I've not been following this thread in detail, but IMO any code anywhere > that thinks that no error can be thrown out of non-straight-line code is > broken beyond redemption. What, for example, happens if we get ENOMEM > within one of the elog.c functions? Out of memory are intentionally is not caught here, ereport_safe() is primarily intended only for handling of errors of ERRCODE_DATA_EXCEPTION category (see explanation below). > I did look through 0007-jsonpath-arithmetic-error-handling-v12.patch, > and I can't believe that's seriously proposed for commit. It's making > some pretty fragile changes in error handling, and so far as I can > find there is not even one line of commentary as to what the new > design rules are supposed to be. Even if it's completely bug-free > today (which I would bet against), how could we keep it so? > > regards, tom lane I basically agree, this experimental patch is optional (see below). I think I need to clarify the error handling in SQL/JSON one more time here. SQL/JSON standard requires us to handle errors being thrown from the JSON path engine and from SQL/JSON item => SQL type coercions, and then perform the specified ON ERROR behavior. Standard does not limit handled error categories, but I think we are interesting here only in errors of category "data exception" (ERRCODE_DATA_EXCEPTION) in which all possible SQL/JSON, arithmetic, datetime errors fall. It might be interesting that such error handling has appeared in standard only with SQL/JSON introduction. Example of arithmetic error handling in JSON_VALUE: =# SELECT JSON_VALUE('0', '1 / $' RETURNING int); -- NULL ON ERROR by default json_value ------------ (1 row) =# SELECT JSON_VALUE('0', '1 / $' RETURNING int ERROR ON ERROR); ERROR: division by zero =# SELECT JSON_VALUE('0', '1 / $' RETURNING int DEFAULT -1 ON ERROR); json_value ------------ -1 (1 row) Error handling in SQL/JSON functions is implemented with PG_TRY/PG_CATCH block with subtransaction (see ExecEvalJsonExpr() in SQL/JSON patch). We had to use subtransactions here because an arbitrary user-defined typecast function can be called. One could think that could only support ERROR ON ERROR behavior and have no problems with PG_TRY/PG_CATCH, but errors should be handled inside jsonpath predicates (boolean expression used in filters) regardless of ON ERROR behavior. Any error in predicate operands is converted directly to special Unknown value (analogue of SQL NULL). In the following examples arithmetic errors and errors in conversion of string into double are handled by comparison predicate itself, not by JSON_QUERY: =# SELECT JSON_QUERY('[1.23, "4.56", 0, "foo"]', '$[*] ? (1 / @.double() > 0)' WITH WRAPPER); json_query ---------------- [1.23, "4.56"] (1 row) =# SELECT JSON_QUERY('[1.23, "4.56", 0, "foo"]', '$[*] ? ((1 / @.double() > 0) is unknown)' WITH WRAPPER); json_query ------------ [0, "foo"] (1 row) Implementation of handling of arithmetic and datetime errors in patch #4 uses PG_TRY/PG_CATCH without surrounding substransactions, because only the limited set of built-in functions with known behavior is called inside this block: numeric_add() numeric_mul() numeric_div() numeric_mod() numeric_float8() float8in() float8_numeric() to_datetime() The 7th optional patch is attempt to completely get rid of PG_TRY/PG_CATCH in jsonpath by the idea of Alexander Korotkov: additional parameter "ErrorData **edata" is passed to internal variants of those functions for saving error info into it instead of throwing it. But I agree that this solution is far from ideal and the assertion that some functions should not throw exceptions of some category can easily be broken. Moreover, I have done it only for numeric functions, because to_datetime() function needs too deep propagation of ErrorData passing. If your see other variants of implementation of error handling in jsonpath, please tell us about them. Now we can offer only two variants described above, but we understand that some future work is necessary to support such standard features in PostgreSQL without having problems with PG_TRY/PG_CATCH. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 3/27/18 11:13 AM, Nikita Glukhov wrote: > > Attached 14th version of the patches: It appears this entry should be marked Needs Review, so I have done that and moved it to the next CF. Regards, -- -David david@pgmasters.net
Attached 15th version of the patches. There are changes only in jsonpath lexer: * added single-quoted strings * added escape sequences \v, \', \xXX, \u{...} It's all required by ECMAScript standard (ECMA-262), from which SQL/JSON standard adopts lexical rules. Also, the documentation has been moved into a separate patch (see https://www.postgresql.org/message-id/732208d3-56c3-25a4-8f08-3be1d54ad51b%40postgrespro.ru). -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Thu, Jun 28, 2018 at 11:38 AM, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 15th version of the patches. Hi Nikita, I wonder why the Windows build scripts are not finding and processing your new .y and .l files: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.3672 Perhaps Mkvcbuild.pm needs to be told about them? For example, it has this: $postgres->AddFiles('src/backend/parser', 'scan.l', 'gram.y'); PS If you need a way to test on Windows but don't have a local Windows build system, this may be useful: https://wiki.postgresql.org/wiki/Continuous_Integration -- Thomas Munro http://www.enterprisedb.com
On 28.06.2018 05:39, Thomas Munro wrote: > On Thu, Jun 28, 2018 at 11:38 AM, Nikita Glukhov > <n.gluhov@postgrespro.ru> wrote: >> Attached 15th version of the patches. > Hi Nikita, > > I wonder why the Windows build scripts are not finding and processing > your new .y and .l files: > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.3672 > > Perhaps Mkvcbuild.pm needs to be told about them? For example, it has this: > > $postgres->AddFiles('src/backend/parser', 'scan.l', 'gram.y'); Yes, generation of jsonpath_gram.* was missing in Mkvcbuild.pm. I have fixed it in the new 16th version of patches, and now it seems to work on Windows (https://ci.appveyor.com/project/NikitaGlukhov/postgres/build/1.0.4) > PS If you need a way to test on Windows but don't have a local Windows > build system, this may be useful: > > https://wiki.postgresql.org/wiki/Continuous_Integration Thank, this is really helpful. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Attached 17th version of the patches rebased onto the current master. Nothing significant has changed. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
> On 23 Aug 2018, at 00:29, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > Attached 17th version of the patches rebased onto the current master. > > Nothing significant has changed. Hey. I’ve looked through presented implementation of jsonpath and have some remarks: 1. Current patch set adds functions named jsonpath_* that implementing subset of functionality of corresponding json_* functions which doesn’t require changes to postgres SQL grammar. If we are going to commit json_* functions it will be quite strange for end user: two sets of functions with mostly identical behaviour, except that one of them allows customise error handling and pass variable via special language constructions. I suggest only leave operators in this patch, but remove functions. 2. Handling of Unknown value differs from one that described in tech report: filter '(@.k == 1)’ doesn’t evaluates to Unknown. Consider following example: select '42'::json @* '$ ? ( (@.k == 1) is unknown )'; As per my understanding of standard this should show 42. Seems that it was evaluated to False instead, because boolean operators (! || &&) on filter expression with structural error behave like it is False. 3. .keyvalue() returns object with field named ‘key’ instead of ’name’ as per tech report. ‘key’ field seems to be more consistent with function name, but i’m not sure it is worths of mismatch with standard. Also ‘id’ field is omitted, making it harder to use something like GROUP BY afterwards. 4. Looks like jsonpath executor lacks some CHECK_FOR_INTERRUPTS() during hot paths. Backend with following query is unresponsive to signals: select count(*) from ( select '[0,1,2,3,4,5,6,7,8,9]'::json @* longpath::jsonpath from ( select '$[' || string_agg(subscripts, ',') ||']' as longpath from ( select 'last,1' as subscripts from generate_series(1,1000000) ) subscripts_q ) jpath_q ) count_q; 5. Files generated by lex/bison should be listed in .gitignore files in corresponding directories. 6. My compiler complains about unused functions: JsonValueListConcat, JsonValueListClear. 7. Presented patch files structure is somewhat complicated with patches to patches. I've melded them down to following patches: 0001: three first patches with preliminary datetime infrastructure 0002: Jsonpath engine and operators that is your previous 4+6+7 0003: Jsonpath extensions is your previous 8+9 0004: GIN support is your 5th path Also this patches were formed with 'git format-patch', so one can apply all of them with 'git apply' restoring each one as commit. 8. Also sending jsonpath_check.sql with queries which I used to check compliance with the standard. That can be added to regression test, if you want. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Attached 18th version of the patches rebased onto the current master. On 04.09.2018 17:15, Stas Kelvich wrote:
On 23 Aug 2018, at 00:29, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: Attached 17th version of the patches rebased onto the current master. Nothing significant has changed.Hey. I’ve looked through presented implementation of jsonpath and have some remarks:
Thank you for your review.
1. Current patch set adds functions named jsonpath_* that implementing subset of functionality of corresponding json_* functions which doesn’t require changes to postgres SQL grammar. If we are going to commit json_* functions it will be quite strange for end user: two sets of functions with mostly identical behaviour, except that one of them allows customise error handling and pass variable via special language constructions. I suggest only leave operators in this patch, but remove functions.
We can't simply remove these functions because they are the implementation of operators.
2. Handling of Unknown value differs from one that described in tech report: filter '(@.k == 1)’ doesn’t evaluates to Unknown. Consider following example: select '42'::json @* '$ ? ( (@.k == 1) is unknown )'; As per my understanding of standard this should show 42. Seems that it was evaluated to False instead, because boolean operators (! || &&) on filter expression with structural error behave like it is False.
Yes, this example should return 42, but only in the strict mode. In the lax mode, which is default, structural errors are ignored, so '@.k' evaluates to an empty sequence, and '@.k == 1' then returns False.
3. .keyvalue() returns object with field named ‘key’ instead of ’name’ as per tech report. ‘key’ field seems to be more consistent with function name, but i’m not sure it is worths of mismatch with standard. Also ‘id’ field is omitted, making it harder to use something like GROUP BY afterwards.
SQL/JSON standard uses "key" as a field name (9.39, page 716):vi) For all h, 1 <= h <= q, let OBJh be an SQL/JSON object with three members: 1) The first member has key "key" and bound value Ki . 2) The second member has key "value”"and bound value BVi. 3) The third member has key "id" and bound value IDj.
But "id" field was really missing in our implementation. I have added it using byte offset of the object in jsonb as its identifier.
4. Looks like jsonpath executor lacks some CHECK_FOR_INTERRUPTS() during hot paths. Backend with following query is unresponsive to signals: select count(*) from ( select '[0,1,2,3,4,5,6,7,8,9]'::json @* longpath::jsonpath from ( select '$[' || string_agg(subscripts, ',') ||']' as longpath from ( select 'last,1' as subscripts from generate_series(1,1000000) ) subscripts_q ) jpath_q ) count_q;
Fixed: added CHECK_FOR_INTERRUPTS() to jsonpath parser and executor.
5. Files generated by lex/bison should be listed in .gitignore files in corresponding directories.
Fixed.
6. My compiler complains about unused functions: JsonValueListConcat, JsonValueListClear.
Fixed: these functions used only in the next SQL/JSON patches were removed.
7. Presented patch files structure is somewhat complicated with patches to patches. I've melded them down to following patches: 0001: three first patches with preliminary datetime infrastructure 0002: Jsonpath engine and operators that is your previous 4+6+7 0003: Jsonpath extensions is your previous 8+9 0004: GIN support is your 5th path Also this patches were formed with 'git format-patch', so one can apply all of them with 'git apply' restoring each one as commit.
New patch set is formed with 'git format-patch', but I can still supply patches in the previous split form.
8. Also sending jsonpath_check.sql with queries which I used to check compliance with the standard. That can be added to regression test, if you want.
I think these additional tests can be added, but I have not yet done it in this version of the patches. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote: > Attached 18th version of the patches rebased onto the current master. Nikita, this version fails to apply, as 0004 has conflicts with some regression tests. Could you rebase? I am moving the patch to CF 2018-11, waiting for your input. -- Michael
Attachment
Hi, On 10/02/2018 04:33 AM, Michael Paquier wrote: > On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote: >> Attached 18th version of the patches rebased onto the current master. > > Nikita, this version fails to apply, as 0004 has conflicts with some > regression tests. Could you rebase? I am moving the patch to CF > 2018-11, waiting for your input. > -- > Michael > As Michael mentioned, the patch does not apply anymore, so it would be good to provide a rebased version for CF 2018-11. I've managed to do that, as the issues are due to minor bitrot, so that I can do some quick review of the current version. I haven't done much testing, pretty much just compiling, running the usual regression tests and reading through the patches. I plan to do more testing once the rebased patch is submitted. Now, a couple of comments based on eye-balling the diffs. 1) There are no docs, which makes it pretty much non-committable for now. I know there is [1] and it was a good intro for the review, but we need something as a part of our sgml docs. 2) 0001 says this: *typmod = -1; /* TODO implement FF1, ..., FF9 */ but I have no idea what FF1 or FF9 are. I guess it needs to be implemented, or explained better. 3) The makefile rule for scan.o does this: +# Latest flex causes warnings in this file. +ifeq ($(GCC),yes) +scan.o: CFLAGS += -Wno-error +endif That seems a bit ugly, and we should probably try to make it work with the latest flex, instead of hiding the warnings. I don't think we have any such ad-hoc rules in other Makefiles. If we really need it, can't we make it part of configure, and/or restrict it depending on flex version? 4) There probably should be .gitignore rule for jsonpath_gram.h, just like for other generated header files. 5) jbvType says jbvDatetime is "virtual type" but does not explain what it is. IMHO that deserves a comment or something. 6) I see the JsonPath definition says this about header: /* just version, other bits are reservedfor future use */ but the very next thing it does is defining two pieces stored in the header - version AND "lax" mode flag. Which makes the comment invalid (also, note the missing space after "reserved"). FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the codebase works with "strict" flags passed around, and it's easy to forget to negate the flag somewhere (at least that's my experience). 7) I see src/include/utils/jsonpath_json.h adds support for plain json by undefining various jsonb macros and redirecting them to the json variants. I find that rather suspicious - imagine you're investigating something in code using those jsonb macros, and wondering why it ends up calling the json stuff. I'd be pretty confused ... Also, some of the redefinitions are not really needed for example JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere (and neither are the json variants). 8) I see 0002 defines IsAJsonbScalar like this: #define IsAJsonbScalar(jsonbval) (((jsonbval)->type >= jbvNull && \ (jsonbval)->type <= jbvBool) || \ (jsonbval)->type == jbvDatetime) while 0004 does this #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool) I know those are for different data types (jsonb vs. jsonpath), but I suppose jspIsScalar should include the datetime too. FWIW JsonPathItemType would deserve better documentation what the various items mean (a comment for each line would be enough). I've been wondering if "jpiDouble" means scalar double value until I realized there's a ".double()" function for paths. 9) It's generally a good idea to make the individual pieces committable separately, but that means e.g. the regression tests have to pass after each patch. At the moment that does not seem to be the case for 0002, see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, not sure if that's related. regards [1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Oct 29, 2018 at 2:20 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Hi, > > On 10/02/2018 04:33 AM, Michael Paquier wrote: > > On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote: > >> Attached 18th version of the patches rebased onto the current master. > > > > Nikita, this version fails to apply, as 0004 has conflicts with some > > regression tests. Could you rebase? I am moving the patch to CF > > 2018-11, waiting for your input. > > -- > > Michael > > > > As Michael mentioned, the patch does not apply anymore, so it would be > good to provide a rebased version for CF 2018-11. I've managed to do > that, as the issues are due to minor bitrot, so that I can do some quick > review of the current version. > > I haven't done much testing, pretty much just compiling, running the > usual regression tests and reading through the patches. I plan to do > more testing once the rebased patch is submitted. > > Now, a couple of comments based on eye-balling the diffs. > > > 1) There are no docs, which makes it pretty much non-committable for > now. I know there is [1] and it was a good intro for the review, but we > need something as a part of our sgml docs. SQL/JSON documentation discussed in separate topic https://www.postgresql.org/message-id/flat/732208d3-56c3-25a4-8f08-3be1d54ad51b%40postgrespro.ru > > > 2) 0001 says this: > > *typmod = -1; /* TODO implement FF1, ..., FF9 */ > > but I have no idea what FF1 or FF9 are. I guess it needs to be > implemented, or explained better. > > > 3) The makefile rule for scan.o does this: > > +# Latest flex causes warnings in this file. > +ifeq ($(GCC),yes) > +scan.o: CFLAGS += -Wno-error > +endif > > That seems a bit ugly, and we should probably try to make it work with > the latest flex, instead of hiding the warnings. I don't think we have > any such ad-hoc rules in other Makefiles. If we really need it, can't we > make it part of configure, and/or restrict it depending on flex version? > > > 4) There probably should be .gitignore rule for jsonpath_gram.h, just > like for other generated header files. > > > 5) jbvType says jbvDatetime is "virtual type" but does not explain what > it is. IMHO that deserves a comment or something. > > > 6) I see the JsonPath definition says this about header: > > /* just version, other bits are reservedfor future use */ > > but the very next thing it does is defining two pieces stored in the > header - version AND "lax" mode flag. Which makes the comment invalid > (also, note the missing space after "reserved"). > > FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the > codebase works with "strict" flags passed around, and it's easy to > forget to negate the flag somewhere (at least that's my experience). > > > 7) I see src/include/utils/jsonpath_json.h adds support for plain json > by undefining various jsonb macros and redirecting them to the json > variants. I find that rather suspicious - imagine you're investigating > something in code using those jsonb macros, and wondering why it ends up > calling the json stuff. I'd be pretty confused ... > > Also, some of the redefinitions are not really needed for example > JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere > (and neither are the json variants). > > > 8) I see 0002 defines IsAJsonbScalar like this: > > #define IsAJsonbScalar(jsonbval) (((jsonbval)->type >= jbvNull && \ > (jsonbval)->type <= jbvBool) || \ > (jsonbval)->type == jbvDatetime) > > while 0004 does this > > #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool) > > I know those are for different data types (jsonb vs. jsonpath), but I > suppose jspIsScalar should include the datetime too. > > FWIW JsonPathItemType would deserve better documentation what the > various items mean (a comment for each line would be enough). I've been > wondering if "jpiDouble" means scalar double value until I realized > there's a ".double()" function for paths. > > > 9) It's generally a good idea to make the individual pieces committable > separately, but that means e.g. the regression tests have to pass after > each patch. At the moment that does not seem to be the case for 0002, > see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, > not sure if that's related. > > regards > > > [1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 29.10.2018 2:20, Tomas Vondra wrote:
On 10/02/2018 04:33 AM, Michael Paquier wrote:On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote:Attached 18th version of the patches rebased onto the current master.Nikita, this version fails to apply, as 0004 has conflicts with some regression tests. Could you rebase? I am moving the patch to CF 2018-11, waiting for your input. -- MichaelAs Michael mentioned, the patch does not apply anymore, so it would be good to provide a rebased version for CF 2018-11. I've managed to do that, as the issues are due to minor bitrot, so that I can do some quick review of the current version.
Sorry that I failed to provide rebased version earlier. Attached 19th version of the patches rebased onto the current master.
I haven't done much testing, pretty much just compiling, running the usual regression tests and reading through the patches. I plan to do more testing once the rebased patch is submitted. Now, a couple of comments based on eye-balling the diffs.
Thank you for your review.
1) There are no docs, which makes it pretty much non-committable for now. I know there is [1] and it was a good intro for the review, but we need something as a part of our sgml docs.
I think that jsonpath part of documentation can be extracted from [2] and added to the patch set.
2) 0001 says this: *typmod = -1; /* TODO implement FF1, ..., FF9 */ but I have no idea what FF1 or FF9 are. I guess it needs to be implemented, or explained better.
FF1-FF9 are standard datetime template fields used for specifying of fractional seconds. FF3/FF6 are analogues of our MS/US. I decided simply to implement this feature (see patch 0001, I also can supply it in the separate patch). But FF7-FF9 are questionable since the maximal supported precision is only 6. They are optional by the standard: 95) Specifications for Feature F555, “Enhanced seconds precision”: d) Subclause 9.44, “Datetime templates”: i) Without Feature F555, “Enhanced seconds precision”, a <datetime template fraction> shall not be FF7, FF8 or FF9. So I decided to allow FF7-FF9 only on the output in to_char().
3) The makefile rule for scan.o does this: +# Latest flex causes warnings in this file. +ifeq ($(GCC),yes) +scan.o: CFLAGS += -Wno-error +endif That seems a bit ugly, and we should probably try to make it work with the latest flex, instead of hiding the warnings. I don't think we have any such ad-hoc rules in other Makefiles. If we really need it, can't we make it part of configure, and/or restrict it depending on flex version?
These lines seem to belong to the earliest versions of our jsonpath implementation. There is no scan.o file now at all, there is only jsonpath_scan.o, so I simply removed these lines.
4) There probably should be .gitignore rule for jsonpath_gram.h, just like for other generated header files.
I see 3 rules in /src/backend/utils/adt/.gitignore:
/jsonpath_gram.h /jsonpath_gram.c /jsonpath_scan.c
5) jbvType says jbvDatetime is "virtual type" but does not explain what it is. IMHO that deserves a comment or something.
"Virtual type" means here that it is used only for in-memory processing and converted into JSON string when outputted to jsonb. Corresponding comment was added.
6) I see the JsonPath definition says this about header: /* just version, other bits are reservedfor future use */ but the very next thing it does is defining two pieces stored in the header - version AND "lax" mode flag. Which makes the comment invalid (also, note the missing space after "reserved").
Fixed.
FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the codebase works with "strict" flags passed around, and it's easy to forget to negate the flag somewhere (at least that's my experience).
Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is saved in "laxMode" field. New "strict" flag passed to datetime functions is unrelated to jsonpath.
7) I see src/include/utils/jsonpath_json.h adds support for plain json by undefining various jsonb macros and redirecting them to the json variants. I find that rather suspicious - imagine you're investigating something in code using those jsonb macros, and wondering why it ends up calling the json stuff. I'd be pretty confused ...
I agree, this is rather simple but doubtful solution. That's why json support was in a separate patch until the 18th version of the patches. But if we do not want to compile jsonpath.c twice with different definitions, then we need some kind of run-time wrapping over json strings and jsonb containers, which seems a bit harder to implement. To simplify debugging I can also suggest to explicitly preprocess jsonpath.c into jsonpath_json.c using redefinitions from jsonpath_json.h before its compilation.
Also, some of the redefinitions are not really needed for example JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere (and neither are the json variants).
These definitions will be used in the next patches, so I removed them for now from this patch.
8) I see 0002 defines IsAJsonbScalar like this: #define IsAJsonbScalar(jsonbval) (((jsonbval)->type >= jbvNull && \ (jsonbval)->type <= jbvBool) || \ (jsonbval)->type == jbvDatetime) while 0004 does this #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool) I know those are for different data types (jsonb vs. jsonpath), but I suppose jspIsScalar should include the datetime too.
jpiDatetime does not mean the same thing as jbvDatetime. jpiDatetime is a representation of ".datetime()" item method. jbvDatetime is a representation of datetime SQL/JSON items. Also datetime SQL/JSON items can be represented in JSON path only by strings with ".datetime()" method applied, they do not have their own literal: '"2018-01-10".datetime()'
FWIW JsonPathItemType would deserve better documentation what the various items mean (a comment for each line would be enough). I've been wondering if "jpiDouble" means scalar double value until I realized there's a ".double()" function for paths.
I added per-item comments.
9) It's generally a good idea to make the individual pieces committable separately, but that means e.g. the regression tests have to pass after each patch. At the moment that does not seem to be the case for 0002, see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, not sure if that's related.
This should definitely be a bug in json support, but I can't reproduce it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY. Could you provide a stack trace at least? [1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md [2] https://github.com/postgrespro/sqljson/tree/sqljson_doc -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 11/6/18 3:31 PM, Nikita Glukhov wrote: > On 29.10.2018 2:20, Tomas Vondra wrote:> > > ...> > Thank you for your review. > >> 1) There are no docs, which makes it pretty much non-committable for >> now. I know there is [1] and it was a good intro for the review, but we >> need something as a part of our sgml docs. > > I think that jsonpath part of documentation can be extracted from [2] and > added to the patch set. > Yes, please let's do that. The patch could not get into RFC without docs, so let's deal with it now. >> 2) 0001 says this: >> >> *typmod = -1; /* TODO implement FF1, ..., FF9 */ >> >> but I have no idea what FF1 or FF9 are. I guess it needs to be >> implemented, or explained better. > > FF1-FF9 are standard datetime template fields used for specifying of fractional > seconds. FF3/FF6 are analogues of our MS/US. I decided simply to implement > this feature (see patch 0001, I also can supply it in the separate patch). > Understood. Thanks for the explanation. > But FF7-FF9 are questionable since the maximal supported precision is only 6. > They are optional by the standard: > > 95) Specifications for Feature F555, “Enhanced seconds precision”: > d) Subclause 9.44, “Datetime templates”: > i) Without Feature F555, “Enhanced seconds precision”, > a <datetime template fraction> shall not be FF7, FF8 or FF9. > > So I decided to allow FF7-FF9 only on the output in to_char(). > Hmmmm, isn't that against the standard then? I mean, if our precision is only 6, that probably means we don't have the F555 feature, which means FF7-9 should not be available. It also seems like a bit surprising behavior that we only allow those fields in to_char() and not in other places. >> 4) There probably should be .gitignore rule for jsonpath_gram.h, just >> like for other generated header files. > > I see 3 rules in /src/backend/utils/adt/.gitignore: > > /jsonpath_gram.h > /jsonpath_gram.c > /jsonpath_scan.c > But there's a symlink in src/include/utils/jsonpath_gram.h and that's not in .gitignore. >> FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the >> codebase works with "strict" flags passed around, and it's easy to >> forget to negate the flag somewhere (at least that's my experience). > > Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is > saved in "laxMode" field. New "strict" flag passed to datetime functions > is unrelated to jsonpath. > OK. >> 7) I see src/include/utils/jsonpath_json.h adds support for plain json >> by undefining various jsonb macros and redirecting them to the json >> variants. I find that rather suspicious - imagine you're investigating >> something in code using those jsonb macros, and wondering why it ends up >> calling the json stuff. I'd be pretty confused ... > > I agree, this is rather simple but doubtful solution. That's why json support > was in a separate patch until the 18th version of the patches. > > But if we do not want to compile jsonpath.c twice with different definitions, > then we need some kind of run-time wrapping over json strings and jsonb > containers, which seems a bit harder to implement. > > To simplify debugging I can also suggest to explicitly preprocess jsonpath.c > into jsonpath_json.c using redefinitions from jsonpath_json.h before its > compilation. > Not sure what's the right solution. But I agree the current approach probably is not it. >> >> 9) It's generally a good idea to make the individual pieces committable >> separately, but that means e.g. the regression tests have to pass after >> each patch. At the moment that does not seem to be the case for 0002, >> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, >> not sure if that's related. > > This should definitely be a bug in json support, but I can't reproduce > it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY. Could you provide > a stack trace at least? > I'll try. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/6/18 4:48 PM, Tomas Vondra wrote: > On 11/6/18 3:31 PM, Nikita Glukhov wrote: >> On 29.10.2018 2:20, Tomas Vondra wrote:> >> >> ... >>> >>> 9) It's generally a good idea to make the individual pieces committable >>> separately, but that means e.g. the regression tests have to pass after >>> each patch. At the moment that does not seem to be the case for 0002, >>> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, >>> not sure if that's related. >> >> This should definitely be a bug in json support, but I can't reproduce >> it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY. Could you provide >> a stack trace at least? >> > I'll try. > Not sure why you can't reproduce the failures, it's perfectly reproducible for me. For the record, I'm doing this: ./configure --prefix=/home/user/pg-jsonpath --enable-debug --enable-cassert CFLAGS="-O0 -DRANDOMIZE_ALLOCATED_MEMORY" && make -s clean && make -s -j4 && make check After sticking Assert(false) to JsonEncodeJsonbValue (to the default case), I get a failure like this: select json '{}' @* 'lax $[0]'; ! WARNING: unknown jsonb value type: 20938064 ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost The backtrace is attached. My guess is JsonValueListGetList in jsonb_jsonpath_query only does shallow copy instead of copying the pieces into funcctx->multi_call_memory_ctx, so it gets corrupted on subsequent calls. I also attach valgrind report, but I suppose the reported issues are a consequence of the same bug. regard -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
BTW is the v19 really just a rebase of the preceding version? I'm asking because v18 was adding two types into pg_type.dat, namely jsonpath (6050) and _jsonpath (6051), while v19 only adds jsonpath (6050). I've noticed because v18 was missing a comma between the two entries, which I'd say is a bug - interestingly enough, both types were created correctly, so it seems to be resilient to such typos. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > BTW is the v19 really just a rebase of the preceding version? > I'm asking because v18 was adding two types into pg_type.dat, namely > jsonpath (6050) and _jsonpath (6051), while v19 only adds jsonpath > (6050). I haven't looked at this patch, but manual generation of array type entries is obsolete since 3dc820c43e427371b66d217f2bd5481fc9ef2e2d. There should still be a mention of the OID to use for the array type though ... regards, tom lane
On 11/8/18 6:12 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> BTW is the v19 really just a rebase of the preceding version? >> I'm asking because v18 was adding two types into pg_type.dat, namely >> jsonpath (6050) and _jsonpath (6051), while v19 only adds jsonpath >> (6050). > > I haven't looked at this patch, but manual generation of array type entries > is obsolete since 3dc820c43e427371b66d217f2bd5481fc9ef2e2d. There should > still be a mention of the OID to use for the array type though ... > Ah, I missed this improvement somehow. (Yes, the array type OID is still mentioned as part of the type definition.) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached 20th version the jsonpath patches. On 08.11.2018 4:52, Tomas Vondra wrote:
On 11/6/18 4:48 PM, Tomas Vondra wrote:On 11/6/18 3:31 PM, Nikita Glukhov wrote:On 29.10.2018 2:20, Tomas Vondra wrote:>9) It's generally a good idea to make the individual pieces committable separately, but that means e.g. the regression tests have to pass after each patch. At the moment that does not seem to be the case for 0002, see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, not sure if that's related.This should definitely be a bug in json support, but I can't reproduce it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY. Could you provide a stack trace at least?I'll try.Not sure why you can't reproduce the failures, it's perfectly reproducible for me. For the record, I'm doing this: ./configure --prefix=/home/user/pg-jsonpath --enable-debug --enable-cassert CFLAGS="-O0 -DRANDOMIZE_ALLOCATED_MEMORY" && make -s clean && make -s -j4 && make check After sticking Assert(false) to JsonEncodeJsonbValue (to the default case), I get a failure like this: select json '{}' @* 'lax $[0]'; ! WARNING: unknown jsonb value type: 20938064 ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost The backtrace is attached. My guess is JsonValueListGetList in jsonb_jsonpath_query only does shallow copy instead of copying the pieces into funcctx->multi_call_memory_ctx, so it gets corrupted on subsequent calls. I also attach valgrind report, but I suppose the reported issues are a consequence of the same bug.
Than you for your help, I finally have managed to reproduce this bug. The problem was really related to copying values but it was not located in the jsonb_jsonpath_query() where the whole value list should already be allocated in funcctx->multi_call_memory_ctx (executeJsonPath() is called in that context and also input json/jsonb is copied into it). Stack-allocated root JsonbValue was not copied to the heap due to wrong copy condition in recursiveExecuteNoUnwrap() (jpiIndexArray case): @@ -1754,1 +1754,1 @@ recursiveExecuteNoUnwrap(JsonPathExecContext *cxt, ... - res = recursiveExecuteNext(cxt, jsp, &elem, v, found, !binary); + res = recursiveExecuteNext(cxt, jsp, &elem, v, found, singleton || !binary); I have refactored a bit this place extracting explicit 'copy' flag. Also I found a similar problem in makePassingVars() where Numerics were not copied into 'multi_call_memory_ctx' from the input json object. I decided to use datumCopy() inside makePassingVars() instead of copying the whole input json in jsonb_jsonpath_query() using PG_GETARG_JSONB_P_COPY(). But I think that processing of the input json with variables should ideally be bone lazily, just don't know if this refactoring is worth it because 3rd argument of json[b]_jsonpath_xxx() functions is used only for testing of variables passing, SQL/JSON functions use the different mechanism.
On 06.11.2018 18:48, Tomas Vondra wrote:
But FF7-FF9 are questionable since the maximal supported precision is only 6. They are optional by the standard:95) Specifications for Feature F555, “Enhanced seconds precision”: d) Subclause 9.44, “Datetime templates”: i) Without Feature F555, “Enhanced seconds precision”, a <datetime template fraction> shall not be FF7, FF8 or FF9.So I decided to allow FF7-FF9 only on the output in to_char().Hmmmm, isn't that against the standard then? I mean, if our precision is only 6, that probably means we don't have the F555 feature, which means FF7-9 should not be available.It also seems like a bit surprising behavior that we only allow those fields in to_char() and not in other places.
Ok, now error is thrown if FF7-FF9 are found in the format string.
On 06.11.2018 18:48, Tomas Vondra wrote:
4) There probably should be .gitignore rule for jsonpath_gram.h, just like for other generated header files.I see 3 rules in /src/backend/utils/adt/.gitignore: /jsonpath_gram.h /jsonpath_gram.c /jsonpath_scan.cBut there's a symlink in src/include/utils/jsonpath_gram.h and that's not in .gitignore.
Added jsonpath_gram.h to /src/include/utils/.gitignore.
-- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, I've done another round of reviews on v20, assuming the patch is almost ready to commit, but unfortunately I ran into a bunch of issues that need to be resolved. None of this is a huge issue, but it's also above the threshold of what could be tweaked by a committer IMHO. (Which brings the question who plans to commit this. The patch does not have a committer in the CF app, but I see both Teodor and Alexander are listed as it's authors, so I'd expect it to be one of those. Or I might do that, of course.) 0001 ---- 1) to_timestamp() does this: do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false, &tm, &fsec, &fprec, NULL); Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the function did before (well, it called text_to_cstring, but that does VARDATA_ANY). The same thing applies to to_date(), BTW. I also find it a bit inconvenient that we expand the fmt like this in all do_to_timestamp() calls, although it's just to_datetime() that needs to do it this way. I realize we can't change to_datetime() because it's external API, but maybe we should make it construct a varlena and pass it to do_to_timestamp(). 2) We define both DCH_FF# and DCH_ff#, but we never ever use the lower-case version. Heck, it's not mentioned even in DCH_keywords, which does this: ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and "day". Also, the comment for "ff1" is wrong, it should be "f" I guess. And of course, there are regression tests for FF# variants, but not the lower-case ones. 3) Of course, these new formatting patterns should be added to SGML docs, for example to "Template Patterns for Date/Time Formatting" in func.sgml (and maybe to other places, I haven't looked for them very thoroughly). 4) The last block in DCH_from_char() does this while (*s == ' ') s++; but I suppose it should use isspace() just like the code immediately before it. 5) I might be missing something, but why is there the "D" at the end of the return flags from DCH_from_char? /* Return flags for DCH_from_char() */ #define DCH_DATED 0x01 #define DCH_TIMED 0x02 #define DCH_ZONED 0x04 0002 ---- 1) There are some unnecessary changes to to_datetime signature (date_txt renamed to vs. datetxt), which is mostly minor but unnecessary churn. 2) There are some extra changes to to_datetime (extra parameters, etc.). I wonder why those are not included in 0001, as part of the supporting datetime infrastructure. 3) I'm not sure whether the _safe functions are a good or bad idea, but at the very least the comments should be updated to explain what it does (as the API has changed, obviously). 4) the json.c changes are under-documented, e.g. there are no comments for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and whether it needs to be specified all the time, half of the functions at the end don't have comments (some of them are really simple, but then other simple functions do have comments). I don't know what the right balance here is (I'm certainly not asking for bogus comments just to have comments) and I agree that the existing code is not exactly abundant in comments. But perhaps having at least some would be nice. The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There are pretty much no comments whatsoever (at least at the function level, explaining what the functions do). It would be good to have a file-level comment too, explaining how jsonpath works, probably. 5) I see uniqueifyJsonbObject now does this: if (!object->val.object.uniquified) return; That seems somewhat strange, considering the function says it'll uniqueify objects, but then exits when noticing the passed object is not uniquified? 6) Why do we make make_result inline? (numeric.c) If this needs to be marked with "inline" then perhaps all the _internal functions should be marked like that too? I have my doubts about the need for this. 7) The other places add _safe to functions that don't throw errors directly and instead update edata. Why are set_var_from_str, div_var, mod_var excluded from this convention? 8) I wonder if the changes in numeric can have negative impact on performance. Has anyone done any performance tests of this part? 0003 ---- 1) json_gin.c should probably add comments briefly explaining what JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and JsonPathExtractionContext are for. 2) I find it a bit suspicious that there are no asserts in json_gin.c (well, there are 3 in the existing code, but nothing in the new code, and the patch essentially doubles the amount of code here). No comments for 0004 at this point. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi! On Sat, Nov 24, 2018 at 11:03 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I've done another round of reviews on v20, assuming the patch is almost > ready to commit, but unfortunately I ran into a bunch of issues that > need to be resolved. None of this is a huge issue, but it's also above > the threshold of what could be tweaked by a committer IMHO. > > (Which brings the question who plans to commit this. The patch does not > have a committer in the CF app, but I see both Teodor and Alexander are > listed as it's authors, so I'd expect it to be one of those. Or I might > do that, of course.) Thanks a lot for your efforts on this patch! I was planning to work on this patch during this commitfest (and probably commit). But I didn't manage to do this due to family circumstances (I got my baby born and hardly could allocate time to work in November). But I'm still planning to commit this patch. So, I'm going to set myself as committer in commitfest application. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
> On Wed, Nov 28, 2018 at 9:26 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Sat, Nov 24, 2018 at 11:03 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > I've done another round of reviews on v20, assuming the patch is almost > > ready to commit, but unfortunately I ran into a bunch of issues that > > need to be resolved. None of this is a huge issue, but it's also above > > the threshold of what could be tweaked by a committer IMHO. > > > > (Which brings the question who plans to commit this. The patch does not > > have a committer in the CF app, but I see both Teodor and Alexander are > > listed as it's authors, so I'd expect it to be one of those. Or I might > > do that, of course.) > > Thanks a lot for your efforts on this patch! I was planning to work > on this patch during this commitfest (and probably commit). But I > didn't manage to do this due to family circumstances (I got my baby > born and hardly could allocate time to work in November). But I'm > still planning to commit this patch. So, I'm going to set myself as > committer in commitfest application. Great news, thank you! Just for the information, cfbot doesn't like the patch on windows: src/backend/utils/adt/jsonpath_exec.c(1539): error C2146: syntax error : missing ')' before identifier 'id' [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1539): error C2081: 'int32_t' : name in formal parameter list illegal [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1539): error C2061: syntax error : identifier 'id' [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1539): error C2059: syntax error : ';' [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1539): error C2059: syntax error : ')' [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1540): error C2449: found '{' at file scope (missing function header?) [C:\projects\postgresql\postgres.vcxproj] src/backend/utils/adt/jsonpath_exec.c(1548): error C2059: syntax error : '}' [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1539): error C2146: syntax error : missing ')' before identifier 'id' (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1539): error C2081: 'int32_t' : name in formal parameter list illegal (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1539): error C2061: syntax error : identifier 'id' (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1539): error C2059: syntax error : ';' (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1539): error C2059: syntax error : ')' (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1540): error C2449: found '{' at file scope (missing function header?) (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] c:\projects\postgresql\src\backend\utils\adt\jsonpath_exec.c(1548): error C2059: syntax error : '}' (src/backend/utils/adt/jsonpath_json.c) [C:\projects\postgresql\postgres.vcxproj] 10 Warning(s) 14 Error(s)
Attached 21-st version of the patches rebased onto the current master. Added new patch 0004 with documentation written by Liudmila Mantrova, and optional patch 0003 that was separated from 0002.
On 24.11.2018 23:03, Tomas Vondra wrote:
Thank your for your review.Hi, I've done another round of reviews on v20, assuming the patch is almost ready to commit, but unfortunately I ran into a bunch of issues that need to be resolved. None of this is a huge issue, but it's also above the threshold of what could be tweaked by a committer IMHO.
(Which brings the question who plans to commit this. The patch does not have a committer in the CF app, but I see both Teodor and Alexander are listed as it's authors, so I'd expect it to be one of those. Or I might do that, of course.)
0001 ---- 1) to_timestamp() does this: do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false, &tm, &fsec, &fprec, NULL); Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the function did before (well, it called text_to_cstring, but that does VARDATA_ANY). The same thing applies to to_date(), BTW. I also find it a bit inconvenient that we expand the fmt like this in all do_to_timestamp() calls, although it's just to_datetime() that needs to do it this way. I realize we can't change to_datetime() because it's external API, but maybe we should make it construct a varlena and pass it to do_to_timestamp().
Of course, there must be VARDATA_ANY() instead of of VARDATA(). But I decided to construct varlena and pass it to do_to_timestamp() and to to_datetime().
2) We define both DCH_FF# and DCH_ff#, but we never ever use the lower-case version. Heck, it's not mentioned even in DCH_keywords, which does this: ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and "day".
Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. "Day", "day" are not mapped to DCH_DAY because they determine letter case in the output, but "ff1" and "FF#" output contains only digits.
Fixed.Also, the comment for "ff1" is wrong, it should be "f" I guess.
And of course, there are regression tests for FF# variants, but not the lower-case ones.
Tests were added
3) Of course, these new formatting patterns should be added to SGML docs, for example to "Template Patterns for Date/Time Formatting" in func.sgml (and maybe to other places, I haven't looked for them very thoroughly).
Fixed.
4) The last block in DCH_from_char() does this while (*s == ' ') s++; but I suppose it should use isspace() just like the code immediately before it.
Fixed.
5) I might be missing something, but why is there the "D" at the end of the return flags from DCH_from_char? /* Return flags for DCH_from_char() */ #define DCH_DATED 0x01 #define DCH_TIMED 0x02 #define DCH_ZONED 0x04
These terms "dated", "timed", and "zoned" are taken from the standard: 9.39.10) If <JSON datetime template> JDT is specified, ... : a) If JDT contains <datetime template year>, ... then JDT is dated. b) If JDT contains <datetime template 12-hour>, ... then JDT is timed. c) If JDT contains <datetime template time zone hour>, ... then JDT is zoned.
Extra changes to to_datetime() were moved into patch 0001.0002 ---- 1) There are some unnecessary changes to to_datetime signature (date_txt renamed to vs. datetxt), which is mostly minor but unnecessary churn. 2) There are some extra changes to to_datetime (extra parameters, etc.). I wonder why those are not included in 0001, as part of the supporting datetime infrastructure.
3) I'm not sure whether the _safe functions are a good or bad idea, but at the very least the comments should be updated to explain what it does (as the API has changed, obviously).
This functions were introduced for elimination of PG_TRY/PG_CATCH in jsonpath code. But this error handling approach has a lot of issues (see below), so I decided separate again them into optional patch 0004.
Some new comments were added to json.c.4) the json.c changes are under-documented, e.g. there are no comments for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and whether it needs to be specified all the time, half of the functions at the end don't have comments (some of them are really simple, but then other simple functions do have comments). I don't know what the right balance here is (I'm certainly not asking for bogus comments just to have comments) and I agree that the existing code is not exactly abundant in comments. But perhaps having at least some would be nice.
The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There are pretty much no comments whatsoever (at least at the function level, explaining what the functions do). It would be good to have a file-level comment too, explaining how jsonpath works, probably.
I hope to add detailed comments for jsonpath in the next version of the patches.
5) I see uniqueifyJsonbObject now does this: if (!object->val.object.uniquified) return; That seems somewhat strange, considering the function says it'll uniqueify objects, but then exits when noticing the passed object is not uniquified?
'uniquified' field was rename to 'uniquify'.
6) Why do we make make_result inline? (numeric.c) If this needs to be marked with "inline" then perhaps all the _internal functions should be marked like that too? I have my doubts about the need for this. 7) The other places add _safe to functions that don't throw errors directly and instead update edata. Why are set_var_from_str, div_var, mod_var excluded from this convention?
I added "_safe" postfix only to functions having the both variants: safe (error info is placed into *edata) and unsafe (errors are always thrown). One-line make_result() is called from many unsafe places, so I decided to leave it as it was. It can also be defined as a simple macro. New "_internal" functions are called from jsonpath_exec.c, so they can't be "static inline", but "extern inline" seems to be non-portable.
8) I wonder if the changes in numeric can have negative impact on performance. Has anyone done any performance tests of this part?
I've tried to measure this impact by evaluation of expression with dozens of '+' or even by adding a loop into numeric_add(), but failed to get any measurable difference.
A lot of comments were added. Also major refactoring of jsonb_gin.c was done.0003 ---- 1) jsonb_gin.c should probably add comments briefly explaining what JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and JsonPathExtractionContext are for.
I have managed to find only 4 similar places suitable for asserts.2) I find it a bit suspicious that there are no asserts in json_gin.c (well, there are 3 in the existing code, but nothing in the new code, and the patch essentially doubles the amount of code here).
No comments for 0004 at this point. regards
Attachment
On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > 2) We define both DCH_FF# and DCH_ff#, but we never ever use the > lower-case version. Heck, it's not mentioned even in DCH_keywords, which > does this: > > ... > {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > ... > {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > ... > > Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and > "day". > > Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. > > "Day", "day" are not mapped to DCH_DAY because they determine letter case in the > output, but "ff1" and "FF#" output contains only digits. Right, DCH_poz is also offset in DCH_keywords array. So, if array has an entry for "ff1" then enum should have a DCH_ff1 member in the same position. I got some other questions regarding this patchset. 1) Why do we parse FF7-FF9 if we're not supporting them anyway? Without defining FF7-FF9 we can also don't throw errors for them everywhere. That would save us some code lines. 2) + DCH_to_char_fsec("%01d", in->fsec / INT64CONST(100000)); Why do we use INT64CONST() here and in the similar places assuming that fsec is only uint32? 3) wrapItem() is unused in 0002-Jsonpath-engine-and-operators-v21.patch, but used in 0006-Jsonpath-syntax-extensions-v21.patch. Please, move it to 0006-Jsonpath-syntax-extensions-v21.patch? 4) I also got these couple of warning during compilation. jsonpath_exec.c:1485:1: warning: unused function 'recursiveExecuteNested' [-Wunused-function] recursiveExecuteNested(JsonPathExecContext *cxt, JsonPathItem *jsp, ^ 1 warning generated. jsonpath_scan.l:444:6: warning: implicit declaration of function 'jsonpath_yyparse' is invalid in C99 [-Wimplicit-function-declaration] if (jsonpath_yyparse((void*)&parseresult) != 0) ^ 1 warning generated. Perhaps recursiveExecuteNested() is unsed in this patchset. It's probably used by some subsequent SQL/JSON-related patchset. So, please, move it there. 5) I think each usage of PG_TRY()/PG_CATCH() deserves comment describing why it's safe to use without subtransaction. In fact we should just state that no function called inside performs data modification. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 1/5/19 1:11 AM, Alexander Korotkov wrote: > On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the >> lower-case version. Heck, it's not mentioned even in DCH_keywords, which >> does this: >> >> ... >> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> >> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and >> "day". >> >> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. >> >> "Day", "day" are not mapped to DCH_DAY because they determine letter case in the >> output, but "ff1" and "FF#" output contains only digits. > > Right, DCH_poz is also offset in DCH_keywords array. So, if array has > an entry for "ff1" then enum should have a DCH_ff1 member in the same > position. > I guess my question is why we define DCH_ff# at all, when it's not mapped in DCH_keywords? ISTM we could simply leave them out of all the arrays, no? Of course, this is not specific to this patch, as it applies to pre-existing items (like DCH_mi). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 5, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 1/5/19 1:11 AM, Alexander Korotkov wrote: > > On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > >> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the > >> lower-case version. Heck, it's not mentioned even in DCH_keywords, which > >> does this: > >> > >> ... > >> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > >> ... > >> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ > >> ... > >> > >> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and > >> "day". > >> > >> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. > >> > >> "Day", "day" are not mapped to DCH_DAY because they determine letter case in the > >> output, but "ff1" and "FF#" output contains only digits. > > > > Right, DCH_poz is also offset in DCH_keywords array. So, if array has > > an entry for "ff1" then enum should have a DCH_ff1 member in the same > > position. > > > > I guess my question is why we define DCH_ff# at all, when it's not > mapped in DCH_keywords? ISTM we could simply leave them out of all the > arrays, no? I think we still need separate array entries for "FF1" and "ff1". At least, as long as we don't allow "fF1" and "Ff1". In order to get rid of DCH_ff#, I think we should decouple DCH_keywords from array indexes. > Of course, this is not specific to this patch, as it applies > to pre-existing items (like DCH_mi). Right, that should be a subject of separate patch. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached 23rd version of the patches. On 05.01.2019 3:11, Alexander Korotkov wrote: > On Tue, Dec 4, 2018 at 2:23 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the >> lower-case version. Heck, it's not mentioned even in DCH_keywords, which >> does this: >> >> ... >> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ >> ... >> >> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and >> "day". >> >> Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI. >> >> "Day", "day" are not mapped to DCH_DAY because they determine letter case in the >> output, but "ff1" and "FF#" output contains only digits. > Right, DCH_poz is also offset in DCH_keywords array. So, if array has > an entry for "ff1" then enum should have a DCH_ff1 member in the same > position. > > I got some other questions regarding this patchset. > > 1) Why do we parse FF7-FF9 if we're not supporting them anyway? > Without defining FF7-FF9 we can also don't throw errors for them > everywhere. That would save us some code lines. FF7-FF9 were removed. > 2) + DCH_to_char_fsec("%01d", in->fsec / INT64CONST(100000)); > Why do we use INT64CONST() here and in the similar places assuming > that fsec is only uint32? Fixed. > 3) wrapItem() is unused in > 0002-Jsonpath-engine-and-operators-v21.patch, but used in > 0006-Jsonpath-syntax-extensions-v21.patch. Please, move it to > 0006-Jsonpath-syntax-extensions-v21.patch? wraptItem() was moved into patch #6. > 4) I also got these couple of warning during compilation. > > jsonpath_exec.c:1485:1: warning: unused function > 'recursiveExecuteNested' [-Wunused-function] > recursiveExecuteNested(JsonPathExecContext *cxt, JsonPathItem *jsp, > ^ > 1 warning generated. > jsonpath_scan.l:444:6: warning: implicit declaration of function > 'jsonpath_yyparse' is invalid in C99 [-Wimplicit-function-declaration] > if (jsonpath_yyparse((void*)&parseresult) != 0) > ^ > 1 warning generated. > > Perhaps recursiveExecuteNested() is unsed in this patchset. It's > probably used by some subsequent SQL/JSON-related patchset. So, > please, move it there. recursiveExecuteNested() was removed from this patch set. Prototype for jsonpath_yyparse() should be in the Bison-generated file src/include/adt/jsonpath_gram.h. > 5) I think each usage of PG_TRY()/PG_CATCH() deserves comment > describing why it's safe to use without subtransaction. In fact we > should just state that no function called inside performs data > modification. Comments were added. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Nikita, thank you! I noticed another thing. 3-arguments version of functions jsonpath_exists(), jsonpath_predicate(), jsonpath_query(), jsonpath_query_wrapped() are: 1) Not documented 2) Not used in operator definitions 3) Not used in subsequent patches So, it looks like we can just remove then. But I think we're very likely can commit jsonpath patch to PostgreSQL 12, but I'm not sure about other patches. So, having those functions exposed to user can be extremely useful until we have separate nodes for SQL/JSON. So, I vote to document these functions. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 1/18/19 6:58 PM, Alexander Korotkov wrote: > Nikita, thank you! > > I noticed another thing. 3-arguments version of functions > jsonpath_exists(), jsonpath_predicate(), jsonpath_query(), > jsonpath_query_wrapped() are: > > 1) Not documented > 2) Not used in operator definitions > 3) Not used in subsequent patches > > So, it looks like we can just remove then. But I think we're very > likely can commit jsonpath patch to PostgreSQL 12, but I'm not sure > about other patches. So, having those functions exposed to user can > be extremely useful until we have separate nodes for SQL/JSON. So, I > vote to document these functions. > That seems a bit strange. If those functions are meant to be used by other patches (which ones?) then why should not we make them part of those future patches? But it seems more like those functions are actually meant to be used by users in the first place, in cases when we need to provide a third parameter (which operators can't do). In that case yes - we should have them documented properly, but also tested. Which is not the case currently, because the regression tests only use the operators. Two more comments: 1) I'm wondering why we even need separate functions for the different numbers of arguments at the C level, as both simply call to the same function anyway with a PG_NARGS() condition in it. Can't we ditch those wrappers and reference that target function directly? 2) I once again ran into the jsonb->json business, which essentially means that when you do this: select json '{"a": { "b" : 10}}' @? '$.a.b'; it ends up calling jsonb_jsonpath_exists(), which then does this: Jsonb *jb = PG_GETARG_JSONB_P(0); I and am not sure why/how it seems to work, but I find it confusing because the stack still looks like this: #0 jsonb_jsonpath_exists (fcinfo=0x162f800) at jsonpath_exec.c:2857 #1 0x000000000096d721 in json_jsonpath_exists2 (fcinfo=0x162f800) at jsonpath_exec.c:2882 #2 0x00000000006c790a in ExecInterpExpr (state=0x162f300, econtext=0x162ee18, isnull=0x7ffcea4c3857) at execExprInterp.c:648 ... and of course, the fcinfo->flinfo still points to the json-specific function, which say the first parameter type is 'json'. (gdb) print *fcinfo->flinfo $23 = {fn_addr = 0x96d709 <json_jsonpath_exists2>, fn_oid = 6043, fn_nargs = 2, fn_strict = true, fn_retset = false, fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x162e990, fn_expr = 0x15db378} test=# select proname, prosrc, proargtypes from pg_proc where oid = 6043; proname | prosrc | proargtypes -----------------+-----------------------+------------- jsonpath_exists | json_jsonpath_exists2 | 114 6050 (1 row) test=# select typname from pg_type where oid = 114; typname --------- json (1 row) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 19, 2019 at 1:32 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 1/18/19 6:58 PM, Alexander Korotkov wrote: > > So, it looks like we can just remove then. But I think we're very > > likely can commit jsonpath patch to PostgreSQL 12, but I'm not sure > > about other patches. So, having those functions exposed to user can > > be extremely useful until we have separate nodes for SQL/JSON. So, I > > vote to document these functions. > > That seems a bit strange. If those functions are meant to be used by > other patches (which ones?) then why should not we make them part of > those future patches? No, these functions aren't used by other patches (it was my original wrong idea). Other patches provides SQL expressions making these functions not that necessary. But I think we should keep these functions in jsonpath patch. > But it seems more like those functions are actually meant to be used by > users in the first place, in cases when we need to provide a third > parameter (which operators can't do). In that case yes - we should have > them documented properly, but also tested. Which is not the case > currently, because the regression tests only use the operators. +1 Thank you for noticing. Tests should be provided as well. > Two more comments: > > 1) I'm wondering why we even need separate functions for the different > numbers of arguments at the C level, as both simply call to the same > function anyway with a PG_NARGS() condition in it. Can't we ditch those > wrappers and reference that target function directly? That was surprising for me too. Technically, it's OK to do this. And we do this for extensions. But for in-core functions we have following sanity check. -- Considering only built-in procs (prolang = 12), look for multiple uses -- of the same internal function (ie, matching prosrc fields). It's OK to -- have several entries with different pronames for the same internal function, -- but conflicts in the number of arguments and other critical items should -- be complained of. (We don't check data types here; see next query.) -- Note: ignore aggregate functions here, since they all point to the same -- dummy built-in function. SELECT p1.oid, p1.proname, p2.oid, p2.proname FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid < p2.oid AND p1.prosrc = p2.prosrc AND p1.prolang = 12 AND p2.prolang = 12 AND (p1.prokind != 'a' OR p2.prokind != 'a') AND (p1.prolang != p2.prolang OR p1.prokind != p2.prokind OR p1.prosecdef != p2.prosecdef OR p1.proleakproof != p2.proleakproof OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); And we already have some code written especially to make this check happy. /* This is separate to keep the opr_sanity regression test from complaining */ Datum regexp_split_to_table_no_flags(PG_FUNCTION_ARGS) { return regexp_split_to_table(fcinfo); } Personally I'm not fan of this approach, and I would rather relax this sanity check. But that doesn't seem to be a subject of this patch. For jsonpath, it's OK to just keep this tradition. > 2) I once again ran into the jsonb->json business, which essentially > means that when you do this: > > select json '{"a": { "b" : 10}}' @? '$.a.b'; > > it ends up calling jsonb_jsonpath_exists(), which then does this: > > Jsonb *jb = PG_GETARG_JSONB_P(0); > > I and am not sure why/how it seems to work, but I find it confusing > because the stack still looks like this: > > #0 jsonb_jsonpath_exists (fcinfo=0x162f800) at jsonpath_exec.c:2857 > #1 0x000000000096d721 in json_jsonpath_exists2 (fcinfo=0x162f800) at > jsonpath_exec.c:2882 > #2 0x00000000006c790a in ExecInterpExpr (state=0x162f300, > econtext=0x162ee18, isnull=0x7ffcea4c3857) at execExprInterp.c:648 > ... > > and of course, the fcinfo->flinfo still points to the json-specific > function, which say the first parameter type is 'json'. > > (gdb) print *fcinfo->flinfo > $23 = {fn_addr = 0x96d709 <json_jsonpath_exists2>, fn_oid = 6043, > fn_nargs = 2, fn_strict = true, fn_retset = false, fn_stats = 2 '\002', > fn_extra = 0x0, fn_mcxt = 0x162e990, fn_expr = 0x15db378} > > > test=# select proname, prosrc, proargtypes from pg_proc where oid = 6043; > proname | prosrc | proargtypes > -----------------+-----------------------+------------- > jsonpath_exists | json_jsonpath_exists2 | 114 6050 > (1 row) > > test=# select typname from pg_type where oid = 114; > typname > --------- > json > (1 row) It's tricky. There is jsonpath_json.c, which includes jsonpath_exec.c with set of macro definitions making that code work with json type. I'm going to refactor that. But general approach to include same functions more than once seems OK for me. Some more notes: 1) It seems that @* and @# are not going to be supported by any indexes. I think we should remove these operators and let users use functions instead. 2) I propose to rename @~ operator to @@. We already use @@ as "satisfies" in multiple places, and I thinks this case fits too. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sat, Jan 19, 2019 at 2:54 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > 1) It seems that @* and @# are not going to be supported by any > indexes. I think we should remove these operators and let users use > functions instead. > 2) I propose to rename @~ operator to @@. We already use @@ as > "satisfies" in multiple places, and I thinks this case fits too. 3) How do we calculate the "id" property returned by keyvalue() function? It's not documented. Even presence of "id" columns isn't documented. Standard stands that it's implementation-depended indetifier of object holding key-value pair. The way of its calculation is also not clear from the code. Why do we need constant of 10000000000? id = jb->type != jbvBinary ? 0 : (int64)((char *) jb->val.binary.data - (char *) cxt->baseObject.jbc); id += (int64) cxt->baseObject.id * INT64CONST(10000000000); ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, Jan 20, 2019 at 2:45 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > 3) How do we calculate the "id" property returned by keyvalue() > function? It's not documented. Even presence of "id" columns isn't > documented. Standard stands that it's implementation-depended > indetifier of object holding key-value pair. The way of its > calculation is also not clear from the code. Why do we need constant > of 10000000000? > > id = jb->type != jbvBinary ? 0 : > (int64)((char *) jb->val.binary.data - > (char *) cxt->baseObject.jbc); > id += (int64) cxt->baseObject.id * INT64CONST(10000000000); I've revising patchset bringing it to commitable shape. The intermediate result is attached. 0001-Preliminary-datetime-infrastructure-v24.patch * Commit message & minor cleanup 0002-Jsonpath-engine-and-docs-v24.patch * Support of json is removed. Current implementation is tricky and cumbersome. We need to design a suitable abstraction layer for that. It should be done by separate patch applying not only to jsonpath, but also to other jsonb functions lacking of json analogues. * jsonpath_exists(jsonb, jsonpath[, jsonb]), jsonpath_predicate(jsonb, jsonpath[, jsonb]), jsonpath_query(jsonb, jsonpath[, jsonb]), jsonpath_wrapped(jsonb, jsonpath[, jsonb]) are documented. * @* and @# operators are removed, @~ operator is renamed to @@. * Commit message & minor cleanup. I'll continue revising this patchset. Nikita, could you please write tests for 3-argument versions of functions? Also, please answer the question regarding "id" property. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > I'll continue revising this patchset. Nikita, could you please write > tests for 3-argument versions of functions? Also, please answer the > question regarding "id" property. I've some more notes regarding function set provided in jsonpath patch: 1) Function names don't follow the convention we have. All our functions dealing with jsonb have "jsonb_" prefix. Exclusions have "jsonb" in other part of name, for example, "to_jsonb(anyelement)". We could name functions at SQL level in the same way they are named in C. So, they would be jsonb_jsonpath_exists() etc. But it's probably too long. What about jsonb_path_exists() etc? 2) jsonpath_query_wrapped() name seems counter intuitive for me. What about jsonpath_query_array()? 3) jsonpath_query_wrapped() behavior looks tricky to me. It returns NULL for no results. When result item is one, it is returned "as is". When there are two or more result items, they are wrapped into array. This representation of result seems extremely inconvenient for me. It becomes hard to solve even trivial tasks with that: for instance, iterate all items found. Without extra assumptions on what is going to be returned it's also impossible for caller to distinguish single array item found from multiple items found wrapped into array. And that seems very bad. I know this behavior is inspired by SQL/JSON standard. But since these functions are anyway our extension, we can implement them as we like. So, I propose to make this function always return array of items regardless on count of those items (even for 1 or 0 items). 4) If we change behavior of jsonpath_query_wrapped() as above, we can also implement jsonpath_query_one() function, which would return first result item or NULL for no items. Any thoughts? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 20.01.2019 2:45, Alexander Korotkov wrote:
3) How do we calculate the "id" property returned by keyvalue() function? It's not documented. Even presence of "id" columns isn't documented. Standard stands that it's implementation-depended indetifier of object holding key-value pair. The way of its calculation is also not clear from the code. Why do we need constant of 10000000000? id = jb->type != jbvBinary ? 0 : (int64)((char *) jb->val.binary.data - (char *) cxt->baseObject.jbc); id += (int64) cxt->baseObject.id * INT64CONST(10000000000);
I decided to construct object id from the two parts: base object id and its binary offset in its base object's jsonb: object_id = 10000000000 * base_object_id + object_offset_in_base_object 10000000000 (10^10) -- is a first round decimal number greater than 2^32 (maximal offset in jsonb). Decimal multiplier is used here to improve the readability of identifiers. Base object is usually a root object of the path: context item '$' or path variable '$var', literals can't produce objects for now. But if the path contains generated objects (.keyvalue() itself, for example), then they become base object for the subsequent .keyvalue(). See example: '$.a.b.keyvalue().value.keyvalue()' :- base for the first .keyvalue() is '$'- base for the second .keyvalue() is '$.a.b.keyvalue()' Id of '$' is 0. Id of '$var' is its ordinal (positive) number in the list of variables. Ids for generated objects are assigned using global counter 'JsonPathExecContext.generatedObjectId' starting from 'number_of_vars + 1'. Corresponding comments will be added in the upcoming version of the patches. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Jan 21, 2019 at 1:40 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > I'll continue revising this patchset. Nikita, could you please write > > tests for 3-argument versions of functions? Also, please answer the > > question regarding "id" property. > > I've some more notes regarding function set provided in jsonpath patch: > 1) Function names don't follow the convention we have. All our > functions dealing with jsonb have "jsonb_" prefix. Exclusions have > "jsonb" in other part of name, for example, "to_jsonb(anyelement)". > We could name functions at SQL level in the same way they are named in > C. So, they would be jsonb_jsonpath_exists() etc. But it's probably > too long. What about jsonb_path_exists() etc? jsonpath_exists is similar to xpath_exists. Actually, we could use jsonb_exists(jsonb, jsonpath, json), but then we should specify the type of the second argument. > 2) jsonpath_query_wrapped() name seems counter intuitive for me. What > about jsonpath_query_array()? The question is should we try to provide a functional interface for all options of JSON_QUERY clause ? The same is for other SQL/JSON clauses. Currently, patch contains very limited subset of JSON_QUERY functionality, mostly for jsonpath testing. > 3) jsonpath_query_wrapped() behavior looks tricky to me. It returns > NULL for no results. When result item is one, it is returned "as is". > When there are two or more result items, they are wrapped into array. > This representation of result seems extremely inconvenient for me. It > becomes hard to solve even trivial tasks with that: for instance, > iterate all items found. Without extra assumptions on what is going > to be returned it's also impossible for caller to distinguish single > array item found from multiple items found wrapped into array. And > that seems very bad. I know this behavior is inspired by SQL/JSON > standard. But since these functions are anyway our extension, we can > implement them as we like. So, I propose to make this function always > return array of items regardless on count of those items (even for 1 > or 0 items). Fair enough, but if we agree, that we provide an exact functionality of SQL clauses, then better to follow the standard to avoid problems. > 4) If we change behavior of jsonpath_query_wrapped() as above, we can > also implement jsonpath_query_one() function, which would return first > result item or NULL for no items. > Any thoughts? I think, that we should postpone this functional interface, which could be added later. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Jan 21, 2019 at 6:05 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote: > On Mon, Jan 21, 2019 at 1:40 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > > > On Sun, Jan 20, 2019 at 6:30 AM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > I'll continue revising this patchset. Nikita, could you please write > > > tests for 3-argument versions of functions? Also, please answer the > > > question regarding "id" property. > > > > I've some more notes regarding function set provided in jsonpath patch: > > 1) Function names don't follow the convention we have. All our > > functions dealing with jsonb have "jsonb_" prefix. Exclusions have > > "jsonb" in other part of name, for example, "to_jsonb(anyelement)". > > We could name functions at SQL level in the same way they are named in > > C. So, they would be jsonb_jsonpath_exists() etc. But it's probably > > too long. What about jsonb_path_exists() etc? > > jsonpath_exists is similar to xpath_exists. That's true. The question is whether it's more important to follow json[b] naming convention or xml/xpath naming convention? I guess json[b] naming convention is more important in our case. > Actually, we could use jsonb_exists(jsonb, jsonpath, json), but then > we should specify the type of the second argument. Yes, but AFAICS the key point of json[b]_ prefixes is to evade function overloading. So, I'm -1 for use jsonb_ prefix and have function overload because of that. > > 2) jsonpath_query_wrapped() name seems counter intuitive for me. What > > about jsonpath_query_array()? > > The question is should we try to provide a functional interface for > all options of > JSON_QUERY clause ? The same is for other SQL/JSON clauses. > Currently, patch contains very limited subset of JSON_QUERY > functionality, mostly for jsonpath testing. Actually, my point is following. We have jsonpath patch close to the committable shape. And we have patch for SQL/JSON clauses including JSON_QUERY, which is huge, complex and didn't receive any serious review yet. So, we'll did our best on that patch during this release cycle, but I can't guarantee it will get in PostgreSQL 12. Thus, my idea is to make jsonpath patch self contained by providing brief and convenient procedural interface. This procedural interface is anyway not a part of standard. It *might* be inspired by standard clauses, but might be not. I think we should try to make this procedural interface as good and convenient by itself. It's our extension, and it wouldn't make us more or less standard conforming. > > 3) jsonpath_query_wrapped() behavior looks tricky to me. It returns > > NULL for no results. When result item is one, it is returned "as is". > > When there are two or more result items, they are wrapped into array. > > This representation of result seems extremely inconvenient for me. It > > becomes hard to solve even trivial tasks with that: for instance, > > iterate all items found. Without extra assumptions on what is going > > to be returned it's also impossible for caller to distinguish single > > array item found from multiple items found wrapped into array. And > > that seems very bad. I know this behavior is inspired by SQL/JSON > > standard. But since these functions are anyway our extension, we can > > implement them as we like. So, I propose to make this function always > > return array of items regardless on count of those items (even for 1 > > or 0 items). > > Fair enough, but if we agree, that we provide an exact functionality of > SQL clauses, then better to follow the standard to avoid problems. No, I see this as our extension. And I don't see problems in being different from standard clauses, because it's different anyway. For me, in this case it's better to evade problems of users. And current behavior of this function seems like just single big pain :) > > 4) If we change behavior of jsonpath_query_wrapped() as above, we can > > also implement jsonpath_query_one() function, which would return first > > result item or NULL for no items. > > Any thoughts? > > I think, that we should postpone this functional interface, which could be > added later. The reason we typically postpone things is that they are hard to bring to committable shape. jsonpath_query_one() doesn't cost us any real development. So, I don't see point in postponing that if consider that as good part of procedural jsonpath interface. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
The next revision is attached. Nikita made code and documentation improvements, renamed functions from "jsonpath_" prefix to "jsonb_path_" prefix. He also renamed jsonpath_predicate() to jsonb_path_match() (that looks better for me too). I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and changed that behavior to always wrap result into array. Also, I've introduced new function jsonb_query_first(), which returns first item from result. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > The next revision is attached. > > Nikita made code and documentation improvements, renamed functions > from "jsonpath_" prefix to "jsonb_path_" prefix. He also renamed > jsonpath_predicate() to jsonb_path_match() (that looks better for me > too). > > I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and > changed that behavior to always wrap result into array. agree with new names so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option Also, I've > introduced new function jsonb_query_first(), which returns first item > from result. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached 26th version of the patches: * Documentation: - Fixed some mistakes - Removed mention of syntax extensions not present in this patch - Documented '.datetime(format, default_tz)'* Removed accidental syntax extension allowing to write '@.key' without '@' -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 26th version of the patches: > > * Documentation: > - Fixed some mistakes > - Removed mention of syntax extensions not present in this patch > - Documented '.datetime(format, default_tz)' > * Removed accidental syntax extension allowing to write '@.key' without '@' Thank you! I've made few more changes to patchset: * Change back interface of JsonbExtractScalar(). I decide it would be better to keep it. * Run pgindent * Improve documentation of .keyvalue() function. Finally, I'm going to commit this if no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Jan 22, 2019 at 12:13 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote: > > On Tue, Jan 22, 2019 at 8:21 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > > > The next revision is attached. > > > > Nikita made code and documentation improvements, renamed functions > > from "jsonpath_" prefix to "jsonb_path_" prefix. He also renamed > > jsonpath_predicate() to jsonb_path_match() (that looks better for me > > too). > > > > I've further renamed jsonb_query_wrapped() to jsonb_query_array(), and > > changed that behavior to always wrap result into array. > > agree with new names > > so it mimic the behaviour of JSON_QUERY with UNCONDITIONAL WRAPPER option Good, thank you for feedback! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached 28th version of the patches.
On 23.01.2019 8:01, Alexander Korotkov wrote:
On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:Attached 26th version of the patches: * Documentation: - Fixed some mistakes - Removed mention of syntax extensions not present in this patch - Documented '.datetime(format, default_tz)'* Removed accidental syntax extension allowing to write '@.key' without '@'Thank you! I've made few more changes to patchset:* Change back interface of JsonbExtractScalar(). I decide it would be better to keep it.* Run pgindent
I have fixed indentation by adding new typedefs to src/tools/pgindent/typedefs.list. Also I have renamed LikeRegexContext => JsonLikeRegexContext.
* Improve documentation of .keyvalue() function. Finally, I'm going to commit this if no objections.
Thank you! -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Jan 23, 2019 at 3:24 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 28th version of the patches. > > On 23.01.2019 8:01, Alexander Korotkov wrote: > > On Wed, Jan 23, 2019 at 3:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > Attached 26th version of the patches: > > * Documentation: > - Fixed some mistakes > - Removed mention of syntax extensions not present in this patch > - Documented '.datetime(format, default_tz)' > * Removed accidental syntax extension allowing to write '@.key' without '@' > > Thank you! > > I've made few more changes to patchset: > * Change back interface of JsonbExtractScalar(). I decide it would > be better to keep it. > * Run pgindent > > I have fixed indentation by adding new typedefs to > src/tools/pgindent/typedefs.list. Good catch, thank you! > Also I have renamed LikeRegexContext => JsonLikeRegexContext. > * Improve documentation of .keyvalue() function. Ok! I've also fix bug with possible reference to already freed value of jsonpath in jsonb_path_execute(), pointed by you. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Finally, I'm going to commit this if no objections. BTW, I decided to postpone commit for few days. Nikita and me are still working on better error messages. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Finally, I'm going to commit this if no objections. > > BTW, I decided to postpone commit for few days. Nikita and me are > still working on better error messages. Updated patchset is attached. This patchset includes: * Improved error handling by Nikita, revised by me, * Code beautification. So, I'm going to commit this again. This time seriously :) ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sun, Jan 27, 2019 at 1:50 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > > > On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > Finally, I'm going to commit this if no objections. > > > > BTW, I decided to postpone commit for few days. Nikita and me are > > still working on better error messages. > > Updated patchset is attached. This patchset includes: > > * Improved error handling by Nikita, revised by me, > * Code beautification. > > So, I'm going to commit this again. This time seriously :) I'm really sorry for confusing people, but I've one more revision. This is my first time attempting to commit such a large patch. Major changes are following: * We find it ridiculous to save ErrorData for possible latter throw. Now, we either throw an error immediately or return jperError. That also allows to get rid of unwanted changes in elog.c/elog.h. * I decided to change behavior of jsonb_path_match() to throw as less errors as possible. The reason is that it's used to implement potentially (patch is pending) indexable operator. Index scan is not always capable to throw as many errors and sequential scan. So, it's better to not introduce extra possible index scan and sequential scan results divergence. So, this is version I'm going to commit unless Nikita has corrections or anybody else objects. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 1/28/19 1:22 AM, Alexander Korotkov wrote: > On Sun, Jan 27, 2019 at 1:50 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >> On Sat, Jan 26, 2019 at 4:27 AM Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >>> >>> On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov >>> <a.korotkov@postgrespro.ru> wrote: >>>> Finally, I'm going to commit this if no objections. >>> >>> BTW, I decided to postpone commit for few days. Nikita and me are >>> still working on better error messages. >> >> Updated patchset is attached. This patchset includes: >> >> * Improved error handling by Nikita, revised by me, >> * Code beautification. >> >> So, I'm going to commit this again. This time seriously :) > > I'm really sorry for confusing people, but I've one more revision. > This is my first time attempting to commit such a large patch. > I'm not sure what you're apologizing for, it simply shows how careful you are when polishing such a large patch. > Major changes are following: > * We find it ridiculous to save ErrorData for possible latter throw. > Now, we either throw an error immediately or return jperError. That > also allows to get rid of unwanted changes in elog.c/elog.h. OK. > * I decided to change behavior of jsonb_path_match() to throw as less > errors as possible. The reason is that it's used to implement > potentially (patch is pending) indexable operator. Index scan is not > always capable to throw as many errors and sequential scan. So, it's > better to not introduce extra possible index scan and sequential scan > results divergence. > Hmmm, can you elaborate a bit more? Which errors were thrown before and are not thrown with the current patch version? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 28.01.2019 16:50, Tomas Vondra wrote:
On 1/28/19 1:22 AM, Alexander Korotkov wrote:* I decided to change behavior of jsonb_path_match() to throw as less errors as possible. The reason is that it's used to implement potentially (patch is pending) indexable operator. Index scan is not always capable to throw as many errors and sequential scan. So, it's better to not introduce extra possible index scan and sequential scan results divergence.Hmmm, can you elaborate a bit more? Which errors were thrown before and are not thrown with the current patch version?
In the previous version of the patch jsonb_path_match() threw error when jsonpath did not return a singleton value, but in the last version in such cases NULL is returned. This problem arises because we cannot guarantee at compile time that jsonpath expression used in jsonb_path_match() is a predicate. Predicates by standard can return only True, False, and Unknown (errors occurred during execution of their operands are transformed into Unknown values), so predicates cannot throw errors, and there are no problems with errors. GIN does not attempt to search non-predicate expressions, so there may be no problem even we throw "not a singleton" error. Here I want to remind that ability to use predicates in the root of jsonpath expression is an our extension to standard that was created specially for the operator @@. By standard predicates are allowed only in filters. Without this extension we are still able to rewrite @@ using @?: jsonb @@ 'predicate' is equivalent to jsonb @? '$ ? (predicate)' but such @? expression is a bit slower to execute and a bit verbose to write. If we introduced special type 'jsonpath_predicate', then we could solve the problem by checking the type of jsonpath expression at compile-time. Another problem with error handling is that jsonb_path_query*() functions always throw SQL/JSON errors and there is no easy and effective way to emulate NULL ON ERROR behavior, which is used by default in SQL/JSON functions. So I think it's worth trying to add some kind of flag 'throwErrors' to jsonb_path_query*() functions. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Jan 29, 2019 at 2:17 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > In the previous version of the patch jsonb_path_match() threw error when > jsonpath did not return a singleton value, but in the last version in such cases > NULL is returned. This problem arises because we cannot guarantee at compile > time that jsonpath expression used in jsonb_path_match() is a predicate. > Predicates by standard can return only True, False, and Unknown (errors occurred > during execution of their operands are transformed into Unknown values), so > predicates cannot throw errors, and there are no problems with errors. Attached patchset provides description of errors suppressed. It also clarifies how jsonb_path_match() works. > GIN does not attempt to search non-predicate expressions, so there may be no > problem even we throw "not a singleton" error. Yes, I don't insist on that. If majority of us wants to bring "not a singleton" error back, I don't object to it. > Here I want to remind that ability to use predicates in the root of jsonpath > expression is an our extension to standard that was created specially for the > operator @@. By standard predicates are allowed only in filters. Without this > extension we are still able to rewrite @@ using @?: > jsonb @@ 'predicate' is equivalent to > jsonb @? '$ ? (predicate)' > but such @? expression is a bit slower to execute and a bit verbose to write. > > If we introduced special type 'jsonpath_predicate', then we could solve the > problem by checking the type of jsonpath expression at compile-time. For me it seems that separate datatype for this kind of problem is overkill. > Another problem with error handling is that jsonb_path_query*() functions > always throw SQL/JSON errors and there is no easy and effective way to emulate > NULL ON ERROR behavior, which is used by default in SQL/JSON functions. So I > think it's worth trying to add some kind of flag 'throwErrors' to > jsonb_path_query*() functions. Good idea, but let's commit basic jsonpath implementation first. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > + /* > + * It is safe to use here PG_TRY/PG_CATCH without subtransaction because > + * no function called inside performs data modification. > + */ > + PG_TRY(); > + { > + res = DirectFunctionCall2(func, ldatum, rdatum); > + } > + PG_CATCH(); > + { > + int errcode = geterrcode(); > + > + if (jspThrowErrors(cxt) || > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > + PG_RE_THROW(); > + > + MemoryContextSwitchTo(mcxt); > + FlushErrorState(); > + > + return jperError; > + } > + PG_END_TRY(); FWIW, I still think this is a terrible idea and shouldn't be merged this way. The likelihood of introducing subtle bugs seems way too high - even if it's possibly not buggy today, who says that it's not going to be in the future? Greetings, Andres Freund
On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > + /* > > + * It is safe to use here PG_TRY/PG_CATCH without subtransaction because > > + * no function called inside performs data modification. > > + */ > > + PG_TRY(); > > + { > > + res = DirectFunctionCall2(func, ldatum, rdatum); > > + } > > + PG_CATCH(); > > + { > > + int errcode = geterrcode(); > > + > > + if (jspThrowErrors(cxt) || > > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > > + PG_RE_THROW(); > > + > > + MemoryContextSwitchTo(mcxt); > > + FlushErrorState(); > > + > > + return jperError; > > + } > > + PG_END_TRY(); > > FWIW, I still think this is a terrible idea and shouldn't be merged this > way. The likelihood of introducing subtle bugs seems way too high - even > if it's possibly not buggy today, who says that it's not going to be in > the future? I'm probably not yet understanding all the risks this code have. So far I see: 1) One of functions called here performs database modification, while it wasn't suppose to. So, it becomes not safe to skip subtransaction. 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason. So, it might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore. Could you complete this list? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > + /* > > + * It is safe to use here PG_TRY/PG_CATCH without subtransaction because > > + * no function called inside performs data modification. > > + */ > > + PG_TRY(); > > + { > > + res = DirectFunctionCall2(func, ldatum, rdatum); > > + } > > + PG_CATCH(); > > + { > > + int errcode = geterrcode(); > > + > > + if (jspThrowErrors(cxt) || > > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > > + PG_RE_THROW(); > > + > > + MemoryContextSwitchTo(mcxt); > > + FlushErrorState(); > > + > > + return jperError; > > + } > > + PG_END_TRY(); BTW, this code also looks... useless. I can't see whole numeric.c throwing ERRCODE_DATA_EXCEPTION. Nikita, what do we mean here? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2019-01-29 04:17:33 +0300, Alexander Korotkov wrote: > On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > > + /* > > > + * It is safe to use here PG_TRY/PG_CATCH without subtransaction because > > > + * no function called inside performs data modification. > > > + */ > > > + PG_TRY(); > > > + { > > > + res = DirectFunctionCall2(func, ldatum, rdatum); > > > + } > > > + PG_CATCH(); > > > + { > > > + int errcode = geterrcode(); > > > + > > > + if (jspThrowErrors(cxt) || > > > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > > > + PG_RE_THROW(); > > > + > > > + MemoryContextSwitchTo(mcxt); > > > + FlushErrorState(); > > > + > > > + return jperError; > > > + } > > > + PG_END_TRY(); > > > > FWIW, I still think this is a terrible idea and shouldn't be merged this > > way. The likelihood of introducing subtle bugs seems way too high - even > > if it's possibly not buggy today, who says that it's not going to be in > > the future? > > I'm probably not yet understanding all the risks this code have. So far I see: I find these *more* than sufficient to not go to the PG_TRY/CATCH approach. > 1) One of functions called here performs database modification, while > it wasn't suppose to. So, it becomes not safe to skip subtransaction. It's not just data modifications. Even just modifying some memory structures that'd normally be invalidated by an xact abort's invalidation processing isn't safe. > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason. So, it > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore. It'd e.g. not surprise me very much if some OOM would end up translating to ERRCODE_DATA_EXCEPTION, because some library function returned an error due to ENOMEM. > Could you complete this list? 3) The expression changed the current expression context, GUCs or any other such global variable. Without a proper subtrans reset this state isn't reverted. 4) The function acquires an LWLOCK, buffer reference, anything resowner owned. Skipping subtrans reset, that's not released in that moment. That's going to lead to potential hard deadlocks. 99) sigsetjmp is actually pretty expensive. Greetings, Andres Freund
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: >> FWIW, I still think this is a terrible idea and shouldn't be merged this >> way. The likelihood of introducing subtle bugs seems way too high - even >> if it's possibly not buggy today, who says that it's not going to be in >> the future? > I'm probably not yet understanding all the risks this code have. So far I see: > 1) One of functions called here performs database modification, while > it wasn't suppose to. So, it becomes not safe to skip subtransaction. > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason. So, it > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore. > Could you complete this list? Sure: every errcode we have is unsafe to treat this way. The backend coding rule from day one has been that a thrown error requires (sub)transaction cleanup to be done to make sure that things are back in a good state. You can *not* just decide that it's okay to ignore that, especially not when invoking code outside the immediate area of what you're doing. As a counterexample, for any specific errcode you might claim is safe, a plpgsql function might do something that requires cleanup (which is not equal to "does data modification") and then throw that errcode. You might argue that this code will only ever be used to call certain functions in numeric.c and you've vetted every line of code that those functions can reach ... but even to say that is to expose how fragile the argument is. Every time anybody touched numeric.c, or any code reachable from there, we'd have to wonder whether we just introduced a failure hazard for jsonpath. That's not maintainable. It's even less maintainable if anybody decides they'd like to insert extension hooks into any of that code. I rather imagine that this could be broken today by an ErrorContextCallback function, for example. (That is, even today, "vetting every line" has to include vetting every errcontext subroutine in the system, along with everything it can call. Plus equivalent code in external PLs and so on.) We've rejected code like this every time it was submitted to date, and I don't see a reason to change that policy for jsonpath. regards, tom lane
On Tue, Jan 29, 2019 at 4:30 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > > + /* > > > + * It is safe to use here PG_TRY/PG_CATCH without subtransaction because > > > + * no function called inside performs data modification. > > > + */ > > > + PG_TRY(); > > > + { > > > + res = DirectFunctionCall2(func, ldatum, rdatum); > > > + } > > > + PG_CATCH(); > > > + { > > > + int errcode = geterrcode(); > > > + > > > + if (jspThrowErrors(cxt) || > > > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > > > + PG_RE_THROW(); > > > + > > > + MemoryContextSwitchTo(mcxt); > > > + FlushErrorState(); > > > + > > > + return jperError; > > > + } > > > + PG_END_TRY(); > > BTW, this code also looks... useless. I can't see whole numeric.c > throwing ERRCODE_DATA_EXCEPTION. Nikita, what do we mean here? Oh, sorry. I've missed we have ERRCODE_TO_CATEGORY() here. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Jan 29, 2019 at 4:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > On Tue, Jan 29, 2019 at 4:03 AM Andres Freund <andres@anarazel.de> wrote: > >> FWIW, I still think this is a terrible idea and shouldn't be merged this > >> way. The likelihood of introducing subtle bugs seems way too high - even > >> if it's possibly not buggy today, who says that it's not going to be in > >> the future? > > > I'm probably not yet understanding all the risks this code have. So far I see: > > 1) One of functions called here performs database modification, while > > it wasn't suppose to. So, it becomes not safe to skip subtransaction. > > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason. So, it > > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore. > > Could you complete this list? > > Sure: every errcode we have is unsafe to treat this way. > > The backend coding rule from day one has been that a thrown error requires > (sub)transaction cleanup to be done to make sure that things are back in a > good state. You can *not* just decide that it's okay to ignore that, > especially not when invoking code outside the immediate area of what > you're doing. > > As a counterexample, for any specific errcode you might claim is safe, > a plpgsql function might do something that requires cleanup (which is > not equal to "does data modification") and then throw that errcode. > > You might argue that this code will only ever be used to call certain > functions in numeric.c and you've vetted every line of code that those > functions can reach ... but even to say that is to expose how fragile > the argument is. Every time anybody touched numeric.c, or any code > reachable from there, we'd have to wonder whether we just introduced > a failure hazard for jsonpath. That's not maintainable. It's even > less maintainable if anybody decides they'd like to insert extension > hooks into any of that code. I rather imagine that this could be > broken today by an ErrorContextCallback function, for example. > (That is, even today, "vetting every line" has to include vetting every > errcontext subroutine in the system, along with everything it can call. > Plus equivalent code in external PLs and so on.) > > We've rejected code like this every time it was submitted to date, > and I don't see a reason to change that policy for jsonpath. Tom, Andres thank you for the explanation. More than clear for now. Will search for workaround. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > + if (jspThrowErrors(cxt) || > + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION) > + PG_RE_THROW(); > BTW, this code also looks... useless. I can't see whole numeric.c > throwing ERRCODE_DATA_EXCEPTION. Nikita, what do we mean here? I think what this code is claiming is that any errcode in the 22xxx category is safe to trap. Presumably what it's expecting to see is ERRCODE_DIVISION_BY_ZERO (22012) or another error that numeric.c could throw ... but 22xxx covers a heck of a lot of ground. regards, tom lane
Hi, On 2019-01-28 17:31:15 -0800, Andres Freund wrote: > On 2019-01-29 04:17:33 +0300, Alexander Korotkov wrote: > > I'm probably not yet understanding all the risks this code have. So far I see: > > I find these *more* than sufficient to not go to the PG_TRY/CATCH > approach. > > > > 1) One of functions called here performs database modification, while > > it wasn't suppose to. So, it becomes not safe to skip subtransaction. > > It's not just data modifications. Even just modifying some memory > structures that'd normally be invalidated by an xact abort's > invalidation processing isn't safe. > > > > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason. So, it > > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore. > > It'd e.g. not surprise me very much if some OOM would end up translating > to ERRCODE_DATA_EXCEPTION, because some library function returned an > error due to ENOMEM. > > > > Could you complete this list? > > 3) The expression changed the current expression context, GUCs or any > other such global variable. Without a proper subtrans reset this > state isn't reverted. > 4) The function acquires an LWLOCK, buffer reference, anything resowner > owned. Skipping subtrans reset, that's not released in that > moment. That's going to lead to potential hard deadlocks. > 99) sigsetjmp is actually pretty expensive. And even if you, to address Tom's point about plpgsql, had a category that could only be thrown by core code, and addressed 3) and 4) by carefully and continuously auditing code, you'd still have the issue of 5) you'd likely leak memory at potentially prodiguous rate... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > And even if you, to address Tom's point about plpgsql, had a category > that could only be thrown by core code, Actually, that wasn't quite what my point was there. Even if the error that is thrown is perfectly safe and was thrown from a known spot in the core code, elog.c will call ErrorContextCallback functions on the way out, *before* longjmp'ing back to your code. So, if PL foo executes a SQL command that includes jsonpath operations, it's possible for PL foo's error context reporting code to break whatever assumptions you thought were safe to make. Admittedly, a wise PL designer would refrain from assuming too much about what they can get away with doing in an error context callback. But that very same caution should persuade us not to assume too much about what actually has happened during error processing. regards, tom lane
Hi, On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > +/* > + * Make datetime type from 'date_txt' which is formated at argument 'fmt'. > + * Actual datatype (returned in 'typid', 'typmod') is determined by > + * presence of date/time/zone components in the format string. > + */ > +Datum > +to_datetime(text *date_txt, text *fmt, char *tzname, bool strict, Oid *typid, > + int32 *typmod, int *tz) Given other to_<type> functions being SQL callable, I'm not a fan of the naming here. > +{ > + struct pg_tm tm; > + fsec_t fsec; > + int fprec = 0; > + int flags; > + > + do_to_timestamp(date_txt, fmt, strict, &tm, &fsec, &fprec, &flags); > + > + *typmod = fprec ? fprec : -1; /* fractional part precision */ > + *tz = 0; > + > + if (flags & DCH_DATED) > + { > + if (flags & DCH_TIMED) > + { > + if (flags & DCH_ZONED) > + { > + TimestampTz result; > + > + if (tm.tm_zone) > + tzname = (char *) tm.tm_zone; > + > + if (tzname) > + { > + int dterr = DecodeTimezone(tzname, tz); > + > + if (dterr) > + DateTimeParseError(dterr, tzname, "timestamptz"); Do we really need 6/7 indentation levels in new code? > +jsonpath_scan.c: FLEXFLAGS = -CF -p -p Why -CF, and why is -p repeated? > - JsonEncodeDateTime(buf, val, DATEOID); > + JsonEncodeDateTime(buf, val, DATEOID, NULL); ISTM API changes like this ought to be done in a separate patch, to ease review. > } > > + > /* > * Compare two jbvString JsonbValue values, a and b. > * Random WS change. > +/*****************************INPUT/OUTPUT*********************************/ Why do we need this much code to serialize data that we initially got in serialized manner? Couldn't we just keep the original around? > +Datum > +jsonpath_in(PG_FUNCTION_ARGS) > +{ No binary input support? I'd suggest adding that, but keeping the representation the same. Otherwise just adds unnecesary complexity for driver authors wishing to use the binar protocol. > +/********************Support functions for JsonPath**************************/ > + > +/* > + * Support macroses to read stored values > + */ s/macroses/macros/ > +++ b/src/backend/utils/adt/jsonpath_exec.c Gotta say, I'm unenthusiastic about yet another execution engine in some PG subsystem. > @@ -0,0 +1,2776 @@ > +/*------------------------------------------------------------------------- > + * > + * jsonpath_exec.c > + * Routines for SQL/JSON path execution. > + * > + * Copyright (c) 2019, PostgreSQL Global Development Group > + * > + * IDENTIFICATION > + * src/backend/utils/adt/jsonpath_exec.c > + * > + *------------------------------------------------------------------------- > + */ Somewhere there needs to be higher level documentation explaining how this stuff is supposed to work on a code level. > + > +/* strict/lax flags is decomposed into four [un]wrap/error flags */ > +#define jspStrictAbsenseOfErrors(cxt) (!(cxt)->laxMode) > +#define jspAutoUnwrap(cxt) ((cxt)->laxMode) > +#define jspAutoWrap(cxt) ((cxt)->laxMode) > +#define jspIgnoreStructuralErrors(cxt) ((cxt)->ignoreStructuralErrors) > +#define jspThrowErrors(cxt) ((cxt)->throwErrors) What's the point? > +#define ThrowJsonPathError(code, detail) \ > + ereport(ERROR, \ > + (errcode(ERRCODE_ ## code), \ > + errmsg(ERRMSG_ ## code), \ > + errdetail detail)) > + > +#define ThrowJsonPathErrorHint(code, detail, hint) \ > + ereport(ERROR, \ > + (errcode(ERRCODE_ ## code), \ > + errmsg(ERRMSG_ ## code), \ > + errdetail detail, \ > + errhint hint)) These seem ill-advised, just making it harder to understand the code, and grepping for it, without actually meaningfully simplifying anything. > +/* > + * Find value of jsonpath variable in a list of passing params > + */ What is that comment supposed to mean? > +/* > + * Get the type name of a SQL/JSON item. > + */ > +static const char * > +JsonbTypeName(JsonbValue *jb) > +{ Wasn't there pretty similar infrastructure elsewhere? > +/* > + * Cross-type comparison of two datetime SQL/JSON items. If items are > + * uncomparable, 'error' flag is set. > + */ Sounds like it'd not raise an error, but it does in a bunch of scenarios. > @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception > 2200N E ERRCODE_INVALID_XML_CONTENT invalid_xml_content > 2200S E ERRCODE_INVALID_XML_COMMENT invalid_xml_comment > 2200T E ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION invalid_xml_processing_instruction > +22030 E ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE duplicate_json_object_key_value > +22031 E ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION invalid_argument_for_json_datetime_function > +22032 E ERRCODE_INVALID_JSON_TEXT invalid_json_text > +22033 E ERRCODE_INVALID_JSON_SUBSCRIPT invalid_json_subscript > +22034 E ERRCODE_MORE_THAN_ONE_JSON_ITEM more_than_one_json_item > +22035 E ERRCODE_NO_JSON_ITEM no_json_item > +22036 E ERRCODE_NON_NUMERIC_JSON_ITEM non_numeric_json_item > +22037 E ERRCODE_NON_UNIQUE_KEYS_IN_JSON_OBJECT non_unique_keys_in_json_object > +22038 E ERRCODE_SINGLETON_JSON_ITEM_REQUIRED singleton_json_item_required > +22039 E ERRCODE_JSON_ARRAY_NOT_FOUND json_array_not_found > +2203A E ERRCODE_JSON_MEMBER_NOT_FOUND json_member_not_found > +2203B E ERRCODE_JSON_NUMBER_NOT_FOUND json_number_not_found > +2203C E ERRCODE_JSON_OBJECT_NOT_FOUND object_not_found > +2203F E ERRCODE_JSON_SCALAR_REQUIRED json_scalar_required > +2203D E ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS too_many_json_array_elements > +2203E E ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS > too_many_json_object_members Are all of these invented by us? Greetings, Andres Freund
Hi! Thank you for your review! Let me answer some points of your review while others will be answered later. On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > +/*****************************INPUT/OUTPUT*********************************/ > > Why do we need this much code to serialize data that we initially got in > serialized manner? Couldn't we just keep the original around? As I get, you concern related to fact that we have jsonpath in text form (serialized) and convert it to the binary form (also serialized). Yes, we should do so. Otherwise, we would have to parse jsonpath for each evaluation. Basically, for the same reason we have binary representation for jsonb. > > +++ b/src/backend/utils/adt/jsonpath_exec.c > > Gotta say, I'm unenthusiastic about yet another execution engine in some > PG subsystem. Better ideas? I can imagine we can evade introduction of jsonpath datatype and turn jsonpath into executor nodes. But I hardly can imagine you would be more enthusiastic about that :) > > @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception > > 2200N E ERRCODE_INVALID_XML_CONTENT invalid_xml_content > > 2200S E ERRCODE_INVALID_XML_COMMENT invalid_xml_comment > > 2200T E ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION invalid_xml_processing_instruction > > +22030 E ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE duplicate_json_object_key_value > > +22031 E ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION invalid_argument_for_json_datetime_function > > +22032 E ERRCODE_INVALID_JSON_TEXT invalid_json_text > > +22033 E ERRCODE_INVALID_JSON_SUBSCRIPT invalid_json_subscript > > +22034 E ERRCODE_MORE_THAN_ONE_JSON_ITEM more_than_one_json_item > > +22035 E ERRCODE_NO_JSON_ITEM no_json_item > > +22036 E ERRCODE_NON_NUMERIC_JSON_ITEM non_numeric_json_item > > +22037 E ERRCODE_NON_UNIQUE_KEYS_IN_JSON_OBJECT non_unique_keys_in_json_object > > +22038 E ERRCODE_SINGLETON_JSON_ITEM_REQUIRED singleton_json_item_required > > +22039 E ERRCODE_JSON_ARRAY_NOT_FOUND json_array_not_found > > +2203A E ERRCODE_JSON_MEMBER_NOT_FOUND json_member_not_found > > +2203B E ERRCODE_JSON_NUMBER_NOT_FOUND json_number_not_found > > +2203C E ERRCODE_JSON_OBJECT_NOT_FOUND object_not_found > > +2203F E ERRCODE_JSON_SCALAR_REQUIRED json_scalar_required > > +2203D E ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS too_many_json_array_elements > > +2203E E ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS > > too_many_json_object_members > > Are all of these invented by us? None of them are invented by us. All of them are part of SQL 2016 standard. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 2019-01-30 07:34:00 +0300, Alexander Korotkov wrote: > Thank you for your review! Let me answer some points of your review > while others will be answered later. > > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > > +/*****************************INPUT/OUTPUT*********************************/ > > > > Why do we need this much code to serialize data that we initially got in > > serialized manner? Couldn't we just keep the original around? > > As I get, you concern related to fact that we have jsonpath in text > form (serialized) and convert it to the binary form (also serialized). > Yes, we should do so. Otherwise, we would have to parse jsonpath for > each evaluation. Basically, for the same reason we have binary > representation for jsonb. But why can't we keep the text around, instead of having all of that code for converting back? > > > +++ b/src/backend/utils/adt/jsonpath_exec.c > > > > Gotta say, I'm unenthusiastic about yet another execution engine in some > > PG subsystem. > > Better ideas? I can imagine we can evade introduction of jsonpath > datatype and turn jsonpath into executor nodes. But I hardly can > imagine you would be more enthusiastic about that :) Not executor nodes, I think it could be possible to put it into the expression evaluation codepaths, but it's probably too different to fit well (would get you JIT compilation of the whole thing tho). > > > @@ -206,6 +206,22 @@ Section: Class 22 - Data Exception > > > 2200N E ERRCODE_INVALID_XML_CONTENT invalid_xml_content > > > 2200S E ERRCODE_INVALID_XML_COMMENT invalid_xml_comment > > > 2200T E ERRCODE_INVALID_XML_PROCESSING_INSTRUCTION invalid_xml_processing_instruction > > > +22030 E ERRCODE_DUPLICATE_JSON_OBJECT_KEY_VALUE duplicate_json_object_key_value > > > +22031 E ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION invalid_argument_for_json_datetime_function > > > +22032 E ERRCODE_INVALID_JSON_TEXT invalid_json_text > > > +22033 E ERRCODE_INVALID_JSON_SUBSCRIPT invalid_json_subscript > > > +22034 E ERRCODE_MORE_THAN_ONE_JSON_ITEM more_than_one_json_item > > > +22035 E ERRCODE_NO_JSON_ITEM no_json_item > > > +22036 E ERRCODE_NON_NUMERIC_JSON_ITEM non_numeric_json_item > > > +22037 E ERRCODE_NON_UNIQUE_KEYS_IN_JSON_OBJECT non_unique_keys_in_json_object > > > +22038 E ERRCODE_SINGLETON_JSON_ITEM_REQUIRED singleton_json_item_required > > > +22039 E ERRCODE_JSON_ARRAY_NOT_FOUND json_array_not_found > > > +2203A E ERRCODE_JSON_MEMBER_NOT_FOUND json_member_not_found > > > +2203B E ERRCODE_JSON_NUMBER_NOT_FOUND json_number_not_found > > > +2203C E ERRCODE_JSON_OBJECT_NOT_FOUND object_not_found > > > +2203F E ERRCODE_JSON_SCALAR_REQUIRED json_scalar_required > > > +2203D E ERRCODE_TOO_MANY_JSON_ARRAY_ELEMENTS too_many_json_array_elements > > > +2203E E ERRCODE_TOO_MANY_JSON_OBJECT_MEMBERS > > > too_many_json_object_members > > > > Are all of these invented by us? > > None of them are invented by us. All of them are part of SQL 2016 standard. Cool. Greetings, Andres Freund
On Wed, Jan 30, 2019 at 9:36 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-01-30 07:34:00 +0300, Alexander Korotkov wrote:
> > Thank you for your review! Let me answer some points of your review
> > while others will be answered later.
> >
> > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > > +/*****************************INPUT/OUTPUT*********************************/
> > >
> > > Why do we need this much code to serialize data that we initially got in
> > > serialized manner? Couldn't we just keep the original around?
> >
> > As I get, you concern related to fact that we have jsonpath in text
> > form (serialized) and convert it to the binary form (also serialized).
> > Yes, we should do so. Otherwise, we would have to parse jsonpath for
> > each evaluation. Basically, for the same reason we have binary
> > representation for jsonb.
>
> But why can't we keep the text around, instead of having all of that
> code for converting back?
Yeah, that's possible. But now converting back to string is one of ways to test that jsonpath parsing is correct. If we remove conversion back to string, this possibility would be also removed. Also, we would loose way to normalize jsonpath, which is probably not that important. As well it's generally ugly redundancy. So, I can't say I like idea to save few hundreds lines of codes for this price.
> > > > +++ b/src/backend/utils/adt/jsonpath_exec.c
> > >
> > > Gotta say, I'm unenthusiastic about yet another execution engine in some
> > > PG subsystem.
> >
> > Better ideas? I can imagine we can evade introduction of jsonpath
> > datatype and turn jsonpath into executor nodes. But I hardly can
> > imagine you would be more enthusiastic about that :)
>
> Not executor nodes, I think it could be possible to put it into the
> expression evaluation codepaths, but it's probably too different to fit
> well (would get you JIT compilation of the whole thing tho).
Consider given example. We need to check some predicate for each JSON item, where predicate is given by expression and set of items is produced by another expression. In order to fit this into expression evaluation, we probably need some kind of lamda functions there. It seems unlikely for me that we want to implement that.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
> On 2019-01-30 07:34:00 +0300, Alexander Korotkov wrote:
> > Thank you for your review! Let me answer some points of your review
> > while others will be answered later.
> >
> > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > > +/*****************************INPUT/OUTPUT*********************************/
> > >
> > > Why do we need this much code to serialize data that we initially got in
> > > serialized manner? Couldn't we just keep the original around?
> >
> > As I get, you concern related to fact that we have jsonpath in text
> > form (serialized) and convert it to the binary form (also serialized).
> > Yes, we should do so. Otherwise, we would have to parse jsonpath for
> > each evaluation. Basically, for the same reason we have binary
> > representation for jsonb.
>
> But why can't we keep the text around, instead of having all of that
> code for converting back?
Yeah, that's possible. But now converting back to string is one of ways to test that jsonpath parsing is correct. If we remove conversion back to string, this possibility would be also removed. Also, we would loose way to normalize jsonpath, which is probably not that important. As well it's generally ugly redundancy. So, I can't say I like idea to save few hundreds lines of codes for this price.
> > > > +++ b/src/backend/utils/adt/jsonpath_exec.c
> > >
> > > Gotta say, I'm unenthusiastic about yet another execution engine in some
> > > PG subsystem.
> >
> > Better ideas? I can imagine we can evade introduction of jsonpath
> > datatype and turn jsonpath into executor nodes. But I hardly can
> > imagine you would be more enthusiastic about that :)
>
> Not executor nodes, I think it could be possible to put it into the
> expression evaluation codepaths, but it's probably too different to fit
> well (would get you JIT compilation of the whole thing tho).
Consider given example. We need to check some predicate for each JSON item, where predicate is given by expression and set of items is produced by another expression. In order to fit this into expression evaluation, we probably need some kind of lamda functions there. It seems unlikely for me that we want to implement that.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Jan 29, 2019 at 04:51:46AM +0300, Alexander Korotkov wrote: > Oh, sorry. I've missed we have ERRCODE_TO_CATEGORY() here. Note: patch set moved to next CF, still waiting on author. -- Michael
Attachment
Hi! Attached is revised version of jsonpath. Assuming that jsonpath have problem places, I decided to propose partial implementation. Following functionality was cut from jsonpath: 1) Support of datetime datatypes. Besides error suppression, this functionality have troubles with timezones. According to standard we should support timezones in jsonpath expressions. But that would prevent jsonpath functions from being immutable, that in turn limits their usage in expression indexes. 2) Suppression of numeric errors. I will post it as a separate patch. Pushing this even this partial implementation of jsonpath to PG 12 is still very useful. Also it will simplify a lot pushing other parts of SQL/JSON to future releases. Besides this changes, there is some refactoring and code beautification. On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote: > > +/* > > + * Make datetime type from 'date_txt' which is formated at argument 'fmt'. > > + * Actual datatype (returned in 'typid', 'typmod') is determined by > > + * presence of date/time/zone components in the format string. > > + */ > > +Datum > > +to_datetime(text *date_txt, text *fmt, char *tzname, bool strict, Oid *typid, > > + int32 *typmod, int *tz) > > Given other to_<type> functions being SQL callable, I'm not a fan of the > naming here. I've removed that for now. > > +{ > > + struct pg_tm tm; > > + fsec_t fsec; > > + int fprec = 0; > > + int flags; > > + > > + do_to_timestamp(date_txt, fmt, strict, &tm, &fsec, &fprec, &flags); > > + > > + *typmod = fprec ? fprec : -1; /* fractional part precision */ > > + *tz = 0; > > + > > + if (flags & DCH_DATED) > > + { > > + if (flags & DCH_TIMED) > > + { > > + if (flags & DCH_ZONED) > > + { > > + TimestampTz result; > > + > > + if (tm.tm_zone) > > + tzname = (char *) tm.tm_zone; > > + > > + if (tzname) > > + { > > + int dterr = DecodeTimezone(tzname, tz); > > + > > + if (dterr) > > + DateTimeParseError(dterr, tzname, "timestamptz"); > > > Do we really need 6/7 indentation levels in new code? Also, removed that. > > +jsonpath_scan.c: FLEXFLAGS = -CF -p -p > > Why -CF, and why is -p repeated? BTW, for our SQL grammar we have > scan.c: FLEXFLAGS = -CF -p -p Is it kind of default? > > - JsonEncodeDateTime(buf, val, DATEOID); > > + JsonEncodeDateTime(buf, val, DATEOID, NULL); > > ISTM API changes like this ought to be done in a separate patch, to ease > review. Also, removed that for now. > > } > > > > + > > /* > > * Compare two jbvString JsonbValue values, a and b. > > * > > Random WS change. Reverted > No binary input support? I'd suggest adding that, but keeping the > representation the same. Otherwise just adds unnecesary complexity for > driver authors wishing to use the binar protocol. Binary support is added. I decided to make it in jsonb manner. Format version 1 is text format, but it's possible we would have other versions in future. > > +/********************Support functions for JsonPath**************************/ > > + > > +/* > > + * Support macroses to read stored values > > + */ > > s/macroses/macros/ Fixed. > > @@ -0,0 +1,2776 @@ > > +/*------------------------------------------------------------------------- > > + * > > + * jsonpath_exec.c > > + * Routines for SQL/JSON path execution. > > + * > > + * Copyright (c) 2019, PostgreSQL Global Development Group > > + * > > + * IDENTIFICATION > > + * src/backend/utils/adt/jsonpath_exec.c > > + * > > + *------------------------------------------------------------------------- > > + */ > > Somewhere there needs to be higher level documentation explaining how > this stuff is supposed to work on a code level. High level comments are added to jsonpath.c (about jsonpath binary representation) jsonpath_exec.c (about jsonpath execution). > > + > > +/* strict/lax flags is decomposed into four [un]wrap/error flags */ > > +#define jspStrictAbsenseOfErrors(cxt) (!(cxt)->laxMode) > > +#define jspAutoUnwrap(cxt) ((cxt)->laxMode) > > +#define jspAutoWrap(cxt) ((cxt)->laxMode) > > +#define jspIgnoreStructuralErrors(cxt) ((cxt)->ignoreStructuralErrors) > > +#define jspThrowErrors(cxt) ((cxt)->throwErrors) > > What's the point? These are convenience macros, which improves code readability. For instance, instead of direct checking for laxMode, we check for particular aspects of its behavior. For code reader, it becomes easier to understand why do we behave one way or another. > > +#define ThrowJsonPathError(code, detail) \ > > + ereport(ERROR, \ > > + (errcode(ERRCODE_ ## code), \ > > + errmsg(ERRMSG_ ## code), \ > > + errdetail detail)) > > + > > +#define ThrowJsonPathErrorHint(code, detail, hint) \ > > + ereport(ERROR, \ > > + (errcode(ERRCODE_ ## code), \ > > + errmsg(ERRMSG_ ## code), \ > > + errdetail detail, \ > > + errhint hint)) > > These seem ill-advised, just making it harder to understand the code, > and grepping for it, without actually meaningfully simplifying anything. Removed. Instead, I've introduced RETURN_ERROR() macro, which either returns jperError or issues ereport given in the argument. > > +/* > > + * Find value of jsonpath variable in a list of passing params > > + */ > > What is that comment supposed to mean? Comment is rephrased. > > +/* > > + * Get the type name of a SQL/JSON item. > > + */ > > +static const char * > > +JsonbTypeName(JsonbValue *jb) > > +{ > > Wasn't there pretty similar infrastructure elsewhere? Yes, but it wasn't exposed. Moved to jsonb.c > > +/* > > + * Cross-type comparison of two datetime SQL/JSON items. If items are > > + * uncomparable, 'error' flag is set. > > + */ > > Sounds like it'd not raise an error, but it does in a bunch of scenarios. Removed as well. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, On 2/24/19 10:03 AM, Alexander Korotkov wrote: > Hi! > > Attached is revised version of jsonpath. Assuming that jsonpath have > problem places, I decided to propose partial implementation. > Following functionality was cut from jsonpath: > 1) Support of datetime datatypes. Besides error suppression, this > functionality have troubles with timezones. According to standard we > should support timezones in jsonpath expressions. But that would > prevent jsonpath functions from being immutable, that in turn limits > their usage in expression indexes. Assuming we get the patch committed without the datetime stuff now, what does that mean for the future? Does that mean we'll be unable to extend it to support datetime in the future, or what? If we define the jsonpath functions as immutable now, people may create indexes - which means we won't be able to downgrade it to stable later. So, what's the plan here? The only thing I can think of is having two sets of functions - an immutable one, prohibiting datetime expressions, and stable that can't be used for indexes etc. > 2) Suppression of numeric errors. I will post it as a separate patch. > Pushing this even this partial implementation of jsonpath to PG 12 is > still very useful. Also it will simplify a lot pushing other parts of > SQL/JSON to future releases. > +1 to push at least partial (but still useful) subset, instead of just bumping the patch to PG13 regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Feb 24, 2019 at 2:44 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 2/24/19 10:03 AM, Alexander Korotkov wrote: > > Attached is revised version of jsonpath. Assuming that jsonpath have > > problem places, I decided to propose partial implementation. > > Following functionality was cut from jsonpath: > > 1) Support of datetime datatypes. Besides error suppression, this > > functionality have troubles with timezones. According to standard we > > should support timezones in jsonpath expressions. But that would > > prevent jsonpath functions from being immutable, that in turn limits > > their usage in expression indexes. > > Assuming we get the patch committed without the datetime stuff now, what > does that mean for the future? Does that mean we'll be unable to extend > it to support datetime in the future, or what? If we define the jsonpath > functions as immutable now, people may create indexes - which means we > won't be able to downgrade it to stable later. > > So, what's the plan here? The only thing I can think of is having two > sets of functions - an immutable one, prohibiting datetime expressions, > and stable that can't be used for indexes etc. Reasonable question. As I understand, not datetime support itself making jsonpath functions not immutable, but implicit cast happening during comparison timestamp vs. timestamptz (and time vs. timetz). So, in future immutable functions can have limited support of datetime, where comparison of non-tz vs. tz types is restricted. And stable versions of functions (for instance, with '_tz' prefix) with full datetime support. > > 2) Suppression of numeric errors. I will post it as a separate patch. > > Pushing this even this partial implementation of jsonpath to PG 12 is > > still very useful. Also it will simplify a lot pushing other parts of > > SQL/JSON to future releases. > > > > +1 to push at least partial (but still useful) subset, instead of just > bumping the patch to PG13 Thank you for support! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attached 34th version of the patches. 1. Partial jsonpath support: - Fixed copying of jsonb with vars jsonb_path_query() into SRF context - Fixed error message for jsonpath vars - Fixed file-level comment in jsonpath.c 2. Suppression of numeric errors: Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal numeric functions with 'bool *error' flag. 3. Datetime support: Problems with timzeones still exist. Comparison of tz types with non-tz types is simply disallowed. Default integer timezone offset (not abbreviation) can be specified with the second .datetime() argument. Error handling also is done using internal functions. 4. Json type support: Json support was completely refactored since v23: double compilation with function redefinitions was replaced with passing 'isJsonb' flag to low-level json/jsonb access functions. Also major refactoring with introduction of struct JsonItem was made. JsonItem is used in executor instead of raw JsonbValue. This helps to avoid extending of JsonbValue for datetime types and also other numeric types (integers and floats) required by standard. 5. GIN support: Nothing was changed since v23. Patch 1 is what we are going to commit in PG12. Patches 2 and 3 add code that was removed in the previous v33 version.
On 24.02.2019 15:34, Alexander Korotkov wrote:
On Sun, Feb 24, 2019 at 2:44 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:On 2/24/19 10:03 AM, Alexander Korotkov wrote:Attached is revised version of jsonpath. Assuming that jsonpath have problem places, I decided to propose partial implementation. Following functionality was cut from jsonpath: 1) Support of datetime datatypes. Besides error suppression, this functionality have troubles with timezones. According to standard we should support timezones in jsonpath expressions. But that would prevent jsonpath functions from being immutable, that in turn limits their usage in expression indexes.Assuming we get the patch committed without the datetime stuff now, what does that mean for the future? Does that mean we'll be unable to extend it to support datetime in the future, or what? If we define the jsonpath functions as immutable now, people may create indexes - which means we won't be able to downgrade it to stable later. So, what's the plan here? The only thing I can think of is having two sets of functions - an immutable one, prohibiting datetime expressions, and stable that can't be used for indexes etc.Reasonable question. As I understand, not datetime support itself making jsonpath functions not immutable, but implicit cast happening during comparison timestamp vs. timestamptz (and time vs. timetz). So, in future immutable functions can have limited support of datetime, where comparison of non-tz vs. tz types is restricted. And stable versions of functions (for instance, with '_tz' prefix) with full datetime support.
I can also offset to explicitly pass timezone info into jsonpath function using the special user dataype encapsulating struct pg_tz. But simple integer timezone offset can be passed now using jsonpath variables (standard says only about integer timezone offsets; also it requires presence of timezone offset it in the input string if the format string contain timezone components): =# SELECT jsonb_path_query( '"28-02-2019 12:34"', '$.datetime("DD-MM-YYYY HH24:MI TZH", $tz)', jsonb_build_object('tz', EXTRACT(TIMEZONE FROM now())) ); jsonb_path_query -----------------------------"2019-02-28T12:34:00+03:00" (1 row)
See patch #2.2) Suppression of numeric errors. I will post it as a separate patch. Pushing this even this partial implementation of jsonpath to PG 12 is still very useful. Also it will simplify a lot pushing other parts of SQL/JSON to future releases.+1 to push at least partial (but still useful) subset, instead of just bumping the patch to PG13
Thank you for support!
Attachment
On 2019-03-01 03:36:49 +0300, Nikita Glukhov wrote: > Patch 1 is what we are going to commit in PG12. I think it's too early to make that determination. I think there's a good chance, but that this needs more independent review.
On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > I can also offset to explicitly pass timezone info into jsonpath function using > the special user dataype encapsulating struct pg_tz. More interesting question is what would be the source of timezone. If even you encapsulate timezone in a separate datatype, the expression will be still just stable assuming timezone is generated by stable subexpression. What we actually need is immutable timezone. Day once timezone is updated, you create new timezone version, while old version is immutable. Then if jsonpath has given particular *timezone version*, it might remain immutable. But that requires significant rework of our timezone infrastructure. > But simple integer timezone offset can be passed now using jsonpath variables > (standard says only about integer timezone offsets; also it requires presence > of timezone offset it in the input string if the format string contain timezone > components): > > =# SELECT jsonb_path_query( > '"28-02-2019 12:34"', > '$.datetime("DD-MM-YYYY HH24:MI TZH", $tz)', > jsonb_build_object('tz', EXTRACT(TIMEZONE FROM now())) > ); > > jsonb_path_query > ----------------------------- > "2019-02-28T12:34:00+03:00" > (1 row) Standard specifies fixed offset to be given for *particular datetime*. For instance, if json contains offset in separate attribute or whatever, then it's OK to use such two-arguments .datetime() method. But that seems quite narrow use case. Standard doesn't mean you get fixed offset extracted from "now()" and apply it to random datetimes in your json collection. That would work correctly for real timezones only when they are fixed offsets, but there are almost none of them! So, that's just plain wrong, we never should encourage users to do something like this. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi! On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > Attached 34th version of the patches. > > 1. Partial jsonpath support: > - Fixed copying of jsonb with vars jsonb_path_query() into SRF context > - Fixed error message for jsonpath vars > - Fixed file-level comment in jsonpath.c > > 2. Suppression of numeric errors: > Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal > numeric functions with 'bool *error' flag. Revised patches 1 and 2 are attached. Changes are following * Small refactoring, comments adjustment and function renaming. In particular, I've removed "recursive" prefix from function names, because it actually not that informative assuming header comment explains that the whole jsonpath executor is recursive. Also, I made "Unwrap" suffix more clear. Not it's distinguished what is unwrapped target (UnwrapTarget) or result (UnwrapResult). Also, now it's clear that function doesn't always unwraps but has an option to do so (OptUnwrap). * Some more cases are covered by regression tests. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
so 2. 3. 2019 v 6:15 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:
Hi!
On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>
> Attached 34th version of the patches.
>
> 1. Partial jsonpath support:
> - Fixed copying of jsonb with vars jsonb_path_query() into SRF context
> - Fixed error message for jsonpath vars
> - Fixed file-level comment in jsonpath.c
>
> 2. Suppression of numeric errors:
> Now error handling is done without PG_TRY/PG_CATCH using a bunch of internal
> numeric functions with 'bool *error' flag.
Revised patches 1 and 2 are attached. Changes are following
* Small refactoring, comments adjustment and function renaming. In
particular, I've removed "recursive" prefix from function names,
because it actually not that informative assuming header comment
explains that the whole jsonpath executor is recursive. Also, I made
"Unwrap" suffix more clear. Not it's distinguished what is unwrapped
target (UnwrapTarget) or result (UnwrapResult). Also, now it's clear
that function doesn't always unwraps but has an option to do so
(OptUnwrap).
* Some more cases are covered by regression tests.
These patches are large, but I think so the granularity and modularity of these patches are correct.
Now, it looks very well.
Pavel
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi, Here are some initial comments from a review of the 0001 part. I plan to do more testing on a large data set and additional round of review over the next week. FWIW I've passed this through valgrind and the usual battery of regression tests, and there were no issues. I haven't looked at 0002 yet, but it seems fairly small (especially compared to 0001). func.sgml ========= 1) I see the changes removed <indexterm zone="functions-json"> for some reason. Is that intentional? 2) <command>WHERE</command> We generally tag <literal>WHERE</literal>, not <command>. 3) Filter expressions are applied from left to right Perhaps s/applied/evaluated/ in reference to expressions? 4) The result of the filter expression may be true, false, or unknown. It's not entirely clear to me what "unknown" means here. NULL, or something else? There's a section in the SQL/JSON standard explaining this (page 83), but perhaps we should explain it a bit here too? The standard says "In the SQL/JSON data model, there are no SQL nulls, so Unknown is not part of the SQL/JSON data model." so I'm a bit confused what "unknown" references to. Maybe some example? Also, what happens when the result is unknown? 5) There's an example showing how to apply filter at a certain level, using the @ variable, but what if we want to apply multiple filters at different levels? Would it make sense to add such example? 6) ... extensions of the SQL/JSON standard I'm not sure what "extension" is here. Is that an extension defined in the SQL standard, or an additional PostgreSQL functionality not described in the standard? (I assume the latter, just checking.) 7) There are references to "SQL/JSON sequences" without any explanation what it means. Maybe I'm missing something obvious, though. 8) Implicit unwrapping in the lax mode is not performed in the following cases: I suggest to reword it like this: In the lax mode, implicit unwrapping is not performed when: 9) We're not adding the datetime() method for now, due to the issues with timezones. I wonder if we should add a note why it's missing to the docs ... 10) I'm a bit puzzled though, because the standard says this in the description of type() function on page 77 If I is a datetime, then “date”, “time without time zone”, “time with time zone”, “timestamp without time zone”, or “timestamp with time zone”, as appropriate. But I see our type() function does not return anything like that (which I presume is independent of timezone stuff). But I see jsonb.h has no concept of datetime values, and the standard actually says this in the datetime() section JSON has no datetime types. Datetime values are most likely stored in character strings. Considering this, the type() section makes little sense, no? jsonb_util.c ============ I see we're now handling NaN values in convertJsonbScalar(). Isn't it actually a bug that we don't do this already? Or is it not needed for some reason? jsonpath.c ========== I suppose this should say "jsonpath version number" instead? elog(ERROR, "unsupported jsonb version number %d", version); regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/3/19 1:08 PM, Tomas Vondra wrote: > > > jsonb_util.c > ============ > > I see we're now handling NaN values in convertJsonbScalar(). Isn't it > actually a bug that we don't do this already? Or is it not needed for > some reason? > JSON standard numerics don't support NaN, Infinity etc., so I assume this can only happen in a jsonpath expression being converted to a jsonb value. If so, the new section should contain a comment to that effect, otherwise it will be quite confusing. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A bunch of additional comments, after looking at the patch a bit today. All are mostly minor, and sometime perhaps a matter of preference. 1) There's a mismatch between the comment and actual function name for jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say "_novars" instead. 2) In a couple of switches the "default" case does a return with a value, following elog(ERROR). So it's practically unreachable, AFAICS it's fine without it, and we don't do this elsewhere. And I don't get any compiler warnings if I remove it either. Examples: JsonbTypeName default: elog(ERROR, "unrecognized jsonb value type: %d", jbv->type); return "unknown"; jspOperationName default: elog(ERROR, "unrecognized jsonpath item type: %d", type); return NULL; compareItems default: elog(ERROR, "unrecognized jsonpath operation: %d", op); return jpbUnknown; 3) jsonpath_send is using makeStringInfo() for a value that is not returned - IMHO it should use regular stack-allocated variable and use initStringInfo() instead 4) the version number should be defined/used as a constant, not as a magic constant somewhere in the code 5) Why does jsonPathToCstring do this? appendBinaryStringInfo(out, "strict ", 7); Why not to use regular appendStringInfoString()? What am I missing? 6) comment typo: "Aling StringInfo" 7) alignStringInfoInt() should explain why we need this and why INTALIGN is the right alignment. 8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next' I don't quite understand what it's doing with 'next' value? /* * Actual value will be recorded later, after next and children * processing. */ appendBinaryStringInfo(buf, (char *) &next, /* fake value */ sizeof(next)); Perhaps a comment explaining it (why we need a fake value at all?) would be a good idea here. 9) I see printJsonPathItem is only calling check_stack_depth while flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the difference, considering they seem about equally expensive? 10) executeNumericItemMethod is missing a comment (unlike the other executeXXX functions) 11) Wording of some of the error messages in the execute methods seems a bit odd. For example executeNumericItemMethod may complain that it ... is applied to not a numeric value but perhaps a more natural wording would be ... is applied to a non-numeric value And similarly for the other execute methods. But I'm not a native speaker, so perhaps the original wording is just fine. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 4, 2019 at 6:27 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > 11) Wording of some of the error messages in the execute methods seems a > bit odd. For example executeNumericItemMethod may complain that it > > ... is applied to not a numeric value > > but perhaps a more natural wording would be > > ... is applied to a non-numeric value > > And similarly for the other execute methods. But I'm not a native > speaker, so perhaps the original wording is just fine. As a native speaker I can confirm that the first wording is definitely not OK. The second one is tolerable, but I wonder if there is something better, like "can only be applied to a numeric value" or maybe there's a way to rephrase it so that we complain about the non-numeric value itself rather than the application, e.g. ERROR: json_frobnitz can only frob numeric values or ERROR: argument to json_frobnitz must be numeric. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attached 36th version of the patches.
On 03.03.2019 21:08, Tomas Vondra wrote:
Hi, Here are some initial comments from a review of the 0001 part. I plan to do more testing on a large data set and additional round of review over the next week. FWIW I've passed this through valgrind and the usual battery of regression tests, and there were no issues.
Thanks again for your review.
I haven't looked at 0002 yet, but it seems fairly small (especially compared to 0001). func.sgml ========= 1) I see the changes removed <indexterm zone="functions-json"> for some reason. Is that intentional?
Fixed.
2) <command>WHERE</command> We generally tag <literal>WHERE</literal>, not <command>.
Fixed.
3) Filter expressions are applied from left to right Perhaps s/applied/evaluated/ in reference to expressions?
Fixed.
4) The result of the filter expression may be true, false, or unknown. It's not entirely clear to me what "unknown" means here. NULL, or something else? There's a section in the SQL/JSON standard explaining this (page 83), but perhaps we should explain it a bit here too? The standard says "In the SQL/JSON data model, there are no SQL nulls, so Unknown is not part of the SQL/JSON data model." so I'm a bit confused what "unknown" references to. Maybe some example? Also, what happens when the result is unknown?
"unknown" refers here to ordinary three-valued logical Unknown, which is represented in SQL by NULL. JSON path expressions return sequences of SQL/JSON items, which are defined by SQL/JSON data model. But JSON path predicates (logical expressions), which are used in filters, return three-valued logical values: False, True, or Unknown. Filters accept only items for which the filter predicate returned True. False and Unknown results are skipped. Unknown can be checked with IS UNKNOWN predicate: SELECT jsonb_path_query_array('[1, "1", true, null]', '$[*] ? ((@ < 2) is unknown)');jsonb_path_query_array ------------------------["1", true] (1 row) Comparison of JSON nulls to non-nulls returns always False, not Unknown (see SQL/JSON data model). Comparison of non-comparable items return Unknown.
5) There's an example showing how to apply filter at a certain level, using the @ variable, but what if we want to apply multiple filters at different levels? Would it make sense to add such example?
Examples were added. Filters can be nested, but by the standard it is impossible to reference item of the outer filter, because @ always references current item of the innermost filter. We have additional patch with a simple jsonpath syntax extension for this case -- @N:@0, like @, references innermost item@1 references item one level upper@2 references item two levels upper For example, selecting all objects from array '[{"vals": [1,2,3], "val": 2}, {"vals": [4,5], "val": 6}]' having in their .vals[] element greater than their .val field: '$[*] ? (@.vals[*] ? (@ > @1.val))' It is impossible to do this by the standard in the single jsonpath expression. Also there is idea to use lambda expressions with ECMAScript 6 syntax in filters and user method (see below): '$[*] ? (obj => obj.vals[*] ? (val => val > obj.val))' I already have a patch implementing lambda expressions, which were necessary for implementaion of built-in/user-written functions and methods like .map(), reduce(), max(). I can post it again if it is interesting.
6) ... extensions of the SQL/JSON standard I'm not sure what "extension" is here. Is that an extension defined in the SQL standard, or an additional PostgreSQL functionality not described in the standard? (I assume the latter, just checking.)
Yes, this is additional functionality. "Writing the path as an expression is also valid" I have moved this into SQL/JSON patch, because it is related only path specification in SQL/JSON functions.
7) There are references to "SQL/JSON sequences" without any explanation what it means. Maybe I'm missing something obvious, though.
SQL/JSON sequence is a sequence of SQL/JSON items. The corresponding definition was added.
8) Implicit unwrapping in the lax mode is not performed in the following cases: I suggest to reword it like this: In the lax mode, implicit unwrapping is not performed when:
Fixed. I also decided to merge this paragraph with paragraph describing auto-unwrapping.
9) We're not adding the datetime() method for now, due to the issues with timezones. I wonder if we should add a note why it's missing to the docs ...
The corresponding note was added.
10) I'm a bit puzzled though, because the standard says this in the description of type() function on page 77 If I is a datetime, then “date”, “time without time zone”, “time with time zone”, “timestamp without time zone”, or “timestamp with time zone”, as appropriate. But I see our type() function does not return anything like that (which I presume is independent of timezone stuff). But I see jsonb.h has no concept of datetime values, and the standard actually says this in the datetime() section JSON has no datetime types. Datetime values are most likely stored in character strings. Considering this, the type() section makes little sense, no?
According to the SQL/JSON data model, SQL/JSON items can be null, string, boolean, numeric, and datetime. Datetime items exists only at execution time, they are serialized into JSON strings when the resulting SQL/JSON item is converted into jsonb. After removal of .datetime() method, support of datetime SQL/JSON items was removed from jsonpath executor too. Numeric items can be of any numeric type. In PostgreSQL we have the following numeric datatypes: integers, floats and numeric. But our jsonpath executor supports now only numeric-typed items, because this is only type we can get directly from jsonb. Support for other numeric datatypes (float8 is most necessary, because it is produced by .double() item method) can be added by extending JsonbValue or by introducing struct JsonItem in executor, having JsonbValue as a part (see patch #4 in v34).
jsonb_util.c ============ I see we're now handling NaN values in convertJsonbScalar(). Isn't it actually a bug that we don't do this already? Or is it not needed for some reason?
Numeric JsonbValues created outside of jsonpath executor cannot be NaN, because this case in explicitly handled in JsonbValue-producing functions. For example, datum_to_jsonb() which is used in to_jsonb(), jsonb_build_array() and others converts Inf and NaN into JSON strings. But jsonb_plperl and jsonb_plpython transforms do not allow NaNs (I think it is needed only for consistency of "PL => jsonb => PL" roundtrip). In our jsonpath executor we can produce NaNs and we should not touch them before conversion to the resulting jsonb. Moreover, in SQL/JSON functions numeric SQL/JSON item can be directly converted into numeric RETURNING type. So, I decided to add a check for NaN to the low-level convertJsonbScalar() instead of checking items before every call to JsonbValueToJsonb() or pushJsonbValue(). But after introduction of struct JsonItem (see patch #4) introduction there will appropriate place for this check -- JsonItemToJsonbValue().
========== I suppose this should say "jsonpath version number" instead? elog(ERROR, "unsupported jsonb version number %d", version);
Fixed.
On 05.03.2019 2:27, Tomas Vondra wrote:
A bunch of additional comments, after looking at the patch a bit today. All are mostly minor, and sometime perhaps a matter of preference. 1) There's a mismatch between the comment and actual function name for jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say "_novars" instead.
Fixed.
2) In a couple of switches the "default" case does a return with a value, following elog(ERROR). So it's practically unreachable, AFAICS it's fine without it, and we don't do this elsewhere. And I don't get any compiler warnings if I remove it either. Examples: JsonbTypeName default: elog(ERROR, "unrecognized jsonb value type: %d", jbv->type); return "unknown"; jspOperationName default: elog(ERROR, "unrecognized jsonpath item type: %d", type); return NULL; compareItems default: elog(ERROR, "unrecognized jsonpath operation: %d", op); return jpbUnknown;
It seems to be a standard practice in jsonb code, so we followed it.
3) jsonpath_send is using makeStringInfo() for a value that is not returned - IMHO it should use regular stack-allocated variable and use initStringInfo() instead
Fixed.
4) the version number should be defined/used as a constant, not as a magic constant somewhere in the code
Fixed.
5) Why does jsonPathToCstring do this? appendBinaryStringInfo(out, "strict ", 7); Why not to use regular appendStringInfoString()? What am I missing?
appendStringInfoString() is a bit slower than appendBinaryStringInfo() because to strlen() call inside it.
6) comment typo: "Aling StringInfo"
Fixed.
7) alignStringInfoInt() should explain why we need this and why INTALIGN is the right alignment.
Comment was added.
8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next' I don't quite understand what it's doing with 'next' value? /* * Actual value will be recorded later, after next and children * processing. */ appendBinaryStringInfo(buf, (char *) &next, /* fake value */ sizeof(next)); Perhaps a comment explaining it (why we need a fake value at all?) would be a good idea here.
No fake value is needed here, zero placeholder is enough. I have refactored this using new function reserveSpaceForItemPointer().
9) I see printJsonPathItem is only calling check_stack_depth while flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the difference, considering they seem about equally expensive?
CHECK_INTERRUPT() was added to printJsonPathItem() too.
10) executeNumericItemMethod is missing a comment (unlike the other executeXXX functions)
Comment was added.
11) Wording of some of the error messages in the execute methods seems a bit odd. For example executeNumericItemMethod may complain that it ... is applied to not a numeric value but perhaps a more natural wording would be ... is applied to a non-numeric value And similarly for the other execute methods. But I'm not a native speaker, so perhaps the original wording is just fine.
Error messages were changed on the advice of Robert Haas to: "... can only be applied to a numeric value"
-- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi! On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > Attached 36th version of the patches. Thank yo for the revision! In the attached revision following changes are made: > "unknown" refers here to ordinary three-valued logical Unknown, which is > represented in SQL by NULL. > > JSON path expressions return sequences of SQL/JSON items, which are defined by > SQL/JSON data model. But JSON path predicates (logical expressions), which are > used in filters, return three-valued logical values: False, True, or Unknown. * I've added short explanation of this to the documentation. * Removed no longer present data structures from typedefs.list of the first patch. * Moved GIN support patch to number 3. Seems to be well-isolated and not very complex patch. I propose to consider this to 12 too. I added high-level comment there, commit message and made some code beautification. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > Attached 36th version of the patches. > > Thank yo for the revision! > > In the attached revision following changes are made: > > > "unknown" refers here to ordinary three-valued logical Unknown, which is > > represented in SQL by NULL. > > > > JSON path expressions return sequences of SQL/JSON items, which are defined by > > SQL/JSON data model. But JSON path predicates (logical expressions), which are > > used in filters, return three-valued logical values: False, True, or Unknown. > > * I've added short explanation of this to the documentation. > * Removed no longer present data structures from typedefs.list of the > first patch. > * Moved GIN support patch to number 3. Seems to be well-isolated and > not very complex patch. I propose to consider this to 12 too. I > added high-level comment there, commit message and made some code > beautification. I think patches 1 and 2 are in committable shape (I reached Tomas off-list, he doesn't have more notes regarding them). While patch 3 requires more review. I'm going to push 1 and 2 if no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > > > Attached 36th version of the patches. > > > > Thank yo for the revision! > > > > In the attached revision following changes are made: > > > > > "unknown" refers here to ordinary three-valued logical Unknown, which is > > > represented in SQL by NULL. > > > > > > JSON path expressions return sequences of SQL/JSON items, which are defined by > > > SQL/JSON data model. But JSON path predicates (logical expressions), which are > > > used in filters, return three-valued logical values: False, True, or Unknown. > > > > * I've added short explanation of this to the documentation. > > * Removed no longer present data structures from typedefs.list of the > > first patch. > > * Moved GIN support patch to number 3. Seems to be well-isolated and > > not very complex patch. I propose to consider this to 12 too. I > > added high-level comment there, commit message and made some code > > beautification. > > I think patches 1 and 2 are in committable shape (I reached Tomas > off-list, he doesn't have more notes regarding them). While patch 3 > requires more review. > > I'm going to push 1 and 2 if no objections. So, pushed. Many thanks to reviewers and authors! Remaining part I'm proposing for 12 is attached. I appreciate review of it. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Sat, Mar 16, 2019 at 5:36 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
So, pushed. Many thanks to reviewers and authors!
I think these files have to be cleaned up by "make maintainer-clean"
./src/backend/utils/adt/jsonpath_gram.c
./src/backend/utils/adt/jsonpath_scan.c
Cheers,
Jeff
сб, 16 мар. 2019 г., 20:52 Jeff Janes <jeff.janes@gmail.com>:
On Sat, Mar 16, 2019 at 5:36 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
So, pushed. Many thanks to reviewers and authors!I think these files have to be cleaned up by "make maintainer-clean"./src/backend/utils/adt/jsonpath_gram.c./src/backend/utils/adt/jsonpath_scan.c
Good catch, thanks! Will fix.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Jeff Janes <jeff.janes@gmail.com> writes: > I think these files have to be cleaned up by "make maintainer-clean" > ./src/backend/utils/adt/jsonpath_gram.c > ./src/backend/utils/adt/jsonpath_scan.c Good point. I also see jsonpath_gram.h left behind after maintainer-clean: $ git status --ignored On branch master Your branch is up-to-date with 'origin/master'. Ignored files: (use "git add -f <file>..." to include in what will be committed) src/backend/utils/adt/jsonpath_gram.c src/backend/utils/adt/jsonpath_scan.c src/include/utils/jsonpath_gram.h Looks like whoever modified src/backend/Makefile's distprep target didn't bother to read the comment. regards, tom lane
I wrote: > Good point. I also see jsonpath_gram.h left behind after maintainer-clean: Oh, and of potentially more significance: after maintainer-clean and re-configure, make fails with In file included from jsonpath_gram.y:24: ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory I first thought this was a problem with insufficient dependencies allowing parallel make to do things in the wrong order, but the problem repeats even without any parallelism. It looks like the dependencies have been constructed in such a way that if the symlink at src/include/utils/jsonpath_gram.h exists but the underlying file does not, nothing will make the underlying file. This is pretty broken; aside from this outright failure, it also suggests that nothing will update that file if it exists but is out of date relative to its sources. Please make sure that the make rules associated with these files look exactly like the previously-debugged rules for existing bison/flex output files. There are generally good reasons for every last bit of weirdness in those. regards, tom lane
сб, 16 мар. 2019 г., 21:12 Tom Lane <tgl@sss.pgh.pa.us>:
I wrote:
> Good point. I also see jsonpath_gram.h left behind after maintainer-clean:
Oh, and of potentially more significance: after maintainer-clean and
re-configure, make fails with
In file included from jsonpath_gram.y:24:
../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory
I first thought this was a problem with insufficient dependencies
allowing parallel make to do things in the wrong order, but the problem
repeats even without any parallelism. It looks like the dependencies
have been constructed in such a way that if the symlink at
src/include/utils/jsonpath_gram.h exists but the underlying file
does not, nothing will make the underlying file. This is pretty broken;
aside from this outright failure, it also suggests that nothing will
update that file if it exists but is out of date relative to its
sources.
Please make sure that the make rules associated with these files look
exactly like the previously-debugged rules for existing bison/flex
output files. There are generally good reasons for every last bit
of weirdness in those.
Uh, I didn't check that carefully enough. Thank you for the explanation. Will fix.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:
On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> > > Attached 36th version of the patches.
> >
> > Thank yo for the revision!
> >
> > In the attached revision following changes are made:
> >
> > > "unknown" refers here to ordinary three-valued logical Unknown, which is
> > > represented in SQL by NULL.
> > >
> > > JSON path expressions return sequences of SQL/JSON items, which are defined by
> > > SQL/JSON data model. But JSON path predicates (logical expressions), which are
> > > used in filters, return three-valued logical values: False, True, or Unknown.
> >
> > * I've added short explanation of this to the documentation.
> > * Removed no longer present data structures from typedefs.list of the
> > first patch.
> > * Moved GIN support patch to number 3. Seems to be well-isolated and
> > not very complex patch. I propose to consider this to 12 too. I
> > added high-level comment there, commit message and made some code
> > beautification.
>
> I think patches 1 and 2 are in committable shape (I reached Tomas
> off-list, he doesn't have more notes regarding them). While patch 3
> requires more review.
>
> I'm going to push 1 and 2 if no objections.
So, pushed. Many thanks to reviewers and authors!
Remaining part I'm proposing for 12 is attached. I appreciate review of it.
I tested this patch and I didn't find any issue - just I tested basic functionality and regress tests.
looks well
Pavel
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal: >> >> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >> > On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov >> > <a.korotkov@postgrespro.ru> wrote: >> > > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >> > > > Attached 36th version of the patches. >> > > >> > > Thank yo for the revision! >> > > >> > > In the attached revision following changes are made: >> > > >> > > > "unknown" refers here to ordinary three-valued logical Unknown, which is >> > > > represented in SQL by NULL. >> > > > >> > > > JSON path expressions return sequences of SQL/JSON items, which are defined by >> > > > SQL/JSON data model. But JSON path predicates (logical expressions), which are >> > > > used in filters, return three-valued logical values: False, True, or Unknown. >> > > >> > > * I've added short explanation of this to the documentation. >> > > * Removed no longer present data structures from typedefs.list of the >> > > first patch. >> > > * Moved GIN support patch to number 3. Seems to be well-isolated and >> > > not very complex patch. I propose to consider this to 12 too. I >> > > added high-level comment there, commit message and made some code >> > > beautification. >> > >> > I think patches 1 and 2 are in committable shape (I reached Tomas >> > off-list, he doesn't have more notes regarding them). While patch 3 >> > requires more review. >> > >> > I'm going to push 1 and 2 if no objections. >> >> So, pushed. Many thanks to reviewers and authors! >> >> Remaining part I'm proposing for 12 is attached. I appreciate review of it. > > > I tested this patch and I didn't find any issue - just I tested basic functionality and regress tests. > > looks well Thank you for your feedback! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sat, Mar 16, 2019 at 9:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Good point. I also see jsonpath_gram.h left behind after maintainer-clean: > > Oh, and of potentially more significance: after maintainer-clean and > re-configure, make fails with > > In file included from jsonpath_gram.y:24: > ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory > > I first thought this was a problem with insufficient dependencies > allowing parallel make to do things in the wrong order, but the problem > repeats even without any parallelism. It looks like the dependencies > have been constructed in such a way that if the symlink at > src/include/utils/jsonpath_gram.h exists but the underlying file > does not, nothing will make the underlying file. This is pretty broken; > aside from this outright failure, it also suggests that nothing will > update that file if it exists but is out of date relative to its > sources. > > Please make sure that the make rules associated with these files look > exactly like the previously-debugged rules for existing bison/flex > output files. There are generally good reasons for every last bit > of weirdness in those. I've pushed a fix. I hope I didn't forget anything. BTW, it appears that windows build scripts also needs some fixup. I'm not very familiar with that. I would be glad if somebody review the attached patch. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 3/17/19 4:03 AM, Alexander Korotkov wrote: > On Sat, Mar 16, 2019 at 9:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>> Good point. I also see jsonpath_gram.h left behind after maintainer-clean: >> Oh, and of potentially more significance: after maintainer-clean and >> re-configure, make fails with >> >> In file included from jsonpath_gram.y:24: >> ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: utils/jsonpath_gram.h: No such file or directory >> >> I first thought this was a problem with insufficient dependencies >> allowing parallel make to do things in the wrong order, but the problem >> repeats even without any parallelism. It looks like the dependencies >> have been constructed in such a way that if the symlink at >> src/include/utils/jsonpath_gram.h exists but the underlying file >> does not, nothing will make the underlying file. This is pretty broken; >> aside from this outright failure, it also suggests that nothing will >> update that file if it exists but is out of date relative to its >> sources. >> >> Please make sure that the make rules associated with these files look >> exactly like the previously-debugged rules for existing bison/flex >> output files. There are generally good reasons for every last bit >> of weirdness in those. > I've pushed a fix. I hope I didn't forget anything. > > BTW, it appears that windows build scripts also needs some fixup. I'm > not very familiar with that. I would be glad if somebody review the > attached patch. Why are we installing the jsonpath_gram.h file? It's not going to be used by anything else, is it? TBH, I'm not sure I see why we're installing the scanner.h file either. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > Why are we installing the jsonpath_gram.h file? It's not going to be > used by anything else, is it? TBH, I'm not sure I see why we're > installing the scanner.h file either. As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only for communication between jsonpath_gram.y and jsonpath_scan.l. Maybe we'd be better off handling that need by compiling the .l file as part of the .y file, as we used to do with the core lexer and still do with several others (cf comments for commit 72b1e3a21); then we wouldn't even have to generate these files much less install them. A quick look at jsonpath_scan.c shows that it's pretty innocent of the tricks we've learned over the years for flex/bison portability; in particular I see that it's #including <stdio.h> before postgres.h, which is a no-no. So that whole area needs more review anyway. regards, tom lane
On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > > Why are we installing the jsonpath_gram.h file? It's not going to be > > used by anything else, is it? TBH, I'm not sure I see why we're > > installing the scanner.h file either. > > As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only > for communication between jsonpath_gram.y and jsonpath_scan.l. Maybe > we'd be better off handling that need by compiling the .l file as part > of the .y file, as we used to do with the core lexer and still do with > several others (cf comments for commit 72b1e3a21); then we wouldn't > even have to generate these files much less install them. > > A quick look at jsonpath_scan.c shows that it's pretty innocent of > the tricks we've learned over the years for flex/bison portability; > in particular I see that it's #including <stdio.h> before postgres.h, > which is a no-no. So that whole area needs more review anyway. Yeah, I didn't review this part of patch carefully enough (and it seems that other reviewers too). I'm going to write a patch revising this part in next couple of days. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/17/19 3:13 AM, Alexander Korotkov wrote: > On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal: >>> >>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov >>> <a.korotkov@postgrespro.ru> wrote: >>>> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov >>>> <a.korotkov@postgrespro.ru> wrote: >>>>> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >>>>>> Attached 36th version of the patches. >>>>> >>>>> Thank yo for the revision! >>>>> >>>>> In the attached revision following changes are made: >>>>> >>>>>> "unknown" refers here to ordinary three-valued logical Unknown, which is >>>>>> represented in SQL by NULL. >>>>>> >>>>>> JSON path expressions return sequences of SQL/JSON items, which are defined by >>>>>> SQL/JSON data model. But JSON path predicates (logical expressions), which are >>>>>> used in filters, return three-valued logical values: False, True, or Unknown. >>>>> >>>>> * I've added short explanation of this to the documentation. >>>>> * Removed no longer present data structures from typedefs.list of the >>>>> first patch. >>>>> * Moved GIN support patch to number 3. Seems to be well-isolated and >>>>> not very complex patch. I propose to consider this to 12 too. I >>>>> added high-level comment there, commit message and made some code >>>>> beautification. >>>> >>>> I think patches 1 and 2 are in committable shape (I reached Tomas >>>> off-list, he doesn't have more notes regarding them). While patch 3 >>>> requires more review. >>>> >>>> I'm going to push 1 and 2 if no objections. >>> >>> So, pushed. Many thanks to reviewers and authors! >>> >>> Remaining part I'm proposing for 12 is attached. I appreciate review of it. >> >> >> I tested this patch and I didn't find any issue - just I tested basic functionality and regress tests. >> >> looks well > > Thank you for your feedback! Like Pavel, I did some basic testing of the patch (on current HEAD 4178d8b91c) trying out various JSON path expressions, and yes, it all worked. I had a brief scare while testing on 4178d8b91c where initdb was failing on the bootstrapping step, but after doing a thorough wipe of build files and my output directory, it seems to be initializing okay. I also did some testing of the GIN patch upthread, as the quickness of retrieval of the data using JSON path is of course important as well. Using a schema roughly like this: CREATE TABLE news_feed ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, data jsonb NOT NULL ); CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data); I loaded in a data set of roughly 420,000 rows. Each row had all the same keys but differing values (e.g. "length" and "content" as keys) I tested a few different JSON path scenarios. Some of the index scans performed way better than the equivalent sequential scans, for instance: SELECT count(*) FROM news_feed WHERE data @? '$.length ? (@ == 200)'; SELECT * FROM news_feed WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")'; Using the index outperformed the sequential scan (and parallel seq scan) by ~10-100x based on my config + laptop hardware! However, when I did something a little more complex, like the below: SELECT count(*) FROM news_feed WHERE data @? '$.length ? (@ < 150)'; SELECT count(*) FROM news_feed WHERE data @? '$.content ? (@ like_regex "^Start")'; SELECT id, jsonb_path_query(data, '$.content') FROM news_feed WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; I would find that the index scan performed as well as the sequential scan. Additionally, on my laptop, the parallel sequential scan would beat the index scan by ~2.5x in some cases. Reading up on what the GIN patch does, this all makes sense: it's optimized for equality, I understand there are challenges to be able to handle inequality, regex exps, etc. And the cases where it really does work well, it's _incredibly_ fast. My suggestion would be adding some additional guidance in the user documentation around how GIN works with the @@ and @? operators so they can understand where GIN will work very well with JSON path + their data and not be surprised when other types of JSON path queries are performing on par with a sequential scan (or worse than a parallel seq scan). Thanks, Jonathan
Attachment
Hi! On Sun, Mar 17, 2019 at 7:46 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > Like Pavel, I did some basic testing of the patch (on current HEAD > 4178d8b91c) trying out various JSON path expressions, and yes, it all > worked. I had a brief scare while testing on 4178d8b91c where initdb was > failing on the bootstrapping step, but after doing a thorough wipe of > build files and my output directory, it seems to be initializing okay. > > I also did some testing of the GIN patch upthread, as the quickness of > retrieval of the data using JSON path is of course important as well. Thank you very much for testing! > Using a schema roughly like this: > > CREATE TABLE news_feed ( > id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, > data jsonb NOT NULL > ); > CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data); > > I loaded in a data set of roughly 420,000 rows. Each row had all the > same keys but differing values (e.g. "length" and "content" as keys) > > I tested a few different JSON path scenarios. Some of the index scans > performed way better than the equivalent sequential scans, for instance: > > SELECT count(*) > FROM news_feed > WHERE data @? '$.length ? (@ == 200)'; > > SELECT * > FROM news_feed > WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")'; > > Using the index outperformed the sequential scan (and parallel seq scan) > by ~10-100x based on my config + laptop hardware! Great! > However, when I did something a little more complex, like the below: > > SELECT count(*) > FROM news_feed > WHERE data @? '$.length ? (@ < 150)'; > > SELECT count(*) > FROM news_feed > WHERE data @? '$.content ? (@ like_regex "^Start")'; > > SELECT id, jsonb_path_query(data, '$.content') > FROM news_feed > WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; > > I would find that the index scan performed as well as the sequential > scan. Additionally, on my laptop, the parallel sequential scan would > beat the index scan by ~2.5x in some cases. Yeah, this cases are not supported. Did optimizer automatically select sequential scan in this case (if not touching enable_* variables)? It should, because optimizer understands that GIN scan will be bad if extract_query method failed to extract anything. > Reading up on what the GIN patch does, this all makes sense: it's > optimized for equality, I understand there are challenges to be able to > handle inequality, regex exps, etc. And the cases where it really does > work well, it's _incredibly_ fast. Yes, for more complex cases, we need different opclasses. For instance, we can consider porting jsquery opclasses to PG 13. And it become even more important to get parametrized opclasses, because we don't necessary want to index all the json fields in this same way. That's another challenge for future releases. But what we have now is just support for some of jsonpathes for existing opclasses. > My suggestion would be adding some additional guidance in the user > documentation around how GIN works with the @@ and @? operators so they > can understand where GIN will work very well with JSON path + their data > and not be surprised when other types of JSON path queries are > performing on par with a sequential scan (or worse than a parallel seq > scan). Good point. Will do. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/17/19 12:55 PM, Alexander Korotkov wrote: > >> However, when I did something a little more complex, like the below: >> >> SELECT count(*) >> FROM news_feed >> WHERE data @? '$.length ? (@ < 150)'; >> >> SELECT count(*) >> FROM news_feed >> WHERE data @? '$.content ? (@ like_regex "^Start")'; >> >> SELECT id, jsonb_path_query(data, '$.content') >> FROM news_feed >> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; >> >> I would find that the index scan performed as well as the sequential >> scan. Additionally, on my laptop, the parallel sequential scan would >> beat the index scan by ~2.5x in some cases. > > Yeah, this cases are not supported. Did optimizer automatically > select sequential scan in this case (if not touching enable_* > variables)? It should, because optimizer understands that GIN scan > will be bad if extract_query method failed to extract anything. It did not - it was doing a bitmap heap scan. I have default costs setup. Example output from EXPLAIN ANALYZE with the index available: Aggregate (cost=1539.78..1539.79 rows=1 width=8) (actual time=270.419..270.419 rows=1 loops=1) -> Bitmap Heap Scan on news_feed (cost=23.24..1538.73 rows=418 width=0) (actual time=84.040..270.407 rows=5 loops=1) Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath) Rows Removed by Index Recheck: 418360 Heap Blocks: exact=28690 -> Bitmap Index Scan on news_feed_data_gin_idx (cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788 rows=418365 loops=1) Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath) Planning Time: 0.168 ms Execution Time: 271.105 ms And for arguments sake, after I dropped the index (and max_parallel_workers = 8): Finalize Aggregate (cost=30998.07..30998.08 rows=1 width=8) (actual time=91.062..91.062 rows=1 loops=1) -> Gather (cost=30997.65..30998.06 rows=4 width=8) (actual time=90.892..97.739 rows=5 loops=1) Workers Planned: 4 Workers Launched: 4 -> Partial Aggregate (cost=29997.65..29997.66 rows=1 width=8) (actual time=76.977..76.977 rows=1 loops=5) -> Parallel Seq Scan on news_feed (cost=0.00..29997.39 rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5) Filter: (data @? '$."length"?(@ < 150)'::jsonpath) Rows Removed by Filter: 83672 Planning Time: 0.127 ms Execution Time: 97.801 ms >> Reading up on what the GIN patch does, this all makes sense: it's >> optimized for equality, I understand there are challenges to be able to >> handle inequality, regex exps, etc. And the cases where it really does >> work well, it's _incredibly_ fast. > > Yes, for more complex cases, we need different opclasses. For > instance, we can consider porting jsquery opclasses to PG 13. And it > become even more important to get parametrized opclasses, because we > don't necessary want to index all the json fields in this same way. > That's another challenge for future releases. But what we have now is > just support for some of jsonpathes for existing opclasses. Yeah, that makes sense, and seems to be my recollection from the several years of presentations I've seen on the topic ;) >> My suggestion would be adding some additional guidance in the user >> documentation around how GIN works with the @@ and @? operators so they >> can understand where GIN will work very well with JSON path + their data >> and not be surprised when other types of JSON path queries are >> performing on par with a sequential scan (or worse than a parallel seq >> scan). > > Good point. Will do. Thanks! Jonathan
Attachment
On Sun, Mar 17, 2019 at 8:00 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > On 3/17/19 12:55 PM, Alexander Korotkov wrote: > > > >> However, when I did something a little more complex, like the below: > >> > >> SELECT count(*) > >> FROM news_feed > >> WHERE data @? '$.length ? (@ < 150)'; > >> > >> SELECT count(*) > >> FROM news_feed > >> WHERE data @? '$.content ? (@ like_regex "^Start")'; > >> > >> SELECT id, jsonb_path_query(data, '$.content') > >> FROM news_feed > >> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; > >> > >> I would find that the index scan performed as well as the sequential > >> scan. Additionally, on my laptop, the parallel sequential scan would > >> beat the index scan by ~2.5x in some cases. > > > > Yeah, this cases are not supported. Did optimizer automatically > > select sequential scan in this case (if not touching enable_* > > variables)? It should, because optimizer understands that GIN scan > > will be bad if extract_query method failed to extract anything. > > It did not - it was doing a bitmap heap scan. I have default costs > setup. Example output from EXPLAIN ANALYZE with the index available: > > Aggregate (cost=1539.78..1539.79 rows=1 width=8) (actual > time=270.419..270.419 rows=1 loops=1) > -> Bitmap Heap Scan on news_feed (cost=23.24..1538.73 rows=418 > width=0) (actual time=84.040..270.407 rows=5 loops=1) > Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath) > Rows Removed by Index Recheck: 418360 > Heap Blocks: exact=28690 > -> Bitmap Index Scan on news_feed_data_gin_idx > (cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788 > rows=418365 loops=1) > Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath) > Planning Time: 0.168 ms > Execution Time: 271.105 ms > > And for arguments sake, after I dropped the index (and > max_parallel_workers = 8): > > Finalize Aggregate (cost=30998.07..30998.08 rows=1 width=8) (actual > time=91.062..91.062 rows=1 loops=1) > -> Gather (cost=30997.65..30998.06 rows=4 width=8) (actual > time=90.892..97.739 rows=5 loops=1) > Workers Planned: 4 > Workers Launched: 4 > -> Partial Aggregate (cost=29997.65..29997.66 rows=1 width=8) > (actual time=76.977..76.977 rows=1 loops=5) > -> Parallel Seq Scan on news_feed (cost=0.00..29997.39 > rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5) > Filter: (data @? '$."length"?(@ < 150)'::jsonpath) > Rows Removed by Filter: 83672 > Planning Time: 0.127 ms > Execution Time: 97.801 ms Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/17/19 1:02 PM, Alexander Korotkov wrote: > > Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops? I just used "USING gin(col)" so jsonb_ops. Thanks, Jonathan
Attachment
On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > On 3/17/19 1:02 PM, Alexander Korotkov wrote: > > > > Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops? > > I just used "USING gin(col)" so jsonb_ops. I see. So, jsonb_ops extracts from this query only existence of .length key. And I can bet it exists in all (or almost all) the documents. Thus, optimizer thinks index might be useful, while it's useless. There is not much can be done while we don't have statistics for jsonb (and access to it from GIN extract_query). So, for now we can just refuse from extracting only keys from jsonpath in jsonb_ops. But I think it would be better to just document this issue. In future we should improve that with statistics. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/17/19 1:14 PM, Alexander Korotkov wrote: > On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: >> On 3/17/19 1:02 PM, Alexander Korotkov wrote: >>> >>> Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops? >> >> I just used "USING gin(col)" so jsonb_ops. > > I see. So, jsonb_ops extracts from this query only existence of > .length key. And I can bet it exists in all (or almost all) the > documents. Thus, optimizer thinks index might be useful, while it's > useless. There is not much can be done while we don't have statistics > for jsonb (and access to it from GIN extract_query). So, for now we > can just refuse from extracting only keys from jsonpath in jsonb_ops. > But I think it would be better to just document this issue. In future > we should improve that with statistics. That seems to make sense, especially given how I've typically stored JSON documents in PostgreSQL. It sounds like this particular problem would be solved appropriately with JSONB statistics. Thanks, Jonathan
Attachment
On 17.03.2019 21:29, Jonathan S. Katz wrote:
On 3/17/19 1:14 PM, Alexander Korotkov wrote:On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:On 3/17/19 1:02 PM, Alexander Korotkov wrote:Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops?I just used "USING gin(col)" so jsonb_ops.I see. So, jsonb_ops extracts from this query only existence of .length key. And I can bet it exists in all (or almost all) the documents. Thus, optimizer thinks index might be useful, while it's useless. There is not much can be done while we don't have statistics for jsonb (and access to it from GIN extract_query). So, for now we can just refuse from extracting only keys from jsonpath in jsonb_ops. But I think it would be better to just document this issue. In future we should improve that with statistics.That seems to make sense, especially given how I've typically stored JSON documents in PostgreSQL. It sounds like this particular problem would be solved appropriately with JSONB statistics.
GIN jsonb_ops extracts from query data @? '$.length ? (@ < 150)' the same GIN entries as from queries data @? '$.length' data ? 'length' If you don't want to extract entries from unsupported expressions, you can try to use another jsonpath operator @@. Queries will also look like a bit simpler: data @@ '$.length < 150' data @@ '$.content like_regex "^Start"' data @@ '$.content like_regex "risk" flag "i"' All this queries emit no GIN entries. But note that data @@ '$ ? (@.content == "foo").length < 150' emits the same entries as data @@ '$.content == "foo"' We already have a POC implementation of jsonb statistics that was written 2 years ago. I rebased it onto the current master yesterday. If it is interesting, you can find it on my GitHub [1]. But note, that there is no support for jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and CONTAINS @> operators are supported. Also there is no docs, and it works slowly (a more effective storage method for statistics of individual JSON paths is needed). Also there is ability to calculate derived statistics of expressions like js -> 'x' -> 0 -> 'y' js #> '{x,0,y}' using jsonb statistics for columns "js". So the selectivity of expressions js -> 'x' -> 0 -> 'y' = '123' js #> '{x,0,y}' >= '123' also can be estimated (but these expressions can't be used by index on "js"). This topic deserves a separate discussion. I hope, we will start the corresponding thread for PG13 after we find a more effective way of jsonb statistics storing. [1] https://github.com/glukhovn/postgres/tree/jsonb_stats
-- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Just another minor bitch about this patch: jsonpath_scan.l has introduced a typedef called "keyword". This is causing pgindent to produce seriously ugly results in libpq, and probably in other places where that is used as a field or variable name. Please rename that typedef to something less generic. regards, tom lane
On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Just another minor bitch about this patch: jsonpath_scan.l has introduced > a typedef called "keyword". This is causing pgindent to produce seriously > ugly results in libpq, and probably in other places where that is used as > a field or variable name. Please rename that typedef to something less > generic. Ooops... I propose to rename it to KeyWord, which is already typedef'ed in formatting.c. See the attached patch. Is it OK? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Just another minor bitch about this patch: jsonpath_scan.l has introduced >> a typedef called "keyword". This is causing pgindent to produce seriously >> ugly results in libpq, and probably in other places where that is used as >> a field or variable name. Please rename that typedef to something less >> generic. > Ooops... I propose to rename it to KeyWord, which is already > typedef'ed in formatting.c. See the attached patch. Is it OK? I had in mind JsonPathKeyword or something like that. If you re-use formatting.c's typedef name, pgindent won't care, but it's possible you'd be in for unhappiness when trying to look at these structs in gdb for instance. regards, tom lane
po 18. 3. 2019 v 21:23 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just another minor bitch about this patch: jsonpath_scan.l has introduced
>> a typedef called "keyword". This is causing pgindent to produce seriously
>> ugly results in libpq, and probably in other places where that is used as
>> a field or variable name. Please rename that typedef to something less
>> generic.
> Ooops... I propose to rename it to KeyWord, which is already
> typedef'ed in formatting.c. See the attached patch. Is it OK?
I had in mind JsonPathKeyword or something like that. If you re-use
formatting.c's typedef name, pgindent won't care, but it's possible
you'd be in for unhappiness when trying to look at these structs in
gdb for instance.
JsonPathKeyword is better verbose
Pavel
regards, tom lane
On Sun, Feb 24, 2019 at 5:03 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote: > > Why -CF, and why is -p repeated? > > BTW, for our SQL grammar we have > > > scan.c: FLEXFLAGS = -CF -p -p > > Is it kind of default? I just saw this in the committed patch. This is not default, it's chosen for maximum performance at the expense of binary/memory size. That's fine, but with a little effort you can also make the scanner non-backtracking for additional performance, as in the attached. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Mar 18, 2019 at 11:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > On Mon, Mar 18, 2019 at 10:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Just another minor bitch about this patch: jsonpath_scan.l has introduced > >> a typedef called "keyword". This is causing pgindent to produce seriously > >> ugly results in libpq, and probably in other places where that is used as > >> a field or variable name. Please rename that typedef to something less > >> generic. > > > Ooops... I propose to rename it to KeyWord, which is already > > typedef'ed in formatting.c. See the attached patch. Is it OK? > > I had in mind JsonPathKeyword or something like that. If you re-use > formatting.c's typedef name, pgindent won't care, but it's possible > you'd be in for unhappiness when trying to look at these structs in > gdb for instance. Good point, thanks! Pushed. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Mar 19, 2019 at 12:23 PM John Naylor <john.naylor@2ndquadrant.com> wrote: > On Sun, Feb 24, 2019 at 5:03 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > On Wed, Jan 30, 2019 at 5:28 AM Andres Freund <andres@anarazel.de> wrote: > > > Why -CF, and why is -p repeated? > > > > BTW, for our SQL grammar we have > > > > > scan.c: FLEXFLAGS = -CF -p -p > > > > Is it kind of default? > > I just saw this in the committed patch. This is not default, it's > chosen for maximum performance at the expense of binary/memory size. > That's fine, but with a little effort you can also make the scanner > non-backtracking for additional performance, as in the attached. We're working on patchset improving flex scanner. Already have similar change among the others. But thanks a lot for your attention! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > > Why are we installing the jsonpath_gram.h file? It's not going to be > > used by anything else, is it? TBH, I'm not sure I see why we're > > installing the scanner.h file either. > > As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only > for communication between jsonpath_gram.y and jsonpath_scan.l. Maybe > we'd be better off handling that need by compiling the .l file as part > of the .y file, as we used to do with the core lexer and still do with > several others (cf comments for commit 72b1e3a21); then we wouldn't > even have to generate these files much less install them. > > A quick look at jsonpath_scan.c shows that it's pretty innocent of > the tricks we've learned over the years for flex/bison portability; > in particular I see that it's #including <stdio.h> before postgres.h, > which is a no-no. So that whole area needs more review anyway. Attached patch is getting rid of jsonpath_gram.h. Would like to see results of http://commitfest.cputube.org/ before committing, because I'm not available to test this of windows machine. There would be further patches rearranging jsonpath_gram.y and jsonpath_scan.l. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Mar 19, 2019 at 8:10 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sun, Mar 17, 2019 at 6:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > > > Why are we installing the jsonpath_gram.h file? It's not going to be > > > used by anything else, is it? TBH, I'm not sure I see why we're > > > installing the scanner.h file either. > > > > As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only > > for communication between jsonpath_gram.y and jsonpath_scan.l. Maybe > > we'd be better off handling that need by compiling the .l file as part > > of the .y file, as we used to do with the core lexer and still do with > > several others (cf comments for commit 72b1e3a21); then we wouldn't > > even have to generate these files much less install them. > > > > A quick look at jsonpath_scan.c shows that it's pretty innocent of > > the tricks we've learned over the years for flex/bison portability; > > in particular I see that it's #including <stdio.h> before postgres.h, > > which is a no-no. So that whole area needs more review anyway. > > Attached patch is getting rid of jsonpath_gram.h. Would like to see > results of http://commitfest.cputube.org/ before committing, because > I'm not available to test this of windows machine. There would be > further patches rearranging jsonpath_gram.y and jsonpath_scan.l. Attaches patches improving jsonpath parser. First one introduces cosmetic changes, while second gets rid of backtracking. I'm also planning to add high-level comment for both grammar and lexer. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi. Attached patch enables throwing of errors in jsonb_path_match() in its non-silent mode when the jsonpath expression failed to return a singleton boolean. Previously, NULL was always returned, and it seemed to be inconsistent with the behavior of other functions, in which structural and other errors were not suppressed in non-silent mode. We also think that jsonb_path_match() needs to be renamed, because its current name is confusing to many users. "Match" name is more suitable for jsonpath-based pattern matching like that (maybe we'll implement it later): jsonb_path_match( '{ "a": 1, "b": 2, "c": "str" }', '{ "a": 1, "b": @ > 1, * : @.type == "string" }' ) Below are some possible name variants: jsonb_path_predicate() (original name) jsonb_path_pred() jsonb_path_test() jsonb_path_check() jsonb_path_bool() -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Attaches patches improving jsonpath parser. First one introduces > cosmetic changes, while second gets rid of backtracking. I'm also > planning to add high-level comment for both grammar and lexer. The cosmetic changes look good to me. I just noticed a couple things about the comments. 0001: +/* Check if current scanstring value constitutes a keyword */ 'is a keyword' is better. 'Constitutes' implies parts of a whole. + * Resize scanstring for appending of given length. Reinitilize if required. s/Reinitilize/Reinitialize/ The first sentence is not entirely clear to me. 0002: These two rules are not strictly necessary: <xnq,xq,xvq,xsq>{unicode}+\\ { /* throw back the \\, and treat as unicode */ yyless(yyleng - 1); parseUnicode(yytext, yyleng); } <xnq,xq,xvq,xsq>{hex_char}+\\ { /* throw back the \\, and treat as hex */ yyless(yyleng - 1); parseHexChars(yytext, yyleng); } ...and only seem to be there because of how these are written: <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); } <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); } <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); } <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character sequence is invalid"); } I don't understand the reasoning here -- is it a micro-optimization? The following is simpler, allow the rules I mentioned to be removed, and make check still passes. I would prefer it unless there is a performance penalty, in which case a comment to describe the additional complexity would be helpful. <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); } <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); } <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); } <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is invalid"); } -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 22 Mar 2019, 03:14 Nikita Glukhov, <n.gluhov@postgrespro.ru> wrote:
Hi. Attached patch enables throwing of errors in jsonb_path_match() in its non-silent mode when the jsonpath expression failed to return a singleton boolean. Previously, NULL was always returned, and it seemed to be inconsistent with the behavior of other functions, in which structural and other errors were not suppressed in non-silent mode. We also think that jsonb_path_match() needs to be renamed, because its current name is confusing to many users. "Match" name is more suitable for jsonpath-based pattern matching like that (maybe we'll implement it later): jsonb_path_match( '{ "a": 1, "b": 2, "c": "str" }', '{ "a": 1, "b": @ > 1, * : @.type == "string" }' )
Would be very useful for constraints.
Below are some possible name variants: jsonb_path_predicate() (original name)
Too long
jsonb_path_pred() jsonb_path_test() jsonb_path_check()
Check is good to me
jsonb_path_bool() -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 21.03.2019 16:58, Alexander Korotkov wrote:
On Tue, Mar 19, 2019 at 8:10 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:Attaches patches improving jsonpath parser. First one introduces cosmetic changes, while second gets rid of backtracking. I'm also planning to add high-level comment for both grammar and lexer.
Parsing of integers now is wrong: neither JSON specification, nor our json[b] allow leading zeros in integers and floats starting with a dot. =# SELECT json '.1'; ERROR: invalid input syntax for type json LINE 1: SELECT jsonb '.1'; ^ DETAIL: Token "." is invalid. CONTEXT: JSON data, line 1: .... =# SELECT json '00'; ERROR: invalid input syntax for type json LINE 1: SELECT jsonb '00'; ^ DETAIL: Token "00" is invalid. CONTEXT: JSON data, line 1: 00 =# SELECT json '00.1'; ERROR: invalid input syntax for type json LINE 1: SELECT jsonb '00.1'; ^ DETAIL: Token "00" is invalid. CONTEXT: JSON data, line 1: 00... In JavaScript integers with leading zero are treated as octal numbers, but leading dot is a allowed: v8 > 0123 83 v8 > 000.1 Uncaught SyntaxError: Unexpected number v8> .1 0.1 Attached patch 0003 fixes this issue. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Fri, Mar 22, 2019 at 5:38 AM John Naylor <john.naylor@2ndquadrant.com> wrote: > On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Attaches patches improving jsonpath parser. First one introduces > > cosmetic changes, while second gets rid of backtracking. I'm also > > planning to add high-level comment for both grammar and lexer. > > The cosmetic changes look good to me. I just noticed a couple things > about the comments. > > 0001: > > +/* Check if current scanstring value constitutes a keyword */ > > 'is a keyword' is better. 'Constitutes' implies parts of a whole. > > + * Resize scanstring for appending of given length. Reinitilize if required. > > s/Reinitilize/Reinitialize/ > > The first sentence is not entirely clear to me. Thank you, fixed. > 0002: > > These two rules are not strictly necessary: > > <xnq,xq,xvq,xsq>{unicode}+\\ { > /* throw back the \\, and treat as unicode */ > yyless(yyleng - 1); > parseUnicode(yytext, yyleng); > } > > <xnq,xq,xvq,xsq>{hex_char}+\\ { > /* throw back the \\, and treat as hex */ > yyless(yyleng - 1); > parseHexChars(yytext, yyleng); > } > > ...and only seem to be there because of how these are written: > > <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); } > <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); } > <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode > sequence is invalid"); } > <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character > sequence is invalid"); } > > I don't understand the reasoning here -- is it a micro-optimization? > The following is simpler, allow the rules I mentioned to be removed, > and make check still passes. I would prefer it unless there is a > performance penalty, in which case a comment to describe the > additional complexity would be helpful. > > <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); } > <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); } > <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); } > <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is > invalid"); } These rules are needed for unicode. Sequential escaped unicode characters might be connected by hi surrogate value. See jsonpath_encoding regression test in attached patch. Regarding hex, I made it so for the sake of uniformity. But I changed my mind and decided that simpler flex rules are more important. So, now they are considered one-by-one. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 2019-03-23 14:28:31.147 CET [18056:45] pg_regress/jsonpath LOG: statement: select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a== 4) && @.b == 7)'::jsonpath; 2019-03-23 14:28:31.157 CET [18055:59] pg_regress/jsonb_jsonpath LOG: statement: select jsonb_path_query('1', 'lax $.a'); 2019-03-23 14:28:31.163 CET [9597:311] LOG: server process (PID 18056) was terminated by signal 11: Segmentation fault 2019-03-23 14:28:31.163 CET [9597:312] DETAIL: Failed process was running: select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a== 4) && @.b == 7)'::jsonpath; 2019-03-23 14:28:31.164 CET [9597:313] LOG: terminating any other active server processes Something's not quite right... Note this appears to be 32bit sparc. - Andres
On Sun, Mar 24, 2019 at 7:45 PM Andres Freund <andres@anarazel.de> wrote: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 > > 2019-03-23 14:28:31.147 CET [18056:45] pg_regress/jsonpath LOG: statement: select '$.g ? (@.a == 1 || !(@.x >= 123 ||@.a == 4) && @.b == 7)'::jsonpath; > 2019-03-23 14:28:31.157 CET [18055:59] pg_regress/jsonb_jsonpath LOG: statement: select jsonb_path_query('1', 'lax $.a'); > 2019-03-23 14:28:31.163 CET [9597:311] LOG: server process (PID 18056) was terminated by signal 11: Segmentation fault > 2019-03-23 14:28:31.163 CET [9597:312] DETAIL: Failed process was running: select '$.g ? (@.a == 1 || !(@.x >= 123 ||@.a == 4) && @.b == 7)'::jsonpath; > 2019-03-23 14:28:31.164 CET [9597:313] LOG: terminating any other > active server processes > > Something's not quite right... Note this appears to be 32bit sparc. Thank you for pointing. Will investigate. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, Mar 24, 2019 at 9:09 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sun, Mar 24, 2019 at 7:45 PM Andres Freund <andres@anarazel.de> wrote: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 > > > > 2019-03-23 14:28:31.147 CET [18056:45] pg_regress/jsonpath LOG: statement: select '$.g ? (@.a == 1 || !(@.x >= 123 ||@.a == 4) && @.b == 7)'::jsonpath; > > 2019-03-23 14:28:31.157 CET [18055:59] pg_regress/jsonb_jsonpath LOG: statement: select jsonb_path_query('1', 'lax $.a'); > > 2019-03-23 14:28:31.163 CET [9597:311] LOG: server process (PID 18056) was terminated by signal 11: Segmentation fault > > 2019-03-23 14:28:31.163 CET [9597:312] DETAIL: Failed process was running: select '$.g ? (@.a == 1 || !(@.x >= 123 ||@.a == 4) && @.b == 7)'::jsonpath; > > 2019-03-23 14:28:31.164 CET [9597:313] LOG: terminating any other > > active server processes > > > > Something's not quite right... Note this appears to be 32bit sparc. > > Thank you for pointing. Will investigate. Got access to that buildfarm animal thanks to Tom Turelinckx. Now running check-world in a loop on the same commit hash with same build options. Error wasn't triggered yet. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Mar 26, 2019 at 6:06 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, Mar 26, 2019 at 5:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > > Got access to that buildfarm animal thanks to Tom Turelinckx. Now > > > running check-world in a loop on the same commit hash with same build > > > options. Error wasn't triggered yet. > > > > I notice that snapper is using force_parallel_mode = regress ... > > have you got that enabled in your manual test? > > Nope. Thank you for pointing! I've rerun my test loop with this... Still no reproduction. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > Still no reproduction. Annoying, but it's probably not worth expending more effort on right now. I wonder whether that buildfarm animal can be upgraded to capture core dump stack traces --- if so, then if it happens again we'd have more info. regards, tom lane
On Wed, Mar 27, 2019 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > Still no reproduction. > > Annoying, but it's probably not worth expending more effort on > right now. I wonder whether that buildfarm animal can be upgraded > to capture core dump stack traces --- if so, then if it happens > again we'd have more info. Hopefully, Andrew will manage to get a backtrace [1]. BTW, while searching for this bug, I've collected this backtrace using valgrind. ==00:00:00:14.596 10866== Conditional jump or move depends on uninitialised value(s) ==00:00:00:14.596 10866== at 0x579F8C4: ____strtod_l_internal (in /usr/lib64/libc-2.17.so) ==00:00:00:14.596 10866== by 0x771561: float8in_internal_opt_error (float.c:394) ==00:00:00:14.596 10866== by 0x7718B9: float8in_internal (float.c:515) ==00:00:00:14.596 10866== by 0x7718B9: float8in (float.c:336) ==00:00:00:14.596 10866== by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803) ==00:00:00:14.596 10866== by 0x7C9649: numeric_float8 (numeric.c:3417) ==00:00:00:14.596 10866== by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803) ==00:00:00:14.596 10866== by 0x7A1D8D: jsonb_float8 (jsonb.c:2058) ==00:00:00:14.596 10866== by 0x5F8F54: ExecInterpExpr (execExprInterp.c:649) ==00:00:00:14.596 10866== by 0x6A2E19: ExecEvalExprSwitchContext (executor.h:307) ==00:00:00:14.596 10866== by 0x6A2E19: evaluate_expr (clauses.c:4827) ==00:00:00:14.596 10866== by 0x6A45FF: evaluate_function (clauses.c:4369) ==00:00:00:14.596 10866== by 0x6A45FF: simplify_function (clauses.c:3999) ==00:00:00:14.596 10866== by 0x6A31C1: eval_const_expressions_mutator (clauses.c:2474) ==00:00:00:14.596 10866== by 0x644466: expression_tree_mutator (nodeFuncs.c:3072) ==00:00:00:14.596 10866== { <insert_a_suppression_name_here> Memcheck:Cond fun:____strtod_l_internal fun:float8in_internal_opt_error fun:float8in_internal fun:float8in fun:DirectFunctionCall1Coll fun:numeric_float8 fun:DirectFunctionCall1Coll fun:jsonb_float8 fun:ExecInterpExpr fun:ExecEvalExprSwitchContext fun:evaluate_expr fun:evaluate_function fun:simplify_function fun:eval_const_expressions_mutator fun:expression_tree_mutator } Not sure whether it's related to 16d489b0fe. Will investigate more. 1. https://www.postgresql.org/message-id/91ff584f-75d2-471f-4d9e-9ee8f09cdc1d%402ndQuadrant.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Wed, Mar 27, 2019 at 05:37:40PM +0300, Alexander Korotkov wrote: >On Wed, Mar 27, 2019 at 4:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> > Still no reproduction. >> >> Annoying, but it's probably not worth expending more effort on >> right now. I wonder whether that buildfarm animal can be upgraded >> to capture core dump stack traces --- if so, then if it happens >> again we'd have more info. > >Hopefully, Andrew will manage to get a backtrace [1]. > >BTW, while searching for this bug, I've collected this backtrace using valgrind. > >==00:00:00:14.596 10866== Conditional jump or move depends on >uninitialised value(s) >==00:00:00:14.596 10866== at 0x579F8C4: ____strtod_l_internal (in >/usr/lib64/libc-2.17.so) >==00:00:00:14.596 10866== by 0x771561: float8in_internal_opt_error >(float.c:394) >==00:00:00:14.596 10866== by 0x7718B9: float8in_internal (float.c:515) >==00:00:00:14.596 10866== by 0x7718B9: float8in (float.c:336) >==00:00:00:14.596 10866== by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803) >==00:00:00:14.596 10866== by 0x7C9649: numeric_float8 (numeric.c:3417) >==00:00:00:14.596 10866== by 0x842D43: DirectFunctionCall1Coll (fmgr.c:803) >==00:00:00:14.596 10866== by 0x7A1D8D: jsonb_float8 (jsonb.c:2058) >==00:00:00:14.596 10866== by 0x5F8F54: ExecInterpExpr (execExprInterp.c:649) >==00:00:00:14.596 10866== by 0x6A2E19: ExecEvalExprSwitchContext >(executor.h:307) >==00:00:00:14.596 10866== by 0x6A2E19: evaluate_expr (clauses.c:4827) >==00:00:00:14.596 10866== by 0x6A45FF: evaluate_function (clauses.c:4369) >==00:00:00:14.596 10866== by 0x6A45FF: simplify_function (clauses.c:3999) >==00:00:00:14.596 10866== by 0x6A31C1: >eval_const_expressions_mutator (clauses.c:2474) >==00:00:00:14.596 10866== by 0x644466: expression_tree_mutator >(nodeFuncs.c:3072) >==00:00:00:14.596 10866== >{ > <insert_a_suppression_name_here> > Memcheck:Cond > fun:____strtod_l_internal > fun:float8in_internal_opt_error > fun:float8in_internal > fun:float8in > fun:DirectFunctionCall1Coll > fun:numeric_float8 > fun:DirectFunctionCall1Coll > fun:jsonb_float8 > fun:ExecInterpExpr > fun:ExecEvalExprSwitchContext > fun:evaluate_expr > fun:evaluate_function > fun:simplify_function > fun:eval_const_expressions_mutator > fun:expression_tree_mutator >} > >Not sure whether it's related to 16d489b0fe. Will investigate more. > This might be another case of false positive due to SSE (which I think is used by strtod in some cases). But I'd expect strncasecmp in the stack in that case, so maybe that's not it. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/27/19 9:48 AM, Tom Lane wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> Still no reproduction. > Annoying, but it's probably not worth expending more effort on > right now. I wonder whether that buildfarm animal can be upgraded > to capture core dump stack traces --- if so, then if it happens > again we'd have more info. > > I was able to get this stack trace. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > I was able to get this stack trace. > > (gdb) bt > #0 0x00007ffb9ce6a458 in ntdll!RtlRaiseStatus () > from C:\WINDOWS\SYSTEM32\ntdll.dll > #1 0x00007ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll > #2 0x00007ffb9ac52e1a in msvcrt!_setjmpex () > from C:\WINDOWS\System32\msvcrt.dll > #3 0x000000000087431a in pg_re_throw () > at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720 > #4 0x0000000000874106 in errfinish (dummy=<optimized out>) > at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464 > #5 0x00000000007cc938 in jsonpath_yyerror (result=result@entry=0x0, > message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag of LIKE_REGEX predicate") > at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305 > #6 0x00000000007cec9d in makeItemLikeRegex (pattern=<optimized out>, > pattern=<optimized out>, flags=<optimized out>, expr=0x73c7a80) > at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512 Hmm. Reaching the yyerror call is expected given this input, but seemingly the siglongjmp stack has been trashed? The best I can think of is a wild store that either occurs only on this platform or happens to be harmless elsewhere ... but neither idea is terribly convincing. BTW, the expected behavior according to the regression test is regression=# select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath; ERROR: bad jsonpath representation LINE 1: select '$ ? (@ like_regex "pattern" flag "a"'::jsonpath; ^ DETAIL: unrecognized flag of LIKE_REGEX predicate at or near """ which leaves quite a lot to be desired already. The "bad whatever" error wording is a flat out violation of our message style guide, while the "at or near """" bit is pretty darn unhelpful. The latter problem occurs because the last flex production was <xq>\" { yylval->str = scanstring; BEGIN INITIAL; return STRING_P; } so that flex thinks the last token was just the quote mark ending the string. This could be improved on by adopting something similar to the SET_YYLLOC() convention used in scan.l to remember the start of what the user would think the token is. Probably it's not worth the work right now, but details like this are important from a fit-and-finish perspective, so I'd like to see it improved sometime. regards, tom lane
On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 3/27/19 9:48 AM, Tom Lane wrote: > > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > >> Still no reproduction. > > Annoying, but it's probably not worth expending more effort on > > right now. I wonder whether that buildfarm animal can be upgraded > > to capture core dump stack traces --- if so, then if it happens > > again we'd have more info. > > I was able to get this stack trace. Thank you very much! It appears to be hard to investigate this even with backtrace. You told you can almost reliably reproduce this. Could you please, find exact commit caused this error using "git bisect"? I would very appreciate this. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/28/19 1:01 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> I was able to get this stack trace. >> >> (gdb) bt >> #0 0x00007ffb9ce6a458 in ntdll!RtlRaiseStatus () >> from C:\WINDOWS\SYSTEM32\ntdll.dll >> #1 0x00007ffb9ce7760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll >> #2 0x00007ffb9ac52e1a in msvcrt!_setjmpex () >> from C:\WINDOWS\System32\msvcrt.dll >> #3 0x000000000087431a in pg_re_throw () >> at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:1720 >> #4 0x0000000000874106 in errfinish (dummy=<optimized out>) >> at c:/MinGW/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:464 >> #5 0x00000000007cc938 in jsonpath_yyerror (result=result@entry=0x0, >> message=message@entry=0xab0868 <__func__.110231+1592> "unrecognized flag of LIKE_REGEX predicate") >> at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_scan.l:305 >> #6 0x00000000007cec9d in makeItemLikeRegex (pattern=<optimized out>, >> pattern=<optimized out>, flags=<optimized out>, expr=0x73c7a80) >> at /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonpath_gram.y:512 > Hmm. Reaching the yyerror call is expected given this input, but > seemingly the siglongjmp stack has been trashed? The best I can > think of is a wild store that either occurs only on this platform > or happens to be harmless elsewhere ... but neither idea is terribly > convincing. Further data point: if I just call the offending statement alone, there's no crash. The crash only occurs when I call the whole script. I'll see if I can triangulate a bit to get a minimal test case. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/28/19 5:38 AM, Alexander Korotkov wrote: > On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: >> On 3/27/19 9:48 AM, Tom Lane wrote: >>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >>>> Still no reproduction. >>> Annoying, but it's probably not worth expending more effort on >>> right now. I wonder whether that buildfarm animal can be upgraded >>> to capture core dump stack traces --- if so, then if it happens >>> again we'd have more info. >> I was able to get this stack trace. > Thank you very much! It appears to be hard to investigate this even > with backtrace. You told you can almost reliably reproduce this. > Could you please, find exact commit caused this error using "git > bisect"? I would very appreciate this. > I'll try. It's time consuming given how long builds take. Here's an interesting data point. If I run the whole jsonpath.sql script it crashes every time. If I run just the offending statement it crashes exactly every other time. It looks like in that case something gets clobbered and then cleared. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 28, 2019 at 3:25 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 3/28/19 5:38 AM, Alexander Korotkov wrote: > > On Thu, Mar 28, 2019 at 5:55 AM Andrew Dunstan > > <andrew.dunstan@2ndquadrant.com> wrote: > >> On 3/27/19 9:48 AM, Tom Lane wrote: > >>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > >>>> Still no reproduction. > >>> Annoying, but it's probably not worth expending more effort on > >>> right now. I wonder whether that buildfarm animal can be upgraded > >>> to capture core dump stack traces --- if so, then if it happens > >>> again we'd have more info. > >> I was able to get this stack trace. > > Thank you very much! It appears to be hard to investigate this even > > with backtrace. You told you can almost reliably reproduce this. > > Could you please, find exact commit caused this error using "git > > bisect"? I would very appreciate this. > > I'll try. It's time consuming given how long builds take. Thank you very much for your efforts! > Here's an interesting data point. If I run the whole jsonpath.sql script > it crashes every time. If I run just the offending statement it crashes > exactly every other time. It looks like in that case something gets > clobbered and then cleared. Could you clarify this a bit? What is exactly every other time? Do you mean it doesn't crash first time, but crashes second time? Do you run each offending statement in the separate session? Or do you run multiple offending statements in the same session? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 3/28/19 8:49 AM, Alexander Korotkov wrote: > >> Here's an interesting data point. If I run the whole jsonpath.sql script >> it crashes every time. If I run just the offending statement it crashes >> exactly every other time. It looks like in that case something gets >> clobbered and then cleared. > Could you clarify this a bit? What is exactly every other time? Do > you mean it doesn't crash first time, but crashes second time? Do you > run each offending statement in the separate session? Or do you run > multiple offending statements in the same session? > I mean repeated invocations of psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath" These get crash, no crash, crash, no crash ... cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > I mean repeated invocations of > psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath" > These get crash, no crash, crash, no crash ... That is just wacko ... but it does seem to support my thought of a wild store somewhere. The mechanism for this case might be that memory layout is different depending on whether we had to rebuild the relcache init file at session start or not. Similarly, the fact that the full test script reliably crashes might be dependent on previous commands having left things in a certain state. Unfortunately that gets us little closer to a fix. Has anybody gotten through a valgrind run on this code yet? regards, tom lane
On Thu, Mar 28, 2019 at 4:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > > I mean repeated invocations of > > psql -c "select '$ ? (@ like_regex \"pattern\" flag \"a\")'::jsonpath" > > These get crash, no crash, crash, no crash ... > > That is just wacko ... but it does seem to support my thought of > a wild store somewhere. The mechanism for this case might be > that memory layout is different depending on whether we had to > rebuild the relcache init file at session start or not. Similarly, > the fact that the full test script reliably crashes might be > dependent on previous commands having left things in a certain > state. Unfortunately that gets us little closer to a fix. > > Has anybody gotten through a valgrind run on this code yet? I've [1]. Find single backtrace, but it doesn't seem to be related to our issue. 1. https://www.postgresql.org/message-id/CAPpHfdsgyPKbaqOsJ4tyC97Ybpm69eGLHzBGLKYXsfJi%3Dc-fjA%40mail.gmail.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> I mean repeated invocations of >> psql -c "select '$ ? (@ like_regex \"pattern\" flag >\"a\")'::jsonpath" >> These get crash, no crash, crash, no crash ... > >That is just wacko ... but it does seem to support my thought of >a wild store somewhere. The mechanism for this case might be >that memory layout is different depending on whether we had to >rebuild the relcache init file at session start or not. Similarly, >the fact that the full test script reliably crashes might be >dependent on previous commands having left things in a certain >state. Unfortunately that gets us little closer to a fix. > >Has anybody gotten through a valgrind run on this code yet? Skink has successfully passed since - but that's x86... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Has anybody gotten through a valgrind run on this code yet? > Skink has successfully passed since - but that's x86... Yeah, there is a depressingly high chance that this is somehow specific to the bison version, flex version, and/or compiler in use on jacana. regards, tom lane
On 3/28/19 9:50 AM, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Has anybody gotten through a valgrind run on this code yet? >> Skink has successfully passed since - but that's x86... > Yeah, there is a depressingly high chance that this is somehow specific > to the bison version, flex version, and/or compiler in use on jacana. > > lousyjack has also passed it (x64). git bisect on jacana blames commit 550b9d26f. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 3/28/19 9:50 AM, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Has anybody gotten through a valgrind run on this code yet? > >> Skink has successfully passed since - but that's x86... > > Yeah, there is a depressingly high chance that this is somehow specific > > to the bison version, flex version, and/or compiler in use on jacana. > > lousyjack has also passed it (x64). > > git bisect on jacana blames commit 550b9d26f. Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l compile at once. I've re-read this commit and didn't find anything suspicious. I've asked Andrew for access to jacana in order to investigate this myself. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Mar 29, 2019 at 4:15 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: > > On 3/28/19 9:50 AM, Tom Lane wrote: > > > Andres Freund <andres@anarazel.de> writes: > > >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>> Has anybody gotten through a valgrind run on this code yet? > > >> Skink has successfully passed since - but that's x86... > > > Yeah, there is a depressingly high chance that this is somehow specific > > > to the bison version, flex version, and/or compiler in use on jacana. > > > > lousyjack has also passed it (x64). > > > > git bisect on jacana blames commit 550b9d26f. > > Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l > compile at once. I've re-read this commit and didn't find anything > suspicious. > I've asked Andrew for access to jacana in order to investigate this myself. I'm going to push there 3 attached patches for jsonpath. 1st one is revised patch implementing GIN index support for jsonpath. Based on feedback from Jonathan Katz, I decided to restrict jsonb_ops from querying only case. Now, jsonb_ops also works on if "accessors_chain = const" statement is found. Keys are frequently not selective. Given we have no statistics of them, purely key GIN queries are likely confuse optimizer making it select inefficient plan. Actually, jsonb_ops may be used for pure key query when ? operator is used. But in this case, user explicitly searches for key. With jsonpath, purely key GIN searches can easily happen unintended. So, restrict that. 2nd and 3rd patches are from Nikita Glukhov upthread. 2nd restrict some cases in parsing numerics. 3rd make jsonb_path_match() function throw error when result is not single boolean and silent mode is off. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Alexander Korotkov wrote: > Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l > compile at once. I've re-read this commit and didn't find anything > suspicious. > I've asked Andrew for access to jacana in order to investigate this myself. Stack trace from skate: [New LWP 6614] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1". Core was generated by `postgres: pgbf regression [local] SELECT '. Program terminated with signal 11, Segmentation fault. #0 strlen () at ../sysdeps/sparc/sparc64/strlen.S:34 34 ldx [%o0], %o5 #0 strlen () at ../sysdeps/sparc/sparc64/strlen.S:34 #1 0x0008a3e4 in printtup (slot=0x834888, self=0x864cc0) at printtup.c:435 #2 0x00259b60 in ExecutePlan (execute_once=<optimized out>, dest=0x864cc0, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x833eb0, estate=0x833d70) at execMain.c:1686 #3 standard_ExecutorRun (queryDesc=0x7dcdc0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365 #4 0x00259da0 in ExecutorRun (queryDesc=0x808920, queryDesc@entry=0x7dcdc0, direction=direction@entry=ForwardScanDirection, count=8801472, execute_once=<optimized out>) at execMain.c:309 #5 0x003e6714 in PortalRunSelect (portal=portal@entry=0x808920, forward=forward@entry=true, count=0, count@entry=2147483647, dest=dest@entry=0x864cc0) at pquery.c:929 #6 0x003e7d3c in PortalRun (portal=portal@entry=0x808920, count=count@entry=2147483647, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x864cc0, altdest=altdest@entry=0x864cc0, completionTag=completionTag@entry=0xff86e830 "") at pquery.c:770 #7 0x003e32d0 in exec_simple_query ( query_string=0x7ba400 "select '$.g ? (@.a == 1 || @.a == 4 && @.b == 7)'::jsonpath;") at postgres.c:1215 #8 0x003e4854 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x7e2370, dbname=0x7e2148 "regression", username=<optimized out>) at postgres.c:4247 #9 0x0007ae6c in BackendRun (port=0x7ddd40) at postmaster.c:4399 #10 BackendStartup (port=0x7ddd40) at postmaster.c:4090 #11 ServerLoop () at postmaster.c:1703 #12 0x00353a68 in PostmasterMain (argc=argc@entry=6, argv=argv@entry=0x7b49a8) at postmaster.c:1376 #13 0x0007cc60 in main (argc=6, argv=0x7b49a8) at main.c:228 Does this help? Best regards, Tom Turelinckx
Alexander Korotkov wrote: > Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l > compile at once. I've re-read this commit and didn't find anything > suspicious. > I've asked Andrew for access to jacana in order to investigate this myself. Stack trace from skate: [New LWP 6614] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1". Core was generated by `postgres: pgbf regression [local] SELECT '. Program terminated with signal 11, Segmentation fault. #0 strlen () at ../sysdeps/sparc/sparc64/strlen.S:34 34 ldx [%o0], %o5 #0 strlen () at ../sysdeps/sparc/sparc64/strlen.S:34 #1 0x0008a3e4 in printtup (slot=0x834888, self=0x864cc0) at printtup.c:435 #2 0x00259b60 in ExecutePlan (execute_once=<optimized out>, dest=0x864cc0, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x833eb0, estate=0x833d70) at execMain.c:1686 #3 standard_ExecutorRun (queryDesc=0x7dcdc0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:365 #4 0x00259da0 in ExecutorRun (queryDesc=0x808920, queryDesc@entry=0x7dcdc0, direction=direction@entry=ForwardScanDirection, count=8801472, execute_once=<optimized out>) at execMain.c:309 #5 0x003e6714 in PortalRunSelect (portal=portal@entry=0x808920, forward=forward@entry=true, count=0, count@entry=2147483647, dest=dest@entry=0x864cc0) at pquery.c:929 #6 0x003e7d3c in PortalRun (portal=portal@entry=0x808920, count=count@entry=2147483647, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x864cc0, altdest=altdest@entry=0x864cc0, completionTag=completionTag@entry=0xff86e830 "") at pquery.c:770 #7 0x003e32d0 in exec_simple_query ( query_string=0x7ba400 "select '$.g ? (@.a == 1 || @.a == 4 && @.b == 7)'::jsonpath;") at postgres.c:1215 #8 0x003e4854 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x7e2370, dbname=0x7e2148 "regression", username=<optimized out>) at postgres.c:4247 #9 0x0007ae6c in BackendRun (port=0x7ddd40) at postmaster.c:4399 #10 BackendStartup (port=0x7ddd40) at postmaster.c:4090 #11 ServerLoop () at postmaster.c:1703 #12 0x00353a68 in PostmasterMain (argc=argc@entry=6, argv=argv@entry=0x7b49a8) at postmaster.c:1376 #13 0x0007cc60 in main (argc=6, argv=0x7b49a8) at main.c:228 Does this help? Best regards, Tom Turelinckx
"Tom Turelinckx" <tom@turelinckx.be> writes: > Stack trace from skate: Huh ... so that's nowhere near the jsonpath-syntax-error crash that we saw before. I assume this trace is from this run? https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skate&dt=2019-03-31%2006%3A24%3A35 That looks a whole lot like the previous failure on snapper: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 right down to it having gotten through "make check" only to fail when the same regression tests are run again during the pg_upgrade test. I wonder if there's any real significance to that, or if it's just that the failure is not very repeatable. BTW, what is the difference between skate and snapper? They look to be running on the same machine. regards, tom lane
Tom Lane wrote: > I assume this trace is from this run? > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skate&dt=2019-03-31%2006%3A24%3A35 Yes. I did get a core file now, but it wasn't picked up by the buildfarm script, so I extracted the backtrace manually. > That looks a whole lot like the previous failure on snapper: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 That's what I meant. > Huh ... so that's nowhere near the jsonpath-syntax-error crash that > we saw before. Sorry, I wasn't aware there were multiple crashes. > BTW, what is the difference between skate and snapper? They look to > be running on the same machine. They are. Skate runs with default buildfarm options. Snapper mimics the options used by the pgdg debian source packages. Both build the same source (first skate then snapper). This to avoid a repeat of [1]. Snapper also runs more tests (UpgradeXversion, CollateLinuxUTF8). Best regards, Tom Turelinckx 1. https://www.postgresql.org/message-id/20160413175827.dmlbtdf7c3mgmnex%40alap3.anarazel.de
Tom Lane wrote: > I assume this trace is from this run? > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skate&dt=2019-03-31%2006%3A24%3A35 Yes. I did get a core file now, but it wasn't picked up by the buildfarm script, so I extracted the backtrace manually. > That looks a whole lot like the previous failure on snapper: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2019-03-23%2013%3A01%3A28 That's what I meant. > Huh ... so that's nowhere near the jsonpath-syntax-error crash that > we saw before. Sorry, I wasn't aware there were multiple crashes. > BTW, what is the difference between skate and snapper? They look to > be running on the same machine. They are. Skate runs with default buildfarm options. Snapper mimics the options used by the pgdg debian source packages. Both build the same source (first skate then snapper). This to avoid a repeat of [1]. Snapper also runs more tests (UpgradeXversion, CollateLinuxUTF8). Best regards, Tom Turelinckx 1. https://www.postgresql.org/message-id/20160413175827.dmlbtdf7c3mgmnex%40alap3.anarazel.de
On 3/31/19 12:21 PM, Tom Turelinckx wrote: > Tom Lane wrote: > >> I assume this trace is from this run? >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skate&dt=2019-03-31%2006%3A24%3A35 > Yes. I did get a core file now, but it wasn't picked up by the buildfarm > script, so I extracted the backtrace manually. > > I have just committed a patch that should result in stack traces being picked up in pg_upgrade testing. See <https://github.com/PGBuildFarm/client-code/commit/51889e9dd86dd10f7b9444cb62eebb7f8baa989e> You should be able to drop the updated file in place, get it from <https://raw.githubusercontent.com/PGBuildFarm/client-code/51889e9dd86dd10f7b9444cb62eebb7f8baa989e/PGBuild/Modules/TestUpgrade.pm> There will be a buildfarm release out very soon that includes this. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-03-30 18:25:12 +0300, Alexander Korotkov wrote: > I'm going to push there 3 attached patches for jsonpath. I noticed that https://commitfest.postgresql.org/22/1472/ https://commitfest.postgresql.org/22/1473/ are still open and marked as needs-review. Is there a reason for that? - Andres
On Wed, Apr 3, 2019 at 11:03 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-03-30 18:25:12 +0300, Alexander Korotkov wrote: > > I'm going to push there 3 attached patches for jsonpath. > > I noticed that > https://commitfest.postgresql.org/22/1472/ > https://commitfest.postgresql.org/22/1473/ > are still open and marked as needs-review. Is there a reason for that? Nope. I've moved both to the next CF. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Mar 29, 2019 at 4:15 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Thu, Mar 28, 2019 at 7:43 PM Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: > > On 3/28/19 9:50 AM, Tom Lane wrote: > > > Andres Freund <andres@anarazel.de> writes: > > >> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>> Has anybody gotten through a valgrind run on this code yet? > > >> Skink has successfully passed since - but that's x86... > > > Yeah, there is a depressingly high chance that this is somehow specific > > > to the bison version, flex version, and/or compiler in use on jacana. > > > > lousyjack has also passed it (x64). > > > > git bisect on jacana blames commit 550b9d26f. > > Hmm... 550b9d26f just makes jsonpath_gram.y and jsonpath_scan.l > compile at once. I've re-read this commit and didn't find anything > suspicious. > I've asked Andrew for access to jacana in order to investigate this myself. Thanks to Andrew I got access to jacana and made some investigation. At first, I found that existence of separate jsonpath_gram.h doesn't influence the situation. If have jsonpath_gram.h generated, test still fails if compile jsonpath_gram.c and jsonpath_scan.c together. But if build them separately, error is gone. Then I did following trick: build jsonpath_gram.c and jsonpath_scan.c separately, but copy contents of jsonpath_gram.c to the top of jsonpath_scan.c. I also renamed yyparse to yyparse2 in the copy of jsonpath_gram.c in order to make jsonpath_scan.c use another copy of this function defined in the separate file. Then test fails again. After that, I found if I remove contents of yyparse2 function, then test passes OK. See versions of jsonpath_scan.c attached. Thus, contents of unused function makes test fail or pass. So far, it looks like a compiler bug. Any thoughts? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > Thus, contents of unused function makes test fail or pass. So far, it > looks like a compiler bug. Any thoughts? Yeah :-(. The fact that we've not seen a similar failure on any other machines points in that direction, too. Maybe it's some other aspect of the machine's toolchain, like flex or bison, but that's not that much different from our standpoint. There's a lot of stuff I don't especially like about jsonpath_scan, for instance I think the "init" arguments to resizeString etc are a pretty error-prone and unmaintainable way to do things. But I don't see anything that looks like it'd be a portability hazard that would explain this. I still have a nagging feeling that there's a wild store somewhere in here, but I don't know how to find it based on the limited evidence we've got. regards, tom lane
On Sun, Apr 7, 2019 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > Thus, contents of unused function makes test fail or pass. So far, it > > looks like a compiler bug. Any thoughts? > > Yeah :-(. The fact that we've not seen a similar failure on any other > machines points in that direction, too. Maybe it's some other aspect > of the machine's toolchain, like flex or bison, but that's not that > much different from our standpoint. > > There's a lot of stuff I don't especially like about jsonpath_scan, > for instance I think the "init" arguments to resizeString etc are > a pretty error-prone and unmaintainable way to do things. But > I don't see anything that looks like it'd be a portability hazard > that would explain this. > > I still have a nagging feeling that there's a wild store somewhere > in here, but I don't know how to find it based on the limited > evidence we've got. Yeah, it might be not because compiler error. It might depend on memory layout. So existence of extra function changes memory layout and, in turn, causes an error. I will try to disassemble jsonpath_scan.o and see whether content of yyparse2 influences assembly of other functions. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, Apr 07, 2019 at 03:03:58AM +0300, Alexander Korotkov wrote: >On Sun, Apr 7, 2019 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> > Thus, contents of unused function makes test fail or pass. So far, it >> > looks like a compiler bug. Any thoughts? >> >> Yeah :-(. The fact that we've not seen a similar failure on any other >> machines points in that direction, too. Maybe it's some other aspect >> of the machine's toolchain, like flex or bison, but that's not that >> much different from our standpoint. >> >> There's a lot of stuff I don't especially like about jsonpath_scan, >> for instance I think the "init" arguments to resizeString etc are >> a pretty error-prone and unmaintainable way to do things. But >> I don't see anything that looks like it'd be a portability hazard >> that would explain this. >> >> I still have a nagging feeling that there's a wild store somewhere >> in here, but I don't know how to find it based on the limited >> evidence we've got. > >Yeah, it might be not because compiler error. It might depend on >memory layout. So existence of extra function changes memory layout >and, in turn, causes an error. I will try to disassemble >jsonpath_scan.o and see whether content of yyparse2 influences >assembly of other functions. > Have you tried other compiler version / different optimization level? Or running it under valgrind. Not sure how difficult that is on Windows. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 9, 2019 at 12:16 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Have you tried other compiler version / different optimization level? Or > running it under valgrind. Not sure how difficult that is on Windows. > > It's not possible AFAIK. I will look at different compilers when I am back local to the machine later today. It doesn't help that jacana is currently the only buildfarm machine building master on Mingw. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 3/28/19 12:43 PM, Andrew Dunstan wrote: > On 3/28/19 9:50 AM, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On March 28, 2019 9:31:14 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Has anybody gotten through a valgrind run on this code yet? >>> Skink has successfully passed since - but that's x86... >> Yeah, there is a depressingly high chance that this is somehow specific >> to the bison version, flex version, and/or compiler in use on jacana. >> >> > > > lousyjack has also passed it (x64). > > > git bisect on jacana blames commit 550b9d26f. OK, I have tried this with an earlier compiler (gcc 7.3.0 vs gcc 8.1.0) and the problem disappears. So I think we can put this down as a compiler bug. I will downgrade jacana to 7.3.0. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > OK, I have tried this with an earlier compiler (gcc 7.3.0 vs gcc 8.1.0) > and the problem disappears. So I think we can put this down as a > compiler bug. I will downgrade jacana to 7.3.0. OK. There is still the matter of those two failures on skate and snapper, which don't look the same as jacana's issue but nonetheless seem to be the jsonpath patch's fault. However, since neither animal has yet reproduced those failures, I don't know how to investigate further, or whether it's even worth considering that to be an open item. For the record, I wasted a fair amount of time last week trying to duplicate the skate/snapper failures by setting up a sparc qemu VM with more or less the same Debian version they're using. No luck. Recent versions of Debian Wheezy wouldn't boot; I did manage to get 7.6.0 to install and run, er crawl, but the failure didn't manifest in quite a few tries. regards, tom lane
On Tue, Apr 9, 2019 at 7:16 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Apr 07, 2019 at 03:03:58AM +0300, Alexander Korotkov wrote: > >On Sun, Apr 7, 2019 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > >> > Thus, contents of unused function makes test fail or pass. So far, it > >> > looks like a compiler bug. Any thoughts? > >> > >> Yeah :-(. The fact that we've not seen a similar failure on any other > >> machines points in that direction, too. Maybe it's some other aspect > >> of the machine's toolchain, like flex or bison, but that's not that > >> much different from our standpoint. > >> > >> There's a lot of stuff I don't especially like about jsonpath_scan, > >> for instance I think the "init" arguments to resizeString etc are > >> a pretty error-prone and unmaintainable way to do things. But > >> I don't see anything that looks like it'd be a portability hazard > >> that would explain this. > >> > >> I still have a nagging feeling that there's a wild store somewhere > >> in here, but I don't know how to find it based on the limited > >> evidence we've got. > > > >Yeah, it might be not because compiler error. It might depend on > >memory layout. So existence of extra function changes memory layout > >and, in turn, causes an error. I will try to disassemble > >jsonpath_scan.o and see whether content of yyparse2 influences > >assembly of other functions. > > > > Have you tried other compiler version / different optimization level? Error goes away with -O0. Or I just didn't manage to reproduce that. In my observation error depends on memory layout or something. So, it might be that I just didn't manage to reproduce it with -O0 while it really still persists. I didn't try other compilers yet. > Or running it under valgrind. Not sure how difficult that is on Windows. Valgrind isn't installed there. I'm not sure how to do that, but I will probably try. The interesting thing is that on failure I got following backtrace. #0 0x00007ff94f86a458 in ntdll!RtlRaiseStatus () from C:\WINDOWS\SYSTEM32\ntdll.dll #1 0x00007ff94f87760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll #2 0x00007ff94dc42e1a in msvcrt!_setjmpex () from C:\WINDOWS\System32\msvcrt.dll #3 0x000000000086a37a in pg_re_throw () at elog.c:1720 #4 0x000000000086a166 in errfinish (dummy=<optimized out>) at elog.c:464 #5 0x00000000007c3d18 in jsonpath_yyerror (result=result@entry=0x0, message=message@entry=0xa87d38 <__func__.110220+1512> "unrecognized flag of LIKE_REGEX predicate") at jsonpath_scan.l:276 #6 0x00000000007c5f3d in makeItemLikeRegex (pattern=<optimized out>, pattern=<optimized out>, flags=<optimized out>, expr=0x7216760) at jsonpath_gram.y:500 #7 jsonpath_yyparse (result=<optimized out>, result@entry=0x495e818) at jsonpath_gram.y:178 So, error happens inside implementation of siglongjmp(). I've checked that contents of *PG_exception_stack didn't change since previous successfully thrown error. Probably this implementation of long jump saves some part of state outside of sigjmp_buf and that part is corrupt. Any ideas? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 4/14/19 10:43 PM, Alexander Korotkov wrote: > On Tue, Apr 9, 2019 at 7:16 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> On Sun, Apr 07, 2019 at 03:03:58AM +0300, Alexander Korotkov wrote: >>> On Sun, Apr 7, 2019 at 2:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >>>>> Thus, contents of unused function makes test fail or pass. So far, it >>>>> looks like a compiler bug. Any thoughts? >>>> Yeah :-(. The fact that we've not seen a similar failure on any other >>>> machines points in that direction, too. Maybe it's some other aspect >>>> of the machine's toolchain, like flex or bison, but that's not that >>>> much different from our standpoint. >>>> >>>> There's a lot of stuff I don't especially like about jsonpath_scan, >>>> for instance I think the "init" arguments to resizeString etc are >>>> a pretty error-prone and unmaintainable way to do things. But >>>> I don't see anything that looks like it'd be a portability hazard >>>> that would explain this. >>>> >>>> I still have a nagging feeling that there's a wild store somewhere >>>> in here, but I don't know how to find it based on the limited >>>> evidence we've got. >>> Yeah, it might be not because compiler error. It might depend on >>> memory layout. So existence of extra function changes memory layout >>> and, in turn, causes an error. I will try to disassemble >>> jsonpath_scan.o and see whether content of yyparse2 influences >>> assembly of other functions. >>> >> Have you tried other compiler version / different optimization level? > Error goes away with -O0. Or I just didn't manage to reproduce that. > In my observation error depends on memory layout or something. So, it > might be that I just didn't manage to reproduce it with -O0 while it > really still persists. I didn't try other compilers yet. > >> Or running it under valgrind. Not sure how difficult that is on Windows. > Valgrind isn't installed there. I'm not sure how to do that, but I > will probably try. > > The interesting thing is that on failure I got following backtrace. > > #0 0x00007ff94f86a458 in ntdll!RtlRaiseStatus () from > C:\WINDOWS\SYSTEM32\ntdll.dll > #1 0x00007ff94f87760e in ntdll!memset () from C:\WINDOWS\SYSTEM32\ntdll.dll > #2 0x00007ff94dc42e1a in msvcrt!_setjmpex () from > C:\WINDOWS\System32\msvcrt.dll > #3 0x000000000086a37a in pg_re_throw () at elog.c:1720 > #4 0x000000000086a166 in errfinish (dummy=<optimized out>) at elog.c:464 > #5 0x00000000007c3d18 in jsonpath_yyerror (result=result@entry=0x0, > message=message@entry=0xa87d38 <__func__.110220+1512> > "unrecognized flag of LIKE_REGEX predicate") at jsonpath_scan.l:276 > #6 0x00000000007c5f3d in makeItemLikeRegex (pattern=<optimized out>, > pattern=<optimized out>, flags=<optimized out>, expr=0x7216760) at > jsonpath_gram.y:500 > #7 jsonpath_yyparse (result=<optimized out>, result@entry=0x495e818) > at jsonpath_gram.y:178 > > So, error happens inside implementation of siglongjmp(). I've checked > that contents of *PG_exception_stack didn't change since previous > successfully thrown error. Probably this implementation of long jump > saves some part of state outside of sigjmp_buf and that part is > corrupt. Any ideas? > I have downgraded jacana to gcc 7.3.0, which has resolved the problem. I'm still a bit worried that we're clobbering the stack somehow though. I have no idea how to test that. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached is a patch to fix some minor issues: -misspelling of an error message -Commit 550b9d26f80f failed to update the Makefile comment to reflect how the build changed, and also removed the clean target, which we now have use for since we later got rid of backtracking in the scanner. Also, while I have the thought in my head, for v13 we should consider replacing the keyword binary search with the perfect hash technique added in c64d0cd5ce2 -- it might give a small performance boost to the scanner. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
John Naylor <john.naylor@2ndquadrant.com> writes: > Attached is a patch to fix some minor issues: > -misspelling of an error message Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch needs a sweep to bring its error messages into line with our style guidelines, but no harm in starting with the obvious bugs. > -Commit 550b9d26f80f failed to update the Makefile comment to reflect > how the build changed, and also removed the clean target, which we now > have use for since we later got rid of backtracking in the scanner. Right. I'm not really sure why we're bothering with anti-backtracking here, or with using speed-rather-than-code-space lexer optimization options. It's hard for me to credit that any practically-useful jsonpath pattern would be long enough for lexer speed to matter, and even harder to credit that the speed of the flex code itself would be an important factor in the overall processing cost of a long jsonpath. Still, as long as we have the code it needs to be right. > Also, while I have the thought in my head, for v13 we should consider > replacing the keyword binary search with the perfect hash technique > added in c64d0cd5ce2 -- it might give a small performance boost to the > scanner. I doubt it's worth the trouble, per above. Patch LGTM, pushed. regards, tom lane
On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > John Naylor <john.naylor@2ndquadrant.com> writes: > > Attached is a patch to fix some minor issues: > > -misspelling of an error message > > Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > needs a sweep to bring its error messages into line with our style > guidelines, but no harm in starting with the obvious bugs. I'll go trough the jsonpath error messages and post a patch for fixing them. > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect > > how the build changed, and also removed the clean target, which we now > > have use for since we later got rid of backtracking in the scanner. > > Right. I'm not really sure why we're bothering with anti-backtracking > here, or with using speed-rather-than-code-space lexer optimization > options. It's hard for me to credit that any practically-useful jsonpath > pattern would be long enough for lexer speed to matter, and even harder to > credit that the speed of the flex code itself would be an important factor > in the overall processing cost of a long jsonpath. Still, as long as we > have the code it needs to be right. Actually I found that non of in-core lexers are backtracking. So, I understood no backtracking as kind of standard and didn't want to break that :) Nevertheless, I could imagine use-case involving parsing a lot of jsonpath'es. For example we may construct jsonpath based on table data and check that for just few jsonb's. For sure, that wouldn't be a common use-case, but still. > > Also, while I have the thought in my head, for v13 we should consider > > replacing the keyword binary search with the perfect hash technique > > added in c64d0cd5ce2 -- it might give a small performance boost to the > > scanner. > > I doubt it's worth the trouble, per above. > > Patch LGTM, pushed. Thank you for pushing this! ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Apr 18, 2019 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > John Naylor <john.naylor@2ndquadrant.com> writes: > > Attached is a patch to fix some minor issues: > > -misspelling of an error message > > Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > needs a sweep to bring its error messages into line with our style > guidelines, but no harm in starting with the obvious bugs. > > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect > > how the build changed, and also removed the clean target, which we now > > have use for since we later got rid of backtracking in the scanner. > > Right. I'm not really sure why we're bothering with anti-backtracking > here, or with using speed-rather-than-code-space lexer optimization > options. It's hard for me to credit that any practically-useful jsonpath > pattern would be long enough for lexer speed to matter, and even harder to > credit that the speed of the flex code itself would be an important factor > in the overall processing cost of a long jsonpath. Still, as long as we > have the code it needs to be right. I was wondering about that. I measured the current size of yy_transition to be 36492 on my machine. With the flag -Cfe, which gives the smallest representation without backtracking, yy_nxt is 6336 (there is no yy_transition). I'd say that's a large enough difference that we'd want the smaller representation if it makes little difference in performance. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 17, 2019 at 11:14 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > John Naylor <john.naylor@2ndquadrant.com> writes: > > > Attached is a patch to fix some minor issues: > > > -misspelling of an error message > > > > Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > > needs a sweep to bring its error messages into line with our style > > guidelines, but no harm in starting with the obvious bugs. > > I'll go trough the jsonpath error messages and post a patch for fixing them. > > > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect > > > how the build changed, and also removed the clean target, which we now > > > have use for since we later got rid of backtracking in the scanner. > > > > Right. I'm not really sure why we're bothering with anti-backtracking > > here, or with using speed-rather-than-code-space lexer optimization > > options. It's hard for me to credit that any practically-useful jsonpath > > pattern would be long enough for lexer speed to matter, and even harder to > > credit that the speed of the flex code itself would be an important factor > > in the overall processing cost of a long jsonpath. Still, as long as we > > have the code it needs to be right. > > Actually I found that non of in-core lexers are backtracking. So, I > understood no backtracking as kind of standard and didn't want to > break that :) > > Nevertheless, I could imagine use-case involving parsing a lot of > jsonpath'es. For example we may construct jsonpath based on table > data and check that for just few jsonb's. For sure, that wouldn't be > a common use-case, but still. > > > > Also, while I have the thought in my head, for v13 we should consider > > > replacing the keyword binary search with the perfect hash technique > > > added in c64d0cd5ce2 -- it might give a small performance boost to the > > > scanner. > > > > I doubt it's worth the trouble, per above. > > > > Patch LGTM, pushed. > > Thank you for pushing this! I went trough the jsonpath errors and made some corrections. See the attached patch. One thing makes me uneasy. jsonpath_yyerror() substitutes its "message" argument to the errdetail(). Out style guide requires errdetails to be complete sentences, while bison passes non-capitalized error messages. Should we bother capitalize first character of "message"? I wonder how "portable" this solution would be for various languages. cubescan.l reports error in the similar way to jsonpath and doesn't bother about making errdetail a complete sentence. Or should we move bison error to errmsg()? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Thu, Apr 18, 2019 at 4:09 AM John Naylor <john.naylor@2ndquadrant.com> wrote: > On Thu, Apr 18, 2019 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > John Naylor <john.naylor@2ndquadrant.com> writes: > > > Attached is a patch to fix some minor issues: > > > -misspelling of an error message > > > > Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > > needs a sweep to bring its error messages into line with our style > > guidelines, but no harm in starting with the obvious bugs. > > > > > -Commit 550b9d26f80f failed to update the Makefile comment to reflect > > > how the build changed, and also removed the clean target, which we now > > > have use for since we later got rid of backtracking in the scanner. > > > > Right. I'm not really sure why we're bothering with anti-backtracking > > here, or with using speed-rather-than-code-space lexer optimization > > options. It's hard for me to credit that any practically-useful jsonpath > > pattern would be long enough for lexer speed to matter, and even harder to > > credit that the speed of the flex code itself would be an important factor > > in the overall processing cost of a long jsonpath. Still, as long as we > > have the code it needs to be right. > > I was wondering about that. I measured the current size of > yy_transition to be 36492 on my machine. With the flag -Cfe, which > gives the smallest representation without backtracking, yy_nxt is 6336 > (there is no yy_transition). I'd say that's a large enough difference > that we'd want the smaller representation if it makes little > difference in performance. Did I understand correctly that you've tried the same version of jsonpath_scan.l with different flex flags? Did you also notice if changes 1d88a75c made to jsonpath_scan.l have singnificant influence? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: >> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch >>> needs a sweep to bring its error messages into line with our style >>> guidelines, but no harm in starting with the obvious bugs. > I went trough the jsonpath errors and made some corrections. See the > attached patch. Please don't do this sort of change: - elog(ERROR, "unrecognized jsonpath item type: %d", item->type); + ereport(ERROR, + (errcode(ERRCODE_INTERNAL_ERROR), + errmsg("unrecognized jsonpath item type: %d", item->type))); elog() is the appropriate thing for shouldn't-happen internal errors like these. The only thing you've changed here, aside from making the source code longer, is to expose the error message for translation ... which is really just wasting translators' time. Only messages we actually think users might need to deal with should be exposed for translation. @@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, ereport(ERROR, (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \ errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND), - errdetail("JSON object does not contain key %s", + errdetail("JSON object does not contain key %s.", keybuf.data))); } } OK as far as it went, but you should also put double quotes around the %s. (I also noticed some messages that are using single-quotes around interpolated strings, which is not the project standard either.) Other specific things I wanted to see fixed: * jsonpath_scan.l has some messages like "bad ..." which is not project style; use "invalid" or "unrecognized". (There's probably no good reason not to use the same string "invalid input syntax for type jsonpath" that is used elsewhere.) * This in jsonpath_gram.y is quite unhelpful: yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate"); since it doesn't tell you what flag character it doesn't like (and the error positioning info isn't accurate enough to let the user figure that out). It really needs to be something more like "unrecognized flag character \"%c\" in LIKE_REGEX predicate". That probably means you can't use yyerror for this, but I don't think yyerror was providing any useful functionality anyway :-( More generally, I'm not very much on board with this coding technique: /* Standard error message for SQL/JSON errors */ #define ERRMSG_JSON_ARRAY_NOT_FOUND "SQL/JSON array not found" ... RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), errdetail("Jsonpath wildcard array accessor " In the first place, I'm not certain that this will result in the error message being translatable --- do the gettext tools know how to expand macros? In the second place, the actual strings are just restatements of their ERRMSG macro names, which IMO is not conformant to our message style, but it's too hard to see that from source code like this. Also this style is pretty unworkable/unfriendly if the message needs to contain any %-markers, so I suspect that having a coding style like this may be discouraging you from providing values in places where it'd be helpful to do so. What I actually see happening as a consequence of this approach is that you're pushing the useful information off to an errdetail, which is not really helpful and it's not per project style either. The idea is to make the primary message as helpful as possible without being long, not to make it a simple restatement of the SQLSTATE that nobody can understand without also looking at the errdetail. In the third place, this makes it hard for people to grep for occurrences of an error string in our source code. And in the fourth place, we don't do this elsewhere; it does not help anybody for jsonpath to invent its own coding conventions that are unlike the rest of Postgres. So I think you should drop the ERRMSG_xxx macros, write out these error messages where they are used, and rethink your use of errmsg vs. errdetail. Along the same line of not making it unnecessarily hard for people to grep for error texts, it's best not to split texts across lines like this: RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT), errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT), errdetail("Jsonpath array subscript is not a " "singleton numeric value.")))); Somebody grepping for "not a singleton" would not get a hit on that, which could be quite misleading if they do get hits elsewhere. I think for the most part people have decided that it's better to have overly long source lines than to break up error message literals. It's especially pointless to break up source lines when the result still doesn't fit in 80 columns. regards, tom lane
Hi! Thank you for your review! On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), > errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), > errdetail("Jsonpath wildcard array accessor " > > In the first place, I'm not certain that this will result in the error > message being translatable --- do the gettext tools know how to expand > macros? > > In the second place, the actual strings are just restatements of their > ERRMSG macro names, which IMO is not conformant to our message style, > but it's too hard to see that from source code like this. Also this > style is pretty unworkable/unfriendly if the message needs to contain > any %-markers, so I suspect that having a coding style like this may be > discouraging you from providing values in places where it'd be helpful to > do so. What I actually see happening as a consequence of this approach is > that you're pushing the useful information off to an errdetail, which is > not really helpful and it's not per project style either. The idea is to > make the primary message as helpful as possible without being long, not > to make it a simple restatement of the SQLSTATE that nobody can understand > without also looking at the errdetail. > > In the third place, this makes it hard for people to grep for occurrences > of an error string in our source code. > > And in the fourth place, we don't do this elsewhere; it does not help > anybody for jsonpath to invent its own coding conventions that are unlike > the rest of Postgres. Just to clarify things. Do you propose to get rid of RETURN_ERROR() macro by expanding it at every occurrence? Or do you have other ideas in the mind? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Sun, Apr 21, 2019 at 2:01 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > On Thu, Apr 18, 2019 at 4:09 AM John Naylor <john.naylor@2ndquadrant.com> wrote: > > I was wondering about that. I measured the current size of > > yy_transition to be 36492 on my machine. With the flag -Cfe, which > > gives the smallest representation without backtracking, yy_nxt is 6336 > > (there is no yy_transition). I'd say that's a large enough difference > > that we'd want the smaller representation if it makes little > > difference in performance. > > Did I understand correctly that you've tried the same version of > jsonpath_scan.l with different flex flags? Correct. > Did you also notice if > changes 1d88a75c made to jsonpath_scan.l have singnificant influence? Trying the same measurements above with backtracking put back in, jsonpath_yylex was actually larger by a few hundred bytes, and there was almost no difference in the transition/nxt tables. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > Just to clarify things. Do you propose to get rid of RETURN_ERROR() > macro by expanding it at every occurrence? Or do you have other ideas > in the mind? I wasn't really complaining about RETURN_ERROR() --- it was the macros rather than literal strings for the errmsg() texts that was bothering me. Mind you, I'm not really sure about RETURN_ERROR --- it looks a little weird to have something that appears to be doing something with the value of ereport(), which hasn't got a value. But I don't have a better idea at the moment. I doubt that writing out the underlying ereport-or-return business at each spot would be any more readable. (Maybe spelling it RETURN_OR_ERROR, or vice versa, would help? Not sure.) regards, tom lane
On 2019-Apr-22, Alexander Korotkov wrote: > Hi! > > Thank you for your review! > > On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > RETURN_ERROR(ereport(ERROR, > > (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), > > errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), > > errdetail("Jsonpath wildcard array accessor " > > > > In the first place, I'm not certain that this will result in the error > > message being translatable --- do the gettext tools know how to expand > > macros? > > > > In the second place, the actual strings are just restatements of their > > ERRMSG macro names, which IMO is not conformant to our message style, > > but it's too hard to see that from source code like this. Also this > > style is pretty unworkable/unfriendly if the message needs to contain > > any %-markers, so I suspect that having a coding style like this may be > > discouraging you from providing values in places where it'd be helpful to > > do so. What I actually see happening as a consequence of this approach is > > that you're pushing the useful information off to an errdetail, which is > > not really helpful and it's not per project style either. The idea is to > > make the primary message as helpful as possible without being long, not > > to make it a simple restatement of the SQLSTATE that nobody can understand > > without also looking at the errdetail. > > > > In the third place, this makes it hard for people to grep for occurrences > > of an error string in our source code. > > > > And in the fourth place, we don't do this elsewhere; it does not help > > anybody for jsonpath to invent its own coding conventions that are unlike > > the rest of Postgres. > > Just to clarify things. Do you propose to get rid of RETURN_ERROR() > macro by expanding it at every occurrence? Or do you have other ideas > in the mind? I think he's not talking about the RETURN_ERROR macro, but about the ERRMSG_JSON_ARRAY_NOT_FOUND macro. The PG convention is to repeat the message literal in every place instead of defining a macro with the literal. But at the same time, using the same errmsg() and only vary the errdetail() is unhelpful, so we want most detail in the errmsg instead; I think it'd be something like this: ereport(ERROR, (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), errmsg("%s wildcard array accessor not found", "jsonpath"))); note I put the type name "jsonpath" in a separate literal, so that only the interesting part is seen by translators. I don't think Tom said anything about the RETURN_ERROR macro. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > >> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > >>> needs a sweep to bring its error messages into line with our style > >>> guidelines, but no harm in starting with the obvious bugs. > > > I went trough the jsonpath errors and made some corrections. See the > > attached patch. > > Please don't do this sort of change: > > - elog(ERROR, "unrecognized jsonpath item type: %d", item->type); > + ereport(ERROR, > + (errcode(ERRCODE_INTERNAL_ERROR), > + errmsg("unrecognized jsonpath item type: %d", item->type))); > > elog() is the appropriate thing for shouldn't-happen internal errors like > these. The only thing you've changed here, aside from making the source > code longer, is to expose the error message for translation ... which is > really just wasting translators' time. Only messages we actually think > users might need to deal with should be exposed for translation. Makes sense. Removed from the patch. > @@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, > ereport(ERROR, > (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \ > errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND), > - errdetail("JSON object does not contain key %s", > + errdetail("JSON object does not contain key %s.", > keybuf.data))); > } > } > > OK as far as it went, but you should also put double quotes around the %s. This code actually passed key trough escape_json(), which adds double quotes itself. However, we don't do such transformation in other places. So, patch removes call of ecsape_json() while putting double quotes to the error message. > (I also noticed some messages that are using single-quotes around > interpolated strings, which is not the project standard either.) Single-quotes are replaced with double-quotes. > Other specific things I wanted to see fixed: > > * jsonpath_scan.l has some messages like "bad ..." which is not project > style; use "invalid" or "unrecognized". (There's probably no good > reason not to use the same string "invalid input syntax for type jsonpath" > that is used elsewhere.) Fixed. > * This in jsonpath_gram.y is quite unhelpful: > > yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate"); > > since it doesn't tell you what flag character it doesn't like > (and the error positioning info isn't accurate enough to let the > user figure that out). It really needs to be something more like > "unrecognized flag character \"%c\" in LIKE_REGEX predicate". > That probably means you can't use yyerror for this, but I don't > think yyerror was providing any useful functionality anyway :-( Fixed. > More generally, I'm not very much on board with this coding technique: > > /* Standard error message for SQL/JSON errors */ > #define ERRMSG_JSON_ARRAY_NOT_FOUND "SQL/JSON array not found" > > ... > > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), > errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), > errdetail("Jsonpath wildcard array accessor " > > In the first place, I'm not certain that this will result in the error > message being translatable --- do the gettext tools know how to expand > macros? > > In the second place, the actual strings are just restatements of their > ERRMSG macro names, which IMO is not conformant to our message style, > but it's too hard to see that from source code like this. Also this > style is pretty unworkable/unfriendly if the message needs to contain > any %-markers, so I suspect that having a coding style like this may be > discouraging you from providing values in places where it'd be helpful to > do so. What I actually see happening as a consequence of this approach is > that you're pushing the useful information off to an errdetail, which is > not really helpful and it's not per project style either. The idea is to > make the primary message as helpful as possible without being long, not > to make it a simple restatement of the SQLSTATE that nobody can understand > without also looking at the errdetail. > > In the third place, this makes it hard for people to grep for occurrences > of an error string in our source code. > > And in the fourth place, we don't do this elsewhere; it does not help > anybody for jsonpath to invent its own coding conventions that are unlike > the rest of Postgres. > > So I think you should drop the ERRMSG_xxx macros, write out these error > messages where they are used, and rethink your use of errmsg vs. errdetail. OK, ERRMSG_* macros are removed. > Along the same line of not making it unnecessarily hard for people to grep > for error texts, it's best not to split texts across lines like this: > > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT), > errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT), > errdetail("Jsonpath array subscript is not a " > "singleton numeric value.")))); > > Somebody grepping for "not a singleton" would not get a hit on that, which > could be quite misleading if they do get hits elsewhere. I think for the > most part people have decided that it's better to have overly long source > lines than to break up error message literals. It's especially pointless > to break up source lines when the result still doesn't fit in 80 columns. OK, now no line breaks in error messages. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Wed, Apr 24, 2019 at 9:03 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Mon, Apr 22, 2019 at 1:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > >> On Wed, Apr 17, 2019 at 8:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>> Yeah, I'd noticed that one too :-(. I think the whole jsonpath patch > > >>> needs a sweep to bring its error messages into line with our style > > >>> guidelines, but no harm in starting with the obvious bugs. > > > > > I went trough the jsonpath errors and made some corrections. See the > > > attached patch. > > > > Please don't do this sort of change: > > > > - elog(ERROR, "unrecognized jsonpath item type: %d", item->type); > > + ereport(ERROR, > > + (errcode(ERRCODE_INTERNAL_ERROR), > > + errmsg("unrecognized jsonpath item type: %d", item->type))); > > > > elog() is the appropriate thing for shouldn't-happen internal errors like > > these. The only thing you've changed here, aside from making the source > > code longer, is to expose the error message for translation ... which is > > really just wasting translators' time. Only messages we actually think > > users might need to deal with should be exposed for translation. > > Makes sense. Removed from the patch. > > > @@ -623,7 +624,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, > > ereport(ERROR, > > (errcode(ERRCODE_JSON_MEMBER_NOT_FOUND), \ > > errmsg(ERRMSG_JSON_MEMBER_NOT_FOUND), > > - errdetail("JSON object does not contain key %s", > > + errdetail("JSON object does not contain key %s.", > > keybuf.data))); > > } > > } > > > > OK as far as it went, but you should also put double quotes around the %s. > > This code actually passed key trough escape_json(), which adds double > quotes itself. However, we don't do such transformation in other > places. So, patch removes call of ecsape_json() while putting double > quotes to the error message. > > > (I also noticed some messages that are using single-quotes around > > interpolated strings, which is not the project standard either.) > > Single-quotes are replaced with double-quotes. > > > Other specific things I wanted to see fixed: > > > > * jsonpath_scan.l has some messages like "bad ..." which is not project > > style; use "invalid" or "unrecognized". (There's probably no good > > reason not to use the same string "invalid input syntax for type jsonpath" > > that is used elsewhere.) > > Fixed. > > > * This in jsonpath_gram.y is quite unhelpful: > > > > yyerror(NULL, "unrecognized flag of LIKE_REGEX predicate"); > > > > since it doesn't tell you what flag character it doesn't like > > (and the error positioning info isn't accurate enough to let the > > user figure that out). It really needs to be something more like > > "unrecognized flag character \"%c\" in LIKE_REGEX predicate". > > That probably means you can't use yyerror for this, but I don't > > think yyerror was providing any useful functionality anyway :-( > > Fixed. > > > More generally, I'm not very much on board with this coding technique: > > > > /* Standard error message for SQL/JSON errors */ > > #define ERRMSG_JSON_ARRAY_NOT_FOUND "SQL/JSON array not found" > > > > ... > > > > RETURN_ERROR(ereport(ERROR, > > (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), > > errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), > > errdetail("Jsonpath wildcard array accessor " > > > > In the first place, I'm not certain that this will result in the error > > message being translatable --- do the gettext tools know how to expand > > macros? > > > > In the second place, the actual strings are just restatements of their > > ERRMSG macro names, which IMO is not conformant to our message style, > > but it's too hard to see that from source code like this. Also this > > style is pretty unworkable/unfriendly if the message needs to contain > > any %-markers, so I suspect that having a coding style like this may be > > discouraging you from providing values in places where it'd be helpful to > > do so. What I actually see happening as a consequence of this approach is > > that you're pushing the useful information off to an errdetail, which is > > not really helpful and it's not per project style either. The idea is to > > make the primary message as helpful as possible without being long, not > > to make it a simple restatement of the SQLSTATE that nobody can understand > > without also looking at the errdetail. > > > > In the third place, this makes it hard for people to grep for occurrences > > of an error string in our source code. > > > > And in the fourth place, we don't do this elsewhere; it does not help > > anybody for jsonpath to invent its own coding conventions that are unlike > > the rest of Postgres. > > > > So I think you should drop the ERRMSG_xxx macros, write out these error > > messages where they are used, and rethink your use of errmsg vs. errdetail. > > OK, ERRMSG_* macros are removed. > > > Along the same line of not making it unnecessarily hard for people to grep > > for error texts, it's best not to split texts across lines like this: > > > > RETURN_ERROR(ereport(ERROR, > > (errcode(ERRCODE_INVALID_JSON_SUBSCRIPT), > > errmsg(ERRMSG_INVALID_JSON_SUBSCRIPT), > > errdetail("Jsonpath array subscript is not a " > > "singleton numeric value.")))); > > > > Somebody grepping for "not a singleton" would not get a hit on that, which > > could be quite misleading if they do get hits elsewhere. I think for the > > most part people have decided that it's better to have overly long source > > lines than to break up error message literals. It's especially pointless > > to break up source lines when the result still doesn't fit in 80 columns. > > OK, now no line breaks in error messages. I'm going to commit these adjustments if no objections. My question regarding jsonpath_yyerror() vs. bison errors is still relevant. Should we bother making bison-based errdetail() a complete sentence starting from uppercase character? If not, should we make other yyerror() calls look the same? Or should we rather move bison error from errdetail() to errmsg()? I would probably leave bison-based messages "as is", but make messages in other invocations of yyerror() starting from lowercase characters for the uniformity. But I'd like to hear other opinions. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > I'm going to commit these adjustments if no objections. Sorry for not getting to this sooner. Looking quickly at the v2 patch, it seems like you didn't entirely take to heart the idea of preferring a useful primary error message over a boilerplate primary with errdetail. In particular, in places like - errmsg(ERRMSG_SINGLETON_JSON_ITEM_REQUIRED), - errdetail("expression should return a singleton boolean"))); + errmsg("singleton SQL/JSON item required"), + errdetail("Singleton boolean result is expected."))); why bother with errdetail at all? It's not conveying any useful increment of information. In this example I think errmsg("expression should return a singleton boolean") is sufficient and well-phrased. Likewise, a bit further down, + errmsg("SQL/JSON member not found"), + errdetail("JSON object does not contain key \"%s\".", there is nothing being said here that wouldn't fit perfectly well into one errmsg. > My question regarding jsonpath_yyerror() vs. bison errors is still > relevant. Should we bother making bison-based errdetail() a complete > sentence starting from uppercase character? If not, should we make > other yyerror() calls look the same? Or should we rather move bison > error from errdetail() to errmsg()? The latter I think. The core lexer just presents the yyerror message as primary: scanner_yyerror(const char *message, core_yyscan_t yyscanner) { ... ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), /* translator: %s is typically the translation of "syntax error" */ errmsg("%s at end of input", _(message)), lexer_errposition())); and in a quick look at what jsonpath is doing, I'm not really seeing the point of it being different. You could do something like /* translator: %s is typically the translation of "syntax error" */ errmsg("%s in jsonpath input", _(message)) to preserve the information that this is about jsonpath, but beyond that I don't see that splitting off an errdetail is helping much. Or, perhaps, provide an errdetail giving the full jsonpath input string? That might or might not be redundant with other context information, so I'm not sure how useful it'd be. Anyway, my main criticism is still that this is way too eager to use an errdetail message when it could perfectly well fit all the info into the primary error. regards, tom lane
On Thu, Apr 25, 2019 at 10:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > I'm going to commit these adjustments if no objections. > > Sorry for not getting to this sooner. Looking quickly at the v2 patch, > it seems like you didn't entirely take to heart the idea of preferring > a useful primary error message over a boilerplate primary with errdetail. > In particular, in places like > > - errmsg(ERRMSG_SINGLETON_JSON_ITEM_REQUIRED), > - errdetail("expression should return a singleton boolean"))); > + errmsg("singleton SQL/JSON item required"), > + errdetail("Singleton boolean result is expected."))); > > why bother with errdetail at all? It's not conveying any useful increment > of information. In this example I think > > errmsg("expression should return a singleton boolean") > > is sufficient and well-phrased. Likewise, a bit further down, > > + errmsg("SQL/JSON member not found"), > + errdetail("JSON object does not contain key \"%s\".", > > there is nothing being said here that wouldn't fit perfectly well into > one errmsg. Makes sense. Attached revision of patch gets rid of errdetail() where it seems to be appropriate. > > My question regarding jsonpath_yyerror() vs. bison errors is still > > relevant. Should we bother making bison-based errdetail() a complete > > sentence starting from uppercase character? If not, should we make > > other yyerror() calls look the same? Or should we rather move bison > > error from errdetail() to errmsg()? > > The latter I think. The core lexer just presents the yyerror message > as primary: > > scanner_yyerror(const char *message, core_yyscan_t yyscanner) > { > ... > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > /* translator: %s is typically the translation of "syntax error" */ > errmsg("%s at end of input", _(message)), > lexer_errposition())); > > and in a quick look at what jsonpath is doing, I'm not really seeing > the point of it being different. You could do something like > > /* translator: %s is typically the translation of "syntax error" */ > errmsg("%s in jsonpath input", _(message)) > > to preserve the information that this is about jsonpath, but beyond > that I don't see that splitting off an errdetail is helping much. I've moved error message into errmsg(). > Or, perhaps, provide an errdetail giving the full jsonpath input string? > That might or might not be redundant with other context information, > so I'm not sure how useful it'd be. I'm also not sure about this. Didn't do this for now. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > [ jsonpath-errors-improve-3.patch ] This is getting better, but IMO it's still a bit too willing to use a boilerplate primary error message plus errdetail. I do not think that is project style nor something to be encouraged. In particular, you've got a whole lot of cases like this: + errmsg("JSON object is expected"), + errdetail("Jsonpath member accessor can only be applied to an object.")))); I think you should just drop the errmsg and use the errdetail as primary (with no-initcap and no-trailing-period, of course). I don't see more than two or three cases in this whole patch where I'd use an errdetail at all; in almost all of them, the proposed errdetail looks perfectly suitable to be a primary message. One other generic gripe is that a lot of these messages use the term "singleton", which seems a bit too jargon-y to me. As far as I can see in a quick look at the backend .po files, we have not up to now used that term in *any* user-facing error message. Nor does it appear anywhere in our user-facing documentation, except for one place that was itself inserted by the jsonpath patch: Arrays of size 1 are interchangeable with a singleton. I don't think that's either obvious to a non-mathematician, or even technically correct; maybe it'd be better as An array of size 1 is considered equal to its sole element. Likewise, I think it'd be better to avoid "singleton" in the error messages. In some places you could perhaps use "single" instead. In some you just don't need it at all, eg in Jsonpath array subscript is not a singleton numeric value. you could just drop the word "singleton" and it'd be perfectly correct, since a numeric is necessarily a single value. Also, we do often use the term "scalar" to mean a non-composite value; maybe that would work for this context, in places where you do really need that meaning. Sorry to be making you work so hard on this, but I think good error messages are an important part of having a quality feature. I do see a lot of improvements already compared to where we started. regards, tom lane
On Mon, Apr 29, 2019 at 6:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > [ jsonpath-errors-improve-3.patch ] > > This is getting better, but IMO it's still a bit too willing to use > a boilerplate primary error message plus errdetail. I do not think > that is project style nor something to be encouraged. > > In particular, you've got a whole lot of cases like this: > > + errmsg("JSON object is expected"), > + errdetail("Jsonpath member accessor can only be applied to an object.")))); > > I think you should just drop the errmsg and use the errdetail as primary > (with no-initcap and no-trailing-period, of course). I don't see more > than two or three cases in this whole patch where I'd use an errdetail > at all; in almost all of them, the proposed errdetail looks perfectly > suitable to be a primary message. Ok, now it seems that I understood. errdetail is removed from vast majority of cases. > One other generic gripe is that a lot of these messages use the term > "singleton", which seems a bit too jargon-y to me. As far as I can > see in a quick look at the backend .po files, we have not up to now > used that term in *any* user-facing error message. Nor does it appear > anywhere in our user-facing documentation, except for one place that > was itself inserted by the jsonpath patch: > > Arrays of size 1 are interchangeable with a singleton. > > I don't think that's either obvious to a non-mathematician, or > even technically correct; maybe it'd be better as > > An array of size 1 is considered equal to its sole element. > > Likewise, I think it'd be better to avoid "singleton" in the error > messages. In some places you could perhaps use "single" instead. > In some you just don't need it at all, eg in > Jsonpath array subscript is not a singleton numeric value. > you could just drop the word "singleton" and it'd be perfectly > correct, since a numeric is necessarily a single value. > > Also, we do often use the term "scalar" to mean a non-composite > value; maybe that would work for this context, in places where > you do really need that meaning. Makes sense for me. "Singleton" word comes from the standard. But assuming we almost don't use it in the documentation (and especially don't define it), it's better to get rid of this word altogether. Removed from error messages. Separate patch adjusting docs as you proposed is also attached. > Sorry to be making you work so hard on this, but I think good > error messages are an important part of having a quality feature. > I do see a lot of improvements already compared to where we started. It's nothing to be sorry about. I need to learn this in order to make my further commits better. Thank you for your explanations. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Apr 30, 2019 at 1:20 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Mon, Apr 29, 2019 at 6:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > > [ jsonpath-errors-improve-3.patch ] > > > > This is getting better, but IMO it's still a bit too willing to use > > a boilerplate primary error message plus errdetail. I do not think > > that is project style nor something to be encouraged. > > > > In particular, you've got a whole lot of cases like this: > > > > + errmsg("JSON object is expected"), > > + errdetail("Jsonpath member accessor can only be applied to an object.")))); > > > > I think you should just drop the errmsg and use the errdetail as primary > > (with no-initcap and no-trailing-period, of course). I don't see more > > than two or three cases in this whole patch where I'd use an errdetail > > at all; in almost all of them, the proposed errdetail looks perfectly > > suitable to be a primary message. > > Ok, now it seems that I understood. errdetail is removed from vast > majority of cases. > > > One other generic gripe is that a lot of these messages use the term > > "singleton", which seems a bit too jargon-y to me. As far as I can > > see in a quick look at the backend .po files, we have not up to now > > used that term in *any* user-facing error message. Nor does it appear > > anywhere in our user-facing documentation, except for one place that > > was itself inserted by the jsonpath patch: > > > > Arrays of size 1 are interchangeable with a singleton. > > > > I don't think that's either obvious to a non-mathematician, or > > even technically correct; maybe it'd be better as > > > > An array of size 1 is considered equal to its sole element. > > > > Likewise, I think it'd be better to avoid "singleton" in the error > > messages. In some places you could perhaps use "single" instead. > > In some you just don't need it at all, eg in > > Jsonpath array subscript is not a singleton numeric value. > > you could just drop the word "singleton" and it'd be perfectly > > correct, since a numeric is necessarily a single value. > > > > Also, we do often use the term "scalar" to mean a non-composite > > value; maybe that would work for this context, in places where > > you do really need that meaning. > > Makes sense for me. "Singleton" word comes from the standard. But > assuming we almost don't use it in the documentation (and especially > don't define it), it's better to get rid of this word altogether. > Removed from error messages. Separate patch adjusting docs as you > proposed is also attached. > > > Sorry to be making you work so hard on this, but I think good > > error messages are an important part of having a quality feature. > > I do see a lot of improvements already compared to where we started. > > It's nothing to be sorry about. I need to learn this in order to make > my further commits better. Thank you for your explanations. Attached patchset contains revised commit messages. I'm going to commit this on no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > Attached patchset contains revised commit messages. I'm going to > commit this on no objections. Sorry for slow response --- I was tied up with release preparations. The -5 patches look pretty good. A couple of nits: @@ -774,9 +749,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, { RETURN_ERROR(ereport(ERROR, (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), - errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), - errdetail("jsonpath array accessor can " - "only be applied to an array")))); + errdetail("jsonpath array accessor can only be applied to an array")))); } break; I think you forgot to s/errdetail/errmsg/ in this one. Likewise at: + errdetail("jsonpath wildcard member accessor can only be applied to an object")))); Also, I saw two places where you missed removing a trailing period from an errmsg: + errmsg("left operand of jsonpath operator %s is not a single numeric value.", + jspOperationName(jsp->type))))); + errmsg("right operand of jsonpath operator %s is not a single numeric value.", + jspOperationName(jsp->type))))); I'd suggest making a quick pass for other instances of the same mistakes, just in case. I'm good with the wording on everything now. regards, tom lane
On Tue, May 7, 2019 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > Attached patchset contains revised commit messages. I'm going to > > commit this on no objections. > > Sorry for slow response --- I was tied up with release preparations. > > The -5 patches look pretty good. A couple of nits: > > @@ -774,9 +749,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp, > { > RETURN_ERROR(ereport(ERROR, > (errcode(ERRCODE_JSON_ARRAY_NOT_FOUND), > - errmsg(ERRMSG_JSON_ARRAY_NOT_FOUND), > - errdetail("jsonpath array accessor can " > - "only be applied to an array")))); > + errdetail("jsonpath array accessor can only be applied to an array")))); > } > break; > > I think you forgot to s/errdetail/errmsg/ in this one. Likewise at: > > + errdetail("jsonpath wildcard member accessor can only be applied to an object")))); > > > Also, I saw two places where you missed removing a trailing period > from an errmsg: > > + errmsg("left operand of jsonpath operator %s is not a single numeric value.", > + jspOperationName(jsp->type))))); > > + errmsg("right operand of jsonpath operator %s is not a single numeric value.", > + jspOperationName(jsp->type))))); > > > I'd suggest making a quick pass for other instances of the same mistakes, > just in case. I'm good with the wording on everything now. Thank you! I've catched couple other cases with errdetail() instead of errmsg(). Pushed. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Couple patches improving jsonpath docs are attached. The first one documents nesting level filter for .** accessor. The second adds to documentation of jsonpath array subsciption usage of expressions and multiple slice ranges. I'm going to push both patches if no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Fri, May 17, 2019 at 6:50 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Couple patches improving jsonpath docs are attached. The first one > documents nesting level filter for .** accessor. The second adds to > documentation of jsonpath array subsciption usage of expressions and > multiple slice ranges. I'm going to push both patches if no > objections. Looking more on documentation I found that I'm not exactly happy on how jsonpath description is organized. Part of description including accessors is given in json datatypes section [1], while part of description including functions and operators is given in json functions and operators section. I think we should give the whole jsonpath description in the single place. So, what about moving it to datatype section leaving functions sections with just SQL-level functions? 1. https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH 2. https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-PATH ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company