Thread: Tackling JsonPath support

Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...

> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation?  And if so, how urgently are people waiting for
it?

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

Kind regards,
Christian



Re: Tackling JsonPath support

From
Pavel Stehule
Date:
Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...

> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation?  And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
 

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -  postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.

Regards

Pavel


Kind regards,
Christian

Re: Tackling JsonPath support

From
Christian Convey
Date:
Hi Pavel,

Can I check a few assumptions about what you're suggesting for this task?

* Our ultimate goal is to give Postgres an implementation of the functions "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the SQL standards.

* The best representation of those standards is found here: [1].

* When [1] mentions a "JSON path expression" or "JSON path language", it's referring to the query language described here: [2].

* Even if other popular DBMS's deviate from [1], or other popular JSONPath implementations deviate from [2], we remain committed to a faithful implementation of [1]. 

* It's okay for my first commit to implement just two things: (a) a PG-internal implementation of JsonPath, and (b) a user-visible implementation of "JSON_QUERY" based on (a).  Later commits could add implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).

Thanks,
Christian




On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...

> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation?  And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
 

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -  postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.

Regards

Pavel


Kind regards,
Christian


Re: Tackling JsonPath support

From
Tom Lane
Date:
Christian Convey <christian.convey@gmail.com> writes:
> * Our ultimate goal is to give Postgres an implementation of the functions
> "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
> SQL standards.
> * The best representation of those standards is found here: [1].
> [1]
> http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf

You're going to need to find a draft standard somewhere, as that
presentation is too thin on details to support writing an actual
implementation.  In particular, it's far from clear that this is
true at all:

> * When [1] mentions a "JSON path expression" or "JSON path language", it's
> referring to the query language described here: [2].
> [2] http://goessner.net/articles/JsonPath

The one slide they have on the path language mentions a lax/strict syntax
that I don't see either in the document you mention or in the Wikipedia
XPath article it links to.  This does not give me a warm feeling.  The SQL
committee is *fully* capable of inventing their own random path notation,
especially when there's no ISO-blessed precedent to bind them.

In general, the stuff I see in these WG3 slides strikes me as pretty
horribly designed.  The committee is evidently still stuck on the idea
that every feature they invent should have a bunch of new bespoke syntax
for function calls, which is a direction we really don't want to go in
because of the parser overhead and need for more fully-reserved keywords.
For instance:WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR)
Really?  Who thought that was a better idea than a simple bool parameter?

I have no objection to providing some functions that implement XPath-like
tests for JSON, but I'm not sure that you ought to try to tie it to
whatever the SQL committee is going to do, especially when they've not
published a finished standard yet.  You may be chasing a moving target.

As for whether JSONPath is the right spec to follow, I'm not sure.
The article you mention is from 2007 and I don't see all that many
other references in a Google search.  I found this Wikipedia page:
https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
which mentions half a dozen competitors, including "JSON Pointer"
which has at least gotten as far as being an RFC standard:
https://tools.ietf.org/html/rfc6901
I'm not up enough on the JSON ecosystem to know which of these has the
most traction, but I'm unconvinced that it's JSONPath.
        regards, tom lane



Re: Tackling JsonPath support

From
Pavel Stehule
Date:


2016-11-13 18:13 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Christian Convey <christian.convey@gmail.com> writes:
> * Our ultimate goal is to give Postgres an implementation of the functions
> "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
> SQL standards.
> * The best representation of those standards is found here: [1].
> [1]
> http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf

You're going to need to find a draft standard somewhere, as that
presentation is too thin on details to support writing an actual
implementation.  In particular, it's far from clear that this is
true at all:

> * When [1] mentions a "JSON path expression" or "JSON path language", it's
> referring to the query language described here: [2].
> [2] http://goessner.net/articles/JsonPath

The one slide they have on the path language mentions a lax/strict syntax
that I don't see either in the document you mention or in the Wikipedia
XPath article it links to.  This does not give me a warm feeling.  The SQL
committee is *fully* capable of inventing their own random path notation,
especially when there's no ISO-blessed precedent to bind them.

In general, the stuff I see in these WG3 slides strikes me as pretty
horribly designed.  The committee is evidently still stuck on the idea
that every feature they invent should have a bunch of new bespoke syntax
for function calls, which is a direction we really don't want to go in
because of the parser overhead and need for more fully-reserved keywords.
For instance:
        WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR)
Really?  Who thought that was a better idea than a simple bool parameter?

I have no objection to providing some functions that implement XPath-like
tests for JSON, but I'm not sure that you ought to try to tie it to
whatever the SQL committee is going to do, especially when they've not
published a finished standard yet.  You may be chasing a moving target.

As for whether JSONPath is the right spec to follow, I'm not sure.
The article you mention is from 2007 and I don't see all that many
other references in a Google search.  I found this Wikipedia page:
https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
which mentions half a dozen competitors, including "JSON Pointer"
which has at least gotten as far as being an RFC standard:
https://tools.ietf.org/html/rfc6901
I'm not up enough on the JSON ecosystem to know which of these has the
most traction, but I'm unconvinced that it's JSONPath.

We can use some other databases with this implementation as references.

I have to agree, so the people in SQL committee are not too consistent - and sometimes creates too cobolish syntax, but it is standard - and it is implemented by major vendors.

We doesn't need to implement full API - not in first step - important point is don't close door to possible ANSI conformance. In first step we can take the best and important from standard. It can be similar to our SQL/XML implementation - we implement maybe 75% - and only XPath instead XQuery, but I don't feel any weak. I see very useful "JSON_TABLE" function, which is good for start.
 
Regards

Pavel


                        regards, tom lane

Re: Tackling JsonPath support

From
Pavel Stehule
Date:
Hi

2016-11-13 15:14 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
Hi Pavel,

Can I check a few assumptions about what you're suggesting for this task?

* Our ultimate goal is to give Postgres an implementation of the functions "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the SQL standards.

* The best representation of those standards is found here: [1].

* When [1] mentions a "JSON path expression" or "JSON path language", it's referring to the query language described here: [2].

* Even if other popular DBMS's deviate from [1], or other popular JSONPath implementations deviate from [2], we remain committed to a faithful implementation of [1]. 

* It's okay for my first commit to implement just two things: (a) a PG-internal implementation of JsonPath, and (b) a user-visible implementation of "JSON_QUERY" based on (a).  Later commits could add implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).

My goal is implementation of JSON_TABLE function - this function can be used instead any other mentioned function (and it is really useful - it is usual task - transform JSON to table). The SQL/JSON is pretty new and bigger for implementation in one step. Nobody knows it from PostgreSQL world. The our SQL/XML needed more than 10 years and still is not fully complete - and we used power and features libxml2 (nothing similar we have for JSON). But almost what is daily need from SQL/XML we have. For JSON_TABLE we need only basic features of JSONPath - the predicates are not necessary in first step.

http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/json/src/tpc/db2z_bif_jsontable.html The vendors use name for this query language "SQL/JSON path expressions" - so important source is SQL/JSON (this can be different than origin JSONPath (your second source)).

Regards

Pavel




Regards

Pavel


On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...

> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation?  And if so, how urgently are people waiting for
it?

yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.

But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
 

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.

You can learn from current searching on JSON -  postgresql/src/backend/utils/adt/json.c

And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.

Regards

Pavel


Kind regards,
Christian



Re: Tackling JsonPath support

From
Christian Convey
Date:
From looking at other databases' docs, it seems like the behavior of various JSON-related operators / functions are described partially in terms of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)

If I'm correctly understanding the situation, It sounds like we have two big unknowns: 

(a) The exact syntax/semantics of JSON path searching, especially w.r.t. corner cases and error handling, and 

(b) The syntax/semantics of whatever SQL operators / functions are currently defined in terms of (a).  E.g., "JSON_TABLE".

If that's correct, then what do you guys think about us taking the following incremental approach?

Step 1: I'll dig into the implementations described above, to see what's similar and different between the JSON-path-expression syntax and semantics offered by each.  I then report my findings here, and we can hopefully reach a consensus about the syntax/semantics of PG's json-path-expression handling.

Step 2: I submit a patch for adding a new function to "contrib", which implements the JSON-path-expression semantics chosen in Step 1.  The function will be named such that people won't confuse it with any (eventual) SQL-standard equivalent.

Step 3: PG developers can, if they choose, start defining new JSON operator / functions, and/or port existing JSON-related functions, in terms of the function created in Step 2.

I see the following pros / cons to this approach:

Pro: It gives us a concrete start on this functionality, even though we're not sure what's happening with the SQL standard.

Pro: The risk of painting ourselves into a corner is relatively low, because we're putting the functionality in "contrib", and avoid function names which conflict with likely upcoming standards.

Pro: It might permit us to give PG users access to JSONPath -like functionality sooner than if we wait until we're clear on the ideal long-term interface.

Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4].  But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.

- Christian

Re: Tackling JsonPath support

From
Pavel Stehule
Date:


2016-11-27 17:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
From looking at other databases' docs, it seems like the behavior of various JSON-related operators / functions are described partially in terms of a "json path expression":

* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)

If I'm correctly understanding the situation, It sounds like we have two big unknowns: 

(a) The exact syntax/semantics of JSON path searching, especially w.r.t. corner cases and error handling, and 

(b) The syntax/semantics of whatever SQL operators / functions are currently defined in terms of (a).  E.g., "JSON_TABLE".

If that's correct, then what do you guys think about us taking the following incremental approach?

Step 1: I'll dig into the implementations described above, to see what's similar and different between the JSON-path-expression syntax and semantics offered by each.  I then report my findings here, and we can hopefully reach a consensus about the syntax/semantics of PG's json-path-expression handling.

Step 2: I submit a patch for adding a new function to "contrib", which implements the JSON-path-expression semantics chosen in Step 1.  The function will be named such that people won't confuse it with any (eventual) SQL-standard equivalent.

Step 3: PG developers can, if they choose, start defining new JSON operator / functions, and/or port existing JSON-related functions, in terms of the function created in Step 2.

I see the following pros / cons to this approach:

Pro: It gives us a concrete start on this functionality, even though we're not sure what's happening with the SQL standard.

Pro: The risk of painting ourselves into a corner is relatively low, because we're putting the functionality in "contrib", and avoid function names which conflict with likely upcoming standards.

Pro: It might permit us to give PG users access to JSONPath -like functionality sooner than if we wait until we're clear on the ideal long-term interface.

Incremental work is great idea - I like this this style. Instead contrib, you can use public repository on github. Minimally for first stage is better to live outside core - you are not restricted by PostgreSQL development process. When your code will be stabilized, then you can go to commitfest. I believe so we need good JSON support. The XML support helps to PostgreSQL lot of, JSON will be great too.
 

Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4].  But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.

We can implement subset only - our XPath based on libxml2 does it too. The good target is support of usual examples on the net.

Regards

Pavel

Re: Tackling JsonPath support

From
David Fetter
Date:
On Sun, Nov 27, 2016 at 11:50:30AM -0500, Christian Convey wrote:
> >From looking at other databases' docs, it seems like the behavior of
> various JSON-related operators / functions are described partially in terms
> of a "json path expression":
> 
> * In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
> * In MySQL: [2]
> * In DB2: [3]
> * In MS SQL Server: [4]
> * (Whatever the Standards committee will end up producing.)

There's another option we should also consider: jq
<https://stedolan.github.io/jq/>.  It's available under a
PostgreSQL-compatible license, and has had a LOT of work put into
correctness and performance.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Tackling JsonPath support

From
Pavel Stehule
Date:
<p dir="ltr"><p dir="ltr">Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <<a
href="mailto:david@fetter.org">david@fetter.org</a>>:<br/> ><br /> > On Sun, Nov 27, 2016 at 11:50:30AM -0500,
ChristianConvey wrote:<br /> > > >From looking at other databases' docs, it seems like the behavior of<br />
>> various JSON-related operators / functions are described partially in terms<br /> > > of a "json path
expression":<br/> > ><br /> > > * In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column":
[1]<br/> > > * In MySQL: [2]<br /> > > * In DB2: [3]<br /> > > * In MS SQL Server: [4]<br /> >
>* (Whatever the Standards committee will end up producing.)<br /> ><br /> > There's another option we should
alsoconsider: jq<br /> > <<a href="https://stedolan.github.io/jq/">https://stedolan.github.io/jq/</a>>.  It's
availableunder a<br /> > PostgreSQL-compatible license, and has had a LOT of work put into<br /> > correctness
andperformance.<p dir="ltr">we can use it for inspiration. but the syntax of this tool is little bit too complex and
toooriginal against Json path ... jsonpath is relative simple implementation of xpath to json<p dir="ltr">we have one
proprietarysyntax already, two is maybe too much :-)<br /><p dir="ltr">><br /> > Best,<br /> > David.<br />
>--<br /> > David Fetter <david(at)fetter(dot)org> <a href="http://fetter.org/">http://fetter.org/</a><br
/>> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter<br /> > Skype: davidfetter      XMPP:
david(dot)fetter(at)gmail(dot)com<br/> ><br /> > Remember to vote!<br /> > Consider donating to Postgres: <a
href="http://www.postgresql.org/about/donate">http://www.postgresql.org/about/donate</a><br/> 

Re: Tackling JsonPath support

From
Christian Convey
Date:


On Mon, Nov 28, 2016 at 5:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
​...​
Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4].  But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.

We can implement subset only - our XPath based on libxml2 does it too. The good target is support of usual examples on the net.

​Hi Pavel,

Can you clarify what you meant?  I *think* you're saying:

* It's not important for me to match the syntax/semantics of the json-path implementations found in MySQL / Oracle / DB2 / ​MS SQL Server, and

* Instead, I should just use examples / explanations on the web as my guidance.

​Thanks,​
Christian


Re: Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Nov 28, 2016 at 5:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...
Incremental work is great idea - I like this this style. Instead contrib, you can use public repository on github. Minimally for first stage is better to live outside core - you are not restricted by PostgreSQL development process. When your code will be stabilized, then you can go to commitfest. I believe so we need good JSON support. The XML support helps to PostgreSQL lot of, JSON will be great too.

​Hi Pavel,

Thanks for the suggestion.

I am planning to use one of my own public github repos as the location for my work.

I thought by adding my first implementation to "contrib", we could make this functionality available to end-users, even before there was a consensus about what PG's "official" JSON-related operators should have for syntax and semantics.

Does my reasoning make sense?

- C
 

Re: Tackling JsonPath support

From
Christian Convey
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 28, 2016 at 9:40 AM, Pavel Stehule <span
dir="ltr"><<ahref="mailto:pavel.stehule@gmail.com" target="_blank">pavel.stehule@gmail.com</a>></span>
wrote:</div><divclass="gmail_quote"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">​...​</div><blockquoteclass="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><p dir="ltr"><span class="">> ​Hi Pavel,<br /> ><br /> > Can
youclarify what you meant?  I *think* you're saying:<br /> ><br /> > * It's not important for me to match the
syntax/semanticsof the json-path implementations found in MySQL / Oracle / DB2 / ​MS SQL Server, and<br />
></span><pdir="ltr">oh no. the syntax is important. But for start we can have a subset. For json table function ..
jsonto relation mapping is important path expression. some other features like predicates<br /> are nice, but can be
implementedlater.<p dir="ltr">Im sorry. My English is bad.</blockquote><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">​HiPavel,</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">You'reEnglish is very good, actually.  I think the confusion arises from
mespeaking in vague terms.  I apologize for that.  Allow me to be more specific about what I'm proposing to
do.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">I propose adding to "contrib" a function with the
followingcharacteristics:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br
/></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">* Its signature is "json_path( jsonb
from_json,string json_path_expression) --> jsonb".</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">*The function will hopefully be a useful building block for PG's
implementationof "official" JSON operators such as "JSON_TABLE".  Once the PG community agrees on what those operators'
syntax/semanticsshould be.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">​</div><div
class="gmail_default"style="font-family:arial,helvetica,sans-serif">* The function will hopefully be immediately useful
toPG users who want JSONPath -like operations on their "jsonb" objects.</div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">-C</div></div></div></div> 

Re: Tackling JsonPath support

From
Christian Convey
Date:

On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote

> I thought by adding my first implementation to "contrib", we could make this functionality available to end-users, even before there was a consensus about what PG's "official" JSON-related operators should have for syntax and semantics.
>

this time the commiters dislike the contrib dir. It is hard to push there anything :-(. You can try it, but it can be lost time.


​Thanks for the warning.  I'm okay with  my patch adding the "json_path" function to the core PG code.​

I would still suggest that we hold off on having my first patch implement an official JSON-related operator such as "JSON_TABLE".  I would prefer to have my "json_path" function available to users even before we know how "JSON_TABLE", etc. should behave.

​Does that sound reasonable?​

Re: Tackling JsonPath support

From
Nico Williams
Date:
I wonder what it might take to integrate jq[1] (via libjq) with
PostgreSQL...  The internal representation of JSON data is bound to be
completely different, no doubt, but jq is a fantastic language, akin to
XPath and XSLT combined, but with nice syntax.

[1] https://stedolan.github.io/jq (note: jq is NOT JQuery)

Nico
-- 



Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Nov 28, 2016 at 05:56:41PM +0100, Pavel Stehule wrote:
> Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <david@fetter.org>:
> > There's another option we should also consider: jq
> > <https://stedolan.github.io/jq/>.  It's available under a
> > PostgreSQL-compatible license, and has had a LOT of work put into
> > correctness and performance.
>
> we can use it for inspiration. but the syntax of this tool is little bit
> too complex and too original against Json path ... jsonpath is relative
> simple implementation of xpath to json
>
> we have one proprietary syntax already, two is maybe too much :-)

jq is hardly proprietary :)

JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

libjq has a number of desirable features, mostly its immutable/COW data
structures.  In libjq data structures are only mutated when there's
only one reference to them, but libjq's jv API is built around
immutability, so jv values are always notionally immutable.  For
example, one writes:
 jv a = jv_array();
 a = jv_array_append(a, jv_true()); // `a' is notionally new, but since                                    // it had
onlyone reference, its                                    // memory is reused 

and similarly for objects.  One could instead write:
 jv a = jv_array_append(jv_array(), jv_true());
or
 jv a = JV_ARRAY(jv_true());

One of the nice things about libjv is that almost every function
consumes a reference of every jv value passed in, with very few
exceptions.  This simplifies memory management, or at least avoidance of
double-free and use-after-free (it can be harder to track down leaks
though, since tools like valgrind don't understand that jv_copy() call
sites can be like allocations).

Nico
--



Re: Tackling JsonPath support

From
Petr Jelinek
Date:

On 28/11/16 18:57, Christian Convey wrote:
> 
> On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com
> <mailto:pavel.stehule@gmail.com>> wrote
> 
>     > I thought by adding my first implementation to "contrib", we could make this functionality available to
end-users,even before there was a consensus about what PG's "official" JSON-related operators should have for syntax
andsemantics.
 
>     >
> 
>     this time the commiters dislike the contrib dir. It is hard to push
>     there anything :-(. You can try it, but it can be lost time.
> 
> 
> ​Thanks for the warning.  I'm okay with  my patch adding the "json_path"
> function to the core PG code.​
> 
> I would still suggest that we hold off on having my first patch
> implement an official JSON-related operator such as "JSON_TABLE".  I
> would prefer to have my "json_path" function available to users even
> before we know how "JSON_TABLE", etc. should behave.
> 
> ​Does that sound reasonable?​
> 

Hi,

just make it extension, not contrib module, there is not much difference
between those except contrib is included in distribution.

Extensions that provide just functions are easy to integrate into core
(that's how some of the existing json functions were added in the past
as well).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com> wrote:
...
JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

​Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

I ask because I've been struggling to identify clear requirements for the json-path functionality I'm trying to provide.  It sounds like perhaps you have something concrete in mind.

Since I myself have no need currently for this functionality, I'm left guessing about hypothetical users of it.​  My current mental model is:

(a) Backend web developers.  AFAICT, their community has mostly settled on the syntax/semantics proposed by Stefan Groessner.  It would probably be unkind for PG's implementation to deviate from that without a good reason.

(b) PG hackers who will eventually implement the ISO SQL standard operators.  In the standards-committee meeting notes I've seen, it seemed to me that they were planning to define some operators in terms of json-path expression.  So it would probably be good if whatever json-path function I implement turns out to comply with that standard, so that the PG-hackers can use it as a building block for their work.

(c) Pavel.  (I'm still somewhat unclear on what has him interested in this, and what his specific constraints are.)

- Christian

Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
> wrote:
> ...
> > JSON Path is not expressive enough (last I looked) and can be mapped
> > onto jq if need be anyways.
> 
> Hi Nico,
> 
> Could you please clarify what you mean by "not expressive enough"?

jq is a functional language that has these and other features:
- recursion- generators- lazy evaluation (of sorts)- path expressions- math functionality (libm, basically)- reduction-
functions-and other things
 

(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)

jq is and feels a lot like a SQL, but for JSON.

> I ask because I've been struggling to identify clear requirements for the
> json-path functionality I'm trying to provide.  It sounds like perhaps you
> have something concrete in mind.

SQL imposes structure on data.  Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say.  Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).

Less rigid schema constraints do not and should not preclude powerful
query languages.

One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them.  But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.

SQL is very powerful.  One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS.  jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML.  While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

> Since I myself have no need currently for this functionality, I'm left
> guessing about hypothetical users of it.  My current mental model is:

That's a bit like asking what is the use for SQL :^)  The point is that
SQL is a powerful query language, and so is jq.  Each is appropriate to
its own domain; both could be used together.

> (a) Backend web developers.  AFAICT, their community has mostly settled on
> the syntax/semantics proposed by Stefan Groessner.  It would probably be
> unkind for PG's implementation to deviate from that without a good reason.

I can't speak for the community.  I wouldn't take it personally that jq
be not chosen, nor any other proposal of mine.  If it's politically
easier, then do that.

> (b) PG hackers who will eventually implement the ISO SQL standard
> operators.  In the standards-committee meeting notes I've seen, it seemed
> to me that they were planning to define some operators in terms of
> json-path expression.  So it would probably be good if whatever json-path
> function I implement turns out to comply with that standard, so that the
> PG-hackers can use it as a building block for their work.

These could still be implemented (e.g., using jq itself).

> (c) Pavel.  (I'm still somewhat unclear on what has him interested in this,
> and what his specific constraints are.)

Hmm?

Nico
-- 



Re: Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com> wrote:
On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
> wrote:
> ...
> > JSON Path is not expressive enough (last I looked) and can be mapped
> > onto jq if need be anyways.
>
> Hi Nico,
>
> Could you please clarify what you mean by "not expressive enough"?

jq is a functional language that has these and other features:

 - recursion
 - generators
 - lazy evaluation (of sorts)
 - path expressions
 - math functionality (libm, basically)
 - reduction
 - functions
 - and other things

(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)

jq is and feels a lot like a SQL, but for JSON.

> I ask because I've been struggling to identify clear requirements for the
> json-path functionality I'm trying to provide.  It sounds like perhaps you
> have something concrete in mind.

SQL imposes structure on data.  Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say.  Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).

Less rigid schema constraints do not and should not preclude powerful
query languages.

One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them.  But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.

SQL is very powerful.  One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS.  jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML.  While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

> Since I myself have no need currently for this functionality, I'm left
> guessing about hypothetical users of it.  My current mental model is:

That's a bit like asking what is the use for SQL :^)  The point is that
SQL is a powerful query language, and so is jq.  Each is appropriate to
its own domain; both could be used together.

Thanks for the explanation.  It sounds like your original point was NOT that json-path isn't sufficient for "${specific use X}".  

Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.

Is that what you're saying?

​...


> (c) Pavel.  (I'm still somewhat unclear on what has him interested in this,
> and what his specific constraints are.)

Hmm?

​Context: The reason I'm trying to work on a json-path implementation is that Pavel Stehule suggested it as a good first PG-hacking project for me.  At the time, it sounded like he had a use for the feature.

- C

Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Nov 28, 2016 at 06:38:55PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com>
> wrote:
> >
> 
> Thanks for the explanation.  It sounds like your original point was NOT
> that json-path isn't sufficient for "${specific use X}".

The only uses of SQL w/ JSON I've seen so far in live action are to
implement EAV schemas on PostgreSQL.  Since PostgreSQL lacks an ANY
type... using the hstore or jsonb to store data that would otherwise
require an ANY type is the obvious thing to do.  Naturally this use
doesn't need deeply nested JSON data structures, so even JSONPath is
overkill for it!

However, there are use cases I can imagine:
- generating complex JSON from complex (e.g., recursive) SQL data where  the desired JSON "schema" is not close to the
SQLschema
 
  I've used jq a *lot* to convert schemas.  I've also use XSLT for the  same purpose.  I've also used SQL RDBMSes and
jqtogether a fair bit,  either having jq consume JSON documents to output INSERT and other  statements, or having a SQL
applicationoutput JSON that I then  convert to an appropriate schema using jq.
 
  Naturally I can keep using these two tools separately.  There's not  much to gain from integrating them for this
particularsort of  use-case.
 
- handling JSON documents with very loose schemata, perhaps arbitrary  JSON documents, embedded in a SQL DB
  I've not needed to do this much, so I have no specific examples.  But, of course, one reason I've not needed to do
thisis that today  it kinda can't be done with enough expressivity.
 

There are many use-cases for general-purpose programming languages, and
even for very widely-applicable domain-specific programming language.

It's especially difficult to name a specific use-case for a language
that doesn't exist -- in this case that would be SQL + (jq and/or
JSONPath).

> Instead, your point was that jq seems to have many advantages over
> json-path in general, and therefore PG should offer jq instead or, or in
> addition to, json-path.
> 
> Is that what you're saying?

Roughly, yes.  The distinct advantage is that jq is much more general
and expressive, not unlike SQL itself.

> > Hmm?
> 
> Context: The reason I'm trying to work on a json-path implementation is
> that Pavel Stehule suggested it as a good first PG-hacking project for me.
> At the time, it sounded like he had a use for the feature.

I see.  I understand that.  If you've already made a significant
investment, then I don't blame you for not wanting to risk it.  On the
other hand, if melding jsonb and jq happens to be easy, then you'll get
much more bang from it for your investment.  Naturally, you do what you
prefer, and if the reality on the ground is JSONPath, then so be it.  If
I had time and felt sufficiently strongly, I'd contribute jq
integration; as it is I don't, and beggars can't be choosers.

Nico
-- 



Re: Tackling JsonPath support

From
"David G. Johnston"
Date:
On Mon, Nov 28, 2016 at 7:38 PM, Christian Convey <christian.convey@gmail.com> wrote:
On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com> wrote:
While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.


IMO jq is considerably closer to XSLT than XPath - which leads me to figure that since xml has both that JSON can benefit from jq and json-path.  I'm not inclined to dig too deep here but I'd rather take jq in the form of "pl/jq" and have json-path (abstractly) as something that you can use like "pg_catalog.get_value(json, json-path)"

​David J.

Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Nov 28, 2016 at 08:00:46PM -0700, David G. Johnston wrote:
> IMO jq is considerably closer to XSLT than XPath - which leads me to figure
> that since xml has both that JSON can benefit from jq and json-path.  I'm
> not inclined to dig too deep here but I'd rather take jq in the form of
> "pl/jq" and have json-path (abstractly) as something that you can use like
> "pg_catalog.get_value(json, json-path)"

JSONPath looks a lot like a small subset of jq.  Here are some examples:
       JSONPath                    |                   jq
-------------------------------------------------------------------
   $.store.book[0].title           | .store.book[0].title   $['store']['book'][0]['title']  |
.["store"]["book"][0]["title"]  $..author                       | ..|.author   $.store.*                       |
.store[]  $.store..price                  | .store|..|.price?   $..book[2]                      | [..|.book?][2]
$..book[?(@.isbn)]             | ..|.book?|select(.isbn)   $..book[?(@.price<10)]          |
..|.book?|select(.price<10)  $..*                            | ..?
 

Of course, jq can do much more than this.  E.g.,
   # Output [<title>, <price>] of all books with an ISBN:   ..|.book?|select(.isbn)|[.title,.price]
   # Output the average price of books with ISBNs appearing anywhere in   # the input document:   reduce
(..|.book?|select(.isbn)|.price)as $price     (      # Initial reduction state:      {price:0,num:0};      # State
update     .price = (.price * .num + $price) / (.num + 1) | .num += 1) |   # Extract average price   .price
 

Of course one could just wrap that with a function:
   def avg(pathexp; cond; v):     reduce (pathexp | select(cond) | v) as $v       ({v: 0, c: 0};        .v = (.v * .c +
$v)/ (.c + 1) | .c += 1) | v;
 
   # Average price of books with ISBNs:   avg(..|.book?; .isbn; .price)
   # Average price of all books:   avg(..|.book?; true; .price)

There's much, much more.

Note that jq comes with a C implementation.  It should be easy to make
bindings to it from other programming language run-times.

Nico
-- 



Re: Tackling JsonPath support

From
Pavel Stehule
Date:


2016-11-29 2:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com> wrote:
...
JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.

​Hi Nico,

Could you please clarify what you mean by "not expressive enough"?

I ask because I've been struggling to identify clear requirements for the json-path functionality I'm trying to provide.  It sounds like perhaps you have something concrete in mind.

Since I myself have no need currently for this functionality, I'm left guessing about hypothetical users of it.​  My current mental model is:

(a) Backend web developers.  AFAICT, their community has mostly settled on the syntax/semantics proposed by Stefan Groessner.  It would probably be unkind for PG's implementation to deviate from that without a good reason.

(b) PG hackers who will eventually implement the ISO SQL standard operators.  In the standards-committee meeting notes I've seen, it seemed to me that they were planning to define some operators in terms of json-path expression.  So it would probably be good if whatever json-path function I implement turns out to comply with that standard, so that the PG-hackers can use it as a building block for their work.

(c) Pavel.  (I'm still somewhat unclear on what has him interested in this, and what his specific constraints are.)

My target is simple - 1. to have good ANSI/SQL support, 2. to have good JSON to relation mapping function - ANSI/SQL JSONTABLE does it.

We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.

Regards

Pavel

- Christian


Re: Tackling JsonPath support

From
Pavel Stehule
Date:


2016-11-29 4:00 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
On Mon, Nov 28, 2016 at 7:38 PM, Christian Convey <christian.convey@gmail.com> wrote:
On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com> wrote:
While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.

Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.


IMO jq is considerably closer to XSLT than XPath - which leads me to figure that since xml has both that JSON can benefit from jq and json-path.  I'm not inclined to dig too deep here but I'd rather take jq in the form of "pl/jq" and have json-path (abstractly) as something that you can use like "pg_catalog.get_value(json, json-path)"

I am not against to this idea. The jq and similar environments can have sense in JSON NoSQL databases. Using it in relation database  in searching functions is a overkill.

Regards

Pavel

 

​David J.


Re: Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?

​- C

Re: Tackling JsonPath support

From
Pavel Stehule
Date:


2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?

surely not.

follow ANSI/SQL :)

Pavel
 

​- C

Re: Tackling JsonPath support

From
Christian Convey
Date:
On Mon, Nov 28, 2016 at 10:37 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
​​
We now support XPath function - JSONPath is similar to XPath -
​​
it is better for user, because have to learn only one language.

I'm not sure I understand.

Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?

surely not.

follow ANSI/SQL :)

I see.  Then I'm afraid I still don't understand what you're main point was when you wrote:

We now support XPath function - JSONPath is similar to XPath - 
​​
it is better for user, because have to learn only one language.

- C

 
 

Re: Tackling JsonPath support

From
Petr Jelinek
Date:
On 29/11/16 07:37, Pavel Stehule wrote:
> 
> 
> 2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com
> <mailto:christian.convey@gmail.com>>:
> 
>     On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule
>     <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>wrote:
> 
>         We now support XPath function - JSONPath is similar to XPath -
>         it is better for user, because have to learn only one language.
> 
> 
>     I'm not sure I understand.
> 
>     Are you suggesting that we use XPath, not JSONPath, as our language
>     for json-path expressions?
> 
> 
> surely not.
> 
> follow ANSI/SQL :)
> 

Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Tackling JsonPath support

From
Nico Williams
Date:
On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote:
> Just to add to this, the SQL/JSON proposals I've seen so far, and what
> Oracle, MSSQL and Teradata chose to implement already is basically
> subset of jsonpath (some proposals/implementations also include
> lax/strict prefix keyword on top of that). I think that should give us
> some hint on what the base functionality should look like.

Yes, that'd be base functionality.  You can go above and beyond.

I agree with Pavel that jq could be used as a user-defined function, but
proper integration would be better because it would avoid the need to
format and parse JSON around calls to jq, and also because PG could
compile jq programs when preparing SQL statements.  Besides, the libjq
jv API is *very* nice.

Nico
-- 



Re: Tackling JsonPath support

From
Petr Jelinek
Date:
On 29/11/16 17:28, Nico Williams wrote:
> On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote:
>> Just to add to this, the SQL/JSON proposals I've seen so far, and what
>> Oracle, MSSQL and Teradata chose to implement already is basically
>> subset of jsonpath (some proposals/implementations also include
>> lax/strict prefix keyword on top of that). I think that should give us
>> some hint on what the base functionality should look like.
> 
> Yes, that'd be base functionality.  You can go above and beyond.
> 

But let's just do the base thing first before going to much more
complicated endeavor, especially if this is supposed to be the first
patch for Christian. Also, one of the points of the SQL is the
compatibility so that's what we should strive for first, especially
given that the syntax of the jq is not compatible AFAICS.

> I agree with Pavel that jq could be used as a user-defined function, but
> proper integration would be better because it would avoid the need to
> format and parse JSON around calls to jq, and also because PG could
> compile jq programs when preparing SQL statements.  Besides, the libjq
> jv API is *very* nice.
> 

I think this would be good as extension first and then we can see what
to do with it next (ie I agree with Pavel).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Tackling JsonPath support

From
Christian Convey
Date:
On Tue, Nov 29, 2016 at 8:18 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
...
Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.

I agree.  My guess is that PG users would benefit most from:

(1) Conformance to whatever ISO standard regarding JSON operators eventually makes it out of the working group.
(2) Compatibility with other widely-used DBMS's.
(3) Compatibility with the JSONPath functionality ​used by web developers.  (Although I don't currently have a grasp on which frameworks / libraries this entails.)

I *think* that (1), (2), and (3) are in approximate agreement about the syntax and semantics of the path-expression language: the language proposed by Stefan Groessner, plus the strict vs. lax distinction.

I think I can satisfy (3) with a PG extension which provides a function that approximately implements JSONPath.  My short-term plans are to submit such a patch.

Hopefully that patch's function will be a helpful starting point for satisfying (1) and (2) as well.  But that can be decided later.

Nico Williams has argued for using "jq".  I don't think jq satisfies any of (1), (2), or (3), so I don't see a good case for incorporating it in my short-term plans.  There *may* be a case for using jq internally to my implementation; I'll try to look into that.

Re: Tackling JsonPath support

From
Robert Haas
Date:
On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
<christian.convey@gmail.com> wrote:
> I think I can satisfy (3) with a PG extension which provides a function that
> approximately implements JSONPath.  My short-term plans are to submit such a
> patch.

FWIW, I think that's a fine plan.  I don't really know whether
JSONPath is the right standard to pick for the task of extracting bits
of JSON from other bits of JSON, but I think there's some value in
picking something is simple enough that we can implement it in our own
code and not have to rely on a third-party library.  Of course, if
somebody feels like adding a configure option for --with-jq and
appropriate interfaces to integrate with JQ, we could consider that,
too, but that imposes a packaging requirement that a home-grown
implementation doesn't.  I'd want to hear more than one vote for such
a course of action before embracing it.  If JQ is a Turing-complete
query language, integrating it might be quite difficult -- for
example, we'd need a way to make sure that it does periodic
CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or
crash if those calls decide longjmp() away due to an ERROR -- and
would we let people query database tables with it?  Would that be
efficient?  I think it's fine to have more limited objectives than
what a JQ implementation would apparently entail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tackling JsonPath support

From
Nico Williams
Date:
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
> <christian.convey@gmail.com> wrote:
> > I think I can satisfy (3) with a PG extension which provides a function that
> > approximately implements JSONPath.  My short-term plans are to submit such a
> > patch.
> 
> FWIW, I think that's a fine plan.  I don't really know whether
> JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard.  Are there particular proposals that the ANSI
SQL working group is considering?

> of JSON from other bits of JSON, but I think there's some value in
> picking something is simple enough that we can implement it in our own
> code and not have to rely on a third-party library.  Of course, if
> somebody feels like adding a configure option for --with-jq and

Sure.  My main concern is that I don't want to have to parse/format JSON
around every such call.  I'd rather parsed JSON remain in an internal
form for as long as possible.

Speaking of which, you could use libjq's jv API and not support the jq
language itself.

> appropriate interfaces to integrate with JQ, we could consider that,
> too, but that imposes a packaging requirement that a home-grown
> implementation doesn't.  I'd want to hear more than one vote for such

What we do in Heimdal, OpenAFS, and other open source projects, some
times, is include a copy / git submodule / similar of some such external
dependencies.  Naturally it's not possible to do this for all external
dependencies, but it works well enough.  The jv API part of jq is small
and simple, and could be ripped out into a library that could be
included in PostgreSQL.

> a course of action before embracing it.  If JQ is a Turing-complete
> query language, integrating it might be quite difficult -- for

Even if it weren't!  (It is.)

Consider this expression using a builtin in jq:
 [range(4503599627370496)]

That is, an array of integers from 0 to 4503599627370495, inclusive.
That will "halt" given a very, very large computer and a lot of time.

(Because jq is Turning-complete, range() can be coded in jq itself, and
some variants of range() are.)

> example, we'd need a way to make sure that it does periodic
> CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or
> crash if those calls decide longjmp() away due to an ERROR -- and
> would we let people query database tables with it?  Would that be
> efficient?  I think it's fine to have more limited objectives than
> what a JQ implementation would apparently entail.

Agreed.  I think this means that we need either or both of a variant of
the C jq_next() function that takes either a timeout parameter, or a
jq_intr() function that can cause a running jq_next() to stop.

(Tolerating longjmp() is harder to do and I'd rather not.)

Other projects, like, say, nginx or similar where there is a per-client
or per-connection memory pool to limit memory footprint, might want
libjq to get an allocator hook, so that's another enhancement to
consider.  If that's something that PostgreSQL would need, please let me
know.

Nico
-- 



Re: Tackling JsonPath support

From
Christian Convey
Date:
On Fri, Dec 2, 2016 at 1:32 PM, Nico Williams <nico@cryptonector.com> wrote:
...
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
> <christian.convey@gmail.com> wrote:
> > I think I can satisfy (3) with a PG extension which provides a function that
> > approximately implements JSONPath.  My short-term plans are to submit such a
> > patch.
>
> FWIW, I think that's a fine plan.  I don't really know whether
> JSONPath is the right standard to pick for the task of extracting bits

It's not even a standard.  Are there particular proposals that the ANSI
SQL working group is considering?

​Hi Nico, it seems to be something in the works with the standards committee.  We were discussing it earlier in the thread: [1]

Kind regards,
Christian


 

Re: Tackling JsonPath support

From
Robert Haas
Date:
On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote:
> On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:
>> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
>> <christian.convey@gmail.com> wrote:
>> > I think I can satisfy (3) with a PG extension which provides a function that
>> > approximately implements JSONPath.  My short-term plans are to submit such a
>> > patch.
>>
>> FWIW, I think that's a fine plan.  I don't really know whether
>> JSONPath is the right standard to pick for the task of extracting bits
>
> It's not even a standard.  Are there particular proposals that the ANSI
> SQL working group is considering?

I don't know.

>> of JSON from other bits of JSON, but I think there's some value in
>> picking something is simple enough that we can implement it in our own
>> code and not have to rely on a third-party library.  Of course, if
>> somebody feels like adding a configure option for --with-jq and
>
> Sure.  My main concern is that I don't want to have to parse/format JSON
> around every such call.  I'd rather parsed JSON remain in an internal
> form for as long as possible.

Sure, but that seems like an orthogonal consideration.

> Speaking of which, you could use libjq's jv API and not support the jq
> language itself.

You can submit a patch for that if you like, but I don't think that's
a good reason to block what Christian wants to do, because it's a much
bigger change.  Now if you whip that patch up in a short period of
time and everybody agrees that it gives us everything Christian wanted
to implement and other good stuff too, fine; we can reject Christian's
approach then.  Or if Christian's patch is committed, we can rip it
back out again if and when somebody does this (or any other thing we
all agree is better).  But I object strenuously to the idea that we
should reject the idea of drinking the bottle of beer we have in the
house because there's a liquor store down the road where we can buy an
entire keg.  The possibility of awesome (with enough work) is not a
reason to reject good (via a straightforward approach).

>> appropriate interfaces to integrate with JQ, we could consider that,
>> too, but that imposes a packaging requirement that a home-grown
>> implementation doesn't.  I'd want to hear more than one vote for such
>
> What we do in Heimdal, OpenAFS, and other open source projects, some
> times, is include a copy / git submodule / similar of some such external
> dependencies.  Naturally it's not possible to do this for all external
> dependencies, but it works well enough.  The jv API part of jq is small
> and simple, and could be ripped out into a library that could be
> included in PostgreSQL.

We are typically avoid copying things into our repository because then
we become responsible for pulling in any subsequent fixes.  The few
instances that we have right now (zic, snowball, Harry Spencer's
regexp stuff) have imposed a significant code maintenance burden.  The
git submodules approach might avoid that problem, but it would still
be the case that any compile-breaking bugs in the upstream repository
immediately become compile breaks for all PostgreSQL developers, and
that any critical defects that force emergency releases by the
upstream project now force emergency releases of PostgreSQL as well.
If we merely link against the external project, then we avoid that.
So if we're going to use JQ at all, I think that's how we should do
it.  And again, I'm not trying to prevent you or anybody else from
pursuing that.  All I'm saying is that Christian can do what he wants
to do, too.

> Other projects, like, say, nginx or similar where there is a per-client
> or per-connection memory pool to limit memory footprint, might want
> libjq to get an allocator hook, so that's another enhancement to
> consider.  If that's something that PostgreSQL would need, please let me
> know.

The overall need is that it needs to be possible for PostgreSQL to
throw an ERROR, and thus longjmp, without leaking resources.
Sometimes those errors happen asynchronously due to query cancel or
replication conflicts or similar, and those don't have to be processed
at once but the delay can't be unbounded or more than some modest
fraction of a second.  Integrating with PostgreSQL's memory-context
stuff might make that or other resource leak problems easier, or it
might not.  To really get a clear notion of what would be involved, I
think you'd probably need to produce a prototype patch and submit it
here for review to really find out what issues people see with it.
Library integrations are tricky but, since you wrote JQ and seem
potentially willing to modify it to work with PostgreSQL better, this
one might be a lot less painful than some.  I can't promise we'd
accept an integration even if you came up with a patch, but I see
looking back over this thread that there are several people cautiously
supporting the idea of using JQ in some form, which is promising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Dec 05, 2016 at 11:28:31AM -0500, Robert Haas wrote:
> The overall need is that it needs to be possible for PostgreSQL to
> throw an ERROR, and thus longjmp, without leaking resources.

As long as one can interpose jump buffers, that should be possible.

> Sometimes those errors happen asynchronously due to query cancel or
> replication conflicts or similar, and those don't have to be processed
> at once but the delay can't be unbounded or more than some modest
> fraction of a second.  Integrating with PostgreSQL's memory-context
> stuff might make that or other resource leak problems easier, or it
> might not.  To really get a clear notion of what would be involved, I
> think you'd probably need to produce a prototype patch and submit it
> here for review to really find out what issues people see with it.

Understood.  Thanks.

> Library integrations are tricky but, since you wrote JQ and seem

Just to be clear, Stephen Dolan wrote jq.  I've added to, and maintained
jq, to be sure, but I would not want to take credit from Stephen.

> potentially willing to modify it to work with PostgreSQL better, this
> one might be a lot less painful than some.  I can't promise we'd
> accept an integration even if you came up with a patch, but I see
> looking back over this thread that there are several people cautiously
> supporting the idea of using JQ in some form, which is promising.

Fair enough.  I won't be able to work on an integration for a few more
months, so we'll see (and Stephen might well veto some such changes to
jq), and it is time for me to shut up about this for now.  Thanks for
the very useful comments, and sorry for the noise.

Nico
-- 



Re: Tackling JsonPath support

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote:
>> What we do in Heimdal, OpenAFS, and other open source projects, some
>> times, is include a copy / git submodule / similar of some such external
>> dependencies.  Naturally it's not possible to do this for all external
>> dependencies, but it works well enough.  The jv API part of jq is small
>> and simple, and could be ripped out into a library that could be
>> included in PostgreSQL.

> We are typically avoid copying things into our repository because then
> we become responsible for pulling in any subsequent fixes.  The few
> instances that we have right now (zic, snowball, Harry Spencer's
> regexp stuff) have imposed a significant code maintenance burden.  The
> git submodules approach might avoid that problem, but it would still
> be the case that any compile-breaking bugs in the upstream repository
> immediately become compile breaks for all PostgreSQL developers, and
> that any critical defects that force emergency releases by the
> upstream project now force emergency releases of PostgreSQL as well.
> If we merely link against the external project, then we avoid that.

Another point here is that packagers such as Red Hat strenuously dislike
such source-code-level wrapping of other projects, because that means that
they have to rebuild multiple packages to fix any bugs found in the
wrapped code.  If I were still packaging Postgres for Red Hat, and such
a distribution landed in my inbox, the very first thing I'd be looking
to do is rip out the borrowed code and replace it with a runtime
shared-library dependency on the upstream project's official library.

Having said that ... we have a *really bad* track record of deciding which
outside projects we want to depend on, or maybe we've just outlived a lot
of them.  Aside from Robert's examples, there's uuid-ossp and libxml2,
which are external code but have caused us headaches anyway.  So I think
there's a lot to be said for avoiding dependencies on libraries that may
or may not still be getting actively maintained ten years from now.
        regards, tom lane



Re: Tackling JsonPath support

From
Robert Haas
Date:
On Mon, Dec 5, 2016 at 11:42 AM, Nico Williams <nico@cryptonector.com> wrote:
>> Library integrations are tricky but, since you wrote JQ and seem
>
> Just to be clear, Stephen Dolan wrote jq.  I've added to, and maintained
> jq, to be sure, but I would not want to take credit from Stephen.

Ah, OK, sorry for misunderstanding.

>> potentially willing to modify it to work with PostgreSQL better, this
>> one might be a lot less painful than some.  I can't promise we'd
>> accept an integration even if you came up with a patch, but I see
>> looking back over this thread that there are several people cautiously
>> supporting the idea of using JQ in some form, which is promising.
>
> Fair enough.  I won't be able to work on an integration for a few more
> months, so we'll see (and Stephen might well veto some such changes to
> jq), and it is time for me to shut up about this for now.  Thanks for
> the very useful comments, and sorry for the noise.

Not noise at all.  Thanks for your interest.  Bandwidth is a little
limited around here so it's hard to give everyone the feedback that
they deserve, but it's good that you're interested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tackling JsonPath support

From
Nico Williams
Date:
On Mon, Dec 05, 2016 at 11:52:57AM -0500, Tom Lane wrote:
> Another point here is that packagers such as Red Hat strenuously dislike
> such source-code-level wrapping of other projects, because that means that
> they have to rebuild multiple packages to fix any bugs found in the
> wrapped code.  If I were still packaging Postgres for Red Hat, and such
> a distribution landed in my inbox, the very first thing I'd be looking
> to do is rip out the borrowed code and replace it with a runtime
> shared-library dependency on the upstream project's official library.

I'm quite aware :(  I used to work at Sun on Solaris.  We too hated
duplication.  OpenSSL was a particularly problematic case...

There is a real tension between the release trains of many distinct open
source projects and those of their consumers, and those of distros/OSes.

Some projects, such as SQLite3, explicitly recommend copying their
source or statically linking them into dependents; distros/vendors never
like this.

My best advice on this (PG might benefit from it), informed by years of
experience dealing with this, is that there's no perfect answer, but
that nonetheless library developers should always follow these best
practices so as to help those who end up having to deal with multiple
versions of those libraries:
- prefer dynamic linking (because dynamic linking semantics are  superior to static linking semantics)
- make libraries self-initialize and self-finalize!  (pthread_once()  and Win32's ExecuteOnce* are your friends, as are
atexit()/on_exit(), pthread_key_create(), and DllMain() on Windows)
 
  (otherwise calling your library from another library gets tricky)
- make it so that as long as you change SONAMEs you can have multiple  versions of the library loaded in one process,
specifically:
   - don't use POSIX file locking (but the new non-POSIX OFD locks are     OK) (or use them for files that wouldn't be
sharedacross multiple     versions in one process)
 
     (e.g., SQLite3 uses POSIX file locking safely, but it's not likely     that two consumers of different SQLite3
versionsin one process     would access the same DB files, so it kinda works)
 
   - be backwards- and forwards-compatible as to any config file     formats and other state that will be shared by
multipleversions
 
- generally: mind backwards compatibility, both source and binary, so  as to make it easy to upgrade
   - this means applying good API design best practices that I won't go     into here
- write thread-safe code, and preferably fork-safe code too

For example, I've seen OpenSSL built with different SONAMEs to support
multiple versions of OpenSSL coexisting in a single program/process.
That actually works.

> Having said that ... we have a *really bad* track record of deciding which
> outside projects we want to depend on, or maybe we've just outlived a lot
> of them.  Aside from Robert's examples, there's uuid-ossp and libxml2,
> which are external code but have caused us headaches anyway.  So I think
> there's a lot to be said for avoiding dependencies on libraries that may
> or may not still be getting actively maintained ten years from now.

I'm not at all surprised.

One codebase I help develop and maintain, Heimdal, includes SQLite3 and
libeditline, and parts of Heimdal should really be separate projects
(e.g., its ASN.1 compiler and library, and several supporting libraries
like libroken (a portability layer)) because they could be useful to
others outside Heimdal.  Finding the right balance is not trivial.

Nico
-- 



Re: Tackling JsonPath support

From
Jim Nasby
Date:
On 11/28/16 12:38 PM, Nico Williams wrote:
> The internal representation of JSON data is bound to be
> completely different, no doubt

Actually, that could be a good thing. The internal storage of JSONB is 
optimized for compress-ability, but that imposes a substantial overhead 
to calls that are searching for a particular key in a document. This 
gets *really* bad if you make nested expansion calls (ie: json->'a'->'b').

ExpandedObject support means we're not stuck with the same 
representation in-memory as on-disk though. While we could create our 
own internal representation, it seems a bit silly to reinvent that wheel 
if we don't need to. Bonus points if it would also throw an error if you 
fed it duplicated object keys.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)