Thread: Tackling JsonPath support
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: ... > I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But > there is one blocker - missing JsonPath support in our JSON implementation. > > So one idea - implement JsonPath support and related JSON query functions. > This can help with better standard conformance. Hi Pavel, Are you still looking for someone to add the JsonPath support to the JSON implementation? And if so, how urgently are people waiting for it? I'd be happy to start working on a patch, but since I'm new to PG development, I'm probably not the fastest person to get it done. Kind regards, Christian
Hi
2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...
> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.
Hi Pavel,
Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?
yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.
But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.
It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.
You can learn from current searching on JSON - postgresql/src/backend/utils/adt/json.c
And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.
Regards
Pavel
Kind regards,
Christian
Hi Pavel,
Can I check a few assumptions about what you're suggesting for this task?
* Our ultimate goal is to give Postgres an implementation of the functions "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the SQL standards.
* The best representation of those standards is found here: [1].
* When [1] mentions a "JSON path expression" or "JSON path language", it's referring to the query language described here: [2].
* Even if other popular DBMS's deviate from [1], or other popular JSONPath implementations deviate from [2], we remain committed to a faithful implementation of [1].
* It's okay for my first commit to implement just two things: (a) a PG-internal implementation of JsonPath, and (b) a user-visible implementation of "JSON_QUERY" based on (a). Later commits could add implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).
Thanks,
Christian
On Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...
> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.
Hi Pavel,
Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.You can learn from current searching on JSON - postgresql/src/backend/utils/adt/json.c And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.RegardsPavel
Kind regards,
Christian
Christian Convey <christian.convey@gmail.com> writes: > * Our ultimate goal is to give Postgres an implementation of the functions > "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the > SQL standards. > * The best representation of those standards is found here: [1]. > [1] > http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial-Opening-Plenary.pdf You're going to need to find a draft standard somewhere, as that presentation is too thin on details to support writing an actual implementation. In particular, it's far from clear that this is true at all: > * When [1] mentions a "JSON path expression" or "JSON path language", it's > referring to the query language described here: [2]. > [2] http://goessner.net/articles/JsonPath The one slide they have on the path language mentions a lax/strict syntax that I don't see either in the document you mention or in the Wikipedia XPath article it links to. This does not give me a warm feeling. The SQL committee is *fully* capable of inventing their own random path notation, especially when there's no ISO-blessed precedent to bind them. In general, the stuff I see in these WG3 slides strikes me as pretty horribly designed. The committee is evidently still stuck on the idea that every feature they invent should have a bunch of new bespoke syntax for function calls, which is a direction we really don't want to go in because of the parser overhead and need for more fully-reserved keywords. For instance:WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR) Really? Who thought that was a better idea than a simple bool parameter? I have no objection to providing some functions that implement XPath-like tests for JSON, but I'm not sure that you ought to try to tie it to whatever the SQL committee is going to do, especially when they've not published a finished standard yet. You may be chasing a moving target. As for whether JSONPath is the right spec to follow, I'm not sure. The article you mention is from 2007 and I don't see all that many other references in a Google search. I found this Wikipedia page: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats which mentions half a dozen competitors, including "JSON Pointer" which has at least gotten as far as being an RFC standard: https://tools.ietf.org/html/rfc6901 I'm not up enough on the JSON ecosystem to know which of these has the most traction, but I'm unconvinced that it's JSONPath. regards, tom lane
2016-11-13 18:13 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Christian Convey <christian.convey@gmail.com> writes:
> * Our ultimate goal is to give Postgres an implementation of the functions
> "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the
> SQL standards.
> * The best representation of those standards is found here: [1].
> [1]
> http://jtc1sc32.org/doc/N2501-2550/32N2528-WG3-Tutorial- Opening-Plenary.pdf
You're going to need to find a draft standard somewhere, as that
presentation is too thin on details to support writing an actual
implementation. In particular, it's far from clear that this is
true at all:
> * When [1] mentions a "JSON path expression" or "JSON path language", it's
> referring to the query language described here: [2].
> [2] http://goessner.net/articles/JsonPath
The one slide they have on the path language mentions a lax/strict syntax
that I don't see either in the document you mention or in the Wikipedia
XPath article it links to. This does not give me a warm feeling. The SQL
committee is *fully* capable of inventing their own random path notation,
especially when there's no ISO-blessed precedent to bind them.
In general, the stuff I see in these WG3 slides strikes me as pretty
horribly designed. The committee is evidently still stuck on the idea
that every feature they invent should have a bunch of new bespoke syntax
for function calls, which is a direction we really don't want to go in
because of the parser overhead and need for more fully-reserved keywords.
For instance:
WHERE JSON_EXISTS (T.J, 'strict $.where' FALSE ON ERROR)
Really? Who thought that was a better idea than a simple bool parameter?
I have no objection to providing some functions that implement XPath-like
tests for JSON, but I'm not sure that you ought to try to tie it to
whatever the SQL committee is going to do, especially when they've not
published a finished standard yet. You may be chasing a moving target.
As for whether JSONPath is the right spec to follow, I'm not sure.
The article you mention is from 2007 and I don't see all that many
other references in a Google search. I found this Wikipedia page:
https://en.wikipedia.org/wiki/Comparison_of_data_ serialization_formats
which mentions half a dozen competitors, including "JSON Pointer"
which has at least gotten as far as being an RFC standard:
https://tools.ietf.org/html/rfc6901
I'm not up enough on the JSON ecosystem to know which of these has the
most traction, but I'm unconvinced that it's JSONPath.
We can use some other databases with this implementation as references.
I have to agree, so the people in SQL committee are not too consistent - and sometimes creates too cobolish syntax, but it is standard - and it is implemented by major vendors.
We doesn't need to implement full API - not in first step - important point is don't close door to possible ANSI conformance. In first step we can take the best and important from standard. It can be similar to our SQL/XML implementation - we implement maybe 75% - and only XPath instead XQuery, but I don't feel any weak. I see very useful "JSON_TABLE" function, which is good for start.
Regards
Pavel
regards, tom lane
Hi
2016-11-13 15:14 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
Hi Pavel,Can I check a few assumptions about what you're suggesting for this task?* Our ultimate goal is to give Postgres an implementation of the functions "JSON_EXISTS", "JSON_VALUE", and "JSON_QUERY" which fully comply with the SQL standards.* The best representation of those standards is found here: [1].
* When [1] mentions a "JSON path expression" or "JSON path language", it's referring to the query language described here: [2].* Even if other popular DBMS's deviate from [1], or other popular JSONPath implementations deviate from [2], we remain committed to a faithful implementation of [1].* It's okay for my first commit to implement just two things: (a) a PG-internal implementation of JsonPath, and (b) a user-visible implementation of "JSON_QUERY" based on (a). Later commits could add implementations of "JSON_VALUE", "JSON_EXISTS", etc. in terms of (a).
My goal is implementation of JSON_TABLE function - this function can be used instead any other mentioned function (and it is really useful - it is usual task - transform JSON to table). The SQL/JSON is pretty new and bigger for implementation in one step. Nobody knows it from PostgreSQL world. The our SQL/XML needed more than 10 years and still is not fully complete - and we used power and features libxml2 (nothing similar we have for JSON). But almost what is daily need from SQL/XML we have. For JSON_TABLE we need only basic features of JSONPath - the predicates are not necessary in first step.
http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/json/src/tpc/db2z_bif_jsontable.html The vendors use name for this query language "SQL/JSON path expressions" - so important source is SQL/JSON (this can be different than origin JSONPath (your second source)).
http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/json/src/tpc/db2z_bif_jsontable.html The vendors use name for this query language "SQL/JSON path expressions" - so important source is SQL/JSON (this can be different than origin JSONPath (your second source)).
Regards
Pavel
Regards
Pavel
Thanks,ChristianOn Fri, Sep 16, 2016 at 2:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:Hi2016-09-15 18:05 GMT+02:00 Christian Convey <christian.convey@gmail.com>:On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...
> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.
Hi Pavel,
Are you still looking for someone to add the JsonPath support to the
JSON implementation? And if so, how urgently are people waiting for
it?yes - JsonPath support should be great. I hope so this or next commitfest the XMLTABLE patch will be committed, and with JsonPath I can start to work on JSON_TABLE function.But the JsonPath can be merged separately without dependency to JSON_TABLE. There are more JSON searching functions, and these functions should to support JsonPath be ANSI SQL compliant.
I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.It is not problem. Probably you should to do this work without deep knowledges about PostgreSQL internals. The work with data types (and functions for data types) is well isolated from PostgreSQL engine.You can learn from current searching on JSON - postgresql/src/backend/utils/adt/json.c And it is good start to be PostgreSQL's hacker - I started with implementation of own data type and related functions.RegardsPavel
Kind regards,
Christian
From looking at other databases' docs, it seems like the behavior of various JSON-related operators / functions are described partially in terms of a "json path expression":
* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]
* In MySQL: [2]
* In DB2: [3]
* In MS SQL Server: [4]
* (Whatever the Standards committee will end up producing.)
If I'm correctly understanding the situation, It sounds like we have two big unknowns:
(a) The exact syntax/semantics of JSON path searching, especially w.r.t. corner cases and error handling, and
(b) The syntax/semantics of whatever SQL operators / functions are currently defined in terms of (a). E.g., "JSON_TABLE".
If that's correct, then what do you guys think about us taking the following incremental approach?
Step 1: I'll dig into the implementations described above, to see what's similar and different between the JSON-path-expression syntax and semantics offered by each. I then report my findings here, and we can hopefully reach a consensus about the syntax/semantics of PG's json-path-expression handling.
Step 2: I submit a patch for adding a new function to "contrib", which implements the JSON-path-expression semantics chosen in Step 1. The function will be named such that people won't confuse it with any (eventual) SQL-standard equivalent.
Step 3: PG developers can, if they choose, start defining new JSON operator / functions, and/or port existing JSON-related functions, in terms of the function created in Step 2.
I see the following pros / cons to this approach:
Pro: It gives us a concrete start on this functionality, even though we're not sure what's happening with the SQL standard.
Pro: The risk of painting ourselves into a corner is relatively low, because we're putting the functionality in "contrib", and avoid function names which conflict with likely upcoming standards.
Pro: It might permit us to give PG users access to JSONPath -like functionality sooner than if we wait until we're clear on the ideal long-term interface.
Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4]. But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.
- Christian
2016-11-27 17:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
From looking at other databases' docs, it seems like the behavior of various JSON-related operators / functions are described partially in terms of a "json path expression":* In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]* In MySQL: [2]* In DB2: [3]* In MS SQL Server: [4]* (Whatever the Standards committee will end up producing.)If I'm correctly understanding the situation, It sounds like we have two big unknowns:(a) The exact syntax/semantics of JSON path searching, especially w.r.t. corner cases and error handling, and(b) The syntax/semantics of whatever SQL operators / functions are currently defined in terms of (a). E.g., "JSON_TABLE".If that's correct, then what do you guys think about us taking the following incremental approach?Step 1: I'll dig into the implementations described above, to see what's similar and different between the JSON-path-expression syntax and semantics offered by each. I then report my findings here, and we can hopefully reach a consensus about the syntax/semantics of PG's json-path-expression handling.Step 2: I submit a patch for adding a new function to "contrib", which implements the JSON-path-expression semantics chosen in Step 1. The function will be named such that people won't confuse it with any (eventual) SQL-standard equivalent.Step 3: PG developers can, if they choose, start defining new JSON operator / functions, and/or port existing JSON-related functions, in terms of the function created in Step 2.I see the following pros / cons to this approach:Pro: It gives us a concrete start on this functionality, even though we're not sure what's happening with the SQL standard.Pro: The risk of painting ourselves into a corner is relatively low, because we're putting the functionality in "contrib", and avoid function names which conflict with likely upcoming standards.Pro: It might permit us to give PG users access to JSONPath -like functionality sooner than if we wait until we're clear on the ideal long-term interface.
Incremental work is great idea - I like this this style. Instead contrib, you can use public repository on github. Minimally for first stage is better to live outside core - you are not restricted by PostgreSQL development process. When your code will be stabilized, then you can go to commitfest. I believe so we need good JSON support. The XML support helps to PostgreSQL lot of, JSON will be great too.
Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4]. But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.
We can implement subset only - our XPath based on libxml2 does it too. The good target is support of usual examples on the net.
Regards
Pavel
On Sun, Nov 27, 2016 at 11:50:30AM -0500, Christian Convey wrote: > >From looking at other databases' docs, it seems like the behavior of > various JSON-related operators / functions are described partially in terms > of a "json path expression": > > * In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1] > * In MySQL: [2] > * In DB2: [3] > * In MS SQL Server: [4] > * (Whatever the Standards committee will end up producing.) There's another option we should also consider: jq <https://stedolan.github.io/jq/>. It's available under a PostgreSQL-compatible license, and has had a LOT of work put into correctness and performance. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
<p dir="ltr"><p dir="ltr">Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <<a href="mailto:david@fetter.org">david@fetter.org</a>>:<br/> ><br /> > On Sun, Nov 27, 2016 at 11:50:30AM -0500, ChristianConvey wrote:<br /> > > >From looking at other databases' docs, it seems like the behavior of<br /> >> various JSON-related operators / functions are described partially in terms<br /> > > of a "json path expression":<br/> > ><br /> > > * In Oracle, "JSON_TABLE", "JSON_exists_column", "JSON_value_column": [1]<br/> > > * In MySQL: [2]<br /> > > * In DB2: [3]<br /> > > * In MS SQL Server: [4]<br /> > >* (Whatever the Standards committee will end up producing.)<br /> ><br /> > There's another option we should alsoconsider: jq<br /> > <<a href="https://stedolan.github.io/jq/">https://stedolan.github.io/jq/</a>>. It's availableunder a<br /> > PostgreSQL-compatible license, and has had a LOT of work put into<br /> > correctness andperformance.<p dir="ltr">we can use it for inspiration. but the syntax of this tool is little bit too complex and toooriginal against Json path ... jsonpath is relative simple implementation of xpath to json<p dir="ltr">we have one proprietarysyntax already, two is maybe too much :-)<br /><p dir="ltr">><br /> > Best,<br /> > David.<br /> >--<br /> > David Fetter <david(at)fetter(dot)org> <a href="http://fetter.org/">http://fetter.org/</a><br />> Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter<br /> > Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com<br/> ><br /> > Remember to vote!<br /> > Consider donating to Postgres: <a href="http://www.postgresql.org/about/donate">http://www.postgresql.org/about/donate</a><br/>
On Mon, Nov 28, 2016 at 5:20 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
...
Con: "JSON path expression" is a recurring them in the *grammars* of user-facing operators in [1], [2], [3], and [4]. But it doesn't necessarily follow that the function implemented in Step 2 will provide useful infrastructure for PG's eventual implementations of "JSON_TABLE", etc.We can implement subset only - our XPath based on libxml2 does it too. The good target is support of usual examples on the net.
Hi Pavel,
Can you clarify what you meant? I *think* you're saying:
* It's not important for me to match the syntax/semantics of the json-path implementations found in MySQL / Oracle / DB2 / MS SQL Server, and
* Instead, I should just use examples / explanations on the web as my guidance.
Thanks,
Christian
...
Incremental work is great idea - I like this this style. Instead contrib, you can use public repository on github. Minimally for first stage is better to live outside core - you are not restricted by PostgreSQL development process. When your code will be stabilized, then you can go to commitfest. I believe so we need good JSON support. The XML support helps to PostgreSQL lot of, JSON will be great too.
Hi Pavel,
Thanks for the suggestion.
I am planning to use one of my own public github repos as the location for my work.
I thought by adding my first implementation to "contrib", we could make this functionality available to end-users, even before there was a consensus about what PG's "official" JSON-related operators should have for syntax and semantics.
Does my reasoning make sense?
- C
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 28, 2016 at 9:40 AM, Pavel Stehule <span dir="ltr"><<ahref="mailto:pavel.stehule@gmail.com" target="_blank">pavel.stehule@gmail.com</a>></span> wrote:</div><divclass="gmail_quote"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">...</div><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><p dir="ltr"><span class="">> Hi Pavel,<br /> ><br /> > Can youclarify what you meant? I *think* you're saying:<br /> ><br /> > * It's not important for me to match the syntax/semanticsof the json-path implementations found in MySQL / Oracle / DB2 / MS SQL Server, and<br /> ></span><pdir="ltr">oh no. the syntax is important. But for start we can have a subset. For json table function .. jsonto relation mapping is important path expression. some other features like predicates<br /> are nice, but can be implementedlater.<p dir="ltr">Im sorry. My English is bad.</blockquote><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">HiPavel,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">You'reEnglish is very good, actually. I think the confusion arises from mespeaking in vague terms. I apologize for that. Allow me to be more specific about what I'm proposing to do.</div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">I propose adding to "contrib" a function with the followingcharacteristics:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br /></div><divclass="gmail_default" style="font-family:arial,helvetica,sans-serif">* Its signature is "json_path( jsonb from_json,string json_path_expression) --> jsonb".</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">*The function will hopefully be a useful building block for PG's implementationof "official" JSON operators such as "JSON_TABLE". Once the PG community agrees on what those operators' syntax/semanticsshould be.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"></div><div class="gmail_default"style="font-family:arial,helvetica,sans-serif">* The function will hopefully be immediately useful toPG users who want JSONPath -like operations on their "jsonb" objects.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br/></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">-C</div></div></div></div>
On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote
> I thought by adding my first implementation to "contrib", we could make this functionality available to end-users, even before there was a consensus about what PG's "official" JSON-related operators should have for syntax and semantics.
>this time the commiters dislike the contrib dir. It is hard to push there anything :-(. You can try it, but it can be lost time.
Thanks for the warning. I'm okay with my patch adding the "json_path" function to the core PG code.
I would still suggest that we hold off on having my first patch implement an official JSON-related operator such as "JSON_TABLE". I would prefer to have my "json_path" function available to users even before we know how "JSON_TABLE", etc. should behave.
Does that sound reasonable?
I wonder what it might take to integrate jq[1] (via libjq) with PostgreSQL... The internal representation of JSON data is bound to be completely different, no doubt, but jq is a fantastic language, akin to XPath and XSLT combined, but with nice syntax. [1] https://stedolan.github.io/jq (note: jq is NOT JQuery) Nico --
On Mon, Nov 28, 2016 at 05:56:41PM +0100, Pavel Stehule wrote: > Dne 28. 11. 2016 17:26 napsal uživatel "David Fetter" <david@fetter.org>: > > There's another option we should also consider: jq > > <https://stedolan.github.io/jq/>. It's available under a > > PostgreSQL-compatible license, and has had a LOT of work put into > > correctness and performance. > > we can use it for inspiration. but the syntax of this tool is little bit > too complex and too original against Json path ... jsonpath is relative > simple implementation of xpath to json > > we have one proprietary syntax already, two is maybe too much :-) jq is hardly proprietary :) JSON Path is not expressive enough (last I looked) and can be mapped onto jq if need be anyways. libjq has a number of desirable features, mostly its immutable/COW data structures. In libjq data structures are only mutated when there's only one reference to them, but libjq's jv API is built around immutability, so jv values are always notionally immutable. For example, one writes: jv a = jv_array(); a = jv_array_append(a, jv_true()); // `a' is notionally new, but since // it had onlyone reference, its // memory is reused and similarly for objects. One could instead write: jv a = jv_array_append(jv_array(), jv_true()); or jv a = JV_ARRAY(jv_true()); One of the nice things about libjv is that almost every function consumes a reference of every jv value passed in, with very few exceptions. This simplifies memory management, or at least avoidance of double-free and use-after-free (it can be harder to track down leaks though, since tools like valgrind don't understand that jv_copy() call sites can be like allocations). Nico --
On 28/11/16 18:57, Christian Convey wrote: > > On Mon, Nov 28, 2016 at 9:47 AM, Pavel Stehule <pavel.stehule@gmail.com > <mailto:pavel.stehule@gmail.com>> wrote > > > I thought by adding my first implementation to "contrib", we could make this functionality available to end-users,even before there was a consensus about what PG's "official" JSON-related operators should have for syntax andsemantics. > > > > this time the commiters dislike the contrib dir. It is hard to push > there anything :-(. You can try it, but it can be lost time. > > > Thanks for the warning. I'm okay with my patch adding the "json_path" > function to the core PG code. > > I would still suggest that we hold off on having my first patch > implement an official JSON-related operator such as "JSON_TABLE". I > would prefer to have my "json_path" function available to users even > before we know how "JSON_TABLE", etc. should behave. > > Does that sound reasonable? > Hi, just make it extension, not contrib module, there is not much difference between those except contrib is included in distribution. Extensions that provide just functions are easy to integrate into core (that's how some of the existing json functions were added in the past as well). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
...
JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.
Hi Nico,
Could you please clarify what you mean by "not expressive enough"?
I ask because I've been struggling to identify clear requirements for the json-path functionality I'm trying to provide. It sounds like perhaps you have something concrete in mind.
Since I myself have no need currently for this functionality, I'm left guessing about hypothetical users of it. My current mental model is:
(a) Backend web developers. AFAICT, their community has mostly settled on the syntax/semantics proposed by Stefan Groessner. It would probably be unkind for PG's implementation to deviate from that without a good reason.
(b) PG hackers who will eventually implement the ISO SQL standard operators. In the standards-committee meeting notes I've seen, it seemed to me that they were planning to define some operators in terms of json-path expression. So it would probably be good if whatever json-path function I implement turns out to comply with that standard, so that the PG-hackers can use it as a building block for their work.
(c) Pavel. (I'm still somewhat unclear on what has him interested in this, and what his specific constraints are.)
- Christian
On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote: > On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com> > wrote: > ... > > JSON Path is not expressive enough (last I looked) and can be mapped > > onto jq if need be anyways. > > Hi Nico, > > Could you please clarify what you mean by "not expressive enough"? jq is a functional language that has these and other features: - recursion- generators- lazy evaluation (of sorts)- path expressions- math functionality (libm, basically)- reduction- functions-and other things (jq does not have higher-order functions in that functions cannot return functions and functions are not values, though it does have closures.) jq is and feels a lot like a SQL, but for JSON. > I ask because I've been struggling to identify clear requirements for the > json-path functionality I'm trying to provide. It sounds like perhaps you > have something concrete in mind. SQL imposes structure on data. Recursion makes SQL structure looser in the sense that it may not be easy or possible to express certain desirable schema constraints in SQL terms without resorting to triggers, say. Storing documents in XML, JSON, or other such recursion-friendly formats (perhaps in semantically equivalent but query-optimized forms) is also a way to avoid strict structure (thus one needs schema validators for XML, for example). Less rigid schema constraints do not and should not preclude powerful query languages. One could convert such documents to a SQL EAV schema, if one has an RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and then use SQL to query them. But that may be more difficult to use than a SQL with support for XML/JSON/... and query sub-languages for those. SQL is very powerful. One might like to have similarly powerful, format-specific query languages for documents stored in XML, JSON, etcetera, in a SQL RDBMS. jq is such a language, for JSON documents. Ditto XPath/XSLT, for XML. While XPath is expressive and compact, XSLT is rather verbose; jq is as expressive as XSLT, but with the compact verbosity of XPath. > Since I myself have no need currently for this functionality, I'm left > guessing about hypothetical users of it. My current mental model is: That's a bit like asking what is the use for SQL :^) The point is that SQL is a powerful query language, and so is jq. Each is appropriate to its own domain; both could be used together. > (a) Backend web developers. AFAICT, their community has mostly settled on > the syntax/semantics proposed by Stefan Groessner. It would probably be > unkind for PG's implementation to deviate from that without a good reason. I can't speak for the community. I wouldn't take it personally that jq be not chosen, nor any other proposal of mine. If it's politically easier, then do that. > (b) PG hackers who will eventually implement the ISO SQL standard > operators. In the standards-committee meeting notes I've seen, it seemed > to me that they were planning to define some operators in terms of > json-path expression. So it would probably be good if whatever json-path > function I implement turns out to comply with that standard, so that the > PG-hackers can use it as a building block for their work. These could still be implemented (e.g., using jq itself). > (c) Pavel. (I'm still somewhat unclear on what has him interested in this, > and what his specific constraints are.) Hmm? Nico --
On Mon, Nov 28, 2016 at 05:50:40PM -0800, Christian Convey wrote:
> On Mon, Nov 28, 2016 at 11:23 AM, Nico Williams <nico@cryptonector.com>
> wrote:
> ...
> > JSON Path is not expressive enough (last I looked) and can be mapped
> > onto jq if need be anyways.
>
> Hi Nico,
>
> Could you please clarify what you mean by "not expressive enough"?
jq is a functional language that has these and other features:
- recursion
- generators
- lazy evaluation (of sorts)
- path expressions
- math functionality (libm, basically)
- reduction
- functions
- and other things
(jq does not have higher-order functions in that functions cannot return
functions and functions are not values, though it does have closures.)
jq is and feels a lot like a SQL, but for JSON.
> I ask because I've been struggling to identify clear requirements for the
> json-path functionality I'm trying to provide. It sounds like perhaps you
> have something concrete in mind.
SQL imposes structure on data. Recursion makes SQL structure looser in
the sense that it may not be easy or possible to express certain
desirable schema constraints in SQL terms without resorting to triggers,
say. Storing documents in XML, JSON, or other such recursion-friendly
formats (perhaps in semantically equivalent but query-optimized forms)
is also a way to avoid strict structure (thus one needs schema
validators for XML, for example).
Less rigid schema constraints do not and should not preclude powerful
query languages.
One could convert such documents to a SQL EAV schema, if one has an
RDBMS with an ANY type (e.g., something like SQLite3's duck typing), and
then use SQL to query them. But that may be more difficult to use than
a SQL with support for XML/JSON/... and query sub-languages for those.
SQL is very powerful. One might like to have similarly powerful,
format-specific query languages for documents stored in XML, JSON,
etcetera, in a SQL RDBMS. jq is such a language, for JSON documents.
Ditto XPath/XSLT, for XML. While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.
> Since I myself have no need currently for this functionality, I'm left
> guessing about hypothetical users of it. My current mental model is:
That's a bit like asking what is the use for SQL :^) The point is that
SQL is a powerful query language, and so is jq. Each is appropriate to
its own domain; both could be used together.
Thanks for the explanation. It sounds like your original point was NOT that json-path isn't sufficient for "${specific use X}".
Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.
Is that what you're saying?
...
> (c) Pavel. (I'm still somewhat unclear on what has him interested in this,
> and what his specific constraints are.)
Hmm?
Context: The reason I'm trying to work on a json-path implementation is that Pavel Stehule suggested it as a good first PG-hacking project for me. At the time, it sounded like he had a use for the feature.
- C
On Mon, Nov 28, 2016 at 06:38:55PM -0800, Christian Convey wrote: > On Mon, Nov 28, 2016 at 6:26 PM, Nico Williams <nico@cryptonector.com> > wrote: > > > > Thanks for the explanation. It sounds like your original point was NOT > that json-path isn't sufficient for "${specific use X}". The only uses of SQL w/ JSON I've seen so far in live action are to implement EAV schemas on PostgreSQL. Since PostgreSQL lacks an ANY type... using the hstore or jsonb to store data that would otherwise require an ANY type is the obvious thing to do. Naturally this use doesn't need deeply nested JSON data structures, so even JSONPath is overkill for it! However, there are use cases I can imagine: - generating complex JSON from complex (e.g., recursive) SQL data where the desired JSON "schema" is not close to the SQLschema I've used jq a *lot* to convert schemas. I've also use XSLT for the same purpose. I've also used SQL RDBMSes and jqtogether a fair bit, either having jq consume JSON documents to output INSERT and other statements, or having a SQL applicationoutput JSON that I then convert to an appropriate schema using jq. Naturally I can keep using these two tools separately. There's not much to gain from integrating them for this particularsort of use-case. - handling JSON documents with very loose schemata, perhaps arbitrary JSON documents, embedded in a SQL DB I've not needed to do this much, so I have no specific examples. But, of course, one reason I've not needed to do thisis that today it kinda can't be done with enough expressivity. There are many use-cases for general-purpose programming languages, and even for very widely-applicable domain-specific programming language. It's especially difficult to name a specific use-case for a language that doesn't exist -- in this case that would be SQL + (jq and/or JSONPath). > Instead, your point was that jq seems to have many advantages over > json-path in general, and therefore PG should offer jq instead or, or in > addition to, json-path. > > Is that what you're saying? Roughly, yes. The distinct advantage is that jq is much more general and expressive, not unlike SQL itself. > > Hmm? > > Context: The reason I'm trying to work on a json-path implementation is > that Pavel Stehule suggested it as a good first PG-hacking project for me. > At the time, it sounded like he had a use for the feature. I see. I understand that. If you've already made a significant investment, then I don't blame you for not wanting to risk it. On the other hand, if melding jsonb and jq happens to be easy, then you'll get much more bang from it for your investment. Naturally, you do what you prefer, and if the reality on the ground is JSONPath, then so be it. If I had time and felt sufficiently strongly, I'd contribute jq integration; as it is I don't, and beggars can't be choosers. Nico --
While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.
Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.
IMO jq is considerably closer to XSLT than XPath - which leads me to figure that since xml has both that JSON can benefit from jq and json-path. I'm not inclined to dig too deep here but I'd rather take jq in the form of "pl/jq" and have json-path (abstractly) as something that you can use like "pg_catalog.get_value(json, json-path)"
David J.
On Mon, Nov 28, 2016 at 08:00:46PM -0700, David G. Johnston wrote: > IMO jq is considerably closer to XSLT than XPath - which leads me to figure > that since xml has both that JSON can benefit from jq and json-path. I'm > not inclined to dig too deep here but I'd rather take jq in the form of > "pl/jq" and have json-path (abstractly) as something that you can use like > "pg_catalog.get_value(json, json-path)" JSONPath looks a lot like a small subset of jq. Here are some examples: JSONPath | jq ------------------------------------------------------------------- $.store.book[0].title | .store.book[0].title $['store']['book'][0]['title'] | .["store"]["book"][0]["title"] $..author | ..|.author $.store.* | .store[] $.store..price | .store|..|.price? $..book[2] | [..|.book?][2] $..book[?(@.isbn)] | ..|.book?|select(.isbn) $..book[?(@.price<10)] | ..|.book?|select(.price<10) $..* | ..? Of course, jq can do much more than this. E.g., # Output [<title>, <price>] of all books with an ISBN: ..|.book?|select(.isbn)|[.title,.price] # Output the average price of books with ISBNs appearing anywhere in # the input document: reduce (..|.book?|select(.isbn)|.price)as $price ( # Initial reduction state: {price:0,num:0}; # State update .price = (.price * .num + $price) / (.num + 1) | .num += 1) | # Extract average price .price Of course one could just wrap that with a function: def avg(pathexp; cond; v): reduce (pathexp | select(cond) | v) as $v ({v: 0, c: 0}; .v = (.v * .c + $v)/ (.c + 1) | .c += 1) | v; # Average price of books with ISBNs: avg(..|.book?; .isbn; .price) # Average price of all books: avg(..|.book?; true; .price) There's much, much more. Note that jq comes with a C implementation. It should be easy to make bindings to it from other programming language run-times. Nico --
2016-11-29 2:50 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
...JSON Path is not expressive enough (last I looked) and can be mapped
onto jq if need be anyways.Hi Nico,Could you please clarify what you mean by "not expressive enough"?I ask because I've been struggling to identify clear requirements for the json-path functionality I'm trying to provide. It sounds like perhaps you have something concrete in mind.Since I myself have no need currently for this functionality, I'm left guessing about hypothetical users of it. My current mental model is:(a) Backend web developers. AFAICT, their community has mostly settled on the syntax/semantics proposed by Stefan Groessner. It would probably be unkind for PG's implementation to deviate from that without a good reason.(b) PG hackers who will eventually implement the ISO SQL standard operators. In the standards-committee meeting notes I've seen, it seemed to me that they were planning to define some operators in terms of json-path expression. So it would probably be good if whatever json-path function I implement turns out to comply with that standard, so that the PG-hackers can use it as a building block for their work.(c) Pavel. (I'm still somewhat unclear on what has him interested in this, and what his specific constraints are.)
My target is simple - 1. to have good ANSI/SQL support, 2. to have good JSON to relation mapping function - ANSI/SQL JSONTABLE does it.
We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.
Regards
Pavel
- Christian
2016-11-29 4:00 GMT+01:00 David G. Johnston <david.g.johnston@gmail.com>:
While XPath is expressive and compact, XSLT
is rather verbose; jq is as expressive as XSLT, but with the compact
verbosity of XPath.Instead, your point was that jq seems to have many advantages over json-path in general, and therefore PG should offer jq instead or, or in addition to, json-path.IMO jq is considerably closer to XSLT than XPath - which leads me to figure that since xml has both that JSON can benefit from jq and json-path. I'm not inclined to dig too deep here but I'd rather take jq in the form of "pl/jq" and have json-path (abstractly) as something that you can use like "pg_catalog.get_value(json, json-path)"
I am not against to this idea. The jq and similar environments can have sense in JSON NoSQL databases. Using it in relation database in searching functions is a overkill.
Regards
Pavel
David J.
We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.
I'm not sure I understand.
Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?
- C
2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:
We now support XPath function - JSONPath is similar to XPath - it is better for user, because have to learn only one language.I'm not sure I understand.Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?
surely not.
follow ANSI/SQL :)
Pavel
- C
2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com>:We now support XPath function - JSONPath is similar to XPath -it is better for user, because have to learn only one language.I'm not sure I understand.Are you suggesting that we use XPath, not JSONPath, as our language for json-path expressions?surely not.follow ANSI/SQL :)
I see. Then I'm afraid I still don't understand what you're main point was when you wrote:
We now support XPath function - JSONPath is similar to XPath -it is better for user, because have to learn only one language.
- C
On 29/11/16 07:37, Pavel Stehule wrote: > > > 2016-11-29 7:34 GMT+01:00 Christian Convey <christian.convey@gmail.com > <mailto:christian.convey@gmail.com>>: > > On Mon, Nov 28, 2016 at 9:28 PM, Pavel Stehule > <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>wrote: > > We now support XPath function - JSONPath is similar to XPath - > it is better for user, because have to learn only one language. > > > I'm not sure I understand. > > Are you suggesting that we use XPath, not JSONPath, as our language > for json-path expressions? > > > surely not. > > follow ANSI/SQL :) > Just to add to this, the SQL/JSON proposals I've seen so far, and what Oracle, MSSQL and Teradata chose to implement already is basically subset of jsonpath (some proposals/implementations also include lax/strict prefix keyword on top of that). I think that should give us some hint on what the base functionality should look like. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote: > Just to add to this, the SQL/JSON proposals I've seen so far, and what > Oracle, MSSQL and Teradata chose to implement already is basically > subset of jsonpath (some proposals/implementations also include > lax/strict prefix keyword on top of that). I think that should give us > some hint on what the base functionality should look like. Yes, that'd be base functionality. You can go above and beyond. I agree with Pavel that jq could be used as a user-defined function, but proper integration would be better because it would avoid the need to format and parse JSON around calls to jq, and also because PG could compile jq programs when preparing SQL statements. Besides, the libjq jv API is *very* nice. Nico --
On 29/11/16 17:28, Nico Williams wrote: > On Tue, Nov 29, 2016 at 05:18:17PM +0100, Petr Jelinek wrote: >> Just to add to this, the SQL/JSON proposals I've seen so far, and what >> Oracle, MSSQL and Teradata chose to implement already is basically >> subset of jsonpath (some proposals/implementations also include >> lax/strict prefix keyword on top of that). I think that should give us >> some hint on what the base functionality should look like. > > Yes, that'd be base functionality. You can go above and beyond. > But let's just do the base thing first before going to much more complicated endeavor, especially if this is supposed to be the first patch for Christian. Also, one of the points of the SQL is the compatibility so that's what we should strive for first, especially given that the syntax of the jq is not compatible AFAICS. > I agree with Pavel that jq could be used as a user-defined function, but > proper integration would be better because it would avoid the need to > format and parse JSON around calls to jq, and also because PG could > compile jq programs when preparing SQL statements. Besides, the libjq > jv API is *very* nice. > I think this would be good as extension first and then we can see what to do with it next (ie I agree with Pavel). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
...
Just to add to this, the SQL/JSON proposals I've seen so far, and what
Oracle, MSSQL and Teradata chose to implement already is basically
subset of jsonpath (some proposals/implementations also include
lax/strict prefix keyword on top of that). I think that should give us
some hint on what the base functionality should look like.
I agree. My guess is that PG users would benefit most from:
(1) Conformance to whatever ISO standard regarding JSON operators eventually makes it out of the working group.
(2) Compatibility with other widely-used DBMS's.
(3) Compatibility with the JSONPath functionality used by web developers. (Although I don't currently have a grasp on which frameworks / libraries this entails.)
I *think* that (1), (2), and (3) are in approximate agreement about the syntax and semantics of the path-expression language: the language proposed by Stefan Groessner, plus the strict vs. lax distinction.
I think I can satisfy (3) with a PG extension which provides a function that approximately implements JSONPath. My short-term plans are to submit such a patch.
Hopefully that patch's function will be a helpful starting point for satisfying (1) and (2) as well. But that can be decided later.
Nico Williams has argued for using "jq". I don't think jq satisfies any of (1), (2), or (3), so I don't see a good case for incorporating it in my short-term plans. There *may* be a case for using jq internally to my implementation; I'll try to look into that.
On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey <christian.convey@gmail.com> wrote: > I think I can satisfy (3) with a PG extension which provides a function that > approximately implements JSONPath. My short-term plans are to submit such a > patch. FWIW, I think that's a fine plan. I don't really know whether JSONPath is the right standard to pick for the task of extracting bits of JSON from other bits of JSON, but I think there's some value in picking something is simple enough that we can implement it in our own code and not have to rely on a third-party library. Of course, if somebody feels like adding a configure option for --with-jq and appropriate interfaces to integrate with JQ, we could consider that, too, but that imposes a packaging requirement that a home-grown implementation doesn't. I'd want to hear more than one vote for such a course of action before embracing it. If JQ is a Turing-complete query language, integrating it might be quite difficult -- for example, we'd need a way to make sure that it does periodic CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or crash if those calls decide longjmp() away due to an ERROR -- and would we let people query database tables with it? Would that be efficient? I think it's fine to have more limited objectives than what a JQ implementation would apparently entail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote: > On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey > <christian.convey@gmail.com> wrote: > > I think I can satisfy (3) with a PG extension which provides a function that > > approximately implements JSONPath. My short-term plans are to submit such a > > patch. > > FWIW, I think that's a fine plan. I don't really know whether > JSONPath is the right standard to pick for the task of extracting bits It's not even a standard. Are there particular proposals that the ANSI SQL working group is considering? > of JSON from other bits of JSON, but I think there's some value in > picking something is simple enough that we can implement it in our own > code and not have to rely on a third-party library. Of course, if > somebody feels like adding a configure option for --with-jq and Sure. My main concern is that I don't want to have to parse/format JSON around every such call. I'd rather parsed JSON remain in an internal form for as long as possible. Speaking of which, you could use libjq's jv API and not support the jq language itself. > appropriate interfaces to integrate with JQ, we could consider that, > too, but that imposes a packaging requirement that a home-grown > implementation doesn't. I'd want to hear more than one vote for such What we do in Heimdal, OpenAFS, and other open source projects, some times, is include a copy / git submodule / similar of some such external dependencies. Naturally it's not possible to do this for all external dependencies, but it works well enough. The jv API part of jq is small and simple, and could be ripped out into a library that could be included in PostgreSQL. > a course of action before embracing it. If JQ is a Turing-complete > query language, integrating it might be quite difficult -- for Even if it weren't! (It is.) Consider this expression using a builtin in jq: [range(4503599627370496)] That is, an array of integers from 0 to 4503599627370495, inclusive. That will "halt" given a very, very large computer and a lot of time. (Because jq is Turning-complete, range() can be coded in jq itself, and some variants of range() are.) > example, we'd need a way to make sure that it does periodic > CHECK_FOR_INTERRUPTS() calls, and that it doesn't leak resources or > crash if those calls decide longjmp() away due to an ERROR -- and > would we let people query database tables with it? Would that be > efficient? I think it's fine to have more limited objectives than > what a JQ implementation would apparently entail. Agreed. I think this means that we need either or both of a variant of the C jq_next() function that takes either a timeout parameter, or a jq_intr() function that can cause a running jq_next() to stop. (Tolerating longjmp() is harder to do and I'd rather not.) Other projects, like, say, nginx or similar where there is a per-client or per-connection memory pool to limit memory footprint, might want libjq to get an allocator hook, so that's another enhancement to consider. If that's something that PostgreSQL would need, please let me know. Nico --
...
On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey
> <christian.convey@gmail.com> wrote:
> > I think I can satisfy (3) with a PG extension which provides a function that
> > approximately implements JSONPath. My short-term plans are to submit such a
> > patch.
>
> FWIW, I think that's a fine plan. I don't really know whether
> JSONPath is the right standard to pick for the task of extracting bits
It's not even a standard. Are there particular proposals that the ANSI
SQL working group is considering?
Hi Nico, it seems to be something in the works with the standards committee. We were discussing it earlier in the thread: [1]
Kind regards,
Christian
On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote: > On Fri, Dec 02, 2016 at 08:53:33AM -0500, Robert Haas wrote: >> On Tue, Nov 29, 2016 at 11:50 AM, Christian Convey >> <christian.convey@gmail.com> wrote: >> > I think I can satisfy (3) with a PG extension which provides a function that >> > approximately implements JSONPath. My short-term plans are to submit such a >> > patch. >> >> FWIW, I think that's a fine plan. I don't really know whether >> JSONPath is the right standard to pick for the task of extracting bits > > It's not even a standard. Are there particular proposals that the ANSI > SQL working group is considering? I don't know. >> of JSON from other bits of JSON, but I think there's some value in >> picking something is simple enough that we can implement it in our own >> code and not have to rely on a third-party library. Of course, if >> somebody feels like adding a configure option for --with-jq and > > Sure. My main concern is that I don't want to have to parse/format JSON > around every such call. I'd rather parsed JSON remain in an internal > form for as long as possible. Sure, but that seems like an orthogonal consideration. > Speaking of which, you could use libjq's jv API and not support the jq > language itself. You can submit a patch for that if you like, but I don't think that's a good reason to block what Christian wants to do, because it's a much bigger change. Now if you whip that patch up in a short period of time and everybody agrees that it gives us everything Christian wanted to implement and other good stuff too, fine; we can reject Christian's approach then. Or if Christian's patch is committed, we can rip it back out again if and when somebody does this (or any other thing we all agree is better). But I object strenuously to the idea that we should reject the idea of drinking the bottle of beer we have in the house because there's a liquor store down the road where we can buy an entire keg. The possibility of awesome (with enough work) is not a reason to reject good (via a straightforward approach). >> appropriate interfaces to integrate with JQ, we could consider that, >> too, but that imposes a packaging requirement that a home-grown >> implementation doesn't. I'd want to hear more than one vote for such > > What we do in Heimdal, OpenAFS, and other open source projects, some > times, is include a copy / git submodule / similar of some such external > dependencies. Naturally it's not possible to do this for all external > dependencies, but it works well enough. The jv API part of jq is small > and simple, and could be ripped out into a library that could be > included in PostgreSQL. We are typically avoid copying things into our repository because then we become responsible for pulling in any subsequent fixes. The few instances that we have right now (zic, snowball, Harry Spencer's regexp stuff) have imposed a significant code maintenance burden. The git submodules approach might avoid that problem, but it would still be the case that any compile-breaking bugs in the upstream repository immediately become compile breaks for all PostgreSQL developers, and that any critical defects that force emergency releases by the upstream project now force emergency releases of PostgreSQL as well. If we merely link against the external project, then we avoid that. So if we're going to use JQ at all, I think that's how we should do it. And again, I'm not trying to prevent you or anybody else from pursuing that. All I'm saying is that Christian can do what he wants to do, too. > Other projects, like, say, nginx or similar where there is a per-client > or per-connection memory pool to limit memory footprint, might want > libjq to get an allocator hook, so that's another enhancement to > consider. If that's something that PostgreSQL would need, please let me > know. The overall need is that it needs to be possible for PostgreSQL to throw an ERROR, and thus longjmp, without leaking resources. Sometimes those errors happen asynchronously due to query cancel or replication conflicts or similar, and those don't have to be processed at once but the delay can't be unbounded or more than some modest fraction of a second. Integrating with PostgreSQL's memory-context stuff might make that or other resource leak problems easier, or it might not. To really get a clear notion of what would be involved, I think you'd probably need to produce a prototype patch and submit it here for review to really find out what issues people see with it. Library integrations are tricky but, since you wrote JQ and seem potentially willing to modify it to work with PostgreSQL better, this one might be a lot less painful than some. I can't promise we'd accept an integration even if you came up with a patch, but I see looking back over this thread that there are several people cautiously supporting the idea of using JQ in some form, which is promising. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 05, 2016 at 11:28:31AM -0500, Robert Haas wrote: > The overall need is that it needs to be possible for PostgreSQL to > throw an ERROR, and thus longjmp, without leaking resources. As long as one can interpose jump buffers, that should be possible. > Sometimes those errors happen asynchronously due to query cancel or > replication conflicts or similar, and those don't have to be processed > at once but the delay can't be unbounded or more than some modest > fraction of a second. Integrating with PostgreSQL's memory-context > stuff might make that or other resource leak problems easier, or it > might not. To really get a clear notion of what would be involved, I > think you'd probably need to produce a prototype patch and submit it > here for review to really find out what issues people see with it. Understood. Thanks. > Library integrations are tricky but, since you wrote JQ and seem Just to be clear, Stephen Dolan wrote jq. I've added to, and maintained jq, to be sure, but I would not want to take credit from Stephen. > potentially willing to modify it to work with PostgreSQL better, this > one might be a lot less painful than some. I can't promise we'd > accept an integration even if you came up with a patch, but I see > looking back over this thread that there are several people cautiously > supporting the idea of using JQ in some form, which is promising. Fair enough. I won't be able to work on an integration for a few more months, so we'll see (and Stephen might well veto some such changes to jq), and it is time for me to shut up about this for now. Thanks for the very useful comments, and sorry for the noise. Nico --
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Dec 2, 2016 at 4:32 PM, Nico Williams <nico@cryptonector.com> wrote: >> What we do in Heimdal, OpenAFS, and other open source projects, some >> times, is include a copy / git submodule / similar of some such external >> dependencies. Naturally it's not possible to do this for all external >> dependencies, but it works well enough. The jv API part of jq is small >> and simple, and could be ripped out into a library that could be >> included in PostgreSQL. > We are typically avoid copying things into our repository because then > we become responsible for pulling in any subsequent fixes. The few > instances that we have right now (zic, snowball, Harry Spencer's > regexp stuff) have imposed a significant code maintenance burden. The > git submodules approach might avoid that problem, but it would still > be the case that any compile-breaking bugs in the upstream repository > immediately become compile breaks for all PostgreSQL developers, and > that any critical defects that force emergency releases by the > upstream project now force emergency releases of PostgreSQL as well. > If we merely link against the external project, then we avoid that. Another point here is that packagers such as Red Hat strenuously dislike such source-code-level wrapping of other projects, because that means that they have to rebuild multiple packages to fix any bugs found in the wrapped code. If I were still packaging Postgres for Red Hat, and such a distribution landed in my inbox, the very first thing I'd be looking to do is rip out the borrowed code and replace it with a runtime shared-library dependency on the upstream project's official library. Having said that ... we have a *really bad* track record of deciding which outside projects we want to depend on, or maybe we've just outlived a lot of them. Aside from Robert's examples, there's uuid-ossp and libxml2, which are external code but have caused us headaches anyway. So I think there's a lot to be said for avoiding dependencies on libraries that may or may not still be getting actively maintained ten years from now. regards, tom lane
On Mon, Dec 5, 2016 at 11:42 AM, Nico Williams <nico@cryptonector.com> wrote: >> Library integrations are tricky but, since you wrote JQ and seem > > Just to be clear, Stephen Dolan wrote jq. I've added to, and maintained > jq, to be sure, but I would not want to take credit from Stephen. Ah, OK, sorry for misunderstanding. >> potentially willing to modify it to work with PostgreSQL better, this >> one might be a lot less painful than some. I can't promise we'd >> accept an integration even if you came up with a patch, but I see >> looking back over this thread that there are several people cautiously >> supporting the idea of using JQ in some form, which is promising. > > Fair enough. I won't be able to work on an integration for a few more > months, so we'll see (and Stephen might well veto some such changes to > jq), and it is time for me to shut up about this for now. Thanks for > the very useful comments, and sorry for the noise. Not noise at all. Thanks for your interest. Bandwidth is a little limited around here so it's hard to give everyone the feedback that they deserve, but it's good that you're interested. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 05, 2016 at 11:52:57AM -0500, Tom Lane wrote: > Another point here is that packagers such as Red Hat strenuously dislike > such source-code-level wrapping of other projects, because that means that > they have to rebuild multiple packages to fix any bugs found in the > wrapped code. If I were still packaging Postgres for Red Hat, and such > a distribution landed in my inbox, the very first thing I'd be looking > to do is rip out the borrowed code and replace it with a runtime > shared-library dependency on the upstream project's official library. I'm quite aware :( I used to work at Sun on Solaris. We too hated duplication. OpenSSL was a particularly problematic case... There is a real tension between the release trains of many distinct open source projects and those of their consumers, and those of distros/OSes. Some projects, such as SQLite3, explicitly recommend copying their source or statically linking them into dependents; distros/vendors never like this. My best advice on this (PG might benefit from it), informed by years of experience dealing with this, is that there's no perfect answer, but that nonetheless library developers should always follow these best practices so as to help those who end up having to deal with multiple versions of those libraries: - prefer dynamic linking (because dynamic linking semantics are superior to static linking semantics) - make libraries self-initialize and self-finalize! (pthread_once() and Win32's ExecuteOnce* are your friends, as are atexit()/on_exit(), pthread_key_create(), and DllMain() on Windows) (otherwise calling your library from another library gets tricky) - make it so that as long as you change SONAMEs you can have multiple versions of the library loaded in one process, specifically: - don't use POSIX file locking (but the new non-POSIX OFD locks are OK) (or use them for files that wouldn't be sharedacross multiple versions in one process) (e.g., SQLite3 uses POSIX file locking safely, but it's not likely that two consumers of different SQLite3 versionsin one process would access the same DB files, so it kinda works) - be backwards- and forwards-compatible as to any config file formats and other state that will be shared by multipleversions - generally: mind backwards compatibility, both source and binary, so as to make it easy to upgrade - this means applying good API design best practices that I won't go into here - write thread-safe code, and preferably fork-safe code too For example, I've seen OpenSSL built with different SONAMEs to support multiple versions of OpenSSL coexisting in a single program/process. That actually works. > Having said that ... we have a *really bad* track record of deciding which > outside projects we want to depend on, or maybe we've just outlived a lot > of them. Aside from Robert's examples, there's uuid-ossp and libxml2, > which are external code but have caused us headaches anyway. So I think > there's a lot to be said for avoiding dependencies on libraries that may > or may not still be getting actively maintained ten years from now. I'm not at all surprised. One codebase I help develop and maintain, Heimdal, includes SQLite3 and libeditline, and parts of Heimdal should really be separate projects (e.g., its ASN.1 compiler and library, and several supporting libraries like libroken (a portability layer)) because they could be useful to others outside Heimdal. Finding the right balance is not trivial. Nico --
On 11/28/16 12:38 PM, Nico Williams wrote: > The internal representation of JSON data is bound to be > completely different, no doubt Actually, that could be a good thing. The internal storage of JSONB is optimized for compress-ability, but that imposes a substantial overhead to calls that are searching for a particular key in a document. This gets *really* bad if you make nested expansion calls (ie: json->'a'->'b'). ExpandedObject support means we're not stuck with the same representation in-memory as on-disk though. While we could create our own internal representation, it seems a bit silly to reinvent that wheel if we don't need to. Bonus points if it would also throw an error if you fed it duplicated object keys. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)