Thread: json accessors

json accessors

From
Andrew Dunstan
Date:
This is a proposal to create some basic functions to extract values from 
json. The simple functions I envision would be:
 * json_object_keys(json) => setof text   returns the set of dequoted, unescaped keys of the object,   errors if it's
notan object * json_get(json, keytext) => json   returns the json value corresponding to the key text in the json
object,  null if not found, error if it's not an object * json_get(json, indexint) => json   returns the json value of
theindexth element in the json array,   null of the index is outside the array bounds, errors if it's not an   array *
json_get_as_text(json,keytext or indexint) => text   same as json_get() except that it returns dequoted, unescaped text
 for a quoted leaf field
 


I also propose to map the json_get functions to the operator '->' and 
json_get_as_text to '->>', so that given x has this json value:
   {"a":[{"b":"c","d":"e"},{"f":true,"g":1}]}

the expression x->'a'->0->>'d' will yield 'e', x->'a'->0->'f' will yield 
'true' and x->'a'->0 will yield '{"b":"c","d":"e"}'. The operators would 
make using these a whole lot nicer :-)

Various people have suggested putting json_path or something similar 
into the core. I'm not sure we want to do that, partly because there are 
several competing entries in this field, and partly because I don't want 
to get into the business of evaluating json predicate tests, which I 
think any tolerably complete gadget would need to do.

Regarding implementation, the way I propose to do this is to modify the 
json parser a bit to turn it into a recursive descent parser, with hooks 
for various operations. NULL hooks would leave us with the validating 
parser we have now with no side effects. The hook functions themselves 
will be very small. This would also allow us to do other things very 
simply at a later stage, for example a json to xml transformation 
function would be very easy to construct using this infrastructure, and 
without disturbing any existing functionality.

cheers

andrew



Re: json accessors

From
Merlin Moncure
Date:
On Wed, Nov 28, 2012 at 11:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> This is a proposal to create some basic functions to extract values from
> json. The simple functions I envision would be:
>
>  * json_object_keys(json) => setof text
>    returns the set of dequoted, unescaped keys of the object,
>    errors if it's not an object
>  * json_get(json, keytext) => json
>    returns the json value corresponding to the key text in the json object,
>    null if not found, error if it's not an object
>  * json_get(json, indexint) => json
>    returns the json value of the indexth element in the json array,
>    null of the index is outside the array bounds, errors if it's not an
>    array
>  * json_get_as_text(json, keytext or indexint) => text
>    same as json_get() except that it returns dequoted, unescaped text
>    for a quoted leaf field

Comments (this is awesome btw):

*) ISTM your keytext operators are a reasonable replacement for a
hypothetical json_path.  That said  you're basically forcing json->sql
mapping through a highly iterative API, which I don't like. At the
very least, I think json_get should return setof json and return all
matching constructions.  I won't miss predicate tests: we can do all
that in SQL.

Non-trivial json productions in postgres require the creation of
special composite types that structure the data that we (I?) rig up in
SQL before routing to json.  What about having functions that work in
the opposite direction:

*) can you access both arrays and records with numeric positional
syntax (hopefully, yes?), for example:

x->0->0

*) json_object_keys(json) seems to special case to me. how about:

json_each(json) which returns a set of key/value pairs and would on
arrays or objects (for arrays the key could be invented from the
index).

*) json_get_as_text(json, keytext or indexint) => text

prefer json_to_text() naming. also json_to_hstore(), etc.

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type;  (this may or many not be possible given our
casting mechanics; i don't know).

My reasoning here is that for non-trivial json productions we (I?)
typically use composite types to rigidly control the structure of the
output document.  For 'restful' type protocols I might want to use the
same trick: there would be a set of nested composite type/arrays (or
even, in trivial cases, a table) that would cleanly map to the
document.  The parsing here can and should be automatic; this would
give nice symmetry with your xxx_to_json functions.  Obviously
conversion here would be best effort but when it works, it would be
wonderful:

WITH json_data AS
( SELECT from_json(null::foo[], <input_doc>)
)
i1 as (INSERT INTO bar SELECT ... FROM json_data)
i2 as (INSERT INTO baz SELECT ... FROM json_data)

where "..." would be some combination of unnest() and composite type
access syntax.

Now, some documents in json won't translate cleanly to composite types
because json allows for heterogeneous arrays.  But if we're in control
of both sides of the protocol that shouldn't matter.

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 11/28/2012 02:08 PM, Merlin Moncure wrote:
> On Wed, Nov 28, 2012 at 11:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> This is a proposal to create some basic functions to extract values from
>> json. The simple functions I envision would be:
>>
>>   * json_object_keys(json) => setof text
>>     returns the set of dequoted, unescaped keys of the object,
>>     errors if it's not an object
>>   * json_get(json, keytext) => json
>>     returns the json value corresponding to the key text in the json object,
>>     null if not found, error if it's not an object
>>   * json_get(json, indexint) => json
>>     returns the json value of the indexth element in the json array,
>>     null of the index is outside the array bounds, errors if it's not an
>>     array
>>   * json_get_as_text(json, keytext or indexint) => text
>>     same as json_get() except that it returns dequoted, unescaped text
>>     for a quoted leaf field
> Comments (this is awesome btw):

Thanks for the input.

>
> *) ISTM your keytext operators are a reasonable replacement for a
> hypothetical json_path.  That said  you're basically forcing json->sql
> mapping through a highly iterative API, which I don't like. At the
> very least, I think json_get should return setof json and return all
> matching constructions.  I won't miss predicate tests: we can do all
> that in SQL.

Yes, it's iterative. And for deeply nested json it might be somewhat 
inefficient, although the parser is pretty fast AFAICT. But it's a start.

>
> Non-trivial json productions in postgres require the creation of
> special composite types that structure the data that we (I?) rig up in
> SQL before routing to json.  What about having functions that work in
> the opposite direction:
>
> *) can you access both arrays and records with numeric positional
> syntax (hopefully, yes?), for example:
>
> x->0->0

You can't do that in JS, so I'm not clear why we should allow it.


>
> *) json_object_keys(json) seems to special case to me. how about:
>
> json_each(json) which returns a set of key/value pairs and would on
> arrays or objects (for arrays the key could be invented from the
> index).

Again, I don't think we should conflate the processing for arrays and 
objects. But I could see doing each(json) => setof (text, json) (and 
maybe a similar function returning setof (text, text), which would 
dequote leaf nodes as json_get_as_text() does).

And similarly a couple of functions to unnest arrays.

>
> *) json_get_as_text(json, keytext or indexint) => text
>
> prefer json_to_text() naming. also json_to_hstore(), etc.


json_to_text seems rather misleading as a name here. Maybe we could 
remove the "_as" from the name if that's bothering you.

As for json_to_hstore, as I mentioned, the design is intended to enable 
the easy constructyion of such transformations, although for hstores 
anything except trivial json structure (i.e. an unnested object) it 
might have unappealing results. But in any case, the important thing to 
do first is to get the infrastructure in place. Time is very short and I 
don't want to extend this very much.

>
> *) have you considered something like
> anyelement from_json(anyelement, json)
> or
> select <json>::some_type;  (this may or many not be possible given our
> casting mechanics; i don't know).


I have no idea what the semantics of this would be.


cheers

andrew





Re: json accessors

From
Andrew Dunstan
Date:
On 11/28/2012 03:44 PM, Andrew Dunstan wrote:
>
>
> As for json_to_hstore, as I mentioned, the design is intended to 
> enable the easy constructyion of such transformations, although for 
> hstores anything except trivial json structure (i.e. an unnested 
> object) it might have unappealing results. But in any case, the 
> important thing to do first is to get the infrastructure in place. 
> Time is very short and I don't want to extend this very much.


The other thing about doing json_to_hstore() is that, since hstore is 
not itself a core type, we couldn't do that in the core json module, and 
therefore we'd either need to expose an API to the JSON parser or 
replicate it in the hstore module. Exposing it is probably the better 
way to go. Then people could write extensions that process json just by 
supplying the hooked functions.

cheers

andrew





Re: json accessors

From
Merlin Moncure
Date:
On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 11/28/2012 02:08 PM, Merlin Moncure wrote:
>> *) ISTM your keytext operators are a reasonable replacement for a
>> hypothetical json_path.  That said  you're basically forcing json->sql
>> mapping through a highly iterative API, which I don't like. At the
>> very least, I think json_get should return setof json and return all
>> matching constructions.  I won't miss predicate tests: we can do all
>> that in SQL.
>
>
> Yes, it's iterative. And for deeply nested json it might be somewhat
> inefficient, although the parser is pretty fast AFAICT. But it's a start.

not completely buying that: see comments below.  not supporting xpath
style decompositions seems wrong to me.  IOW, json_get should be set
returning (perhaps via wild cards in the keytext) or we need
json_each.

>> Non-trivial json productions in postgres require the creation of
>> special composite types that structure the data that we (I?) rig up in
>> SQL before routing to json.  What about having functions that work in
>> the opposite direction:
>>
>> *) can you access both arrays and records with numeric positional
>> syntax (hopefully, yes?), for example:
>>
>> x->0->0
>
>
> You can't do that in JS, so I'm not clear why we should allow it.

agreed -- withdrawn.

>> *) json_object_keys(json) seems to special case to me. how about:
>>
>> json_each(json) which returns a set of key/value pairs and would on
>> arrays or objects (for arrays the key could be invented from the
>> index).
>
> Again, I don't think we should conflate the processing for arrays and
> objects. But I could see doing each(json) => setof (text, json) (and maybe a
> similar function returning setof (text, text), which would dequote leaf
> nodes as json_get_as_text() does).
>
> And similarly a couple of functions to unnest arrays.

Yeah.  Although, I *do* think you need 'json_each' (or a set returning
json_get) and they should be conflated...exactly as jquery does:
http://api.jquery.com/jQuery.each/.  json objects are associative
arrays, right?

So if the *value* that gets returned by json_each is itself a
collection, we can cast back to json and recurse. at the very least,
we ought to decompose large documents into arbitrary smaller chunks
(as xpath does) without iterating.

In most of the code I'd write, I would decompose to a json object
using your stuff then route to something like:

insert into foo select (r).* from populate_record(null::foo,
json_to_hstore(x)) r
from json_each('path->to->record_containg_array', json_document');

assuming the json was deliberately constructed to mashall cleanly into
the database, which is perfectly reasonable.

>> *) json_get_as_text(json, keytext or indexint) => text
>>
>> prefer json_to_text() naming. also json_to_hstore(), etc.
>
> json_to_text seems rather misleading as a name here. Maybe we could remove
> the "_as" from the name if that's bothering you.

hm, I think you're right here -- I see the distinction.

> As for json_to_hstore, as I mentioned, the design is intended to enable the
> easy constructyion of such transformations, although for hstores anything
> except trivial json structure (i.e. an unnested object) it might have
> unappealing results. But in any case, the important thing to do first is to
> get the infrastructure in place. Time is very short and I don't want to
> extend this very much.

yeah, understood.

>> *) have you considered something like
>> anyelement from_json(anyelement, json)
>> or
>> select <json>::some_type;  (this may or many not be possible given our
>> casting mechanics; i don't know).
>
> I have no idea what the semantics of this would be.

Yeah, there's a lot of nuance there.   Don't have to tackle everything
at once I suppose, but spiritually I'm hoping it would serve as
replacement for textual record_in, array_in, etc.  It's just wrong to
have to specify each and every field in during parsing when the
receiving structure is well defined in the database.

merlin



Re: json accessors

From
Hannu Krosing
Date:
On 11/29/2012 01:10 AM, Merlin Moncure wrote:
> On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
...
>
>>> *) have you considered something like
>>> anyelement from_json(anyelement, json)
>>> or
>>> select <json>::some_type;  (this may or many not be possible given our
>>> casting mechanics; i don't know).
>> I have no idea what the semantics of this would be.
> Yeah, there's a lot of nuance there.
One way to tackle it would give the argument element as a template
and the result will the same template filled in from json filled

create table tab1(id serial primary key, ts timestamp default now(), 
data text);

insert into tab1 select from_json(row(null,null,null)::tab1, 
'{"data":"the data"}');
insert into tab1 select from_json(row(null,null,null)::tab1, '{"id":-1, 
"ts":null, "data":""}');
insert into tab1 select from_json(t.*,'{"data":"more data"}') from tab1 
t where id = -1;

hannu=# select row_to_json(t.*) from tab1 t;                          row_to_json
--------------------------------------------------------------- {"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the
data"}{"id":-1,"ts":null, "data":""} {"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
 
(3 rows)

if extracting the defaults from table def proves too tricky for first 
iteration, then
just set the missing fields to NULL or even better, carry over the 
values from template;

------------------------------
Hannu

PS: good work so far :)

Hannu



Re: json accessors

From
Hannu Krosing
Date:
On 11/29/2012 02:07 AM, Hannu Krosing wrote:
> On 11/29/2012 01:10 AM, Merlin Moncure wrote:
>> On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> 
>> wrote:
> ...
>>
>>>> *) have you considered something like
>>>> anyelement from_json(anyelement, json)
>>>> or
>>>> select <json>::some_type;  (this may or many not be possible given our
>>>> casting mechanics; i don't know).
>>> I have no idea what the semantics of this would be.
>> Yeah, there's a lot of nuance there.
> One way to tackle it would give the argument element as a template
> and the result will the same template filled in from json filled
>
> create table tab1(id serial primary key, ts timestamp default now(), 
> data text);
>
> insert into tab1 select from_json(row(null,null,null)::tab1, 
> '{"data":"the data"}');
> insert into tab1 select from_json(row(null,null,null)::tab1, 
> '{"id":-1, "ts":null, "data":""}');
> insert into tab1 select from_json(t.*,'{"data":"more data"}') from 
> tab1 t where id = -1;
>
> hannu=# select row_to_json(t.*) from tab1 t;
>                           row_to_json
> ---------------------------------------------------------------
>  {"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the data"}
>  {"id":-1,"ts":null, "data":""}
>  {"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
> (3 rows)
>
> if extracting the defaults from table def proves too tricky for first 
> iteration, then
> just set the missing fields to NULL or even better, carry over the 
> values from template;
You could even do a template-less row_from_json which returns a records 
with all fields converted to
the JSON-encodable types and hope that the next conversions will be done 
by postgreSQL  as needed.

insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21", 
"data":"End of Everything"}');

insert into tab1
select * from row_from_json(
'[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}  {"id":102, "ts":"2012-12-22", "data":"2nd
dayafter End of Everything"}
 
]');

Hannu
>
> ------------------------------
> Hannu
>
> PS: good work so far :)
>
> Hannu
>
>




Re: json accessors

From
Andrew Dunstan
Date:
On 11/28/2012 08:16 PM, Hannu Krosing wrote:
> On 11/29/2012 02:07 AM, Hannu Krosing wrote:
>> On 11/29/2012 01:10 AM, Merlin Moncure wrote:
>>> On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan 
>>> <andrew@dunslane.net> wrote:
>> ...
>>>
>>>>> *) have you considered something like
>>>>> anyelement from_json(anyelement, json)
>>>>> or
>>>>> select <json>::some_type;  (this may or many not be possible given 
>>>>> our
>>>>> casting mechanics; i don't know).
>>>> I have no idea what the semantics of this would be.
>>> Yeah, there's a lot of nuance there.
>> One way to tackle it would give the argument element as a template
>> and the result will the same template filled in from json filled
>>
>> create table tab1(id serial primary key, ts timestamp default now(), 
>> data text);
>>
>> insert into tab1 select from_json(row(null,null,null)::tab1, 
>> '{"data":"the data"}');
>> insert into tab1 select from_json(row(null,null,null)::tab1, 
>> '{"id":-1, "ts":null, "data":""}');
>> insert into tab1 select from_json(t.*,'{"data":"more data"}') from 
>> tab1 t where id = -1;
>>
>> hannu=# select row_to_json(t.*) from tab1 t;
>>                           row_to_json
>> ---------------------------------------------------------------
>>  {"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the data"}
>>  {"id":-1,"ts":null, "data":""}
>>  {"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
>> (3 rows)
>>
>> if extracting the defaults from table def proves too tricky for first 
>> iteration, then
>> just set the missing fields to NULL or even better, carry over the 
>> values from template;
> You could even do a template-less row_from_json which returns a 
> records with all fields converted to
> the JSON-encodable types and hope that the next conversions will be 
> done by postgreSQL  as needed.
>
> insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21", 
> "data":"End of Everything"}');
>
> insert into tab1
> select * from row_from_json(
> '[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}
>   {"id":102, "ts":"2012-12-22", "data":"2nd day after End of Everything"}
> ]');
>


The real problem here is that for any irregularly shaped json it's 
likely to be a bust, and could only possibly work sanely for nested json 
at all if the target type had corresponding array and composite fields. 
hstore's populate_record works fairly well precisely because hstore is a 
flat structure, unlike json.


In any case, I think this sort of suggestion highlights the possible 
benefits of what I suggested upthread, namely to expose an API that will 
allow easy construction of json transformation functions as extensions.


>>
>>
>> PS: good work so far :)
>>
>> Hannu
>>
>>


Thanks.

cheers

andrew




Re: json accessors

From
Merlin Moncure
Date:
On Thu, Nov 29, 2012 at 7:58 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 11/28/2012 08:16 PM, Hannu Krosing wrote:
>> You could even do a template-less row_from_json which returns a records
>> with all fields converted to
>> the JSON-encodable types and hope that the next conversions will be done
>> by postgreSQL  as needed.
>>
>> insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21",
>> "data":"End of Everything"}');
>>
>> insert into tab1
>> select * from row_from_json(
>> '[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}
>>   {"id":102, "ts":"2012-12-22", "data":"2nd day after End of Everything"}
>> ]');
>
> The real problem here is that for any irregularly shaped json it's likely to
> be a bust, and could only possibly work sanely for nested json at all if the
> target type had corresponding array and composite fields.

again, that's pretty a fairly typical case -- crafting json documents
specifically for consumption in postgres.  defining backend types
allows you to skip intermediate iterative marshaling step.

> hstore's
> populate_record works fairly well precisely because hstore is a flat
> structure, unlike json.

agreed. not trying to drag you into the weeds here.  the above is neat
functionality but doesn't cover all the cases so specific accessor
functions in the vein of your proposal are still needed and the hstore
workaround should work pretty well -- sugaring up the syntax for 'all
in wonder' type translations of complicated structures can be done
later if you want to keep things simple in the short term.

so, just hashing out your proposal and making sure it's reasonable
analogous implementation of xpath.  Sleeping on it, I say mostly, but
not quite. how about some changes for json_get:

1) return setof (key, value) in the style of jquery each().
2) we need some way of indicating in the keytext path that we want to
unnest the collecton pointed to by keytext or to just return it.  for
example,  ->* as indicator?
3) use double quotes, and make them optional (as hstore)
4) speaking of hstore, prefer => vs ->?

if you do at least #1 and #2, json_get I think can cover all the bases
for parsing json, meaning you could reproduce the behaviors for each
of your four proposed  just as xpath does for xml.   (you may still
want to add them for posterity or performance though). so no need for
json_each or json_array_unnest etc.

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 11/29/2012 01:06 PM, Merlin Moncure wrote:
> so, just hashing out your proposal and making sure it's reasonable
> analogous implementation of xpath.  Sleeping on it, I say mostly, but
> not quite. how about some changes for json_get:
>
> 1) return setof (key, value) in the style of jquery each().
> 2) we need some way of indicating in the keytext path that we want to
> unnest the collecton pointed to by keytext or to just return it.  for
> example,  ->* as indicator?
> 3) use double quotes, and make them optional (as hstore)
> 4) speaking of hstore, prefer => vs ->?So I don't think your modifications are well thought out.
>
> if you do at least #1 and #2, json_get I think can cover all the bases
> for parsing json, meaning you could reproduce the behaviors for each
> of your four proposed  just as xpath does for xml.   (you may still
> want to add them for posterity or performance though). so no need for
> json_each or json_array_unnest etc.
>


json_get is designed to return a single thing. What is more, returning a 
(key, value) pair seems quite silly when you're passing the key in as an 
argument. It's not designed to be json_path or json_query, and it's not 
designed either to take a path expression as an argument. So I don't 
think this is a good direction. Your proposed mods to json_get modify it 
out of all recognition. If I offer you a horse and ask what colour you'd 
like, asking for a lion instead isn't a good response :-)

(Repeating myself), I also suggest exposing the transform API so that it 
will be easy to construct further functions as extensions. I'm not 
trying to cover the field. The intention here is to provide some very 
basic json accessors as core functions / operators.


cheers

andrew




Re: json accessors

From
Merlin Moncure
Date:
On Thu, Nov 29, 2012 at 1:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 11/29/2012 01:06 PM, Merlin Moncure wrote:
>>
>> so, just hashing out your proposal and making sure it's reasonable
>> analogous implementation of xpath.  Sleeping on it, I say mostly, but
>> not quite. how about some changes for json_get:
>>
>> 1) return setof (key, value) in the style of jquery each().
>> 2) we need some way of indicating in the keytext path that we want to
>> unnest the collecton pointed to by keytext or to just return it.  for
>> example,  ->* as indicator?
>> 3) use double quotes, and make them optional (as hstore)
>> 4) speaking of hstore, prefer => vs ->?So I don't think your modifications
>> are well thought out.
>>
>>
>> if you do at least #1 and #2, json_get I think can cover all the bases
>> for parsing json, meaning you could reproduce the behaviors for each
>> of your four proposed  just as xpath does for xml.   (you may still
>> want to add them for posterity or performance though). so no need for
>> json_each or json_array_unnest etc.
>
>
> json_get is designed to return a single thing. What is more, returning a
> (key, value) pair seems quite silly when you're passing the key in as an
> argument. It's not designed to be json_path or json_query, and it's not
> designed either to take a path expression as an argument. So I don't think
> this is a good direction. Your proposed mods to json_get modify it out of
> all recognition. If I offer you a horse and ask what colour you'd like,
> asking for a lion instead isn't a good response :-)
>
> (Repeating myself), I also suggest exposing the transform API so that it
> will be easy to construct further functions as extensions. I'm not trying to
> cover the field. The intention here is to provide some very basic json
> accessors as core functions / operators.

Right.   But you're not offering a horse to the farm...but to the zoo.json is in core so I don't think you have the
luxuryof offering a
 
clunky API now withe expectation of a sleeker, faster one in the
future as the old functions will sit around forever in the public
namespace.  What is present in the API doesn't have to cover all
reasonable use cases but it certainly should be expected withstand the
test of time for the cases it does cover.

Sketch out how a object array of indeterminate size would be parsed
and placed into records with a set returning/array returning and
non-set returning json_get: which is a better fit?  xpath() doesn't
work iteratively and nobody has ever complained about that to my
recollection.

table:  create table foo (a int, b int);
document: [{"a": 1, "b": 2}, {"a": 3, "b": 4}, ... {"a": 99999, "b": 100000}]

set returning json_get:
INSERT INTO foo
SELECT * FROM populate_record(null, hstore_to_json((json_get(*)).value));

assuming '*' is the 'expand this' operator in your 'keytext'
expression that I was suggestion. How would this work with your
proposed API?  This is a very typical use case.

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 11/29/2012 04:52 PM, Merlin Moncure wrote:
> On Thu, Nov 29, 2012 at 1:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 11/29/2012 01:06 PM, Merlin Moncure wrote:
>>> so, just hashing out your proposal and making sure it's reasonable
>>> analogous implementation of xpath.  Sleeping on it, I say mostly, but
>>> not quite. how about some changes for json_get:
>>>
>>> 1) return setof (key, value) in the style of jquery each().
>>> 2) we need some way of indicating in the keytext path that we want to
>>> unnest the collecton pointed to by keytext or to just return it.  for
>>> example,  ->* as indicator?
>>> 3) use double quotes, and make them optional (as hstore)
>>> 4) speaking of hstore, prefer => vs ->?So I don't think your modifications
>>> are well thought out.
>>>
>>>
>>> if you do at least #1 and #2, json_get I think can cover all the bases
>>> for parsing json, meaning you could reproduce the behaviors for each
>>> of your four proposed  just as xpath does for xml.   (you may still
>>> want to add them for posterity or performance though). so no need for
>>> json_each or json_array_unnest etc.
>>
>> json_get is designed to return a single thing. What is more, returning a
>> (key, value) pair seems quite silly when you're passing the key in as an
>> argument. It's not designed to be json_path or json_query, and it's not
>> designed either to take a path expression as an argument. So I don't think
>> this is a good direction. Your proposed mods to json_get modify it out of
>> all recognition. If I offer you a horse and ask what colour you'd like,
>> asking for a lion instead isn't a good response :-)
>>
>> (Repeating myself), I also suggest exposing the transform API so that it
>> will be easy to construct further functions as extensions. I'm not trying to
>> cover the field. The intention here is to provide some very basic json
>> accessors as core functions / operators.
> Right.   But you're not offering a horse to the farm...but to the zoo.
>   json is in core so I don't think you have the luxury of offering a
> clunky API now withe expectation of a sleeker, faster one in the
> future as the old functions will sit around forever in the public
> namespace.  What is present in the API doesn't have to cover all
> reasonable use cases but it certainly should be expected withstand the
> test of time for the cases it does cover.
>
> Sketch out how a object array of indeterminate size would be parsed
> and placed into records with a set returning/array returning and
> non-set returning json_get: which is a better fit?  xpath() doesn't
> work iteratively and nobody has ever complained about that to my
> recollection.
>
> table:  create table foo (a int, b int);
> document: [{"a": 1, "b": 2}, {"a": 3, "b": 4}, ... {"a": 99999, "b": 100000}]
>
> set returning json_get:
> INSERT INTO foo
> SELECT * FROM populate_record(null, hstore_to_json((json_get(*)).value));
>
> assuming '*' is the 'expand this' operator in your 'keytext'
> expression that I was suggestion. How would this work with your
> proposed API?  This is a very typical use case.
>


There are many things wrong with this. First, converting to hstore so 
you can call populate_record is quite horrible and ugly and inefficient. 
And it's dependent on having hstore loaded - you can't have an 
hstore_to_jon in core because hstore itself isn't in core. If you want a 
populate_record that takes data from json we should have one coded 
direct. I'm happy to add it to the list as long as everyone understands 
the limitations. Given a function to unnest the json array, which I 
already suggested upthread, you could do what you suggested above much 
more elegantly and directly. Also, BTW, you can't use * that way.

We are not replicating xpath here for json. Sorry, but that's not my 
goal. If you want to code that up, be my guest.

cheers

andrew



Re: json accessors

From
Merlin Moncure
Date:
On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> There are many things wrong with this. First, converting to hstore so you
> can call populate_record is quite horrible and ugly and inefficient. And
> it's dependent on having hstore loaded - you can't have an hstore_to_jon in
> core because hstore itself isn't in core. If you want a populate_record that
> takes data from json we should have one coded direct. I'm happy to add it to
> the list as long as everyone understands the limitations. Given a function
> to unnest the json array, which I already suggested upthread, you could do
> what you suggested above much more elegantly and directly.

I wasn't suggesting you added the hstore stuff and I understand
perfectly well the awkwardness of the hstore route.  That said, this
is how people are going to use your api so it doesn't hurt to go
through the motions; I'm just feeling out how code in the wild would
shape up.

Anyways, my example was busted since you'd need an extra step to move
the set returning output from the json array unnest() into a
'populate_record' type function call.

So, AIUI I think you're proposing (i'm assuming optional quotes)
following my example above:

INSERT INTO foo(a,b)
SELECT json_get_as_text(v, 'a')::int, json_get_as_text(v, 'b')::int
FROM json_each(<document>) v;  /* gives you array of json (a,b) records  */

a hypothetical 'json_to_record (cribbing usage from populate_record)'
variant might look like (please note, I'm not saying 'write this now',
just feeling it out)::

INSERT INTO foo(a,b)
SELECT  r.*
FROM json_each(<document>) v,
LATERAL json_to_record(null::foo, v) r;

you're right: that's pretty clean.

An json_object_each(json), => key, value couldn't hurt either -- this
would handle those oddball cases of really wide objects that you
occasionally see in json.  Plus as_text variants of both each and
object_each.  If you're buying json_object_each, I think you can scrap
json_object_keys().

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 11/29/2012 06:34 PM, Merlin Moncure wrote:
> On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> There are many things wrong with this. First, converting to hstore so you
>> can call populate_record is quite horrible and ugly and inefficient. And
>> it's dependent on having hstore loaded - you can't have an hstore_to_jon in
>> core because hstore itself isn't in core. If you want a populate_record that
>> takes data from json we should have one coded direct. I'm happy to add it to
>> the list as long as everyone understands the limitations. Given a function
>> to unnest the json array, which I already suggested upthread, you could do
>> what you suggested above much more elegantly and directly.
> I wasn't suggesting you added the hstore stuff and I understand
> perfectly well the awkwardness of the hstore route.  That said, this
> is how people are going to use your api so it doesn't hurt to go
> through the motions; I'm just feeling out how code in the wild would
> shape up.
>
> Anyways, my example was busted since you'd need an extra step to move
> the set returning output from the json array unnest() into a
> 'populate_record' type function call.
>
> So, AIUI I think you're proposing (i'm assuming optional quotes)
> following my example above:
>
> INSERT INTO foo(a,b)
> SELECT
>    json_get_as_text(v, 'a')::int,
>    json_get_as_text(v, 'b')::int
> FROM
>    json_each(<document>) v;  /* gives you array of json (a,b) records  */
>
> a hypothetical 'json_to_record (cribbing usage from populate_record)'
> variant might look like (please note, I'm not saying 'write this now',
> just feeling it out)::
>
> INSERT INTO foo(a,b)
> SELECT  r.*
> FROM
>    json_each(<document>) v,
> LATERAL
>    json_to_record(null::foo, v) r;
>
> you're right: that's pretty clean.
>
> An json_object_each(json), => key, value couldn't hurt either -- this
> would handle those oddball cases of really wide objects that you
> occasionally see in json.  Plus as_text variants of both each and
> object_each.  If you're buying json_object_each, I think you can scrap
> json_object_keys().
>


OK, so based on this discussion, I'm thinking of the following:
 * keep the original functions and operators. json_keys is still   required for the case where the json is not flat. *
json_each(json)=> setof (text, text)   errors if the json is not a flat object * json_unnest(json) => setof json
errorsif the json is not an array * json_unnest_each => setof (int, text, text)   errors if the array is not an array
offlat objects * populate_record(record, json) => record   errors if the json isn't a flat object *
populate_recordset(record,json) => setof record   errors if the json is not an array of flat objects
 

Note that I've added a couple of things to deal with json that 
represents a recordset (i.e. an array of objects). This is a very common 
pattern and one well worth optimizing for.

I think that would let you do a lot of what you want pretty cleanly.

cheers

andrew



Re: json accessors

From
Merlin Moncure
Date:
On Fri, Nov 30, 2012 at 8:38 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> OK, so based on this discussion, I'm thinking of the following:

ok, this is looking awesome -- couple naming suggestions (see inline):

>  * keep the original functions and operators. json_keys is still
>    required for the case where the json is not flat.
>  * json_each(json) => setof (text, text)
>    errors if the json is not a flat object
>  * json_unnest(json) => setof json
>    errors if the json is not an array

I wonder if usage of 'unnest' is appropriate: sql unnest()
*completely* unwraps the array to a list of scalars where as json
unnest() only peels of one level.  If you agree with that (it's
debatable), how about json_array_each()?

>  * json_unnest_each => setof (int, text, text)
>    errors if the array is not an array of flat objects

I like this.  Maybe json_object_each() if you agree with my analysis above.

>  * populate_record(record, json) => record
>    errors if the json isn't a flat object
>  * populate_recordset(record, json) => setof record
>    errors if the json is not an array of flat objects

Two questions:
1) is it possible for these to work without a polymorphic object
passed through as hstore does (null::foo)?
select  populate_record(anyelement, record, json)

2) in keeping with naming style of json api, how about json_to_record,
json_to_recordset?
Maybe though keeping similarity with hstore convention is more important.

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 11/30/2012 09:51 AM, Merlin Moncure wrote:
>
> Two questions:
> 1) is it possible for these to work without a polymorphic object
> passed through as hstore does (null::foo)?
> select  populate_record(anyelement, record, json)

I don't understand the question. The API I'm suggesting is exactly in 
line with hstore's, which uses a polymorphic parameter. I don't see how 
it can not, and I don't understand why you would have 3 parameters.

cheers

andrew




Re: json accessors

From
Merlin Moncure
Date:
On Fri, Nov 30, 2012 at 9:02 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 11/30/2012 09:51 AM, Merlin Moncure wrote:
>>
>>
>> Two questions:
>> 1) is it possible for these to work without a polymorphic object
>> passed through as hstore does (null::foo)?
>> select  populate_record(anyelement, record, json)
>
>
> I don't understand the question. The API I'm suggesting is exactly in line
> with hstore's, which uses a polymorphic parameter. I don't see how it can
> not, and I don't understand why you would have 3 parameters.

my mistake: I misread the function as you write it.  it's good as is.

merlin



Re: json accessors

From
Hannu Krosing
Date:
On 11/30/2012 03:38 PM, Andrew Dunstan wrote:
>
> On 11/29/2012 06:34 PM, Merlin Moncure wrote:
>> On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net> 
>> wrote:
>>> There are many things wrong with this. First, converting to hstore 
>>> so you
>>> can call populate_record is quite horrible and ugly and inefficient. 
>>> And
>>> it's dependent on having hstore loaded - you can't have an 
>>> hstore_to_jon in
>>> core because hstore itself isn't in core. If you want a 
>>> populate_record that
>>> takes data from json we should have one coded direct. I'm happy to 
>>> add it to
>>> the list as long as everyone understands the limitations. Given a 
>>> function
>>> to unnest the json array, which I already suggested upthread, you 
>>> could do
>>> what you suggested above much more elegantly and directly.
>> I wasn't suggesting you added the hstore stuff and I understand
>> perfectly well the awkwardness of the hstore route.  That said, this
>> is how people are going to use your api so it doesn't hurt to go
>> through the motions; I'm just feeling out how code in the wild would
>> shape up.
>>
>> Anyways, my example was busted since you'd need an extra step to move
>> the set returning output from the json array unnest() into a
>> 'populate_record' type function call.
>>
>> So, AIUI I think you're proposing (i'm assuming optional quotes)
>> following my example above:
>>
>> INSERT INTO foo(a,b)
>> SELECT
>>    json_get_as_text(v, 'a')::int,
>>    json_get_as_text(v, 'b')::int
>> FROM
>>    json_each(<document>) v;  /* gives you array of json (a,b) 
>> records  */
>>
>> a hypothetical 'json_to_record (cribbing usage from populate_record)'
>> variant might look like (please note, I'm not saying 'write this now',
>> just feeling it out)::
>>
>> INSERT INTO foo(a,b)
>> SELECT  r.*
>> FROM
>>    json_each(<document>) v,
>> LATERAL
>>    json_to_record(null::foo, v) r;
>>
>> you're right: that's pretty clean.
>>
>> An json_object_each(json), => key, value couldn't hurt either -- this
>> would handle those oddball cases of really wide objects that you
>> occasionally see in json.  Plus as_text variants of both each and
>> object_each.  If you're buying json_object_each, I think you can scrap
>> json_object_keys().
>>
>
>
> OK, so based on this discussion, I'm thinking of the following:
>
>  * keep the original functions and operators. json_keys is still
>    required for the case where the json is not flat.
>  * json_each(json) => setof (text, text)
>    errors if the json is not a flat object
Why not json_each(json) => setof (text, json) ? with no erroring out ?

if the json does represent text it is easy to convert to text on the 
query side.

>  * json_unnest(json) => setof json
>    errors if the json is not an array
>  * json_unnest_each => setof (int, text, text)
>    errors if the array is not an array of flat objects
json_unnest_each => setof (int, text, json)
>  * populate_record(record, json) => record
>    errors if the json isn't a flat object
errors if the values are not castable to records field types

nb! some nonflatness is castable. especially to json or hstore or record 
types

>  * populate_recordset(record, json) => setof record
>    errors if the json is not an array of flat objects
ditto
> Note that I've added a couple of things to deal with json that 
> represents a recordset (i.e. an array of objects). This is a very 
> common pattern and one well worth optimizing for.
>
> I think that would let you do a lot of what you want pretty cleanly.
>
> cheers
>
> andrew




Re: json accessors

From
Andrew Dunstan
Date:
On 11/30/2012 10:04 AM, Hannu Krosing wrote:
>>
>>
>> OK, so based on this discussion, I'm thinking of the following:
>>
>>  * keep the original functions and operators. json_keys is still
>>    required for the case where the json is not flat.
>>  * json_each(json) => setof (text, text)
>>    errors if the json is not a flat object
>
> Why not json_each(json) => setof (text, json) ? with no erroring out ?
>
> if the json does represent text it is easy to convert to text on the 
> query side.


Well, it would be possible, sure. I'm not sure how useful. Or we could 
do both fairly easily. It's not as simple or efficient as you might 
think to dequote / de-escape json string values, which is why the 
original API had variants for returning both types of values. Maybe we 
need a function for doing just that.

>
>>  * json_unnest(json) => setof json
>>    errors if the json is not an array
>>  * json_unnest_each => setof (int, text, text)
>>    errors if the array is not an array of flat objects
> json_unnest_each => setof (int, text, json)

ditto.

>>  * populate_record(record, json) => record
>>    errors if the json isn't a flat object
> errors if the values are not castable to records field types
>
> nb! some nonflatness is castable. especially to json or hstore or 
> record types


If the record has a json field, certainly. If it has a record field, 
fairly likely.  hstore could probably be a problem given it's not a core 
type. Similarly to the generation functions discussed in another thread, 
I could possibly look up a cast from json to the non-core type and use 
it. That might work for hstore.

I'll try to keep this as permissive as possible.

>
>>  * populate_recordset(record, json) => setof record
>>    errors if the json is not an array of flat objects
> ditto

ditto ;-)


cheers

andrew




Re: json accessors

From
Hannu Krosing
Date:
On 11/30/2012 04:29 PM, Andrew Dunstan wrote:
>
> On 11/30/2012 10:04 AM, Hannu Krosing wrote:
>>>
>>>
>>> OK, so based on this discussion, I'm thinking of the following:
>>>
>>>  * keep the original functions and operators. json_keys is still
>>>    required for the case where the json is not flat.
>>>  * json_each(json) => setof (text, text)
>>>    errors if the json is not a flat object
>>
>> Why not json_each(json) => setof (text, json) ? with no erroring out ?
>>
>> if the json does represent text it is easy to convert to text on the 
>> query side.
>
>
> Well, it would be possible, sure. I'm not sure how useful. Or we could 
> do both fairly easily. It's not as simple or efficient as you might 
> think to dequote / de-escape json string values, which is why the 
> original API had variants for returning both types of values. Maybe we 
> need a function for doing just that.
>
Btw, how does current json type handle code pages - is json always utf-8 
even when server encoding is not ?

if so then we could at least have a shortcut conversion of json to 
utf8-text which can skip codepage changes.

----------------------
Hannu




Re: json accessors

From
Andrew Dunstan
Date:
On 11/30/2012 10:59 AM, Hannu Krosing wrote:
>
> Btw, how does current json type handle code pages - is json always 
> utf-8 even when server encoding is not ?
>
> if so then we could at least have a shortcut conversion of json to 
> utf8-text which can skip codepage changes.
>
>

IIRC json is stored and processed in the server encoding. Normally it 
would make sense to have that be utf8. It is delivered to the client in 
the client encoding.

cheers

andrew




Re: json accessors

From
Josh Berkus
Date:
Andrew,

What about doing:

json_get(json, json)
returns json

where parameter #2 is a path expressed as JSON?  For example,

json_get(personal_profile, '[ {contact_info {phone numbers {cell phones}
} } ]')
... would return whatever was in that heirarchical object, in this case
an array of cell phone numbers.

Or am I just reinventing jsonpath?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: json accessors

From
Andrew Dunstan
Date:
On 12/03/2012 08:14 PM, Josh Berkus wrote:
> Andrew,
>
> What about doing:
>
> json_get(json, json)
> returns json
>
> where parameter #2 is a path expressed as JSON?  For example,
>
> json_get(personal_profile, '[ {contact_info {phone numbers {cell phones}
> } } ]')
> ... would return whatever was in that heirarchical object, in this case
> an array of cell phone numbers.
>
> Or am I just reinventing jsonpath?
>


Yes, you are, rather. It might be possible to do something like:
    json_get(json, variadic text) => json

as long as it doesn't involve any testing beyond field name  / array 
index equivalence.

cheers

andrew




Re: json accessors

From
Josh Berkus
Date:
> Yes, you are, rather. It might be possible to do something like:
> 
>     json_get(json, variadic text) => json

Given that I already do the equivalent in Python, this would suit me
well.  Not sure about other users ...

> as long as it doesn't involve any testing beyond field name  / array
> index equivalence.

I'm sure people will *ask* for more in the future, but you could do a
LOT with just an equivalence version.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: json accessors

From
"David E. Wheeler"
Date:
On Dec 4, 2012, at 10:05 AM, Josh Berkus <josh@agliodbs.com> wrote:

>>    json_get(json, variadic text) => json
>
> Given that I already do the equivalent in Python, this would suit me
> well.  Not sure about other users ...

Well, given that sometimes you will have mixed arrays and objects, how would you distinguish "42" as an object key or
anarray index? 

Best,

David




Re: json accessors

From
"David E. Wheeler"
Date:
On Nov 28, 2012, at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

>> Yes, it's iterative. And for deeply nested json it might be somewhat
>> inefficient, although the parser is pretty fast AFAICT. But it's a start.
>
> not completely buying that: see comments below.  not supporting xpath
> style decompositions seems wrong to me.  IOW, json_get should be set
> returning (perhaps via wild cards in the keytext) or we need
> json_each.

The problem I see with the current proposal is that this limitation, it seems to me, would prevent the ability to index
nestedkeys. If you're essentially composing and decomposing JSON values as you drill down, the intermediate JSON values
betweenthe original one and the final return value can't be indexed, can they? 

For sufficiently large columns, I expect I would want a GIN index to speed JSON value extraction queries. Possible with
thisproposal? 

Best,

David

PS: SOrry for the delayed replies, digging my way out of a couple weeks of back posts…




Re: json accessors

From
Andrew Dunstan
Date:
On 12/05/2012 12:11 PM, David E. Wheeler wrote:
> On Dec 4, 2012, at 10:05 AM, Josh Berkus <josh@agliodbs.com> wrote:
>
>>>     json_get(json, variadic text) => json
>> Given that I already do the equivalent in Python, this would suit me
>> well.  Not sure about other users ...
> Well, given that sometimes you will have mixed arrays and objects, how would you distinguish "42" as an object key or
anarray index?
 


if the thing is an array, test to see if the string is a valid integer 
string, and if so use the integer value.


cheers

andrew



Re: json accessors

From
Andrew Dunstan
Date:
On 12/05/2012 12:14 PM, David E. Wheeler wrote:
> On Nov 28, 2012, at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>>> Yes, it's iterative. And for deeply nested json it might be somewhat
>>> inefficient, although the parser is pretty fast AFAICT. But it's a start.
>> not completely buying that: see comments below.  not supporting xpath
>> style decompositions seems wrong to me.  IOW, json_get should be set
>> returning (perhaps via wild cards in the keytext) or we need
>> json_each.
> The problem I see with the current proposal is that this limitation, it seems to me, would prevent the ability to
indexnested keys. If you're essentially composing and decomposing JSON values as you drill down, the intermediate JSON
valuesbetween the original one and the final return value can't be indexed, can they?
 
>
> For sufficiently large columns, I expect I would want a GIN index to speed JSON value extraction queries. Possible
withthis proposal?
 

Probably not.


cheers

andrew




Re: json accessors

From
"David E. Wheeler"
Date:
On Dec 5, 2012, at 9:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

>> For sufficiently large columns, I expect I would want a GIN index to speed JSON value extraction queries. Possible
withthis proposal? 
>
> Probably not.

That greatly reduces its utility for querying, though not, of course, for using it in procedural code.

Wouldn't using a jsonpath-style implementation allow for indexing?

Best,

David


Re: json accessors

From
Merlin Moncure
Date:
On Wed, Dec 5, 2012 at 11:14 AM, David E. Wheeler <david@justatheory.com> wrote:
> On Nov 28, 2012, at 4:10 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>>> Yes, it's iterative. And for deeply nested json it might be somewhat
>>> inefficient, although the parser is pretty fast AFAICT. But it's a start.
>>
>> not completely buying that: see comments below.  not supporting xpath
>> style decompositions seems wrong to me.  IOW, json_get should be set
>> returning (perhaps via wild cards in the keytext) or we need
>> json_each.
>
> The problem I see with the current proposal is that this limitation, it seems to me, would prevent the ability to
indexnested keys. If you're essentially composing and decomposing JSON values as you drill down, the intermediate JSON
valuesbetween the original one and the final return value can't be indexed, can they? 
>
> For sufficiently large columns, I expect I would want a GIN index to speed JSON value extraction queries. Possible
withthis proposal? 

I think best practices for JSON manipulation (at least in performance
sensitive cases with large documents) are going to be to fully
decompose into sql structures and manipulate after the fact.  JSON's
primary role is to serve as data exchange and Andrew's API (with the
tweaks he came up with) seems to facilitate that pretty well; full
decomposition is a snap.

Indexing large documents for fancy querying is a niche case but also
quite complex.  This isn't very well covered by xmlpath either btw --
I think for inspiration we should be looking at hstore.

That said, how would you do that?  The first thing that jumps into my
mind is to cut right to the chase:  Maybe the semantics could be
defined so that implement hackstack @> needle would reasonable cover
most cases.

So my takeaways are:
*) decomposition != precise searching.  andrew's api handles the
former and stands on it's own merits.

*) xmlpath/jsonpath do searching (and decomposition) but are very
clunky from sql perspective and probably absolutely nogo in terms if
GIST/GIN.  postgres spiritually wants to do things via operators and
we should (if possible) at least consider that first

merlin



Re: json accessors

From
Andrew Dunstan
Date:
On 12/05/2012 12:29 PM, David E. Wheeler wrote:
> On Dec 5, 2012, at 9:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>>> For sufficiently large columns, I expect I would want a GIN index to speed JSON value extraction queries. Possible
withthis proposal?
 
>> Probably not.
> That greatly reduces its utility for querying, though not, of course, for using it in procedural code.
>
> Wouldn't using a jsonpath-style implementation allow for indexing?


Indexing tree-like data isn't at all easy. We don't index XML either. 
There has been discussion of this sort of indexing it in the past, and a 
couple of people have said they would work on it, but I have not seen a 
proposal or a single line of code.

Jsonpath on its own would not do what you're suggesting. A first 
approach to indexing treeish data requires that you generate all the 
possible paths and index that. That would be quite explosive in volume. 
And anyway, jsonpath is not on offer here.

I'm sorry what I have offered isn't what you want, but plenty of other 
people have told me it will go a long way meeting their needs.

cheers

andrew





Re: json accessors

From
"David E. Wheeler"
Date:
On Dec 5, 2012, at 9:57 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

> Indexing large documents for fancy querying is a niche case but also
> quite complex.  This isn't very well covered by xmlpath either btw --
> I think for inspiration we should be looking at hstore.

Agreed, although hstore, IIRC, does not support nesting.

> That said, how would you do that?  The first thing that jumps into my
> mind is to cut right to the chase:  Maybe the semantics could be
> defined so that implement hackstack @> needle would reasonable cover
> most cases.

Yes.

> So my takeaways are:
> *) decomposition != precise searching.  andrew's api handles the
> former and stands on it's own merits.

Agreed.

> *) xmlpath/jsonpath do searching (and decomposition) but are very
> clunky from sql perspective and probably absolutely nogo in terms if
> GIST/GIN.  postgres spiritually wants to do things via operators and
> we should (if possible) at least consider that first

I don't understand how xmlpath/jsonpath is not able to be implemented with operators.

Best,

David




Re: json accessors

From
"David E. Wheeler"
Date:
On Dec 5, 2012, at 10:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

> Indexing tree-like data isn't at all easy. We don't index XML either. There has been discussion of this sort of
indexingit in the past, and a couple of people have said they would work on it, but I have not seen a proposal or a
singleline of code. 

Yeah, I forgot that xmlpath was not indexable.

> Jsonpath on its own would not do what you're suggesting. A first approach to indexing treeish data requires that you
generateall the possible paths and index that. That would be quite explosive in volume. And anyway, jsonpath is not on
offerhere. 

Yeah, explosive for sure, but for sufficiently small JSON values, that shouldn’t be much of an issue. I expect GINs to
beexpensive anyway (see full-text indexing). 

I am not invested in jsonpath; I just cited it as an example of using a single function call to do a nested search.
Obviously`json_get(json, variadic text)` allows this, too, and could potentially use a GIN index of a JSON tree to
performthe variadic text search at some point in the future, yes? 

> I'm sorry what I have offered isn't what you want, but plenty of other people have told me it will go a long way
meetingtheir needs. 

*Sigh.* I guess I have not been clear.

The stuff you propose is *awesome*. I love it. The syntax with the chaining operators warms my heart, and I can’t wait
tomake *extensive* use of it in my procedural code. Maybe I would never *need* to do column queries of JSON contents
oftenenough to require an expensive index. 

So I'm happy with this stuff, as long as it does not get in the way of supporting indexing at some point in the future.
Ican’t wait to start using it! 

Best,

David




Re: json accessors

From
Josh Berkus
Date:
>> *) xmlpath/jsonpath do searching (and decomposition) but are very
>> clunky from sql perspective and probably absolutely nogo in terms if
>> GIST/GIN.  postgres spiritually wants to do things via operators and
>> we should (if possible) at least consider that first

Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
equality lookups in what way is a JSON doc different from a collection
of ltree rows?

We'd probably want to use SP-GiST for better index size/performance, but
I don't see that this is impossible.  Just some difficult code.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: json accessors

From
Merlin Moncure
Date:
On Wed, Dec 5, 2012 at 12:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>>> *) xmlpath/jsonpath do searching (and decomposition) but are very
>>> clunky from sql perspective and probably absolutely nogo in terms if
>>> GIST/GIN.  postgres spiritually wants to do things via operators and
>>> we should (if possible) at least consider that first
>
> Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
> equality lookups in what way is a JSON doc different from a collection
> of ltree rows?
>
> We'd probably want to use SP-GiST for better index size/performance, but
> I don't see that this is impossible.  Just some difficult code.

huh -- good point.   xpath at least is quite complicated and likely
impractical (albeit not impossible) to marry with GIST in a meaningful
way.   jsonpath (at least AIUI from here:
http://code.google.com/p/json-path/) seems to be lighter weight as is
all things json when stacked up against xml.

merlin



Re: json accessors

From
Andres Freund
Date:
On 2012-12-05 10:49:35 -0800, Josh Berkus wrote:
>
> >> *) xmlpath/jsonpath do searching (and decomposition) but are very
> >> clunky from sql perspective and probably absolutely nogo in terms if
> >> GIST/GIN.  postgres spiritually wants to do things via operators and
> >> we should (if possible) at least consider that first
>
> Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
> equality lookups in what way is a JSON doc different from a collection
> of ltree rows?

The space requirement for the paths are quite different. Its not that
hard to build indexing support, its hard to build efficient support.

The more you hide from postgres (i.e. behind a single very complex
operator/function) the harder it is for the planner to detect whether
your expression is indexable or not.

> We'd probably want to use SP-GiST for better index size/performance, but
> I don't see that this is impossible.  Just some difficult code.

I don't immediately see why SP-Gist would be be beneficial. What kind of
access structure do you have in mind?

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: json accessors

From
Andrew Dunstan
Date:
On 12/05/2012 01:49 PM, Josh Berkus wrote:
>>> *) xmlpath/jsonpath do searching (and decomposition) but are very
>>> clunky from sql perspective and probably absolutely nogo in terms if
>>> GIST/GIN.  postgres spiritually wants to do things via operators and
>>> we should (if possible) at least consider that first
> Why is it a nogo for GiST?  Ltree works, doesn't it?  If we only support
> equality lookups in what way is a JSON doc different from a collection
> of ltree rows?
>
> We'd probably want to use SP-GiST for better index size/performance, but
> I don't see that this is impossible.  Just some difficult code.
>

The set of paths for a single json datum can be huge, as opposed to one 
for a single ltree datum. That strikes me as a serious barrier. In any 
case, nobody I know of is even offering to do this - when they do we can 
look at the design. Until then I'm assuming nothing.

cheers

andrew



Re: json accessors

From
Andrew Dunstan
Date:
On 12/05/2012 01:48 PM, David E. Wheeler wrote:
>> I'm sorry what I have offered isn't what you want, but plenty of other people have told me it will go a long way
meetingtheir needs. 
> *Sigh.* I guess I have not been clear.
>
> The stuff you propose is *awesome*. I love it. The syntax with the chaining operators warms my heart, and I can’t
waitto make *extensive* use of it in my procedural code. Maybe I would never *need* to do column queries of JSON
contentsoften enough to require an expensive index. 


OK, sorry if I misunderstood. I guess I'm trying pretty hard to
concentrate on what can be accomplished now, and other people are
talking about blue sky possibilities.


>
> So I'm happy with this stuff, as long as it does not get in the way of supporting indexing at some point in the
future.I can’t wait to start using it! 

I don't see why it should get in the way of anything like that. If
anything, the parser design changes I have proposed should make later
development much easier.


cheers

andrew



Re: json accessors

From
Merlin Moncure
Date:
On Wed, Dec 5, 2012 at 12:42 PM, David E. Wheeler <david@justatheory.com> wrote:
> On Dec 5, 2012, at 9:57 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> Indexing large documents for fancy querying is a niche case but also
>> quite complex.  This isn't very well covered by xmlpath either btw --
>> I think for inspiration we should be looking at hstore.
>
> Agreed, although hstore, IIRC, does not support nesting.
>
>> That said, how would you do that?  The first thing that jumps into my
>> mind is to cut right to the chase:  Maybe the semantics could be
>> defined so that implement hackstack @> needle would reasonable cover
>> most cases.
>
> Yes.
>
>> So my takeaways are:
>> *) decomposition != precise searching.  andrew's api handles the
>> former and stands on it's own merits.
>
> Agreed.
>
>> *) xmlpath/jsonpath do searching (and decomposition) but are very
>> clunky from sql perspective and probably absolutely nogo in terms if
>> GIST/GIN.  postgres spiritually wants to do things via operators and
>> we should (if possible) at least consider that first
>
> I don't understand how xmlpath/jsonpath is not able to be implemented with operators.

yeah -- i phrased that badly -- by 'operators' I meant that on both
sides would be json document with absolute minimum fanciness such as
wildcards and predicate matches.  basically, 'overlaps' and
(especially) 'contains'.

merlin



Re: json accessors

From
"David E. Wheeler"
Date:
On Dec 5, 2012, at 11:51 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

>> So I'm happy with this stuff, as long as it does not get in the way of supporting indexing at some point in the
future.I can’t wait to start using it! 
>
> I don't see why it should get in the way of anything like that. If anything, the parser design changes I have
proposedshould make later development much easier. 

Awesome, thanks!

David