Thread: jsonb existence queries are misimplemented by jsonb_ops
I wrote: > The readability of that comment starts to go downhill with its use of > "reset" to refer to what everything else calls a "recheck" flag, and in > any case it's claiming that we *don't* need a recheck for exists (a > statement I suspect to be false, but more later). And, indeed, it's false: regression=# create table j (f1 jsonb); CREATE TABLE regression=# insert into j values ('{"foo": {"bar": "baz"}}'); INSERT 0 1 regression=# insert into j values ('{"foo": {"blah": "baz"}}'); INSERT 0 1 regression=# insert into j values ('{"fool": {"bar": "baz"}}'); INSERT 0 1 regression=# create index on j using gin(f1); CREATE INDEX regression=# select * from j where f1 ? 'bar';f1 ---- (0 rows) regression=# set enable_seqscan to 0; SET regression=# select * from j where f1 ? 'bar'; f1 --------------------------{"foo": {"bar": "baz"}}{"fool": {"bar": "baz"}} (2 rows) The indexscan is incorrectly returning rows where the queried key exists but isn't at top-level. We could fix this either by giving up on no-recheck for existence queries, or by changing the way that non-top-level keys get indexed. However I suspect the latter would break containment queries, or at least make their lives a lot more difficult. Another idea would be to change the definition of the exists operator so that it *does* look into sub-objects. It seems rather random to me that containment looks into sub-objects but exists doesn't. However, possibly there are good reasons for the non-orthogonality. Thoughts? regards, tom lane
I wrote: > Another idea would be to change the definition of the exists operator > so that it *does* look into sub-objects. It seems rather random to me > that containment looks into sub-objects but exists doesn't. However, > possibly there are good reasons for the non-orthogonality. No, wait, containment *doesn't* look into sub-objects: regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; f1 -------------------------{"foo": {"bar": "baz"}} (1 row) regression=# select * from j where f1 @> '{"bar": "baz"}';f1 ---- (0 rows) This is rather surprising in view of the way that section 8.14.4 goes on about nesting. But I guess the user-facing docs for jsonb are in little better shape than the internal docs. regards, tom lane
On Wed, May 7, 2014 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > No, wait, containment *doesn't* look into sub-objects: > > regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; > f1 > ------------------------- > {"foo": {"bar": "baz"}} > (1 row) > > regression=# select * from j where f1 @> '{"bar": "baz"}'; > f1 > ---- > (0 rows) Yes it does. It's just not intended to work like that. You have to "give a path" to what you're looking for. -- Peter Geoghegan
On Wed, May 7, 2014 at 1:51 PM, Peter Geoghegan <pg@heroku.com> wrote: > Yes it does. It's just not intended to work like that. You have to > "give a path" to what you're looking for. There is exactly one exception explicitly noted as such in the user-visible docs, which is arrays and the containment of single elements. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Wed, May 7, 2014 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No, wait, containment *doesn't* look into sub-objects: >> >> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; >> f1 >> ------------------------- >> {"foo": {"bar": "baz"}} >> (1 row) >> >> regression=# select * from j where f1 @> '{"bar": "baz"}'; >> f1 >> ---- >> (0 rows) > Yes it does. It's just not intended to work like that. You have to > "give a path" to what you're looking for. Right, so the top-level item in the query has to be at top level in the searched-for row. This means that we *could* distinguish top-level from non-top-level keys in the index items, and still be able to generate correct index probes for a containment search (since we could similarly mark the generated keys as to whether they came from top level in the query object). So the question is whether that's a good idea or not. It seems like it is a good idea for the current existence operator, and about a wash for the current containment operator, but perhaps a bad idea for other definitions of containment. In any case, something here needs to change. Thoughts? regards, tom lane
On Wed, May 7, 2014 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The indexscan is incorrectly returning rows where the queried key exists > but isn't at top-level. > > We could fix this either by giving up on no-recheck for existence queries, > or by changing the way that non-top-level keys get indexed. However > I suspect the latter would break containment queries, or at least make > their lives a lot more difficult. Thanks for looking into this. I don't know that it would make life all that much harder for containment. jsonb_ops is already not compelling for testing containment where there is a lot of nesting. ISTM that make_scalar_key() could be taught to respect that distinction. ISTM that containment will indifferently treat the top-level keys as top level, and the nested keys as nested, because if you're looking at something nested you have to put a top-level key in the rhs value to do so, and it will be marked as top level, or not top level in a bijective fashion. -- Peter Geoghegan