Thread: jsonb existence queries are misimplemented by jsonb_ops

jsonb existence queries are misimplemented by jsonb_ops

From
Tom Lane
Date:
I wrote:
> The readability of that comment starts to go downhill with its use of
> "reset" to refer to what everything else calls a "recheck" flag, and in
> any case it's claiming that we *don't* need a recheck for exists (a
> statement I suspect to be false, but more later).

And, indeed, it's false:

regression=# create table j (f1 jsonb);
CREATE TABLE
regression=# insert into j values ('{"foo": {"bar": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"foo": {"blah": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"fool": {"bar": "baz"}}');
INSERT 0 1
regression=# create index on j using gin(f1);
CREATE INDEX
regression=# select * from j where f1 ? 'bar';f1 
----
(0 rows)

regression=# set enable_seqscan to 0;
SET
regression=# select * from j where f1 ? 'bar';           f1            
--------------------------{"foo": {"bar": "baz"}}{"fool": {"bar": "baz"}}
(2 rows)

The indexscan is incorrectly returning rows where the queried key exists
but isn't at top-level.

We could fix this either by giving up on no-recheck for existence queries,
or by changing the way that non-top-level keys get indexed.  However
I suspect the latter would break containment queries, or at least make
their lives a lot more difficult.

Another idea would be to change the definition of the exists operator
so that it *does* look into sub-objects.  It seems rather random to me
that containment looks into sub-objects but exists doesn't.  However,
possibly there are good reasons for the non-orthogonality.

Thoughts?
        regards, tom lane



Re: jsonb existence queries are misimplemented by jsonb_ops

From
Tom Lane
Date:
I wrote:
> Another idea would be to change the definition of the exists operator
> so that it *does* look into sub-objects.  It seems rather random to me
> that containment looks into sub-objects but exists doesn't.  However,
> possibly there are good reasons for the non-orthogonality.

No, wait, containment *doesn't* look into sub-objects:

regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';          f1            
-------------------------{"foo": {"bar": "baz"}}
(1 row)

regression=# select * from j where f1 @> '{"bar": "baz"}';f1 
----
(0 rows)

This is rather surprising in view of the way that section 8.14.4
goes on about nesting.  But I guess the user-facing docs for jsonb
are in little better shape than the internal docs.
        regards, tom lane



Re: jsonb existence queries are misimplemented by jsonb_ops

From
Peter Geoghegan
Date:
On Wed, May 7, 2014 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> No, wait, containment *doesn't* look into sub-objects:
>
> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
>            f1
> -------------------------
>  {"foo": {"bar": "baz"}}
> (1 row)
>
> regression=# select * from j where f1 @> '{"bar": "baz"}';
>  f1
> ----
> (0 rows)


Yes it does. It's just not intended to work like that. You have to
"give a path" to what you're looking for.

-- 
Peter Geoghegan



Re: jsonb existence queries are misimplemented by jsonb_ops

From
Peter Geoghegan
Date:
On Wed, May 7, 2014 at 1:51 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Yes it does. It's just not intended to work like that. You have to
> "give a path" to what you're looking for.


There is exactly one exception explicitly noted as such in the
user-visible docs, which is arrays and the containment of single
elements.

-- 
Peter Geoghegan



Re: jsonb existence queries are misimplemented by jsonb_ops

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Wed, May 7, 2014 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No, wait, containment *doesn't* look into sub-objects:
>> 
>> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
>> f1
>> -------------------------
>> {"foo": {"bar": "baz"}}
>> (1 row)
>> 
>> regression=# select * from j where f1 @> '{"bar": "baz"}';
>> f1
>> ----
>> (0 rows)

> Yes it does. It's just not intended to work like that. You have to
> "give a path" to what you're looking for.

Right, so the top-level item in the query has to be at top level in the
searched-for row.  This means that we *could* distinguish top-level from
non-top-level keys in the index items, and still be able to generate
correct index probes for a containment search (since we could similarly
mark the generated keys as to whether they came from top level in the
query object).

So the question is whether that's a good idea or not.  It seems like it is
a good idea for the current existence operator, and about a wash for the
current containment operator, but perhaps a bad idea for other definitions
of containment.

In any case, something here needs to change.  Thoughts?
        regards, tom lane



Re: jsonb existence queries are misimplemented by jsonb_ops

From
Peter Geoghegan
Date:
On Wed, May 7, 2014 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The indexscan is incorrectly returning rows where the queried key exists
> but isn't at top-level.
>
> We could fix this either by giving up on no-recheck for existence queries,
> or by changing the way that non-top-level keys get indexed.  However
> I suspect the latter would break containment queries, or at least make
> their lives a lot more difficult.

Thanks for looking into this.

I don't know that it would make life all that much harder for
containment. jsonb_ops is already not compelling for testing
containment where there is a lot of nesting. ISTM that
make_scalar_key() could be taught to respect that distinction. ISTM
that containment will indifferently treat the top-level keys as top
level, and the nested keys as nested, because if you're looking at
something nested you have to put a top-level key in the rhs value to
do so, and it will be marked as top level, or not top level in a
bijective fashion.

-- 
Peter Geoghegan