Home > mailing lists

Minor inaccuracy in jsonb_path_ops documentation - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Minor inaccuracy in jsonb_path_ops documentation
Date	July 24, 2014 21:49:43
Msg-id	CAM3SWZSGV44Agpw3EFrZAcKaF2+=80MWjP8dfspGsWadAH4+vA@mail.gmail.com Whole thread Raw
Responses	Re: Minor inaccuracy in jsonb_path_ops documentation
List	pgsql-hackers

Tree view

The jsonb documentation says of the jsonb_path_ops GIN opclass:

"""
A disadvantage of the jsonb_path_ops approach is that it produces no
index entries for JSON structures not containing any values, such as
{"a": {}}. If a search for documents containing such a structure is
requested, it will require a full-index scan, which is quite slow.
jsonb_path_ops is therefore ill-suited for applications that often
perform such searches.
"""

The reference to a full index scan seems questionable. This text
should indicate that a sequential scan can be expected. Even without
any statistics, in the event of not being able to extract any hash
values as GIN keys, the optimizer prefers a sequential scan.

Example with query where one jsonb_path_ops GIN hash value is generated:

postgres=# explain analyze select count(*) from test where j @>
'{"tags":[{"term":"postgres"}]}';                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=4732.72..4732.73 rows=1 width=0) (actual
 
time=0.513..0.513 rows=1 loops=1)  ->  Bitmap Heap Scan on test  (cost=33.71..4729.59 rows=1253
width=0) (actual time=0.107..0.496 rows=100 loops=1)        Recheck Cond: (j @> '{"tags": [{"term":
"postgres"}]}'::jsonb)       Heap Blocks: exact=100        ->  Bitmap Index Scan on ttt  (cost=0.00..33.40 rows=1253
 
width=0) (actual time=0.076..0.076 rows=100 loops=1)              Index Cond: (j @> '{"tags": [{"term":
"postgres"}]}'::jsonb)Planningtime: 0.083 msExecution time: 0.560 ms
 
(8 rows)

Example of empty query hazard with no such hash values:

postgres=# explain select count(*) from test where j @> '{"tags":[]}';                           QUERY PLAN
------------------------------------------------------------------Aggregate  (cost=191519.46..191519.47 rows=1 width=0)
->  Seq Scan on test  (cost=0.00..191516.33 rows=1253 width=0)        Filter: (j @> '{"tags": []}'::jsonb)Planning
time:0.073 ms
 
(4 rows)

gincostestimate() does at least have the ability to anticipate that a
full index scan will be required, and that actually makes the
optimizer do the right thing here. Maybe the text quoted above is
intended to indicate that if there was an index scan, it would have to
be a full index scan, but that doesn't seem appropriate for user
facing documentation like this.

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Alvaro Herrera
Date: 24 July 2014, 20:13:33
Subject: Re: gaussian distribution pgbench -- splits Bv6

From: Tom Lane
Date: 24 July 2014, 21:53:21
Subject: Why does xlog.c not treat COMMIT_PREPARED/ABORT_PREPARED as commit/abort?

Minor inaccuracy in jsonb_path_ops documentation - Mailing list pgsql-hackers

Previous

Next