Re: Indexing on JSONB field not working - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Indexing on JSONB field not working
Date
Msg-id 21855.1576943356@sss.pgh.pa.us
Whole thread Raw
In response to Re: Indexing on JSONB field not working  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Indexing on JSONB field not working
List pgsql-bugs
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>> "Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>> "  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)" 
>> "        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"

> Well, this confirms what I suspected before - the optimizer believes the
> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
> that mismatches the actual performance. The question is why ...

The planner evidently believes that 867607 rows will match the query
condition, so it expects that the scan will stop (after collecting
100 rows) very quickly.  In reality only 7 rows match, so the scan
has to run to completion.  This is what's bollixing the plan choice.

I suspect that 867607 is just a default estimate, but if ANALYZE has
been run then there should be stats for the index column, so why isn't
it doing better?  When I try a similar case here, I get good estimates:

regression=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
(1 row)

regression=# create table public.assets(_doc jsonb);
CREATE TABLE
regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST);
CREATE INDEX
regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x;
INSERT 0 10000
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100; 
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..7.71 rows=100 width=32)
   ->  Seq Scan on assets  (cost=0.00..293.44 rows=3808 width=32)
         Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

regression=# analyze assets;
ANALYZE
regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100; 
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..8.33 rows=2 width=32)
   ->  Index Scan using assets_float8_idx on assets  (cost=0.29..8.33 rows=2 width=32)
         Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)
(3 rows)

The "3808" estimate is just a default for '<' with no stats, but
with stats I get a dead-on estimate.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Indexing on JSONB field not working
Next
From: Bruce Momjian
Date:
Subject: Re: BUG #16151: startup timing problem