Re: Indexing on JSONB field not working - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: Indexing on JSONB field not working
Date
Msg-id 20191221114003.yo56lv3o6thdvxtn@development
Whole thread Raw
In response to Indexing on JSONB field not working  ("Zhihong Zhang" <zhihong@gmail.com>)
Responses Re: Indexing on JSONB field not working  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
>SET enable_seqscan = off;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;
 
>
>"Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>"  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)"
 
>"        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"Planning Time: 1.617 ms"
>"Execution Time: 0.276 ms"
>
>SET enable_seqscan = on;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;
 
>
>"Limit  (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
>"  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7
loops=1)"
>"        Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"        Rows Removed by Filter: 2602824"
>"Planning Time: 0.283 ms"
>"Execution Time: 313501.777 ms"
>

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

    100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Next
From: Joe Conway
Date:
Subject: Re: BUG #16176: NULL value returned by category_sql argument tocrosstab() causes segmentation fault