Home > mailing lists

Re: Indexing on JSONB field not working - Mailing list pgsql-bugs

From	Tomas Vondra
Subject	Re: Indexing on JSONB field not working
Date	December 21, 2019 11:40:03
Msg-id	20191221114003.yo56lv3o6thdvxtn@development Whole thread Raw
In response to	Indexing on JSONB field not working ("Zhihong Zhang" <zhihong@gmail.com>)
Responses	Re: Indexing on JSONB field not working
List	pgsql-bugs

Tree view

On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote:
>Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!
>
>
>SET enable_seqscan = off;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;

>
>"Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
>"  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual
time=0.089..0.230rows=7 loops=1)"

>"        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"Planning Time: 1.617 ms"
>"Execution Time: 0.276 ms"
>
>SET enable_seqscan = on;
>explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <
3.0limit 100;

>
>"Limit  (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
>"  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7
loops=1)"
>"        Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
>"        Rows Removed by Filter: 2602824"
>"Planning Time: 0.283 ms"
>"Execution Time: 313501.777 ms"
>

Well, this confirms what I suspected before - the optimizer believes the
seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but
that mismatches the actual performance. The question is why ...

For LIMIT queries, I can think of two common issues - the cost estimate
is computed as a simple linear approximation in the input relation. For
example, we know the seqscan is expected to produce 867607 rows with a
total cost of 936605, so the cost of producing just 100 rows is

    100 * 936605 / 867607 = 107.95

But that assumes a number of things: (a) that the seqscan row estimate
is correct, and that (b) the matching rows are uniformly distributed in
the table. If it's misestimated, or if the rows are towards the end of
the relation (i.e. after doing a most of the costed work) this estimate
may be quite off.

Can you do explain analyze of the query without the LIMIT?

BTW a LIMIT without an ORDER BY is a bit strange. Also, maybe you could
do try using a partial index (if the where condition does not change).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

From: Thomas Munro
Date: 21 December 2019, 07:43:05
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: Joe Conway
Date: 21 December 2019, 14:14:41
Subject: Re: BUG #16176: NULL value returned by category_sql argument tocrosstab() causes segmentation fault

Re: Indexing on JSONB field not working - Mailing list pgsql-bugs

Previous

Next