Re: Indexing on JSONB field not working - Mailing list pgsql-bugs

From Jeff Janes
Subject Re: Indexing on JSONB field not working
Date
Msg-id CAMkU=1z8X1s0WChsgKU7SN8ngrLKHt38UhAqKn-nGTf+xbQ9zg@mail.gmail.com
Whole thread Raw
In response to Re: Indexing on JSONB field not working  (Zhihong Zhang <zhihong@gmail.com>)
Responses Re: Indexing on JSONB field not working
List pgsql-bugs
On Sat, Dec 21, 2019 at 7:00 AM Zhihong Zhang <zhihong@gmail.com> wrote:
Run those 2 EXPLAINs with seqscan off and on. See what difference it makes!


SET enable_seqscan = off;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit  (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)"
"  ->  Index Scan using assets_floatvalue_idx on assets  (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230 rows=7 loops=1)"
"        Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
"Planning Time: 1.617 ms"
"Execution Time: 0.276 ms"

313 is a pretty high estimate for fetching an estimated 100 rows.  It must think that nearly every row fetched from the table is going to be a random page fetch.  Which means that it must think the correlation between assets_floatvalue_idx and physical table order is close to 0.

 

SET enable_seqscan = on;
explain analyze select id, _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0 limit 100;

"Limit  (cost=0.00..107.95 rows=100 width=53) (actual time=41021.311..313501.746 rows=7 loops=1)"
"  ->  Seq Scan on assets  (cost=0.00..936605.40 rows=867607 width=53) (actual time=41021.309..313501.732 rows=7 loops=1)"
"        Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)"
"        Rows Removed by Filter: 2602824"
"Planning Time: 0.283 ms"
"Execution Time: 313501.777 ms"


It thinks it will find 867607 rows which meet the <3.0 condition, but really it only finds 7.  It has to scan the full table, because with only 7 rows it can never stop early due to the LIMIT 100.  Why is the estimate wrong by a factor of over 100,000?  It should be using the statistics from the expression index here (even though it is not using the index during execution), and so should have pretty good statistics. 

Can you show the output of:

select * from pg_stats where tablename ='float_number_index_path2'

For readability, use the output format which shows the columns down the screen, not across.  In psql, that would toggled on with \x.

Cheers,

Jeff

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16176: NULL value returned by category_sql argument to crosstab() causes segmentation fault
Next
From: Tom Lane
Date:
Subject: Re: Indexing on JSONB field not working