Re: Indexing on JSONB field not working - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Indexing on JSONB field not working |
Date | |
Msg-id | 21855.1576943356@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Indexing on JSONB field not working (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Indexing on JSONB field not working
|
List | pgsql-bugs |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" > Well, this confirms what I suspected before - the optimizer believes the > seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but > that mismatches the actual performance. The question is why ... The planner evidently believes that 867607 rows will match the query condition, so it expects that the scan will stop (after collecting 100 rows) very quickly. In reality only 7 rows match, so the scan has to run to completion. This is what's bollixing the plan choice. I suspect that 867607 is just a default estimate, but if ANALYZE has been run then there should be stats for the index column, so why isn't it doing better? When I try a similar case here, I get good estimates: regression=# select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit (1 row) regression=# create table public.assets(_doc jsonb); CREATE TABLE regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST); CREATE INDEX regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x; INSERT 0 10000 regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------ Limit (cost=0.00..7.71 rows=100 width=32) -> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32) Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) (3 rows) regression=# analyze assets; ANALYZE regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision < 3.0limit 100; QUERY PLAN ---------------------------------------------------------------------------------------------------- Limit (cost=0.29..8.33 rows=2 width=32) -> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32) Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) (3 rows) The "3808" estimate is just a default for '<' with no stats, but with stats I get a dead-on estimate. regards, tom lane
pgsql-bugs by date: