Thread: Jsonb first level keys statistic
Hi! I ran into a problem which caused by wrong rows count planner expecting to get when using ?| operator against jsonb field. To illustrate the issue: create table tbl_test (a jsonb); insert into tbl_test(a) select jsonb_build_object('val1', 1) from generate_series(1, 100000); analyse tbl_test; explain analyze select count(*) from tbl_test where a ?| '{val1}'::text[]; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1986.25..1986.26 rows=1 width=8) (actual time=55.264..55.264 rows=1 loops=1) -> Seq Scan on tbl_test (cost=0.00..1986.00 rows=100 width=0) (actual time=0.026..43.886 rows=100000 loops=1) Filter: (a ?| '{val1}'::text[]) Planning time: 0.065 ms Execution time: 55.310 ms rows=100 when real value is 100000. In current simple situation it's not an issue, but in real rather complicated queriesit leads to wrong execution plan. How can I help planner to get accurate statistic about keys distribution in jsonb field? version I'm using: PostgreSQL 9.6.16 on x86_64-pc-linux-gnu
Other than creating a functional index on the keys that you most want to access and include that function call in your query, I don't know that you can. Functional indexes get statistics while partial indexes do not. If a GIN index using path operator collected stats on the frequency of the keys, then you would be all set I think. Perhaps that is a feature that could be implemented.
Others will chime in I expect, but I tend to avoid using jsonb for any values that I need to use to significant filter a result set (ON and WHERE clauses).