Re: JSONB index not in use, but is TOAST the real cause of slow query? - Mailing list pgsql-general
From | Shaheed Haque |
---|---|
Subject | Re: JSONB index not in use, but is TOAST the real cause of slow query? |
Date | |
Msg-id | CAHAc2jeCoX=Z9Lzd4b21KdzoozRK8qS-zO6-oC5H9e-Cm5b5aQ@mail.gmail.com Whole thread Raw |
In response to | Re: JSONB index not in use, but is TOAST the real cause of slow query? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: JSONB index not in use, but is TOAST the real cause of slow query?
Is it possible to index "deep" into a JSONB column? |
List | pgsql-general |
Adrian, Tom, thanks for the input. Based on that, it occurred to me to create some "dummy" rows and - almost magically - the index kicks in! Before is 2500ms: ============ Seq Scan on paiyroll_payrun (cost=0.00..52.43 rows=17 width=32) (actual time=53.127..2567.024 rows=104 loops=1) Filter: ((snapshot -> 'employee'::text) ? '2209'::text) Rows Removed by Filter: 1835 Planning Time: 0.060 ms Execution Time: 2567.044 ms (5 rows) ============ After is 300ms: ============ Bitmap Heap Scan on paiyroll_payrun (cost=36.11..64.67 rows=14 width=32) (actual time=4.189..311.932 rows=104 loops=1) Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text) Rows Removed by Index Recheck: 1 Heap Blocks: exact=8 -> Bitmap Index Scan on idx1 (cost=0.00..36.10 rows=14 width=0) (actual time=0.087..0.087 rows=105 loops=1) Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text) Planning Time: 0.167 ms Execution Time: 311.962 ms (8 rows) ============ Woot! Unfortunately, the real query which I think should behave very similarly is still at the several-seconds level despite using the index. Before 3600ms: ============ SELECT "paiyroll_payrun"."actual_t" FROM "paiyroll_payrun" WHERE ("paiyroll_payrun"."company_id" = 173 AND ("paiyroll_payrun"."snapshot" -> 'employee') ? '16376' AND NOT (("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','last_run_of_employment']) = 'true' AND ("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','pay_graph']) = '0' AND ("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','state','employment','-1','2']) > '0')) ORDER BY "paiyroll_payrun"."actual_t" DESC LIMIT 1 ============ ============ Limit (cost=31.33..31.33 rows=1 width=4) (actual time=3595.174..3595.176 rows=1 loops=1) -> Sort (cost=31.33..31.33 rows=3 width=4) (actual time=3595.174..3595.174 rows=1 loops=1) Sort Key: actual_t DESC Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3 width=4) (actual time=44.575..3595.082 rows=62 loops=1) Recheck Cond: (company_id = 173) Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND (((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb))) Rows Removed by Filter: 242 Heap Blocks: exact=9 -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 (cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304 loops=1) Index Cond: (company_id = 173) Planning Time: 0.258 ms Execution Time: 3595.195 ms ============ After, with the index in use, 2200ms: ================ Limit (cost=30.92..30.93 rows=1 width=4) (actual time=2258.989..2258.990 rows=1 loops=1) -> Sort (cost=30.92..30.93 rows=1 width=4) (actual time=2258.988..2258.989 rows=1 loops=1) Sort Key: actual_t DESC Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1 width=4) (actual time=32.488..2258.891 rows=62 loops=1) Recheck Cond: ((company_id = 173) AND ((snapshot -> 'employee'::text) ? '16376'::text)) Filter: (((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)) Heap Blocks: exact=5 -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual time=0.038..0.039 rows=0 loops=1) -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304 loops=1) Index Cond: (company_id = 173) -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9 width=0) (actual time=0.021..0.021 rows=62 loops=1) Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text) Planning Time: 0.245 ms Execution Time: 2259.019 ms =================== IIUC, at the bottom, the indices are doing their thing, but a couple of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I cannot quite see why. Have I missed a needed index or what? I'm pretty much a novice at SQL, so it is quite possible I've missed something very basic. Thanks, Shaheed On Sat, 28 May 2022 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Shaheed Haque <shaheedhaque@gmail.com> writes: > > One last thought about TOAST. If the cost of the -> retrieving the > > data cannot be obviated, is there any way to tweak how that works? > > The only thing that's readily tweakable is to disable data compression > for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note > that you have to rebuild the table for already-stored values to be > changed). It seems unlikely that that will help you much though, > since doing that would save CPU at the cost of more disk I/O, and > it seems that the I/O side is your bottleneck already. It *would* > help if jsonb had logic for partial fetches, because that'd require > uncompressed storage to work. But AFAICS that's not there yet. > I distinctly recall that that was planned for when we were designing > jsonb's on-disk layout, but I see no code actually doing anything > of that sort ... the -> operator certainly doesn't. > > regards, tom lane
pgsql-general by date: