Re: JSONB index not in use, but is TOAST the real cause of slow query? - Mailing list pgsql-general

From Shaheed Haque
Subject Re: JSONB index not in use, but is TOAST the real cause of slow query?
Date
Msg-id CAHAc2jeCoX=Z9Lzd4b21KdzoozRK8qS-zO6-oC5H9e-Cm5b5aQ@mail.gmail.com
Whole thread Raw
In response to Re: JSONB index not in use, but is TOAST the real cause of slow query?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: JSONB index not in use, but is TOAST the real cause of slow query?
Is it possible to index "deep" into a JSONB column?
List pgsql-general
Adrian, Tom, thanks for the input. Based on that, it occurred to me to
create some "dummy" rows and - almost magically - the index kicks in!
Before is 2500ms:

============
Seq Scan on paiyroll_payrun  (cost=0.00..52.43 rows=17 width=32)
(actual time=53.127..2567.024 rows=104 loops=1)
  Filter: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Filter: 1835
Planning Time: 0.060 ms
Execution Time: 2567.044 ms
(5 rows)
============

After is 300ms:

============
Bitmap Heap Scan on paiyroll_payrun  (cost=36.11..64.67 rows=14
width=32) (actual time=4.189..311.932 rows=104 loops=1)
  Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
  Rows Removed by Index Recheck: 1
  Heap Blocks: exact=8
  ->  Bitmap Index Scan on idx1  (cost=0.00..36.10 rows=14 width=0)
(actual time=0.087..0.087 rows=105 loops=1)
        Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text)
Planning Time: 0.167 ms
Execution Time: 311.962 ms
(8 rows)
============

Woot!

Unfortunately, the real query which I think should behave very
similarly is still at the several-seconds level despite using the
index. Before 3600ms:

============
SELECT "paiyroll_payrun"."actual_t"
  FROM "paiyroll_payrun"
 WHERE ("paiyroll_payrun"."company_id" = 173 AND
("paiyroll_payrun"."snapshot" -> 'employee') ? '16376'
   AND NOT (("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','last_run_of_employment']) = 'true'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','pay_graph']) = '0'
   AND ("paiyroll_payrun"."snapshot" #>
ARRAY['employee','16376','state','employment','-1','2']) > '0'))
 ORDER BY "paiyroll_payrun"."actual_t" DESC
 LIMIT 1
============

============
Limit (cost=31.33..31.33 rows=1 width=4) (actual
time=3595.174..3595.176 rows=1 loops=1)
  -> Sort (cost=31.33..31.33 rows=3 width=4) (actual
time=3595.174..3595.174 rows=1 loops=1)
    Sort Key: actual_t DESC
    Sort Method: top-N heapsort Memory: 25kB
    -> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3
width=4) (actual time=44.575..3595.082 rows=62 loops=1)
       Recheck Cond: (company_id = 173)
       Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND
(((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <>
'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[])
<> '0'::jsonb) OR ((snapshot #>
'{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)))
Rows Removed by Filter: 242
       Heap Blocks: exact=9
         -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304
loops=1)
         Index Cond: (company_id = 173)
Planning Time: 0.258 ms
Execution Time: 3595.195 ms
============

After, with the index in use, 2200ms:

================
Limit (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.989..2258.990 rows=1 loops=1)
  -> Sort (cost=30.92..30.93 rows=1 width=4) (actual
time=2258.988..2258.989 rows=1 loops=1)
    Sort Key: actual_t DESC
    Sort Method: top-N heapsort Memory: 25kB
    -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
width=4) (actual time=32.488..2258.891 rows=62 loops=1)
      Recheck Cond: ((company_id = 173) AND ((snapshot ->
'employee'::text) ? '16376'::text))
      Filter: (((snapshot #>
'{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
'0'::jsonb))
      Heap Blocks: exact=5
        -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
time=0.038..0.039 rows=0 loops=1)
          -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
(cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
loops=1)
                Index Cond: (company_id = 173)
          -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
width=0) (actual time=0.021..0.021 rows=62 loops=1)
                Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)
Planning Time: 0.245 ms
Execution Time: 2259.019 ms
===================

IIUC, at the bottom, the indices are doing their thing, but a couple
of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
cannot quite see why. Have I missed a needed index or what? I'm pretty
much a novice at SQL, so it is quite possible I've missed something
very basic.

Thanks, Shaheed

On Sat, 28 May 2022 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Shaheed Haque <shaheedhaque@gmail.com> writes:
> > One last thought about TOAST. If the cost of the -> retrieving the
> > data cannot be obviated, is there any way to tweak how that works?
>
> The only thing that's readily tweakable is to disable data compression
> for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note
> that you have to rebuild the table for already-stored values to be
> changed).  It seems unlikely that that will help you much though,
> since doing that would save CPU at the cost of more disk I/O, and
> it seems that the I/O side is your bottleneck already.  It *would*
> help if jsonb had logic for partial fetches, because that'd require
> uncompressed storage to work.  But AFAICS that's not there yet.
> I distinctly recall that that was planned for when we were designing
> jsonb's on-disk layout, but I see no code actually doing anything
> of that sort ... the -> operator certainly doesn't.
>
>                         regards, tom lane



pgsql-general by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Improve configurability for IO related behavoir
Next
From: Danny Shemesh
Date:
Subject: Showing alternative query planner plans with explain ?