Re: JSONB index not in use, but is TOAST the real cause of slow query? - Mailing list pgsql-general

From Shaheed Haque
Subject Re: JSONB index not in use, but is TOAST the real cause of slow query?
Date
Msg-id CAHAc2jf04e2kPXbYcNHgcb_K7=PBG-uexk9P8L=e_zv2va9+Qw@mail.gmail.com
Whole thread Raw
In response to Re: JSONB index not in use, but is TOAST the real cause of slow query?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general


On Sun, 29 May 2022, 15:58 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> Unfortunately, the real query which I think should behave very
> similarly is still at the several-seconds level despite using the
> index. ...

>     -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> width=4) (actual time=32.488..2258.891 rows=62 loops=1)
>       Recheck Cond: ((company_id = 173) AND ((snapshot ->
> 'employee'::text) ? '16376'::text))
>       Filter: (((snapshot #>
> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> '0'::jsonb))
>       Heap Blocks: exact=5
>         -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> time=0.038..0.039 rows=0 loops=1)
>           -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> loops=1)
>                 Index Cond: (company_id = 173)
>           -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> width=0) (actual time=0.021..0.021 rows=62 loops=1)
>                 Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)

> IIUC, at the bottom, the indices are doing their thing, but a couple
> of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> cannot quite see why.

I suppose it's the execution of that "Filter" condition, which will
require perhaps as many as three fetches of the "snapshot" column.

Thanks, that's clearly in the frame. 

You really need to rethink that data structure.  Sure, you can store tons
of unorganized data in a jsonb column, but you pay for that convenience
with slow access.  Normalizing the bits you need frequently into a more
traditional relational schema is the route to better-performing queries.

Ack. Indeed, the current design works very well for all of the access patterns other than this one, which only recently came into view as a problem. 

Ahead of contemplating a design change I have been looking at how to optimise this bit. I'm currently mired in a crash course on SQL syntax as pertains to JSONB, jsonpath et. al. And the equally mysterious side effects of "?" and "@>" and so on in terms of the amount of data being fetched etc. (and all wrapped in a dose of ORM for good measure). 

I'll write separately with more specific questions if needed on those details. 

Thanks again for the kind help. 

Shaheed


                        regards, tom lane

pgsql-general by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: autovacuum on primary blocking queries on replica?
Next
From: Alastair McKinley
Date:
Subject: Function definition regression in 15beta1 when specific parameter name (string) is used