Re: JSONB index not in use, but is TOAST the real cause of slow query? - Mailing list pgsql-general
From | Shaheed Haque |
---|---|
Subject | Re: JSONB index not in use, but is TOAST the real cause of slow query? |
Date | |
Msg-id | CAHAc2jcGDouJad=f4U6E7KOT1AwHLonjcG7eoUD00p7x2qizEw@mail.gmail.com Whole thread Raw |
In response to | Re: JSONB index not in use, but is TOAST the real cause of slow query? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: JSONB index not in use, but is TOAST the real cause of slow query?
Re: JSONB index not in use, but is TOAST the real cause of slow query? |
List | pgsql-general |
Tom, Thanks for the considered advice and insights. My takeaway is that based on what I've said,you are mostly unsurprised by the results I see. In the longer term, the number of rows will increase but I will have to ponder options for the immediate future. I'll have a play with the knobs you suggested and will report back with anything of note. One last thought about TOAST. If the cost of the -> retrieving the data cannot be obviated, is there any way to tweak how that works? Thanks, Shaheed On Sat, 28 May 2022 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Shaheed Haque <shaheedhaque@gmail.com> writes: > > ====================== > > foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM > > paiyroll_payrun WHERE snapshot ->'employee' ? '2209'; > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------- > > Seq Scan on paiyroll_payrun (cost=0.00..29.13 rows=9 width=32) > > (actual time=50.185..2520.983 rows=104 loops=1) > > Filter: ((snapshot -> 'employee'::text) ? '2209'::text) > > Rows Removed by Filter: 835 > > Planning Time: 0.075 ms > > Execution Time: 2521.004 ms > > (5 rows) > > ====================== > > > So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to > > presume the INDEX is not used because of the number of rows? Is there > > a way to verify that? > > You could do "set enable_seqscan = off" and see if the EXPLAIN > results change. My guess is that you'll find that the indexscan > alternative is costed at a bit more than 29.13 units and thus > the planner thinks seqscan is cheaper. > > > And how can I understand the dreadful amount of > > time (of course, this is just on my dev machine, but still...)? > > In the seqscan case, the -> operator is going to retrieve the whole > JSONB value from each row, which of course is pretty darn expensive > if it's a few megabytes. Unfortunately the planner doesn't account > for detoasting costs when making such estimates, so it doesn't > realize that the seqscan case is going to be expensive. (Fixing > that has been on the to-do list for a long time, but we seldom > see cases where it matters this much, so it hasn't gotten done.) > > The problem would likely go away by itself if your table had more > than a few hundred rows, but if you don't anticipate that happening > then you need some sort of band-aid. I don't recommend turning > enable_seqscan off as a production fix; it'd likely have negative > effects on other queries. Personally I'd experiment with reducing > random_page_cost a bit to see if I could encourage use of the index > that way. The default value of 4.0 is tuned for spinning-rust > storage and is not too appropriate for a lot of modern hardware, > so there's probably room to fix it that way without detuning your > setup for other queries. > > You should probably also rethink whether you really want to store > your data in this format, because anything at all that you do with > that big JSONB column is going to be expensive. (Another thing > that's been on the to-do list for awhile is enabling partial > retrieval of large JSONB values, but AFAIK that hasn't happened > yet either.) > > regards, tom lane
pgsql-general by date: