Re: Query running longer - Mailing list pgsql-general
From | Laurenz Albe |
---|---|
Subject | Re: Query running longer |
Date | |
Msg-id | 60efac4f2e23719e71638ca1be5d678e1a02cc6d.camel@cybertec.at Whole thread Raw |
In response to | Re: Query running longer (veem v <veema0000@gmail.com>) |
List | pgsql-general |
On Fri, 2024-02-02 at 10:14 +0530, veem v wrote: > On Fri, 2 Feb 2024 at 02:43, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2024-02-02 at 02:27 +0530, veem v wrote: > > > We have the below query which is running for ~45 seconds on postgres aurora reader instance. > > > I have captured the explain analyze. Want to understand, where exactly the resources are > > > getting spent and if we can be able to optimize it further. > > > > Aurora <> PostgreSQL, but here is what I can see: > > > > - The index scan on SCHEMA1."TAB2" has to check 2 million extra > > rows because "work_mem" is too small. Almost the complete time > > is spent there. > > > > - You may be getting a bad plan, because the statistics on > > SCHEMA1.TAB4 are either out of date or not detailed enough, > > which makes PostgreSQL underestimate the result size. > > As you mentioned below, So wondering how you got to know, if this is the step where majority of the DB resources get spent.And as total time the query ran was ~45 seconds, and out of that how much time it spent in this step, how can i getthat? And to fix this one line with regards to TAB2, should we increase "work_mem" here? > > - The index scan on SCHEMA1."TAB2" has to check 2 million extra > rows because "work_mem" is too small. Almost the complete time > is spent there. > > -> Parallel Bitmap Heap Scan on SCHEMA1.""TAB2"" TAB2 (cost=84860.50..13040301.00 rows=1175611 width=80) (actual time=713.054..26942.082rows=956249 loops=5)" > " Output: TAB2.TAB2_time, TAB2.PRI, TAB2.AID" > Recheck Cond: (TAB2.MID = 'XXXXX'::numeric) > Rows Removed by Index Recheck: 2137395 > Filter: ((TAB2.TAB2_time >= '2024-01-01 00:00:00+00'::timestamp with time zone) AND (TAB2.TAB2_time <= '2024-01-31 00:00:00+00'::timestampwith time zone)) > Heap Blocks: exact=5300 lossy=782577 > Buffers: shared hit=1651569 read=2245157 > I/O Timings: shared/local read=29063.286 The bitmap index scan builds a bitmap, the size of which is limited by "work_mem". If that is big enough, the bitmap will contain a bit for each table row, if not, it only contains a bit per table row for some table blocks (the "exact" ones), then it degrades to a bit per block (whe "lossy" ones). For lossy blocks, *all* rows in the block have to be rechecked, which is overhead. Given that only a small part of the time (26942.082 - 713.054) * 5 is spent doing I/O (29063.286), I guess that the rest is spent processing table rows. > Another point you mentioned as below , for this , should we run vacuum analyze on the table TAB4? > > - You may be getting a bad plan, because the statistics on > SCHEMA1.TAB4 are either out of date or not detailed enough, > which makes PostgreSQL underestimate the result size. > > -> Parallel Bitmap Heap Scan on SCHEMA1.TAB4 TAB4 (cost=26.39..7042.63 rows=1049 width=37) (actual time=23.650..201.606rows=27613 loops=5) > -> Bitmap Index Scan on TAB4_idx1 (cost=0.00..25.95 rows=1784 width=0) (actual time=23.938..23.938 rows=138067loops=1) > Index Cond: ((TAB4.TAB4_code)::text = 'XX'::text) > Buffers: shared hit=72 An ANALYZE might be enough. If not, you can try to collect more detailed statistics for the column: ALTER TABLE SCHEMA1.TAB4 ALTER TAB4_code SET STATISTICS 1000; ANALYZE SCHEMA1.TAB4; Yours, Laurenz Albe
pgsql-general by date: