Re: CPU Consuming query. Sequential scan despite indexing. - Mailing list pgsql-performance

From aditya desai
Subject Re: CPU Consuming query. Sequential scan despite indexing.
Date
Msg-id CAN0SRDGvmdEYYzQQjq2+HUwpMd5f=ThrjO3CHZ=iCZv0Mjs59w@mail.gmail.com
Whole thread Raw
In response to Re: CPU Consuming query. Sequential scan despite indexing.  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: CPU Consuming query. Sequential scan despite indexing.  (aditya desai <admad123@gmail.com>)
List pgsql-performance
Hi Laurenz,
I created

On Fri, Oct 16, 2020 at 2:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2020-10-15 at 20:34 +0530, aditya desai wrote:
> Below query always shows up on top in the CPU matrix. Also despite having indexes it does sequential scans
> (probably because WHERE condition satisfies almost all of the data from table). This query
> runs on the default landing page in application and needs to fetch records in less that 100 ms
>  without consuming too much CPU.
>
>  Any opinions? Table is very huge and due to referential identity and business requirements we could not
>  implement partitioning as well.
>
> There is index on (countrycode,facilitycode,jobstartdatetime)
>
> explain (analyze,buffers) with JobCount as ( select jobstatuscode,count(1) stat_count from job j
>  where 1=1 and j.countrycode = 'TH'
> and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
>  and ((j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30 00:00:00' ) or j.jobstartdatetime IS NULL )  group by j.jobstatuscode)
>  select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
>
>                           QUERY PLAN
>
>  Hash Right Join  (cost=98845.93..98846.10 rows=10 width=12) (actual time=1314.809..1314.849 rows=10 loops=1)
>                              ->  Parallel Seq Scan on job j  (cost=0.00..96837.93 rows=200963 width=4) (actual time=13.010..1144.434 rows=163200 loops=3)
>                                    Filter: (((countrycode)::text = 'TH'::text) AND (((jobstartdatetime >= '2020-08-01 00:00:00'::timestamp without time zone) AND (jobst
> artdatetime <= '2020-09-30 00:00:00'::timestamp without time zone)) OR (jobstartdatetime IS NULL)) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1
> ,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
>                                    Rows Removed by Filter: 449035
>                                    Buffers: shared hit=60086 read=11834
>                                    I/O Timings: read=59.194
>

You should rewrite the subquery as a UNION to avoid the OR:

  ... WHERE j.countrycode = 'TH'
        and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
        and j.jobstartdatetime between '2020-08-01 00:00:00' and '2020-09-30 00:00:00'

and

  ... WHERE j.countrycode = 'TH'
        and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1')
        and j.jobstartdatetime IS NULL

These indexes could speed up the resulting query:

  CREATE INDEX ON job (countrycode, facilitycode);
  CREATE INDEX ON job (countrycode, jobstartdatetime);
  CREATE INDEX ON job (countrycode, facilitycode) WHERE jobstartdaytime IS NULL;

I created the indexes you suggested and changed the query with the UNION operator. Please see explain plan below. Performance of the query(execution time has improved mostly because I ran vacuum full). Cost of the query is still high.This is Dev envrionment and has 2 vCPU and 8 GB RAM.

explain (analyze,buffers) with JobCount as ( (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime  between '2020-08-01 00:00:00' and '2020-09-30 00:00:00'    group by j.jobstatuscode) UNION (select jobstatuscode,count(1) stat_count from job j where 1=1 and j.countrycode = 'TH'   and j.facilitycode in ('THNPM1','THPRK1','THCNT1','THSPN1','THKRI1','THPKN1','THSBI1','THUTG1','THLRI1','THSRI1','THSUR1','THSKM1') and j.jobstartdatetime is null  group by j.jobstatuscode))
lmp_delivery_jobs->  select js.jobstatuscode,COALESCE(stat_count,0) stat_count from JobCount jc right outer join jobstatus js on jc.jobstatuscode=js.jobstatuscode;
                                                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------
 Hash Right Join  (cost=79010.89..79011.19 rows=10 width=12) (actual time=444.241..444.256 rows=10 loops=1)
   Hash Cond: ((jc.jobstatuscode)::text = (js.jobstatuscode)::text)
   Buffers: shared hit=8560
   CTE jobcount
     ->  HashAggregate  (cost=79002.35..79002.48 rows=13 width=24) (actual time=444.211..444.213 rows=6 loops=1)
           Group Key: j.jobstatuscode, (count(1))
           Buffers: shared hit=8558
           ->  Append  (cost=78959.64..79002.28 rows=13 width=24) (actual time=444.081..444.202 rows=6 loops=1)
                 Buffers: shared hit=8558
                 ->  Finalize GroupAggregate  (cost=78959.64..78961.41 rows=7 width=12) (actual time=444.079..444.101 rows=6 loops=1)
                       Group Key: j.jobstatuscode
                       Buffers: shared hit=8546
                       ->  Gather Merge  (cost=78959.64..78961.27 rows=14 width=12) (actual time=444.063..444.526 rows=18 loops=1)
                             Workers Planned: 2
                             Workers Launched: 2
                             Buffers: shared hit=17636
                             ->  Sort  (cost=77959.61..77959.63 rows=7 width=12) (actual time=435.748..435.750 rows=6 loops=3)
                                   Sort Key: j.jobstatuscode
                                   Sort Method: quicksort  Memory: 25kB
                                   Worker 0:  Sort Method: quicksort  Memory: 25kB
                                   Worker 1:  Sort Method: quicksort  Memory: 25kB
                                   Buffers: shared hit=17636
                                   ->  Partial HashAggregate  (cost=77959.44..77959.51 rows=7 width=12) (actual time=435.703..435.706 rows=6 loops=3)
                                         Group Key: j.jobstatuscode
                                         Buffers: shared hit=17620
                                         ->  Parallel Bitmap Heap Scan on job j  (cost=11528.22..76957.69 rows=200351 width=4) (actual time=47.682..281.928 rows=163200
loops=3)
                                               Recheck Cond: (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,T
HPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
                                               Filter: ((jobstartdatetime >= '2020-08-01 00:00:00'::timestamp without time zone) AND (jobstartdatetime <= '2020-09-30 00
:00:00'::timestamp without time zone))
                                               Heap Blocks: exact=6633
                                               Buffers: shared hit=17620
                                               ->  Bitmap Index Scan on job_list_test1  (cost=0.00..11408.01 rows=482693 width=0) (actual time=49.825..49.826 rows=48960
0 loops=1)
                                                     Index Cond: (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKR
I1,THPKN1,THSBI1,THUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
                                                     Buffers: shared hit=1913
                 ->  GroupAggregate  (cost=40.50..40.68 rows=6 width=12) (actual time=0.093..0.094 rows=0 loops=1)
                       Group Key: j_1.jobstatuscode
                       Buffers: shared hit=12
                       ->  Sort  (cost=40.50..40.54 rows=16 width=4) (actual time=0.092..0.092 rows=0 loops=1)
                             Sort Key: j_1.jobstatuscode
                             Sort Method: quicksort  Memory: 25kB
                             Buffers: shared hit=12
                             ->  Index Scan using job_list_test3 on job j_1  (cost=0.14..40.18 rows=16 width=4) (actual time=0.081..0.082 rows=0 loops=1)
                                   Index Cond: (((countrycode)::text = 'TH'::text) AND ((facilitycode)::text = ANY ('{THNPM1,THPRK1,THCNT1,THSPN1,THKRI1,THPKN1,THSBI1,T
HUTG1,THLRI1,THSRI1,THSUR1,THSKM1}'::text[])))
                                   Buffers: shared hit=12
   ->  CTE Scan on jobcount jc  (cost=0.00..0.26 rows=13 width=24) (actual time=444.215..444.221 rows=6 loops=1)
         Buffers: shared hit=8558
   ->  Hash  (cost=8.29..8.29 rows=10 width=4) (actual time=0.016..0.016 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=2
         ->  Index Only Scan using jobstatus_jobstatuscode_unq on jobstatus js  (cost=0.14..8.29 rows=10 width=4) (actual time=0.006..0.010 rows=10 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=2
 Planning Time: 0.808 ms
 Execution Time: 444.819 ms
(53 rows)


 

Yours,

Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

pgsql-performance by date:

Previous
From: Mats Julian Olsen
Date:
Subject: Re: Query Performance / Planner estimate off
Next
From: aditya desai
Date:
Subject: Re: CPU Consuming query. Sequential scan despite indexing.