Re: slow query with inline function on AWS RDS with RDS 24x large - Mailing list pgsql-performance

From Ayub Khan
Subject Re: slow query with inline function on AWS RDS with RDS 24x large
Date
Msg-id CAHdeyE+BJUwXiyk7Xb5bdbsbke6BMUdH8bEeX6GTtpzmPhXkqA@mail.gmail.com
Whole thread Raw
In response to Re: slow query with inline function on AWS RDS with RDS 24x large  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-performance
You are right, I dropped BRIN and created btree and the performance on 0 rows matching criteria table is good, below is the plan with BTREE. I will test by inserting lot of data.


Hash Join  (cost=50186.91..3765911.10 rows=5397411 width=291) (actual time=1.501..1.504 rows=0 loops=1)
  Hash Cond: (b.restaurant_id = r.restaurant_id)
  ->  Hash Left Join  (cost=49845.88..2078197.48 rows=5397411 width=216) (actual time=0.079..0.081 rows=0 loops=1)
        Hash Cond: (ro.order_id = oom.order_id)
        ->  Hash Join  (cost=933.18..2007856.35 rows=5397411 width=209) (actual time=0.078..0.080 rows=0 loops=1)
              Hash Cond: (b.city_id = c.city_id)
              ->  Hash Join  (cost=930.92..1956181.11 rows=19276467 width=188) (actual time=0.048..0.050 rows=0 loops=1)
                    Hash Cond: (ro.branch_id = b.branch_id)
                    ->  Nested Loop  (cost=0.56..1904639.80 rows=19276467 width=108) (actual time=0.048..0.048 rows=0 loops=1)
                          ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
                          ->  Index Scan using rest_ord_date_brin on restaurant_order ro  (cost=0.56..1711875.12 rows=19276467 width=108) (actual time=0.042..0.042 rows=0 loops=1)
                                Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:48:45.377833+00'::timestamp with time zone))
                    ->  Hash  (cost=668.49..668.49 rows=20949 width=88) (never executed)
                          ->  Seq Scan on branch b  (cost=0.00..668.49 rows=20949 width=88) (never executed)
              ->  Hash  (cost=1.56..1.56 rows=56 width=29) (actual time=0.026..0.027 rows=56 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 12kB
                    ->  Seq Scan on city c  (cost=0.00..1.56 rows=56 width=29) (actual time=0.009..0.016 rows=56 loops=1)
        ->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
              ->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09 rows=1273009 width=13) (never executed)
  ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual time=1.403..1.403 rows=8609 loops=1)
        Buckets: 16384  Batches: 1  Memory Usage: 465kB
        ->  Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42 rows=8609 width=8) (actual time=0.007..0.634 rows=8609 loops=1)
              Heap Fetches: 0
Planning Time: 1.352 ms
Execution Time: 1.571 ms

On Fri, Jun 4, 2021 at 11:41 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi


pá 4. 6. 2021 v 10:32 odesílatel Ayub Khan <ayub.hp@gmail.com> napsal:
BRIN index is only on the date_time column, I even tried with btree index with no performance gains.

              ->  Bitmap Heap Scan on restaurant_order ro  (cost=5427.94..3353966.60 rows=19275986 width=108) (actual time=1036.793..1036.793 rows=0 loops=1)
                    Recheck Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
                    Rows Removed by Index Recheck: 5039976
                    Heap Blocks: lossy=275230

When the most rows are removed in recheck, then the effectivity of the index is not good
Pavel

 

On Fri, Jun 4, 2021 at 11:23 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 4. 6. 2021 v 10:07 odesílatel Ayub Khan <ayub.hp@gmail.com> napsal:

below query is slow even with no data


explain ANALYZE

WITH business AS( SELECT * FROM get_businessday_utc_f() start_date)
SELECT ro.order_id,
ro.date_time,
round(ro.order_amount, 2) AS order_amount,
b.branch_id,
b.branch_name,
st_x(b.location) AS from_x,
st_y(b.location) AS from_y,
b.user_id AS branch_user_id,
b.contact_info,
r.restaurant_id,
c.city_id,
c.city_name,
c.city_name_ar,
st_linefromtext(((((((('LINESTRING('::text || st_x(b.location)) || ' '::text) || st_y(b.location)) || ','::text) || st_x(ro.location_geometry)) || ' '::text) || st_y(ro.location_geometry)) || ')'::text, 28355) AS from_to,
to_char(ro.date_time, 'HH24:MI'::text) AS order_time,
ro.customer_comment,
'N'::text AS is_new_customer,
ro.picked_up_time,
ro.driver_assigned_date_time,
oom.offer_amount,
oom.offer_type_code AS offer_type,
ro.uk_vat
FROM business, restaurant_order ro

JOIN branch b ON b.branch_id = ro.branch_id
JOIN restaurant r ON r.restaurant_id = b.restaurant_id
JOIN city c ON c.city_id = b.city_id
LEFT JOIN order_offer_map oom using (order_id)
WHERE ro.date_time >= business.start_date AND ro.date_time<= f_now_immutable_with_tz();


Hash Left Join  (cost=55497.32..5417639.59 rows=5397276 width=291) (actual time=1056.926..1056.934 rows=0 loops=1)
  Hash Cond: (ro.order_id = oom.order_id)
  ->  Hash Join  (cost=6584.61..3674143.44 rows=5397276 width=209) (actual time=1056.926..1056.932 rows=0 loops=1)
        Hash Cond: (ro.branch_id = b.branch_id)
        ->  Nested Loop  (cost=5427.94..3546726.47 rows=19275986 width=108) (actual time=1036.809..1036.810 rows=0 loops=1)
              ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 width=8) (actual time=0.006..0.008 rows=1 loops=1)
              ->  Bitmap Heap Scan on restaurant_order ro  (cost=5427.94..3353966.60 rows=19275986 width=108) (actual time=1036.793..1036.793 rows=0 loops=1)
                    Recheck Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
                    Rows Removed by Index Recheck: 5039976
                    Heap Blocks: lossy=275230
                    ->  Bitmap Index Scan on rest_ord_date_brin  (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 rows=2917120 loops=1)
                          Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
        ->  Hash  (cost=1083.35..1083.35 rows=5866 width=109) (actual time=20.106..20.109 rows=20949 loops=1)
              Buckets: 32768 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 3112kB
              ->  Hash Join  (cost=343.29..1083.35 rows=5866 width=109) (actual time=1.620..14.539 rows=20949 loops=1)
                    Hash Cond: (b.restaurant_id = r.restaurant_id)
                    ->  Hash Join  (cost=2.26..726.91 rows=5866 width=109) (actual time=0.029..8.597 rows=20949 loops=1)
                          Hash Cond: (b.city_id = c.city_id)
                          ->  Seq Scan on branch b  (cost=0.00..668.49 rows=20949 width=88) (actual time=0.004..1.609 rows=20949 loops=1)
                          ->  Hash  (cost=1.56..1.56 rows=56 width=29) (actual time=0.020..0.021 rows=56 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 12kB
                                ->  Seq Scan on city c  (cost=0.00..1.56 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
                    ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual time=1.575..1.575 rows=8609 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 465kB
                          ->  Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42 rows=8609 width=8) (actual time=0.006..0.684 rows=8609 loops=1)
                                Heap Fetches: 0
  ->  Hash  (cost=33000.09..33000.09 rows=1273009 width=13) (never executed)
        ->  Seq Scan on order_offer_map oom  (cost=0.00..33000.09 rows=1273009 width=13) (never executed)
Planning Time: 1.180 ms
Execution Time: 1057.535 ms could some one explain why it is slow, if I insert 50k records the execution time reaches 20 seconds
 
              ->  Bitmap Heap Scan on restaurant_order ro  (cost=5427.94..3353966.60 rows=19275986 width=108) (actual time=1036.793..1036.793 rows=0 loops=1)
                    Recheck Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))
                    Rows Removed by Index Recheck: 5039976
                    Heap Blocks: lossy=275230
                    ->  Bitmap Index Scan on rest_ord_date_brin  (cost=0.00..608.94 rows=19359111 width=0) (actual time=14.037..14.038 rows=2917120 loops=1)
                          Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-04 08:05:32.784199+00'::timestamp with time zone))

Looks so the BRIN index is not in good condition. Maybe you need reindex, maybe BRIN index is not good format for your data.

There are lot of data - few millions of rows

Regards

Pavel
 




--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love
of God that will put you over the top!!

pgsql-performance by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: slow query with inline function on AWS RDS with RDS 24x large
Next
From: Taras Savchuk
Date:
Subject: PgSQL 12 on WinSrv ~3x faster than on Linux