Re: query planner not using index, instead using squential scan - Mailing list pgsql-performance

From Vijaykumar Jain
Subject Re: query planner not using index, instead using squential scan
Date
Msg-id CAM+6J94tca-6vfONLrvzroUuGeDS7LaHXYfw4zaTk9PJ96_XKA@mail.gmail.com
Whole thread Raw
In response to Re: query planner not using index, instead using squential scan  (Ayub Khan <ayub.hp@gmail.com>)
List pgsql-performance
Yes, slowdown was expected :)

I was just interested in cost estimates.
Also did you try to set random_page_cost to 1 if your storage is not hdd.



On Sun, 6 Jun 2021 at 2:44 AM Ayub Khan <ayub.hp@gmail.com> wrote:

by setting enable_sequence_scan=OFF,  the query execution time seems to have slowed down even though the index is being used for  left join of order_offer_map

Hash Left Join  (cost=72639.74..8176118.25 rows=19276467 width=293) (actual time=858.853..3166.994 rows=230222 loops=1)
  Hash Cond: (ro.order_id = oom.order_id)
  ->  Hash Join  (cost=1947.33..2053190.95 rows=19276467 width=211) (actual time=20.550..462.303 rows=230222 loops=1)
        Hash Cond: (b.city_id = c.city_id)
        ->  Hash Join  (cost=1937.65..1998751.06 rows=19276467 width=190) (actual time=20.523..399.979 rows=230222 loops=1)
              Hash Cond: (b.restaurant_id = r.restaurant_id)
              ->  Hash Join  (cost=1596.61..1947784.40 rows=19276467 width=190) (actual time=19.047..339.984 rows=230222 loops=1)
                    Hash Cond: (ro.branch_id = b.branch_id)
                    ->  Nested Loop  (cost=0.56..1895577.38 rows=19276467 width=108) (actual time=0.032..240.278 rows=230222 loops=1)
                          ->  Function Scan on start_date  (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
                          ->  Index Scan using "idx$$_00010001" on restaurant_order ro  (cost=0.56..1702812.70 rows=19276467 width=108) (actual time=0.025..117.525 rows=230222 loops=1)
                                Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-05 21:09:50.161463+00'::timestamp with time zone))
                    ->  Hash  (cost=1334.19..1334.19 rows=20949 width=90) (actual time=18.969..18.969 rows=20949 loops=1)
                          Buckets: 32768  Batches: 1  Memory Usage: 2758kB
                          ->  Index Scan using "branch_idx$$_274b0038" on branch b  (cost=0.29..1334.19 rows=20949 width=90) (actual time=0.008..14.371 rows=20949 loops=1)
              ->  Hash  (cost=233.42..233.42 rows=8609 width=8) (actual time=1.450..1.451 rows=8609 loops=1)
                    Buckets: 16384  Batches: 1  Memory Usage: 465kB
                    ->  Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r  (cost=0.29..233.42 rows=8609 width=8) (actual time=0.011..0.660 rows=8609 loops=1)
                          Heap Fetches: 0
        ->  Hash  (cost=8.98..8.98 rows=56 width=29) (actual time=0.021..0.021 rows=56 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 12kB
              ->  Index Only Scan using "city_idx$$_274b0022" on city c  (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
                    Heap Fetches: 0
  ->  Hash  (cost=54779.81..54779.81 rows=1273009 width=15) (actual time=836.132..836.133 rows=1273009 loops=1)
        Buckets: 2097152  Batches: 1  Memory Usage: 81629kB
        ->  Index Scan Backward using order_offer_map_order_id on order_offer_map oom  (cost=0.43..54779.81 rows=1273009 width=15) (actual time=0.010..578.226 rows=1273009 loops=1)
Planning Time: 1.229 ms
Execution Time: 3183.248 ms

On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
thanks Tom.

I was trying to simulate some scenarios to be able to explain how the plan would change with/without 
Rows Removed by Filter: 73969  --   by using a different/correct index.

postgres=# \d t
                             Table "public.t"
   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 id         | integer                     |           | not null |
 created_on | timestamp without time zone |           |          |
 col1       | text                        |           |          |
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)
    "t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text   --- useless index as all rows have col1 = 'a', but to attempt lossy case
    "t_created_on_idx1" btree (created_on)
Referenced by:
    TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)

postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 t1_id  | integer |           | not null |
 id     | integer |           |          |
 col2   | text    |           |          |
Indexes:
    "t1_pkey" PRIMARY KEY, btree (t1_id)
Foreign-key constraints:
    "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)



postgres=# update t set col1 = 'a';
UPDATE 1000

postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123';
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash Join  (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125 rows=0 loops=1)
   Hash Cond: (t1.id = t.id)
   ->  Seq Scan on t1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.004..0.008 rows=100 loops=1)
   ->  Hash  (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on t  (cost=0.00..37.00 rows=1 width=4) (actual time=0.058..0.107 rows=1 loops=1)
               Filter: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone)
               Rows Removed by Filter: 999  --- as no useful index, t_created_on_idx will fetch all pages and then remove rows from them, expensive
 Planning Time: 0.111 ms
 Execution Time: 0.162 ms
(10 rows)


postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026 rows=0 loops=1)
   Hash Cond: (t1.id = t.id)
   ->  Seq Scan on t1  (cost=0.00..22.00 rows=1200 width=4) (actual time=0.009..0.009 rows=1 loops=1)
   ->  Hash  (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Index Scan using t_created_on_idx1 on t  (cost=0.29..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone) -- exact match using btree index, 
 Planning Time: 0.255 ms
 Execution Time: 0.071 ms
(9 rows)


but from Ayub's plan, the number of rows fetched are a lot, but is also removing rows post index scan.
if that can be improved with a btree index that does not filter unwanted rows, the run may be faster ?
but i guess if there are 156k rows, planner would a have found a win in seq scan.

Ayub,
just for the sake of understanding,

can you run the query using

postgres=# set enable_seqscan TO 0;
SET
postgres=# -- explain analyze <run the query>

postgres=# set enable_seqscan TO 1;
SET


On Sun, 6 Jun 2021 at 00:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ayub Khan <ayub.hp@gmail.com> writes:
> could someone clarify why the LEFT JOIN order_offer_map oom using
> (order_id) in the below query is using sequential scan instead of
> using index on order_id which is defined in order_offer_map table.

Probably because it estimates the hash join to restaurant_order is
faster than a nestloop join would be.  I think it's likely right.
You'd need very optimistic assumptions about the cost of an
individual index probe into order_offer_map to conclude that 156K
of them would be faster than the 476ms that are being spent here
to read order_offer_map and join it to the result of the
indexscan on restaurant_order.

If, indeed, that *is* faster on your hardware, you might want
to dial down random_page_cost to get more-relevant estimates.

                        regards, tom lane




--
Thanks,
Vijay
Mumbai, India


--
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5
Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love
of God that will put you over the top!!
--
Thanks,
Vijay
Mumbai, India

pgsql-performance by date:

Previous
From: Ayub Khan
Date:
Subject: Re: query planner not using index, instead using squential scan
Next
From: "Dean Gibson (DB Administrator)"
Date:
Subject: Re: AWS forcing PG upgrade from v9.6 a disaster