Re: query planner not using index, instead using squential scan - Mailing list pgsql-performance
From | Ayub Khan |
---|---|
Subject | Re: query planner not using index, instead using squential scan |
Date | |
Msg-id | CAHdeyEJXzj870Zy_5BpRYWUsp==Aky-ZfXePSe3YkwYVwunBYQ@mail.gmail.com Whole thread Raw |
In response to | Re: query planner not using index, instead using squential scan (Vijaykumar Jain <vijaykumarjain.github@gmail.com>) |
Responses |
Re: query planner not using index, instead using squential scan
|
List | pgsql-performance |
by setting enable_sequence_scan=OFF, the query execution time seems to have slowed down even though the index is being used for left join of order_offer_map
Hash Left Join (cost=72639.74..8176118.25 rows=19276467 width=293) (actual time=858.853..3166.994 rows=230222 loops=1)
Hash Cond: (ro.order_id = oom.order_id)
-> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211) (actual time=20.550..462.303 rows=230222 loops=1)
Hash Cond: (b.city_id = c.city_id)
-> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190) (actual time=20.523..399.979 rows=230222 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
-> Hash Join (cost=1596.61..1947784.40 rows=19276467 width=190) (actual time=19.047..339.984 rows=230222 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
-> Nested Loop (cost=0.56..1895577.38 rows=19276467 width=108) (actual time=0.032..240.278 rows=230222 loops=1)
-> Function Scan on start_date (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Scan using "idx$$_00010001" on restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108) (actual time=0.025..117.525 rows=230222 loops=1)
Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-05 21:09:50.161463+00'::timestamp with time zone))
-> Hash (cost=1334.19..1334.19 rows=20949 width=90) (actual time=18.969..18.969 rows=20949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 2758kB
-> Index Scan using "branch_idx$$_274b0038" on branch b (cost=0.29..1334.19 rows=20949 width=90) (actual time=0.008..14.371 rows=20949 loops=1)
-> Hash (cost=233.42..233.42 rows=8609 width=8) (actual time=1.450..1.451 rows=8609 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 465kB
-> Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual time=0.011..0.660 rows=8609 loops=1)
Heap Fetches: 0
-> Hash (cost=8.98..8.98 rows=56 width=29) (actual time=0.021..0.021 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Index Only Scan using "city_idx$$_274b0022" on city c (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
Heap Fetches: 0
-> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual time=836.132..836.133 rows=1273009 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 81629kB
-> Index Scan Backward using order_offer_map_order_id on order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual time=0.010..578.226 rows=1273009 loops=1)
Planning Time: 1.229 ms
Execution Time: 3183.248 ms
Hash Cond: (ro.order_id = oom.order_id)
-> Hash Join (cost=1947.33..2053190.95 rows=19276467 width=211) (actual time=20.550..462.303 rows=230222 loops=1)
Hash Cond: (b.city_id = c.city_id)
-> Hash Join (cost=1937.65..1998751.06 rows=19276467 width=190) (actual time=20.523..399.979 rows=230222 loops=1)
Hash Cond: (b.restaurant_id = r.restaurant_id)
-> Hash Join (cost=1596.61..1947784.40 rows=19276467 width=190) (actual time=19.047..339.984 rows=230222 loops=1)
Hash Cond: (ro.branch_id = b.branch_id)
-> Nested Loop (cost=0.56..1895577.38 rows=19276467 width=108) (actual time=0.032..240.278 rows=230222 loops=1)
-> Function Scan on start_date (cost=0.00..0.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1)
-> Index Scan using "idx$$_00010001" on restaurant_order ro (cost=0.56..1702812.70 rows=19276467 width=108) (actual time=0.025..117.525 rows=230222 loops=1)
Index Cond: ((date_time >= start_date.start_date) AND (date_time <= '2021-06-05 21:09:50.161463+00'::timestamp with time zone))
-> Hash (cost=1334.19..1334.19 rows=20949 width=90) (actual time=18.969..18.969 rows=20949 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 2758kB
-> Index Scan using "branch_idx$$_274b0038" on branch b (cost=0.29..1334.19 rows=20949 width=90) (actual time=0.008..14.371 rows=20949 loops=1)
-> Hash (cost=233.42..233.42 rows=8609 width=8) (actual time=1.450..1.451 rows=8609 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 465kB
-> Index Only Scan using "restaurant_idx$$_274b003d" on restaurant r (cost=0.29..233.42 rows=8609 width=8) (actual time=0.011..0.660 rows=8609 loops=1)
Heap Fetches: 0
-> Hash (cost=8.98..8.98 rows=56 width=29) (actual time=0.021..0.021 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Index Only Scan using "city_idx$$_274b0022" on city c (cost=0.14..8.98 rows=56 width=29) (actual time=0.004..0.010 rows=56 loops=1)
Heap Fetches: 0
-> Hash (cost=54779.81..54779.81 rows=1273009 width=15) (actual time=836.132..836.133 rows=1273009 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 81629kB
-> Index Scan Backward using order_offer_map_order_id on order_offer_map oom (cost=0.43..54779.81 rows=1273009 width=15) (actual time=0.010..578.226 rows=1273009 loops=1)
Planning Time: 1.229 ms
Execution Time: 3183.248 ms
On Sat, Jun 5, 2021 at 10:52 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
thanks Tom.I was trying to simulate some scenarios to be able to explain how the plan would change with/withoutRows Removed by Filter: 73969 -- by using a different/correct index.postgres=# \d tTable "public.t"Column | Type | Collation | Nullable | Default------------+-----------------------------+-----------+----------+---------id | integer | | not null |created_on | timestamp without time zone | | |col1 | text | | |Indexes:"t_pkey" PRIMARY KEY, btree (id)"t_created_on_idx" btree (created_on) WHERE col1 = 'a'::text --- useless index as all rows have col1 = 'a', but to attempt lossy case"t_created_on_idx1" btree (created_on)Referenced by:TABLE "t1" CONSTRAINT "t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)postgres=# \d t1Table "public.t1"Column | Type | Collation | Nullable | Default--------+---------+-----------+----------+---------t1_id | integer | | not null |id | integer | | |col2 | text | | |Indexes:"t1_pkey" PRIMARY KEY, btree (t1_id)Foreign-key constraints:"t1_id_fkey" FOREIGN KEY (id) REFERENCES t(id)postgres=# update t set col1 = 'a';UPDATE 1000postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123';QUERY PLAN--------------------------------------------------------------------------------------------------------Hash Join (cost=37.01..39.28 rows=1 width=4) (actual time=0.124..0.125 rows=0 loops=1)-> Seq Scan on t1 (cost=0.00..2.00 rows=100 width=4) (actual time=0.004..0.008 rows=100 loops=1)-> Hash (cost=37.00..37.00 rows=1 width=4) (actual time=0.109..0.109 rows=1 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 9kB-> Seq Scan on t (cost=0.00..37.00 rows=1 width=4) (actual time=0.058..0.107 rows=1 loops=1)Filter: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone)Rows Removed by Filter: 999 --- as no useful index, t_created_on_idx will fetch all pages and then remove rows from them, expensivePlanning Time: 0.111 msExecution Time: 0.162 ms(10 rows)postgres=# explain analyze select 1 from t join t1 on (t.id = t1.id) where created_on = '2021-06-01 12:48:45.141123';QUERY PLAN---------------------------------------------------------------------------------------------------------------------------------Hash Join (cost=8.32..33.47 rows=1 width=4) (actual time=0.025..0.026 rows=0 loops=1)-> Seq Scan on t1 (cost=0.00..22.00 rows=1200 width=4) (actual time=0.009..0.009 rows=1 loops=1)-> Hash (cost=8.31..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 8kB-> Index Scan using t_created_on_idx1 on t (cost=0.29..8.31 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=1)Index Cond: (created_on = '2021-06-01 12:48:45.141123'::timestamp without time zone) -- exact match using btree index,Planning Time: 0.255 msExecution Time: 0.071 ms(9 rows)but from Ayub's plan, the number of rows fetched are a lot, but is also removing rows post index scan.if that can be improved with a btree index that does not filter unwanted rows, the run may be faster ?but i guess if there are 156k rows, planner would a have found a win in seq scan.Ayub,just for the sake of understanding,can you run the query usingpostgres=# set enable_seqscan TO 0;SETpostgres=# -- explain analyze <run the query>postgres=# set enable_seqscan TO 1;SETOn Sun, 6 Jun 2021 at 00:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:Ayub Khan <ayub.hp@gmail.com> writes:
> could someone clarify why the LEFT JOIN order_offer_map oom using
> (order_id) in the below query is using sequential scan instead of
> using index on order_id which is defined in order_offer_map table.
Probably because it estimates the hash join to restaurant_order is
faster than a nestloop join would be. I think it's likely right.
You'd need very optimistic assumptions about the cost of an
individual index probe into order_offer_map to conclude that 156K
of them would be faster than the 476ms that are being spent here
to read order_offer_map and join it to the result of the
indexscan on restaurant_order.
If, indeed, that *is* faster on your hardware, you might want
to dial down random_page_cost to get more-relevant estimates.
regards, tom lane--Thanks,VijayMumbai, India
--------------------------------------------------------------------
Sun Certified Enterprise Architect 1.5Sun Certified Java Programmer 1.4
Microsoft Certified Systems Engineer 2000
http://in.linkedin.com/pub/ayub-khan/a/811/b81
mobile:+966-502674604
----------------------------------------------------------------------
It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love
of God that will put you over the top!!
It is proved that Hard Work and kowledge will get you close but attitude will get you there. However, it's the Love
of God that will put you over the top!!
pgsql-performance by date: