Re: Planner incorrectly choosing seq scan over index scan - Mailing list pgsql-performance
From | Meetesh Karia |
---|---|
Subject | Re: Planner incorrectly choosing seq scan over index scan |
Date | |
Msg-id | fc5b04ca05080305483430d0ef@mail.gmail.com Whole thread Raw |
In response to | Re: Planner incorrectly choosing seq scan over index scan (Meetesh Karia <meetesh.karia@gmail.com>) |
List | pgsql-performance |
Btw - I tried playing around with some of the other planner cost constants but I wasn't able to get the planner to choose the index scan. It seems like the issue is that the estimated cost for fetching one row from the index (3.02) is a little high in my case. Is there any way that I can adjust that cost estimate? Are there any side effects of doing that? Or is my best solution to simple set enable_hashjoin to off for this query?
Thanks,
Meetesh
Thanks,
Meetesh
On 8/2/05, Meetesh Karia <meetesh.karia@gmail.com> wrote:
Thanks Tom,
That modifies the query plan slightly, but the planner still decides to do a hash join for the lte_user table aliased 't'. Though, if I make this change and set enable_hashjoin to off, the query plan (and execution time) gets even better.
enable_hashjoin = on
----------------------------------
QUERY PLAN
Sort (cost=10113.35..10122.02 rows=3467 width=48) (actual time=1203.000..1203.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=8711.19..9909.50 rows=3467 width=48) (actual time=1156.000..1203.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
Index Cond: (617004 = user_id)
-> Hash Join (cost=8711.19..9776.46 rows=3467 width=40) (actual time=1156.000..1187.000 rows=3467 loops=1)
Hash Cond: ("outer".targetid = "inner".user_id)
-> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..16.000 rows=3467 loops=1)
Filter: (sourceid = 617004)
-> Hash (cost=8012.55..8012.55 rows=279455 width=16) (actual time=1141.000..1141.000 rows=0 loops=1)
-> Seq Scan on lte_user t (cost=0.00..8012.55 rows=279455 width=16) (actual time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms
enable_hashjoin = off
-----------------------------------
QUERY PLAN
Sort (cost=10942.56..10951.22 rows=3467 width=48) (actual time=188.000..188.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=0.00..10738.71 rows=3467 width=48) (actual time=0.000..188.000 rows=3467 loops=1)
-> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
Index Cond: (617004 = user_id)
-> Nested Loop (cost=0.00..10605.67 rows=3467 width=40) (actual time=0.000..157.000 rows=3467 loops=1)
-> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
Filter: (sourceid = 617004)
-> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.02 rows=1 width=16) (actual time=0.028..0.037 rows=1 loops=3467)
Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms
Thanks,
MeeteshOn 8/2/05, Tom Lane < tgl@sss.pgh.pa.us> wrote:Meetesh Karia <meetesh.karia@gmail.com> writes:
> Sure. The lte_user table is just a collection of users. user_id is assigned=
> uniquely using a sequence. During some processing, we create a candidates=
> table (candidates617004 in our case). This table is usually a temp table.=
> sourceid is a user_id (in this case it is always 617004) and targetid is=20
> also a user_id (2860 distinct values out of 3467). The rest of the=20
> information is either only used in the select clause or not used at all=20
> during this processing.
If you know that sourceid has only a single value, it'd probably be
helpful to call out that value in the query, ie,
where ... AND c.sourceId = 617004 ...
regards, tom lane
pgsql-performance by date: