Home > mailing lists

Re: Planner incorrectly choosing seq scan over index scan - Mailing list pgsql-performance

From	Meetesh Karia
Subject	Re: Planner incorrectly choosing seq scan over index scan
Date	August 3, 2005 09:48:53
Msg-id	fc5b04ca05080305483430d0ef@mail.gmail.com Whole thread Raw
In response to	Re: Planner incorrectly choosing seq scan over index scan (Meetesh Karia <meetesh.karia@gmail.com>)
List	pgsql-performance

Tree view

Btw - I tried playing around with some of the other planner cost constants but I wasn't able to get the planner to choose the index scan. It seems like the issue is that the estimated cost for fetching one row from the index (3.02) is a little high in my case. Is there any way that I can adjust that cost estimate? Are there any side effects of doing that? Or is my best solution to simple set enable_hashjoin to off for this query?

Thanks,
Meetesh

On 8/2/05, Meetesh Karia <meetesh.karia@gmail.com> wrote:

Thanks Tom,

That modifies the query plan slightly, but the planner still decides to do a hash join for the lte_user table aliased 't'. Though, if I make this change and set enable_hashjoin to off, the query plan (and execution time) gets even better.

enable_hashjoin = on
----------------------------------
QUERY PLAN
Sort (cost=10113.35..10122.02 rows=3467 width=48) (actual time=1203.000..1203.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=8711.19..9909.50 rows=3467 width=48) (actual time=1156.000..1203.000 rows=3467 loops=1)
        -> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
              Index Cond: (617004 = user_id)
        -> Hash Join (cost=8711.19..9776.46 rows=3467 width=40) (actual time=1156.000..1187.000 rows=3467 loops=1)
              Hash Cond: ("outer".targetid = "inner".user_id)
              -> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..16.000 rows=3467 loops=1)
                    Filter: (sourceid = 617004)
              -> Hash (cost=8012.55..8012.55 rows=279455 width=16) (actual time=1141.000..1141.000 rows=0 loops=1)
                    -> Seq Scan on lte_user t (cost=0.00..8012.55 rows=279455 width=16) (actual time=0.000..720.000 rows=279395 loops=1)
Total runtime: 1218.000 ms

enable_hashjoin = off
-----------------------------------
QUERY PLAN
Sort (cost=10942.56..10951.22 rows=3467 width=48) (actual time=188.000..188.000 rows=3467 loops=1)
Sort Key: c.sourceid, c.targetid
-> Nested Loop (cost=0.00..10738.71 rows=3467 width=48) (actual time=0.000..188.000 rows=3467 loops=1)
        -> Index Scan using lte_user_pkey on lte_user s (cost=0.00..3.02 rows=1 width=16) (actual time=0.000..0.000 rows=1 loops=1)
              Index Cond: (617004 = user_id)
        -> Nested Loop (cost=0.00..10605.67 rows=3467 width=40) (actual time=0.000..157.000 rows=3467 loops=1)
              -> Seq Scan on candidates617004 c (cost=0.00..76.34 rows=3467 width=32) (actual time=0.000..15.000 rows=3467 loops=1)
                    Filter: (sourceid = 617004)
              -> Index Scan using lte_user_pkey on lte_user t (cost=0.00..3.02 rows=1 width=16) (actual time=0.028..0.037 rows=1 loops=3467)
                    Index Cond: ("outer".targetid = t.user_id)
Total runtime: 188.000 ms

Thanks,
Meetesh

On 8/2/05, Tom Lane < tgl@sss.pgh.pa.us> wrote:
Meetesh Karia <meetesh.karia@gmail.com> writes:
> Sure. The lte_user table is just a collection of users. user_id is assigned=
> uniquely using a sequence. During some processing, we create a candidates=
> table (candidates617004 in our case). This table is usually a temp table.=
> sourceid is a user_id (in this case it is always 617004) and targetid is=20
> also a user_id (2860 distinct values out of 3467). The rest of the=20
> information is either only used in the select clause or not used at all=20
> during this processing.

If you know that sourceid has only a single value, it'd probably be
helpful to call out that value in the query, ie,
        where ... AND c.sourceId = 617004 ...

                        regards, tom lane

pgsql-performance by date:

From: Richard Huxton
Date: 03 August 2005, 06:02:44
Subject: Re: "nice"/low priority Query

From: "Lane Van Ingen"
Date: 03 August 2005, 10:12:16
Subject: Is There A Windows Version of Performance Tuning Documents?

Re: Planner incorrectly choosing seq scan over index scan - Mailing list pgsql-performance

Previous

Next