Re: Planner incorrectly choosing seq scan over index scan - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: Planner incorrectly choosing seq scan over index scan
Date
Msg-id 42EEAD4B.4010905@arbash-meinel.com
Whole thread Raw
In response to Planner incorrectly choosing seq scan over index scan  (Meetesh Karia <meetesh.karia@gmail.com>)
Responses Re: Planner incorrectly choosing seq scan over index scan  (Meetesh Karia <meetesh.karia@gmail.com>)
List pgsql-performance
Meetesh Karia wrote:
> Hi all,
>
> We're using 8.0.3 and we're seeing a problem where the planner is
> choosing a seq scan and hash join over an index scan.  If I set
> enable_hashjoin to off, then I get the plan I'm expecting and the query
> runs a lot faster.  I've also tried lowering the random page cost (even
> to 1) but the planner still chooses to use the hash join.
>
> Does anyone have any thoughts/suggestions?  I saw that there was a
> thread recently in which the planner wasn't correctly estimating the
> cost for queries using LIMIT.  Is it possible that something similar is
> happening here (perhaps because of the sort) and that the patch Tom
> proposed would fix it?
>
> Thanks.  Here are the various queries and plans:
>
> Normal settings

...

> QUERY PLAN
> Sort  (cost=13430.57..13439.24 rows=3467 width=48) (actual
> time=1390.000..1390.000 rows=3467 loops=1)
>   Sort Key: c.sourceid, c.targetid
>   ->  Merge Join  (cost=9912.07..13226.72 rows=3467 width=48) (actual
> time=1344.000..1375.000 rows=3467 loops=1)
>         Merge Cond: ("outer".user_id = "inner".sourceid)
>         ->  Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..95.000
> rows=50034 loops=1)

This is where the planner is messing up, and mis-estimating the
selectivity. It is expecting to get 280k rows, but only needs to get 50k.
I assume lte_user is the bigger table, and that candidates617004 has
some subset.

Has lte_user and candidates617004 been recently ANALYZEd? All estimates,
except for the expected number of rows from lte_user seem to be okay.

Is user_id the primary key for lte_user?
I'm trying to figure out how you can get 50k rows, by searching a
primary key, against a 3.5k rows. Is user_id only part of the primary
key for lte_user?

Can you give us the output of:
\d lte_user
\d candidates617004

So that we have the description of the tables, and what indexes you have
defined?

Also, if you could describe the table layouts, that would help.

John
=:->


>         ->  Sort  (cost=9912.07..9920.73 rows=3467 width=40) (actual
> time=1156.000..1156.000 rows=3467 loops=1)
>               Sort Key: c.sourceid
>               ->  Hash Join  (cost=8710.44..9708.21 rows=3467 width=40)
> (actual time=1125.000..1156.000 rows=3467 loops=1)
>                     Hash Cond: ("outer".targetid = "inner".user_id)
>                     ->  Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> rows=3467 loops=1)
>                     ->  Hash  (cost=8011.95..8011.95 rows=279395
> width=16) (actual time=1125.000..1125.000 rows=0 loops=1)
>                           ->  Seq Scan on lte_user t
> (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..670.000
> rows=279395 loops=1)
> Total runtime: 1406.000 ms
>
> enable_hashjoin disabled
> ----------------------------------------
> QUERY PLAN
> Sort  (cost=14355.37..14364.03 rows=3467 width=48) (actual
> time=391.000..391.000 rows=3467 loops=1)
>   Sort Key: c.sourceid, c.targetid
>   ->  Nested Loop  (cost=271.52..14151.51 rows=3467 width=48) (actual
> time=203.000..359.000 rows=3467 loops=1)
>         ->  Merge Join  (cost=271.52..3490.83 rows=3467 width=40)
> (actual time=203.000..218.000 rows=3467 loops=1)
>               Merge Cond: ("outer".user_id = "inner".sourceid)
>               ->  Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..126.000
> rows=50034 loops=1)
>               ->  Sort  (cost=271.52..280.19 rows=3467 width=32) (actual
> time=15.000..30.000 rows=3467 loops=1)
>                     Sort Key: c.sourceid
>                     ->  Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> rows=3467 loops=1)
>         ->  Index Scan using lte_user_pkey on lte_user t
> (cost=0.00..3.03 rows=1 width=16) (actual time=0.031..0.036 rows=1
> loops=3467)
>               Index Cond: ("outer".targetid = t.user_id)
> Total runtime: 406.000 ms
>
> random_page_cost set to 1.5
> ----------------------------------------------
> QUERY PLAN
> Sort  (cost=12702.62..12711.29 rows=3467 width=48) (actual
> time=1407.000..1407.000 rows=3467 loops=1)
>   Sort Key: c.sourceid, c.targetid
>   ->  Merge Join  (cost=9912.07..12498.77 rows=3467 width=48) (actual
> time=1391.000..1407.000 rows=3467 loops=1)
>         Merge Cond: ("outer".user_id = "inner".sourceid)
>         ->  Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..46.000
> rows=50034 loops=1)
>         ->  Sort  (cost=9912.07..9920.73 rows=3467 width=40) (actual
> time=1188.000..1188.000 rows=3467 loops=1)
>               Sort Key: c.sourceid
>               ->  Hash Join  (cost=8710.44..9708.21 rows=3467 width=40)
> (actual time=1157.000..1188.000 rows=3467 loops=1)
>                     Hash Cond: ("outer".targetid = "inner".user_id)
>                     ->  Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> rows=3467 loops=1)
>                     ->  Hash  (cost=8011.95..8011.95 rows=279395
> width=16) (actual time=1157.000..1157.000 rows=0 loops=1)
>                           ->  Seq Scan on lte_user t
> (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..750.000
> rows=279395 loops=1)
> Total runtime: 1422.000 ms
>
> random_page_cost set to 1.5 and enable_hashjoin set to false
> --------------------------------------------------------------------------------------------------
> QUERY PLAN
> Sort  (cost=13565.58..13574.25 rows=3467 width=48) (actual
> time=390.000..390.000 rows=3467 loops=1)
>   Sort Key: c.sourceid, c.targetid
>   ->  Nested Loop  (cost=271.52..13361.73 rows=3467 width=48) (actual
> time=203.000..360.000 rows=3467 loops=1)
>         ->  Merge Join  (cost=271.52..2762.88 rows=3467 width=40)
> (actual time=203.000..250.000 rows=3467 loops=1)
>               Merge Cond: ("outer".user_id = "inner".sourceid)
>               ->  Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..48.000
> rows=50034 loops=1)
>               ->  Sort  (cost=271.52..280.19 rows=3467 width=32) (actual
> time=15.000..31.000 rows=3467 loops=1)
>                     Sort Key: c.sourceid
>                     ->  Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> rows=3467 loops=1)
>         ->  Index Scan using lte_user_pkey on lte_user t
> (cost=0.00..3.02 rows=1 width=16) (actual time=0.023..0.023 rows=1
> loops=3467)
>               Index Cond: ("outer".targetid = t.user_id)
> Total runtime: 406.000 ms
>
> Thanks,
> Meetesh


Attachment

pgsql-performance by date:

Previous
From: Tobias Brox
Date:
Subject: Re: Planner incorrectly choosing seq scan over index scan
Next
From: Meetesh Karia
Date:
Subject: Re: Planner incorrectly choosing seq scan over index scan