Re: Planner incorrectly choosing seq scan over index scan - Mailing list pgsql-performance
From | Meetesh Karia |
---|---|
Subject | Re: Planner incorrectly choosing seq scan over index scan |
Date | |
Msg-id | fc5b04ca05080116564d1b3323@mail.gmail.com Whole thread Raw |
In response to | Re: Planner incorrectly choosing seq scan over index scan (John Arbash Meinel <john@arbash-meinel.com>) |
Responses |
Re: Planner incorrectly choosing seq scan over index scan
|
List | pgsql-performance |
Thanks John. I've answered your questions below:
I ANALYZEd both tables just before putting together my first email. And, unfortunately, modifying the statistics target didn't help either.
Yes
Hmmm ... I missed that before. But, that surprises me too. Especially since sourceId in the candidates table has only 1 value. Also, user_id is the complete primary key for lte_user.
Sure, here they are:
lte=# \d lte_user
Table "public.lte_user"
Column | Type | Modifiers
---------------+-----------------------------+-----------
user_id | bigint | not null
firstname | character varying(255) |
lastname | character varying(255) |
address1 | character varying(255) |
address2 | character varying(255) |
city | character varying(255) |
state | character varying(255) |
zip | character varying(255) |
phone1 | character varying(255) |
phone2 | character varying(255) |
username | character varying(255) |
password | character varying(255) |
deleted | boolean | not null
ext_cust_id | character varying(255) |
aboutme | character varying(255) |
birthday | timestamp without time zone |
fm_id | bigint |
ar | double precision |
Indexes:
"lte_user_pkey" PRIMARY KEY, btree (user_id)
"idx_user_extid" btree (ext_cust_id)
"idx_user_username" btree (username)
Foreign-key constraints:
"fk_user_fm" FOREIGN KEY (fm_id) REFERENCES fm(fm_id)
lte=# \d candidates617004
Table "public.candidates617004"
Column | Type | Modifiers
--------------+------------------+-----------
fmid | bigint |
sourceid | bigint |
sr | double precision |
targetid | bigint |
tr | double precision |
Sure. The lte_user table is just a collection of users. user_id is assigned uniquely using a sequence. During some processing, we create a candidates table (candidates617004 in our case). This table is usually a temp table. sourceid is a user_id (in this case it is always 617004) and targetid is also a user_id (2860 distinct values out of 3467). The rest of the information is either only used in the select clause or not used at all during this processing.
Did I miss something in the table layout description that would be helpful?
Thanks for your help!
Meetesh
Has lte_user and candidates617004 been recently ANALYZEd? All estimates,
except for the expected number of rows from lte_user seem to be okay.
I ANALYZEd both tables just before putting together my first email. And, unfortunately, modifying the statistics target didn't help either.
Is user_id the primary key for lte_user?
Yes
I'm trying to figure out how you can get 50k rows, by searching a
primary key, against a 3.5k rows. Is user_id only part of the primary
key for lte_user?
Hmmm ... I missed that before. But, that surprises me too. Especially since sourceId in the candidates table has only 1 value. Also, user_id is the complete primary key for lte_user.
Can you give us the output of:
\d lte_user
\d candidates617004
Sure, here they are:
lte=# \d lte_user
Table "public.lte_user"
Column | Type | Modifiers
---------------+-----------------------------+-----------
user_id | bigint | not null
firstname | character varying(255) |
lastname | character varying(255) |
address1 | character varying(255) |
address2 | character varying(255) |
city | character varying(255) |
state | character varying(255) |
zip | character varying(255) |
phone1 | character varying(255) |
phone2 | character varying(255) |
username | character varying(255) |
password | character varying(255) |
deleted | boolean | not null
ext_cust_id | character varying(255) |
aboutme | character varying(255) |
birthday | timestamp without time zone |
fm_id | bigint |
ar | double precision |
Indexes:
"lte_user_pkey" PRIMARY KEY, btree (user_id)
"idx_user_extid" btree (ext_cust_id)
"idx_user_username" btree (username)
Foreign-key constraints:
"fk_user_fm" FOREIGN KEY (fm_id) REFERENCES fm(fm_id)
lte=# \d candidates617004
Table "public.candidates617004"
Column | Type | Modifiers
--------------+------------------+-----------
fmid | bigint |
sourceid | bigint |
sr | double precision |
targetid | bigint |
tr | double precision |
Also, if you could describe the table layouts, that would help.
Sure. The lte_user table is just a collection of users. user_id is assigned uniquely using a sequence. During some processing, we create a candidates table (candidates617004 in our case). This table is usually a temp table. sourceid is a user_id (in this case it is always 617004) and targetid is also a user_id (2860 distinct values out of 3467). The rest of the information is either only used in the select clause or not used at all during this processing.
Did I miss something in the table layout description that would be helpful?
Thanks for your help!
Meetesh
> -> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual
> time=1156.000..1156.000 rows=3467 loops=1)
> Sort Key: c.sourceid
> -> Hash Join (cost=8710.44..9708.21 rows=3467 width=40)
> (actual time=1125.000..1156.000 rows=3467 loops=1)
> Hash Cond: ("outer".targetid = "inner".user_id)
> -> Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> rows=3467 loops=1)
> -> Hash (cost=8011.95..8011.95 rows=279395
> width=16) (actual time=1125.000..1125.000 rows=0 loops=1)
> -> Seq Scan on lte_user t
> (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..670.000
> rows=279395 loops=1)
> Total runtime: 1406.000 ms
>
> enable_hashjoin disabled
> ----------------------------------------
> QUERY PLAN
> Sort (cost=14355.37..14364.03 rows=3467 width=48) (actual
> time=391.000..391.000 rows=3467 loops=1)
> Sort Key: c.sourceid, c.targetid
> -> Nested Loop (cost=271.52..14151.51 rows=3467 width=48) (actual
> time=203.000..359.000 rows=3467 loops=1)
> -> Merge Join (cost=271.52..3490.83 rows=3467 width=40)
> (actual time=203.000..218.000 rows=3467 loops=1)
> Merge Cond: ("outer".user_id = "inner".sourceid)
> -> Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..16837.71 rows=279395 width=16) (actual time=0.000..126.000
> rows=50034 loops=1)
> -> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
> time=15.000..30.000 rows=3467 loops=1)
> Sort Key: c.sourceid
> -> Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..0.000
> rows=3467 loops=1)
> -> Index Scan using lte_user_pkey on lte_user t
> (cost=0.00..3.03 rows=1 width=16) (actual time=0.031..0.036 rows=1
> loops=3467)
> Index Cond: ("outer".targetid = t.user_id)
> Total runtime: 406.000 ms
>
> random_page_cost set to 1.5
> ----------------------------------------------
> QUERY PLAN
> Sort (cost= 12702.62..12711.29 rows=3467 width=48) (actual
> time=1407.000..1407.000 rows=3467 loops=1)
> Sort Key: c.sourceid, c.targetid
> -> Merge Join (cost=9912.07..12498.77 rows=3467 width=48) (actual
> time=1391.000..1407.000 rows=3467 loops=1)
> Merge Cond: ("outer".user_id = "inner".sourceid)
> -> Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..46.000
> rows=50034 loops=1)
> -> Sort (cost=9912.07..9920.73 rows=3467 width=40) (actual
> time=1188.000..1188.000 rows=3467 loops=1)
> Sort Key: c.sourceid
> -> Hash Join (cost=8710.44..9708.21 rows=3467 width=40)
> (actual time=1157.000..1188.000 rows=3467 loops=1)
> Hash Cond: ("outer".targetid = "inner".user_id)
> -> Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> rows=3467 loops=1)
> -> Hash (cost=8011.95..8011.95 rows=279395
> width=16) (actual time=1157.000..1157.000 rows=0 loops=1)
> -> Seq Scan on lte_user t
> (cost=0.00..8011.95 rows=279395 width=16) (actual time=0.000..750.000
> rows=279395 loops=1)
> Total runtime: 1422.000 ms
>
> random_page_cost set to 1.5 and enable_hashjoin set to false
> --------------------------------------------------------------------------------------------------
> QUERY PLAN
> Sort (cost=13565.58..13574.25 rows=3467 width=48) (actual
> time=390.000..390.000 rows=3467 loops=1)
> Sort Key: c.sourceid, c.targetid
> -> Nested Loop (cost=271.52..13361.73 rows=3467 width=48) (actual
> time=203.000..360.000 rows=3467 loops=1)
> -> Merge Join (cost=271.52..2762.88 rows=3467 width=40)
> (actual time=203.000..250.000 rows=3467 loops=1)
> Merge Cond: ("outer".user_id = "inner".sourceid)
> -> Index Scan using lte_user_pkey on lte_user s
> (cost=0.00..12807.34 rows=279395 width=16) (actual time=0.000..48.000
> rows=50034 loops=1)
> -> Sort (cost=271.52..280.19 rows=3467 width=32) (actual
> time=15.000..31.000 rows=3467 loops=1)
> Sort Key: c.sourceid
> -> Seq Scan on candidates617004 c
> (cost=0.00..67.67 rows=3467 width=32) (actual time=0.000..15.000
> rows=3467 loops=1)
> -> Index Scan using lte_user_pkey on lte_user t
> (cost= 0.00..3.02 rows=1 width=16) (actual time=0.023..0.023 rows=1
> loops=3467)
> Index Cond: ("outer".targetid = t.user_id)
> Total runtime: 406.000 ms
>
> Thanks,
> Meetesh
pgsql-performance by date: