Thread: SeqScan vs. IndexScan

SeqScan vs. IndexScan

From

Vitaliy Garnashevich

Date:

19 April 2018, 01:14:48

Hi,

I'm running the same query with "set enable_seqscan = on;" and "set 
enable_seqscan = off;":

->  Nested Loop Left Join  (cost=0.00..89642.86 rows=1 width=30) (actual 
time=1.612..6924.232 rows=3289 loops=1)
       Join Filter: (sys_user.user_id = j_6634.id)
       Rows Removed by Join Filter: 14330174
       ->  Seq Scan on sys_user  (cost=0.00..89449.85 rows=1 width=16) 
(actual time=0.117..39.802 rows=3289 loops=1)
             Filter: ...
       ->  Seq Scan on cmn_user j_6634  (cost=0.00..138.56 rows=4356 
width=22) (actual time=0.001..0.973 rows=4358 loops=3289)

(Full plan: https://explain.depesz.com/s/plAO)

->  Nested Loop Left Join  (cost=0.56..89643.52 rows=1 width=30) (actual 
time=0.589..39.674 rows=3288 loops=1)
       ->  Index Scan using sys_user_pkey on sys_user 
(cost=0.28..89635.21 rows=1 width=16) (actual time=0.542..29.435 
rows=3288 loops=1)
             Filter: ...
       ->  Index Scan using cmn_user_pkey on cmn_user j_6634 
(cost=0.28..8.30 rows=1 width=22) (actual time=0.002..0.002 rows=1 
loops=3288)
             Index Cond: (sys_user.user_id = id)

(Full plan: https://explain.depesz.com/s/4QXy)

Why optimizer is choosing SeqScan (on cmn_user) in the first query, 
instead of an IndexScan, despite of SeqScan being more costly?

Regards,
Vitaliy

Re: SeqScan vs. IndexScan

From

Tom Lane

Date:

19 April 2018, 01:33:18

Vitaliy Garnashevich <vgarnashevich@gmail.com> writes:
> I'm running the same query with "set enable_seqscan = on;" and "set 
> enable_seqscan = off;":
> ...
> Why optimizer is choosing SeqScan (on cmn_user) in the first query, 
> instead of an IndexScan, despite of SeqScan being more costly?

Because it cares about the total plan cost, not the cost of any one
sub-node.  In this case, the total costs at the join level are fuzzily
the same, but the indexscan-based join has worse estimated startup cost,
so it prefers the first choice.

The real problem here is the discrepancy between estimate and reality
for the number of rows out of the sys_user scan; because of that, you're
going to get garbage choices at the join level no matter what :-(.
You should look into what's causing that misestimate and whether you
can reduce the error, perhaps by providing better stats or reformulating
the filter conditions in a way the optimizer understands better.

            regards, tom lane