Re: Join between 2 tables always executes a sequential scan on the larger table - Mailing list pgsql-performance

From Igor Neyman
Subject Re: Join between 2 tables always executes a sequential scan on the larger table
Date
Msg-id A76B25F2823E954C9E45E32FA49D70EC08FCA68F@mail.corp.perceptron.com
Whole thread Raw
In response to Join between 2 tables always executes a sequential scan on the larger table  (Dieter Rehbein <dieter.rehbein@skiline.cc>)
Responses Re: Re: Join between 2 tables always executes a sequential scan on the larger table
Re: Join between 2 tables always executes a sequential scan on the larger table
List pgsql-performance
From: Dieter Rehbein [mailto:dieter.rehbein@skiline.cc]
Sent: Tuesday, April 02, 2013 4:52 AM
To: pgsql-performance@postgresql.org
Subject: Join between 2 tables always executes a sequential scan on the larger table

Hi everybody,

in a project I have a performance problem, which I (and my colleagues) don't understand. It's a simple join between 2
of3 tables: 

table-1:   user   (id, user_name, ...).   This table has about 1 million rows (999673 rows)
table-2:   competition (57 rows)
table-3:   user_2_competition.  A relation between user and competition. This table has about 100.000 rows

The query is a join between table user_2_competition and user and looks like this:

select u.id, u.user_name
from user_2_competition uc 
          left join "user" u on u.id = uc.user_id 
where uc.competition_id = '3cc1cb9b3ac132ad013ad01316040001'

The query returns the ID and user_name of all users participating in a competition.

What I don't understand: This query executes a sequential scan on user!


The tables have the following indexes:

user_2_competition:  there is an index on user_id and an index on competition_id (competition_id is a VARCHAR(32)
containingUUIDs) 
user:  id is the primary key and has therefore a unique index (the ID is a VARCHAR(32), which contains UUIDs).

The database has just been restored from a backup, I've executed ANALYZE for both tables.

The output of explain analyze (Postgres 9.2.3):

Hash Left Join  (cost=111357.64..126222.29 rows=41396 width=42) (actual time=1982.543..2737.331 rows=41333 loops=1)
  Hash Cond: ((uc.user_id)::text = (u.id)::text)
  ->  Seq Scan on user_2_competition uc  (cost=0.00..4705.21 rows=41396 width=33) (actual time=0.019..89.691 rows=41333
loops=1)
        Filter: ((competition_id)::text = '3cc1cb9b3ac132ad013ad01316040001'::text)
        Rows Removed by Filter: 80684
  ->  Hash  (cost=90074.73..90074.73 rows=999673 width=42) (actual time=1977.604..1977.604 rows=999673 loops=1)
        Buckets: 2048  Batches: 128  Memory Usage: 589kB
        ->  Seq Scan on "user" u  (cost=0.00..90074.73 rows=999673 width=42) (actual time=0.004..1178.827 rows=999673
loops=1)
Total runtime: 2740.723 ms


I expected to see an index-scan on user_2_competition with a hash join to user, not a sequential scan on user.  I've
triedthis with Postgres 9.1 and 9.2.3). 

Any ideas, what's going on here?

With EXPLAIN ANALYZE I can see, which query plan Postgres is using.  Is there any way to find out, WHY postgres uses
thisquery plan?  

best regards
Dieter

-----------------------------------------------

Dieter,
why do you think index-scan on user_2_competition would be better?

Based on huge number of rows returned (41333 out of total ~120000 in the table) from this table optimizer decided that
SeqScan is better than index scan. 
You don't show QUERY TUNING parameters from Postgresql.conf, are they default?
Playing with optimizer parameters (lowering random_page_cost, lowering cpu_index_tuple_cost , increasing
effective_cache_size,or just setting enable_seqscan = off), you could try to force "optimizer" to use index, and see if
youare getting better results. 

Regards,
Igor Neyman


pgsql-performance by date:

Previous
From: Armand du Plessis
Date:
Subject: Re: Problems with pg_locks explosion
Next
From: Tom Lane
Date:
Subject: Re: Re: Join between 2 tables always executes a sequential scan on the larger table