Subquery flattening causing sequential scan - Mailing list pgsql-performance
| From | Jim Crate |
|---|---|
| Subject | Subquery flattening causing sequential scan |
| Date | |
| Msg-id | 0E417F76-FE21-4FB5-8132-0C485778A041@gmail.com Whole thread Raw |
| Responses |
Re: Subquery flattening causing sequential scan
Re: Subquery flattening causing sequential scan |
| List | pgsql-performance |
PostgreSQL 9.0.2
Mac OS X Server 10.6.8
Autovacuum is on, and I have a script that runs vacuum analyze verbose every night along with the backup.
I have a situation where I'm experiencing a seq scan on a table with almost 3M rows when my condition is based on a
subquery. A google search turned up a way to prevent flattening the subquery into a join using OFFSET 0. This does
work,reducing the query from around 1s to around 250ms, most of which is the subquery.
My question is why does it do a seq scan when it flattens this subquery into a JOIN? Is it because the emsg_messages
tableis around 1M rows? Are there some guidelines to when the planner will prefer not to use an available index? I
justhad a look through postgresql.conf and noticed that I forgot to set effective_cache_size to something reasonable
fora machine with 16GB of memory. Would the default setting of 128MB cause this behavior? I can't bounce the
productionserver midday to test that change.
EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
)
QUERY PLAN
Hash Semi Join (cost=84522.74..147516.35 rows=49545 width=12) (actual time=677.058..1083.685 rows=2 loops=1)
Hash Cond: (ema.message_id = msg.id)
-> Seq Scan on emsg_message_addresses ema (cost=0.00..53654.78 rows=2873478 width=12) (actual time=0.020..424.241
rows=2875437loops=1)
-> Hash (cost=84475.45..84475.45 rows=3783 width=4) (actual time=273.392..273.392 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on emsg_messages msg (cost=7979.35..84475.45 rows=3783 width=4) (actual
time=273.224..273.387rows=1 loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1)) AND
(message_type= 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.98 rows=34611 width=0)
(actualtime=9.633..9.633 rows=34997 loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4) (actual
time=5.684..38.016rows=34594 loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10 rows=35169
width=0)(actual time=5.175..5.175 rows=34594 loops=1)
Index Cond: (account_id = 314)
Total runtime: 1083.890 ms
EXPLAIN ANALYZE
SELECT ema.message_id, ema.email_address_id, ema.address_type
FROM emsg_message_addresses ema
WHERE ema.message_id IN (
SELECT id
FROM emsg_messages msg
WHERE msg.account_id = 314 AND msg.outgoing = FALSE
AND msg.message_type = 1 AND msg.spam_level < 2
AND msg.deleted_at IS NULL
AND msg.id NOT IN (
SELECT emf.message_id
FROM emsg_message_folders emf
where emf.account_id = 314
)
OFFSET 0
)
QUERY PLAN
Nested Loop (cost=84524.89..87496.74 rows=2619 width=12) (actual time=273.409..273.412 rows=2 loops=1)
-> HashAggregate (cost=84524.89..84526.89 rows=200 width=4) (actual time=273.345..273.346 rows=1 loops=1)
-> Limit (cost=7979.36..84477.60 rows=3783 width=4) (actual time=273.171..273.335 rows=1 loops=1)
-> Bitmap Heap Scan on emsg_messages msg (cost=7979.36..84477.60 rows=3783 width=4) (actual
time=273.169..273.333rows=1 loops=1)
Recheck Cond: (account_id = 314)
Filter: ((NOT outgoing) AND (deleted_at IS NULL) AND (spam_level < 2) AND (NOT (hashed SubPlan 1))
AND(message_type = 1))
-> Bitmap Index Scan on index_emsg_messages_on_account_id (cost=0.00..867.99 rows=34612 width=0)
(actualtime=9.693..9.693 rows=34998 loops=1)
Index Cond: (account_id = 314)
SubPlan 1
-> Bitmap Heap Scan on emsg_message_folders emf (cost=704.90..7022.51 rows=35169 width=4)
(actualtime=5.795..39.420 rows=34594 loops=1)
Recheck Cond: (account_id = 314)
-> Bitmap Index Scan on index_emsg_message_folders_on_account_id (cost=0.00..696.10
rows=35169width=0) (actual time=5.266..5.266 rows=34594 loops=1)
Index Cond: (account_id = 314)
-> Index Scan using index_emsg_message_addresses_on_message_id on emsg_message_addresses ema (cost=0.00..14.69
rows=13width=12) (actual time=0.056..0.058 rows=2 loops=1)
Index Cond: (ema.message_id = msg.id)
Total runtime: 273.679 ms
Jim Crate
pgsql-performance by date: