Thread: An unwanted seqscan

An unwanted seqscan

From
Brian Herlihy
Date:
Hi,

I am having trouble understanding why a seqscan is chosen for this query.

In practice the seqscan is very expensive, whereas the nested loop is usually quite fast, even with several hundred
rowsreturned from meta_keywords_url. 

The server is running version 8.1.3, and both tables were analyzed recently.  meta_keywords contains around 25% dead
rows,meta_keywords_url contains no dead rows. 

I have included the query written both as a subquery and as a join.

Thanks for any assistance!
Brian



live=> explain select * from meta_keywords where url_id in (select url_id from meta_keywords_url where host =
'postgresql.org');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Hash IN Join  (cost=1755.79..545380.52 rows=9442 width=29)
   Hash Cond: ("outer".url_id = "inner".url_id)
   ->  Seq Scan on meta_keywords  (cost=0.00..507976.54 rows=7110754 width=29)
   ->  Hash  (cost=1754.35..1754.35 rows=576 width=4)
         ->  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1754.35 rows=576 width=4)
               Recheck Cond: ((host)::text = 'postgresql.org'::text)
               ->  Bitmap Index Scan on meta_keywords_url_host_path  (cost=0.00..11.02 rows=576 width=0)
                     Index Cond: ((host)::text = 'postgresql.org'::text)
(8 rows)

live=> set enable_seqscan=off;
SET
live=> explain select * from meta_keywords where url_id in (select url_id from meta_keywords_url where host =
'postgresql.org');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1755.79..3161748.83 rows=9442 width=29)
   ->  HashAggregate  (cost=1755.79..1761.55 rows=576 width=4)
         ->  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1754.35 rows=576 width=4)
               Recheck Cond: ((host)::text = 'postgresql.org'::text)
               ->  Bitmap Index Scan on meta_keywords_url_host_path  (cost=0.00..11.02 rows=576 width=0)
                     Index Cond: ((host)::text = 'postgresql.org'::text)
   ->  Index Scan using meta_keywords_url_id on meta_keywords  (cost=0.00..5453.28 rows=2625 width=29)
         Index Cond: (meta_keywords.url_id = "outer".url_id)
(8 rows)

live=> explain select * from meta_keywords join meta_keywords_url using (url_id) where host = 'postgresql.org';
                                                                                   QUERY PLAN                        
---------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1758.52..543685.43 rows=9297 width=107)
   Hash Cond: ("outer".url_id = "inner".url_id)
   ->  Seq Scan on meta_keywords  (cost=0.00..506859.29 rows=6994929 width=28)
   ->  Hash  (cost=1757.08..1757.08 rows=577 width=83)
         ->  Bitmap Heap Scan on meta_keywords_url  (cost=11.02..1757.08 rows=577 width=83)
               Recheck Cond: ((host)::text = 'postgresql.org'::text)
               ->  Bitmap Index Scan on meta_keywords_url_host_path  (cost=0.00..11.02 rows=577 width=0)
                     Index Cond: ((host)::text = 'postgresql.org'::text)
(8 rows)

live=> set enable_seqscan=off;
SET
live=> explain select * from meta_keywords join meta_keywords_url using (url_id) where host = 'postgresql.org';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..3348211.21 rows=9297 width=107)
   ->  Index Scan using meta_keywords_url_host_path on meta_keywords_url  (cost=0.00..2230.24 rows=577 width=83)
         Index Cond: ((host)::text = 'postgresql.org'::text)
   ->  Index Scan using meta_keywords_url_id on meta_keywords  (cost=0.00..5765.81 rows=2649 width=28)
         Index Cond: (meta_keywords.url_id = "outer".url_id)
(5 rows)





Re: An unwanted seqscan

From
Tom Lane
Date:
Brian Herlihy <btherl@yahoo.com.au> writes:
> I am having trouble understanding why a seqscan is chosen for this query.

As far as anyone can see from this output, the planner's decisions are
correct: it prefers the plans with the smaller estimated cost.  If you
want us to take an interest, provide some more context --- EXPLAIN
ANALYZE output for starters.

            regards, tom lane

Re: An unwanted seqscan

From
Brian Herlihy
Date:
Hi Tom,

Sorry, I didn't ask the right question.  I meant to ask "Why does it estimate a smaller cost for the seqscan?"

With some further staring I was able to find the bad estimate and fix it by increasing the relevant statistics target.

Thanks,
Brian

----- Original Message ----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Brian Herlihy <btherl@yahoo.com.au>
Cc: Postgresql Performance <pgsql-performance@postgresql.org>
Sent: Wednesday, 14 February, 2007 4:53:54 PM
Subject: Re: [PERFORM] An unwanted seqscan

Brian Herlihy <btherl@yahoo.com.au> writes:
> I am having trouble understanding why a seqscan is chosen for this query.

As far as anyone can see from this output, the planner's decisions are
correct: it prefers the plans with the smaller estimated cost.  If you
want us to take an interest, provide some more context --- EXPLAIN
ANALYZE output for starters.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq