Re: Reason of Slowness of query - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: Reason of Slowness of query
Date
Msg-id 4D89B48D.80800@gmail.com
Whole thread Raw
In response to Re: Reason of Slowness of query  (Adarsh Sharma <adarsh.sharma@orkash.com>)
List pgsql-performance
23.03.11 09:30, Adarsh Sharma написав(ла):
Thanks Chetan, here is the output of your updated query :


explain  select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);


                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8)
   ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8)
         Hash Cond: (p.crawled_page_id = c.source_id)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8)
         ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4)
               ->  Seq Scan on clause2 c  (cost=0.00..771182.96 rows=31876196 width=4)
(6 rows)

And my explain analyze output is :

                                                      QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8) (actual time=56666.181..56669.270 rows=72 loops=1)
   ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8) (actual time=45740.789..56665.816 rows=74 loops=1)
         Hash Cond: (p.crawled_page_id = c.source_id)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1)
         ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4) (actual time=45310.524..45310.524 rows=31853083 loops=1)
               ->  Seq Scan on clause2 c  (cost=0.00..771182.96 rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1)
 Total runtime: 56687.660 ms
(7 rows)

But Is there is any option to tune it further and one more thing output rows varies from 6 to 7.
You need an index on source_id to prevent seq scan, like the next:
CREATE INDEX idx_clause2_source_id
  ON clause2
  (source_id);

Best regards, Vitalii Tymchyshyn

pgsql-performance by date:

Previous
From: Chetan Suttraway
Date:
Subject: Re: Reason of Slowness of query
Next
From: Adarsh Sharma
Date:
Subject: Re-Reason of Slowness of Query