Re: Re-Reason of Slowness of Query - Mailing list pgsql-performance

From Adarsh Sharma
Subject Re: Re-Reason of Slowness of Query
Date
Msg-id 4D89D7AD.4050503@orkash.com
Whole thread Raw
In response to Re: Re-Reason of Slowness of Query  (Chetan Suttraway <chetan.suttraway@enterprisedb.com>)
Responses Re: Re-Reason of Slowness of Query
List pgsql-performance
Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding the needed queries :

First Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(# c.source_id) where (c.source_id is null);
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual time=87927.000..87930.084 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.191..87926.546 rows=74 loops=1)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
         ->  Index Scan using idx_clause2_source_id on clause2 c  (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 loops=428467)
               Index Cond: (p.crawled_page_id = c.source_id)
 Total runtime: 87933.882 ms :-(
(6 rows)

Second Option :

pdc_uima=# explain analyze select distinct(p.crawled_page_id) from page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);
                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual time=7047.259..7050.261 rows=72 loops=1)
   ->  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) (actual time=0.039..7046.826 rows=74 loops=1)
         ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
         ->  Index Scan using idx_clause2_source_id on clause2 c  (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 loops=428467)
               Index Cond: (c.source_id = p.crawled_page_id)
 Total runtime: 7054.074 ms :-)
(6 rows)


Thanks & best Regards,
Adarsh Sharma




Chetan Suttraway wrote:


On Wed, Mar 23, 2011 at 4:08 PM, <tv@fuzzy.cz> wrote:
> I just want to retrieve that id 's from page_content which do not have
> any entry in clause2 table.

In that case the query probably does not work (at least the query you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition). At least
that's how I understand it.

true.
 
So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.


This looks like to give expected results. Also note that the where clause "is null" is really required and is not an
optional predicate.

 
regards
Tomas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.




pgsql-performance by date:

Previous
From: Chetan Suttraway
Date:
Subject: Re: Re-Reason of Slowness of Query
Next
From: Vitalii Tymchyshyn
Date:
Subject: Re: Re-Reason of Slowness of Query