Is it require further tuning - Mailing list pgsql-performance

From Adarsh Sharma
Subject Is it require further tuning
Date
Msg-id 4D6F19BE.9030802@orkash.com
Whole thread Raw
Responses Re: Is it require further tuning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Dear all,

I have a query that i used to fire many times in our application and
need to be tuned at the deeper level.

Query :

 explain analyze select p.crawled_page_id, p.content,
w.publication_name, w.country_name, p.publishing_date,m.doc_category
,l.display_name as location, l.lat, l.lon, l.pop_rank,
p.crawled_page_url, substring(p.content,1,250)
as display_text, p.heading from page_content_terror p, location l,
loc_context_terror lc, meta_terror  m,
website_master w   where p.crawled_page_id>0 and
p.crawled_page_id=lc.source_id and lc.location_id=l.id
 and p.crawled_page_id=m.doc_id and p.url_id= w.url_id  limit 1000;



QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=483.31..1542.79 rows=1000 width=5460) (actual
time=8.797..125504.603 rows=1000 loops=1)
   ->  Hash Join  (cost=483.31..169719466.71 rows=160190779 width=5460)
(actual time=8.794..125502.974 rows=1000 loops=1)
         Hash Cond: (p.url_id = w.url_id)
         ->  Nested Loop  (cost=0.00..163675973.13 rows=13034265
width=4056) (actual time=1.294..125463.784 rows=1000 loops=1)
               ->  Nested Loop  (cost=0.00..115348580.99 rows=13034265
width=3024) (actual time=1.219..125436.104 rows=1156 loops=1)
                     Join Filter: (p.crawled_page_id = lc.source_id)
                     ->  Nested Loop  (cost=0.00..10960127.98 rows=53553
width=3024) (actual time=0.037..66671.887 rows=1156 loops=1)
                           Join Filter: (p.crawled_page_id = m.doc_id)
                           ->  Seq Scan on page_content_terror p
(cost=0.00..8637.64 rows=2844 width=2816) (actual time=0.013..5.884
rows=1156 loops=1)
                                 Filter: (crawled_page_id > 0)
                           ->  Seq Scan on meta_terror m
(cost=0.00..3803.66 rows=3766 width=208) (actual time=0.003..30.117
rows=45148 loops=1156)
                     ->  Seq Scan on loc_context_terror lc
(cost=0.00..1340.78 rows=48678 width=8) (actual time=0.376..24.675
rows=41658 loops=1156)
               ->  Index Scan using location_pk on location l
(cost=0.00..3.70 rows=1 width=1040) (actual time=0.016..0.017 rows=1
loops=1156)
                     Index Cond: (l.id = lc.location_id)
         ->  Hash  (cost=452.58..452.58 rows=2458 width=1412) (actual
time=7.344..7.344 rows=2458 loops=1)
               ->  Seq Scan on website_master w  (cost=0.00..452.58
rows=2458 width=1412) (actual time=0.013..4.094 rows=2458 loops=1)
 Total runtime: 125506.007 ms

/***********************************After adding
indexes*******************************************/

QUERY
PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------
---------
 Limit  (cost=483.31..583.30 rows=1000 width=5460) (actual
time=9.769..63.374 rows=1000 loops=1)
   ->  Hash Join  (cost=483.31..871571182.73 rows=8716355049 width=5460)
(actual time=9.765..62.314 rows=1000 loops=1)
         Hash Cond: (p.url_id = w.url_id)
         ->  Nested Loop  (cost=0.00..542756386.37 rows=709224822
width=4056) (actual time=1.640..30.895 rows=1000 loops=1)
               ->  Nested Loop  (cost=0.00..2587537.38 rows=3139483
width=3856) (actual time=1.558..22.552 rows=1000 loops=1)
                     ->  Nested Loop  (cost=0.00..157876.94 rows=41693
width=1040) (actual time=1.419..13.039 rows=1000 loops=1)
                           ->  Seq Scan on loc_context_terror lc
(cost=0.00..1270.93 rows=41693 width=8) (actual time=1.346..2.264
rows=1156 loops=1)
                           ->  Index Scan using location_pk on location
l  (cost=0.00..3.74 rows=1 width=1040) (actual time=0.005..0.006 rows=1
loops=1156)
                                 Index Cond: (l.id = lc.location_id)
                     ->  Index Scan using idx_crawled_s9 on
page_content_terror p  (cost=0.00..57.34 rows=75 width=2816) (actual
time=0.005..0.006 rows=1 loo
ps=1000)
                           Index Cond: ((p.crawled_page_id > 0) AND
(p.crawled_page_id = lc.source_id))
               ->  Index Scan using idx_doc_s9 on meta_terror m
(cost=0.00..169.23 rows=226 width=208) (actual time=0.005..0.006 rows=1
loops=1000)
                     Index Cond: (m.doc_id = p.crawled_page_id)
         ->  Hash  (cost=452.58..452.58 rows=2458 width=1412) (actual
time=7.964..7.964 rows=2458 loops=1)
               ->  Seq Scan on website_master w  (cost=0.00..452.58
rows=2458 width=1412) (actual time=0.009..4.495 rows=2458 loops=1)
 Total runtime: 64.396 ms


Don't know why it uses Seq Scan on loc_context_terror as i have indexes
on the desired columns as well.


Thanks & best Regards,

Adarsh Sharma

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Performance Test for PostgreSQL9
Next
From: Selva manickaraja
Date:
Subject: Re: Performance Test for PostgreSQL9