Re: Why Index is not used - Mailing list pgsql-performance

From Adarsh Sharma
Subject Re: Why Index is not used
Date
Msg-id 4D8C3B86.3040208@orkash.com
Whole thread Raw
In response to Re: Why Index is not used  (Andreas Kretschmer <akretschmer@spamfence.net>)
Responses Re: Why Index is not used  (Thomas Kellerer <spam_eater@gmx.net>)
List pgsql-performance

Thanks Andreas, I was about print the output but it takes too much time.

Below is the output of explain analyze command :
pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                                     QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117 loops=1)
   Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
   ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 rows=27471560 width=1993) (actual time=0.130..177599.310 rows=27471560 loops=1)
   ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763 loops=1)
         ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083 loops=1)
               Sort Key: c.clause_id, c.source_id, c.sentence_id
               Sort Method:  external merge  Disk: 2616520kB
               ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1)
 Total runtime: 647804.037 ms
(9 rows)


Thanks , Adarsh

Andreas Kretschmer wrote:
Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
 
Dear all,

Today I got to run a query internally from my application by more than  
10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));   
pg_size_pretty
----------------
4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where  
c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;                                                QUERY PLAN                                                 
--------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id)  
AND (s.sentence_id = c.sentence_id)) ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65  
rows=27471560 width=1993) ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)       ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)             Sort Key: c.clause_id, c.source_id, c.sentence_id             ->  Seq Scan on clause2 c  (cost=0.00..770951.84  
rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id,  
sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,  
sentence_id);

I don't know why it not uses the index scan for clause2 table.   
How many rows contains clause2? The planner expected 167324179 returning
rows, can you run the same explain with ANALYSE to see the real amount
of returning rows?


Andreas 

pgsql-performance by date:

Previous
From: Andreas Kretschmer
Date:
Subject: Re: Why Index is not used
Next
From: Chetan Suttraway
Date:
Subject: Re: Why Index is not used