Thread: Why Index is not used

Why Index is not used

From
Adarsh Sharma
Date:
Dear all,

Today I got to run a query internally from my application by more than
10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
 pg_size_pretty
----------------
 5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));
 pg_size_pretty
----------------
 4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where
c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                  QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
   Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id)
AND (s.sentence_id = c.sentence_id))
   ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65
rows=27471560 width=1993)
   ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
         ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
               Sort Key: c.clause_id, c.source_id, c.sentence_id
               ->  Seq Scan on clause2 c  (cost=0.00..770951.84
rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id,
sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,
sentence_id);

I don't know why it not uses the index scan for clause2 table.

Any suggestions to tune the query.


Thanks & best Regards,
Adarsh Sharma

Re: Why Index is not used

From
Andreas Kretschmer
Date:
Adarsh Sharma <adarsh.sharma@orkash.com> wrote:

> Dear all,
>
> Today I got to run a query internally from my application by more than
> 10 connections.
>
> But The query performed very badly. A the data size of tables are as :
>
> pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
> pg_size_pretty
> ----------------
> 5858 MB
> (1 row)
>
> pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));
> pg_size_pretty
> ----------------
> 4719 MB
> (1 row)
>
>
> I explain the query as after making the  indexes as :
>
> pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where
> c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
> pdc_uima-# sentence_id=s.sentence_id ;
>                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------
> Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
>   Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id)
> AND (s.sentence_id = c.sentence_id))
>   ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65
> rows=27471560 width=1993)
>   ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
>         ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
>               Sort Key: c.clause_id, c.source_id, c.sentence_id
>               ->  Seq Scan on clause2 c  (cost=0.00..770951.84
> rows=31853084 width=72)
>
>
>
> Indexes are :
>
> CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id,
> sentence_id);
> CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,
> sentence_id);
>
> I don't know why it not uses the index scan for clause2 table.

How many rows contains clause2? The planner expected 167324179 returning
rows, can you run the same explain with ANALYSE to see the real amount
of returning rows?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Why Index is not used

From
Adarsh Sharma
Date:

Thanks Andreas, I was about print the output but it takes too much time.

Below is the output of explain analyze command :
pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                                     QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117 loops=1)
   Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
   ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 rows=27471560 width=1993) (actual time=0.130..177599.310 rows=27471560 loops=1)
   ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763 loops=1)
         ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083 loops=1)
               Sort Key: c.clause_id, c.source_id, c.sentence_id
               Sort Method:  external merge  Disk: 2616520kB
               ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1)
 Total runtime: 647804.037 ms
(9 rows)


Thanks , Adarsh

Andreas Kretschmer wrote:
Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
 
Dear all,

Today I got to run a query internally from my application by more than  
10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));   
pg_size_pretty
----------------
4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where  
c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;                                                QUERY PLAN                                                 
--------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id)  
AND (s.sentence_id = c.sentence_id)) ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65  
rows=27471560 width=1993) ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)       ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)             Sort Key: c.clause_id, c.source_id, c.sentence_id             ->  Seq Scan on clause2 c  (cost=0.00..770951.84  
rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id,  
sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id,  
sentence_id);

I don't know why it not uses the index scan for clause2 table.   
How many rows contains clause2? The planner expected 167324179 returning
rows, can you run the same explain with ANALYSE to see the real amount
of returning rows?


Andreas 

Re: Why Index is not used

From
Chetan Suttraway
Date:


On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Dear all,

Today I got to run a query internally from my application by more than 10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));  pg_size_pretty
----------------
4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                QUERY PLAN                                                 --------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
 Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
 ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 rows=27471560 width=1993)
 ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
       ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
             Sort Key: c.clause_id, c.source_id, c.sentence_id
             ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id, sentence_id);

I don't know why it not uses the index scan for clause2 table.


In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)
so either of the 2 tables will have to go for simple scan.

Are you expecting seq. scan on svo2 and index scan on clause2?

--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.



Re: Why Index is not used

From
Adarsh Sharma
Date:
Chetan Suttraway wrote:


On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Dear all,

Today I got to run a query internally from my application by more than 10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));  pg_size_pretty
----------------
4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                QUERY PLAN                                                 --------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
 Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
 ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 rows=27471560 width=1993)
 ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
       ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
             Sort Key: c.clause_id, c.source_id, c.sentence_id
             ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id, sentence_id);

I don't know why it not uses the index scan for clause2 table.


In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)
so either of the 2 tables will have to go for simple scan.

Are you expecting seq. scan on svo2 and index scan on clause2?


As per the size consideration and the number of rows, I think index scan on clause2 is better.

Your constraint is valid  but  I need to perform  this query faster. 
What is the reason behind the seq scan of clause2.



Regards,
Adarsh



Re: Why Index is not used

From
Thomas Kellerer
Date:
Adarsh Sharma, 25.03.2011 07:51:
>
> Thanks Andreas, I was about print the output but it takes too much time.
>
> Below is the output of explain analyze command :
> pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and
s.doc_id=c.source_idand c. 
> pdc_uima-# sentence_id=s.sentence_id ;
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117
loops=1)
> Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
> -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) (actual
time=0.130..177599.310rows=27471560 loops=1) 
> -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763
loops=1)
> -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083
loops=1)
> Sort Key: c.clause_id, c.source_id, c.sentence_id
> Sort Method: external merge Disk: 2616520kB
> -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083
loops=1)
> Total runtime: 647804.037 ms
> (9 rows)
>
>
How many rows are there in clause2 in total?

31853084 rows are returned from that table which sounds like the whole table qualifies for the join condition.

Regards
Thomas

Re: Why Index is not used

From
Chetan Suttraway
Date:


On Fri, Mar 25, 2011 at 12:39 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Chetan Suttraway wrote:


On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
Dear all,

Today I got to run a query internally from my application by more than 10 connections.

But The query performed very badly. A the data size of tables are as :

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)

pdc_uima=#  select pg_size_pretty(pg_total_relation_size('svo2'));  pg_size_pretty
----------------
4719 MB
(1 row)


I explain the query as after making the  indexes as :

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
                                                QUERY PLAN                                                 --------------------------------------------------------------------------------------------------------------
Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
 Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
 ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65 rows=27471560 width=1993)
 ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084 width=72)
       ->  Sort  (cost=5673828.74..5753461.45 rows=31853084 width=72)
             Sort Key: c.clause_id, c.source_id, c.sentence_id
             ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=72)



Indexes are :

CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2  ON svo2  USING btree (clause_id, doc_id, sentence_id);

I don't know why it not uses the index scan for clause2 table.


In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)
so either of the 2 tables will have to go for simple scan.

Are you expecting seq. scan on svo2 and index scan on clause2?


As per the size consideration and the number of rows, I think index scan on clause2 is better.

Your constraint is valid  but  I need to perform  this query faster. 
What is the reason behind the seq scan of clause2.



Regards,
Adarsh




Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;

--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.



Re: Why Index is not used

From
Adarsh Sharma
Date:

Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;


As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
 Hash Join  (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
   Hash Cond: (c.clause_id = s.clause_id)
   ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
         ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
   Merge Cond: (c.source_id = s.doc_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.source_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.doc_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
   Merge Cond: (c.sentence_id = s.sentence_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.sentence_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.sentence_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

Please  let me know if any other information is required.






--
Best Regards,
Adarsh Sharma



Re: Why Index is not used

From
Chetan Suttraway
Date:


On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:

Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;


As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
 Hash Join  (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
   Hash Cond: (c.clause_id = s.clause_id)
   ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
         ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
   Merge Cond: (c.source_id = s.doc_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.source_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.doc_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
   Merge Cond: (c.sentence_id = s.sentence_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.sentence_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.sentence_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

Please  let me know if any other information is required.






--
Best Regards,
Adarsh Sharma



The ideas is to have maximum filtering occuring on leading column of index.
the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas
in the second plan with doc_id predicates is returning only 20954686217.

So maybe you should consider re-ordering of the index on clause2.

I am thinking that you created the indexes by looking at the columns used in the where clause.
But its not always helpful to create  indexes based on exact order of predicates specified in query.
Instead the idea should be consider the predicate which is going to do filter out the results.
Likewise we should consider all possible uses of index columns across all queries and then decide on the
order of columns for the composite index to be created.

Whats your take on this?

--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.



Re: Why Index is not used

From
tv@fuzzy.cz
Date:
>>     Merge Join  (cost=5673831.05..34033959.87 rows=167324179 width=2053)
>>      Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id =
>>     c.source_id) AND (s.sentence_id = c.sentence_id))
>>      ->  Index Scan using idx_svo2 on svo2 s  (cost=0.00..24489343.65
>>     rows=27471560 width=1993)
>>      ->  Materialize  (cost=5673828.74..6071992.29 rows=31853084
>> width=72)
>>            ->  Sort  (cost=5673828.74..5753461.45 rows=31853084
>> width=72)
>>                  Sort Key: c.clause_id, c.source_id, c.sentence_id
>>                  ->  Seq Scan on clause2 c  (cost=0.00..770951.84
>>     rows=31853084 width=72)

>>
>
> As per the size consideration and the number of rows, I think index scan
> on clause2 is better.

I really doubt that - using index usually involves a lot of random I/O and
that makes slow with a lot of rows. And that's exactly this case, as there
are 27471560 rows in the first table.

You can force the planner to use different plan by disabling merge join,
just set

  set enable_mergejoin = false

and see what happens. There are other similar options:

  http://www.postgresql.org/docs/8.4/static/runtime-config-query.html

And yet another option - you can try to mangle with the cost constants,
namely seq_page_cost and random_page_cost. Decreasing random_page_cost
(default is 4) makes index scans cheaper, so it's more likely the planner
will choose them.

Tomas


Re: Why Index is not used

From
Adarsh Sharma
Date:
Chetan Suttraway wrote:


On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:

Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;


As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
 Hash Join  (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
   Hash Cond: (c.clause_id = s.clause_id)
   ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
         ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
   Merge Cond: (c.source_id = s.doc_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.source_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.doc_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
   Merge Cond: (c.sentence_id = s.sentence_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.sentence_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.sentence_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

Please  let me know if any other information is required.






--
Best Regards,
Adarsh Sharma



The ideas is to have maximum filtering occuring on leading column of index.
the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas
in the second plan with doc_id predicates is returning only 20954686217.

So maybe you should consider re-ordering of the index on clause2.

I am thinking that you created the indexes by looking at the columns used in the where clause.
But its not always helpful to create  indexes based on exact order of predicates specified in query.
Instead the idea should be consider the predicate which is going to do filter out the results.
Likewise we should consider all possible uses of index columns across all queries and then decide on the
order of columns for the composite index to be created.

Whats your take on this?

I am sorry but I am not able to got your points completely.

My table definitions are as :

Clause2 Table :

CREATE TABLE clause2
(
  id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
  source_id integer,
  sentence_id integer,
  clause_id integer,
  tense character varying(30),
  clause text,
  CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, sentence_id);

svo2 table :--

CREATE TABLE svo2
(
  svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
  doc_id integer,
  sentence_id integer,
  clause_id integer,
  negation integer,
  subject character varying(3000),
  verb character varying(3000),
  "object" character varying(3000),
  preposition character varying(3000),
  subject_type character varying(3000),
  object_type character varying(3000),
  subject_attribute character varying(3000),
  object_attribute character varying(3000),
  verb_attribute character varying(3000),
  subject_concept character varying(100),
  object_concept character varying(100),
  subject_sense character varying(100),
  object_sense character varying(100),
  subject_chain character varying(5000),
  object_chain character varying(5000),
  sub_type_id integer,
  obj_type_id integer,
  CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_svo2  ON svo2  USING btree  (clause_id, doc_id, sentence_id);

Please correct me if I m wrong.

I need to change the order of columns in indexes according to the filter conditions but in this query .

After making
set enable_mergejoin = false
and random_page_cost =2.0

The problem remains the same.





What is your recommendations for the new index so that the query runs even faster.


I can change my original query to :

explain analyze select c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;

And the output is :

                                                             QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..128419720.68 rows=167324179 width=105) (actual time=11.179..285708.966 rows=30473117 loops=1)
   ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=53) (actual time=0.013..19554.222 rows=27471560 loops=1)
   ->  Index Scan using idx_clause on clause2 c  (cost=0.00..4.63 rows=1 width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
         Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id = s.doc_id) AND (c.sentence_id = s.sentence_id))
 Total runtime: 301599.274 ms


Thanks & best Regards,
Adarsh Sharma


Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.




Re: Why Index is not used

From
Shaun Thomas
Date:
On 03/25/2011 04:07 AM, Chetan Suttraway wrote:

> The ideas is to have maximum filtering occuring on leading column of index.
> the first plan with only the predicates on clause_id is returning
> 379772050555842 rows whereas
> in the second plan with doc_id predicates is returning only 20954686217.
>
> So maybe you should consider re-ordering of the index on clause2.

That won't really help him. He's joining a 27M row table against a 31M
row table with basically no WHERE clause. We can see that because he's
getting 30M rows back in the EXPLAIN ANALYZE. At that point, it doesn't
really matter which table gets index scanned. This query will *always*
take several minutes to execute.

It would be completely different if he only wanted to get the results
for *one* source. Or *one* sentence. But getting all of them ever stored
will just take forever.

> I am sorry but I am not able to got your points completely.

He just means that indexes work better if they're placed in order of
selectivity. In your case, it seems sentence_id restricts the result set
better than clause_id. So Chetan suggested remaking your indexes to be
this instead:

CREATE INDEX idx_clause ON clause2
  USING btree (sentence_id, clause_id, source_id);

CREATE INDEX idx_svo2 ON svo2
  USING btree (sentence_id, clause_id, doc_id);

This *might* help. But your fundamental problem is that you're joining
two giant tables with no clause to limit the result set. If you were
only getting back 10,000 rows, or even a million rows, your query could
execute in a fraction of the time. But joining every row in both tables
and returning a 30-million row result set isn't going to be fun for
anyone. Are you actually processing all 30-million rows you get back?
Storing them somewhere?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: Why Index is not used

From
Maciek Sakrejda
Date:
To expand on what Shaun said:

> But your fundamental problem is that you're joining two
> giant tables with no clause to limit the result set. If you were only
> getting back 10,000 rows, or even a million rows, your query could execute
> in a fraction of the time. But joining every row in both tables and
> returning a 30-million row result set isn't going to be fun for anyone.

Indexes aren't a magical performance fairy dust. An index gives you a
way to look up a single row directly (you can't do that with a scan),
but it's a terrible way to look up 90% (or even 50%) of the rows in a
table, because the per-row cost of lookup is actually higher than in a
scan. That is, once you need to look up more than a certain percentage
of rows in a table, it's actually cheaper to scan it and ignore what
you don't care about rather than going through the index for each row.
It looks like your query is hitting this situation.

Try turning off the merge join, as Tomas suggested, to validate the
assumption that using the index would actually be worse.

To resolve your problem, you shouldn't be trying to make the planner
pick a better plan, you should optimize your settings to get this plan
to perform better or (ideally) optimize your application so you don't
need such an expensive query (because the fundamental problem is that
this query is inherently expensive).

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: Why Index is not used

From
Greg Smith
Date:
On 03/25/2011 12:49 PM, Maciek Sakrejda wrote:
> Indexes aren't a magical performance fairy dust.


One day I intend to use this line for the title of a presentation
slide.  Maybe the title of the whole talk.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books