Re: Why Index is not used - Mailing list pgsql-performance

From Adarsh Sharma
Subject Re: Why Index is not used
Date
Msg-id 4D8C6614.1050007@orkash.com
Whole thread Raw
In response to Re: Why Index is not used  (Chetan Suttraway <chetan.suttraway@enterprisedb.com>)
List pgsql-performance
Chetan Suttraway wrote:


On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:

Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;


As per your instructions, Please  check the below output :-

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
 Hash Join  (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
   Hash Cond: (c.clause_id = s.clause_id)
   ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Hash  (cost=697537.60..697537.60 rows=27471560 width=1993)
         ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
   Merge Cond: (c.source_id = s.doc_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.source_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.doc_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
 Merge Join  (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
   Merge Cond: (c.sentence_id = s.sentence_id)
   ->  Sort  (cost=5596061.24..5675693.95 rows=31853084 width=64)
         Sort Key: c.sentence_id
         ->  Seq Scan on clause2 c  (cost=0.00..770951.84 rows=31853084 width=64)
   ->  Materialize  (cost=38028881.02..38372275.52 rows=27471560 width=1993)
         ->  Sort  (cost=38028881.02..38097559.92 rows=27471560 width=1993)
               Sort Key: s.sentence_id
               ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)

Please  let me know if any other information is required.






--
Best Regards,
Adarsh Sharma



The ideas is to have maximum filtering occuring on leading column of index.
the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas
in the second plan with doc_id predicates is returning only 20954686217.

So maybe you should consider re-ordering of the index on clause2.

I am thinking that you created the indexes by looking at the columns used in the where clause.
But its not always helpful to create  indexes based on exact order of predicates specified in query.
Instead the idea should be consider the predicate which is going to do filter out the results.
Likewise we should consider all possible uses of index columns across all queries and then decide on the
order of columns for the composite index to be created.

Whats your take on this?

I am sorry but I am not able to got your points completely.

My table definitions are as :

Clause2 Table :

CREATE TABLE clause2
(
  id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
  source_id integer,
  sentence_id integer,
  clause_id integer,
  tense character varying(30),
  clause text,
  CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_clause  ON clause2  USING btree  (clause_id, source_id, sentence_id);

svo2 table :--

CREATE TABLE svo2
(
  svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
  doc_id integer,
  sentence_id integer,
  clause_id integer,
  negation integer,
  subject character varying(3000),
  verb character varying(3000),
  "object" character varying(3000),
  preposition character varying(3000),
  subject_type character varying(3000),
  object_type character varying(3000),
  subject_attribute character varying(3000),
  object_attribute character varying(3000),
  verb_attribute character varying(3000),
  subject_concept character varying(100),
  object_concept character varying(100),
  subject_sense character varying(100),
  object_sense character varying(100),
  subject_chain character varying(5000),
  object_chain character varying(5000),
  sub_type_id integer,
  obj_type_id integer,
  CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX idx_svo2  ON svo2  USING btree  (clause_id, doc_id, sentence_id);

Please correct me if I m wrong.

I need to change the order of columns in indexes according to the filter conditions but in this query .

After making
set enable_mergejoin = false
and random_page_cost =2.0

The problem remains the same.





What is your recommendations for the new index so that the query runs even faster.


I can change my original query to :

explain analyze select c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;

And the output is :

                                                             QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..128419720.68 rows=167324179 width=105) (actual time=11.179..285708.966 rows=30473117 loops=1)
   ->  Seq Scan on svo2 s  (cost=0.00..697537.60 rows=27471560 width=53) (actual time=0.013..19554.222 rows=27471560 loops=1)
   ->  Index Scan using idx_clause on clause2 c  (cost=0.00..4.63 rows=1 width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
         Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id = s.doc_id) AND (c.sentence_id = s.sentence_id))
 Total runtime: 301599.274 ms


Thanks & best Regards,
Adarsh Sharma


Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.




pgsql-performance by date:

Previous
From: Vitalii Tymchyshyn
Date:
Subject: Re: Shouldn't we have a way to avoid "risky" plans?
Next
From: Laszlo Nagy
Date:
Subject: Re: Slow query on CLUTER -ed tables