Thread: Why Index is not used
Dear all, Today I got to run a query internally from my application by more than 10 connections. But The query performed very badly. A the data size of tables are as : pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); pg_size_pretty ---------------- 5858 MB (1 row) pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty ---------------- 4719 MB (1 row) I explain the query as after making the indexes as : pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) Sort Key: c.clause_id, c.source_id, c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) Indexes are : CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id); I don't know why it not uses the index scan for clause2 table. Any suggestions to tune the query. Thanks & best Regards, Adarsh Sharma
Adarsh Sharma <adarsh.sharma@orkash.com> wrote: > Dear all, > > Today I got to run a query internally from my application by more than > 10 connections. > > But The query performed very badly. A the data size of tables are as : > > pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); > pg_size_pretty > ---------------- > 5858 MB > (1 row) > > pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); > pg_size_pretty > ---------------- > 4719 MB > (1 row) > > > I explain the query as after making the indexes as : > > pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where > c.clause_id=s.clause_id and s.doc_id=c.source_id and c. > pdc_uima-# sentence_id=s.sentence_id ; > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------- > Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) > Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) > AND (s.sentence_id = c.sentence_id)) > -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 > rows=27471560 width=1993) > -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) > -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) > Sort Key: c.clause_id, c.source_id, c.sentence_id > -> Seq Scan on clause2 c (cost=0.00..770951.84 > rows=31853084 width=72) > > > > Indexes are : > > CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, > sentence_id); > CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, > sentence_id); > > I don't know why it not uses the index scan for clause2 table. How many rows contains clause2? The planner expected 167324179 returning rows, can you run the same explain with ANALYSE to see the real amount of returning rows? Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Thanks Andreas, I was about print the output but it takes too much time.
Below is the output of explain analyze command :
pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117 loops=1)
Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
-> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) (actual time=0.130..177599.310 rows=27471560 loops=1)
-> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763 loops=1)
-> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083 loops=1)
Sort Key: c.clause_id, c.source_id, c.sentence_id
Sort Method: external merge Disk: 2616520kB
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1)
Total runtime: 647804.037 ms
(9 rows)
Thanks , Adarsh
Andreas Kretschmer wrote:
Adarsh Sharma <adarsh.sharma@orkash.com> wrote:Dear all, Today I got to run a query internally from my application by more than 10 connections. But The query performed very badly. A the data size of tables are as : pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2')); pg_size_pretty ---------------- 5858 MB (1 row) pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty ---------------- 4719 MB (1 row) I explain the query as after making the indexes as : pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c. pdc_uima-# sentence_id=s.sentence_id ; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) Sort Key: c.clause_id, c.source_id, c.sentence_id -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) Indexes are : CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id); CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id); I don't know why it not uses the index scan for clause2 table.How many rows contains clause2? The planner expected 167324179 returning rows, can you run the same explain with ANALYSE to see the real amount of returning rows? Andreas
On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)Dear all,
Today I got to run a query internally from my application by more than 10 connections.
But The query performed very badly. A the data size of tables are as :
pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)
pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty
----------------
4719 MB
(1 row)
I explain the query as after making the indexes as :
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
QUERY PLAN --------------------------------------------------------------------------------------------------------------
Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053)
Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
-> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993)
-> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72)
-> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72)
Sort Key: c.clause_id, c.source_id, c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72)
Indexes are :
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id);
I don't know why it not uses the index scan for clause2 table.
so either of the 2 tables will have to go for simple scan.
Are you expecting seq. scan on svo2 and index scan on clause2?
--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.
Chetan Suttraway wrote:
As per the size consideration and the number of rows, I think index scan on clause2 is better.
Your constraint is valid but I need to perform this query faster.
What is the reason behind the seq scan of clause2.
Regards,
Adarsh
On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)Dear all,
Today I got to run a query internally from my application by more than 10 connections.
But The query performed very badly. A the data size of tables are as :
pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)
pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty
----------------
4719 MB
(1 row)
I explain the query as after making the indexes as :
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
QUERY PLAN --------------------------------------------------------------------------------------------------------------
Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053)
Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
-> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993)
-> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72)
-> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72)
Sort Key: c.clause_id, c.source_id, c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72)
Indexes are :
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id);
I don't know why it not uses the index scan for clause2 table.
so either of the 2 tables will have to go for simple scan.
Are you expecting seq. scan on svo2 and index scan on clause2?
As per the size consideration and the number of rows, I think index scan on clause2 is better.
Your constraint is valid but I need to perform this query faster.
What is the reason behind the seq scan of clause2.
Regards,
Adarsh
Adarsh Sharma, 25.03.2011 07:51: > > Thanks Andreas, I was about print the output but it takes too much time. > > Below is the output of explain analyze command : > pdc_uima=# explain analyze select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_idand c. > pdc_uima-# sentence_id=s.sentence_id ; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) (actual time=216281.162..630721.636 rows=30473117 loops=1) > Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id)) > -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993) (actual time=0.130..177599.310rows=27471560 loops=1) > -> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72) (actual time=216280.596..370507.452 rows=52037763 loops=1) > -> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72) (actual time=216280.591..324707.956 rows=31853083 loops=1) > Sort Key: c.clause_id, c.source_id, c.sentence_id > Sort Method: external merge Disk: 2616520kB > -> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72) (actual time=0.025..25018.665 rows=31853083 loops=1) > Total runtime: 647804.037 ms > (9 rows) > > How many rows are there in clause2 in total? 31853084 rows are returned from that table which sounds like the whole table qualifies for the join condition. Regards Thomas
On Fri, Mar 25, 2011 at 12:39 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
As per the size consideration and the number of rows, I think index scan on clause2 is better.Chetan Suttraway wrote:On Fri, Mar 25, 2011 at 12:05 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:In this case, there are no predicates or filters on individual table. (maybe something like c.source_id=10)Dear all,
Today I got to run a query internally from my application by more than 10 connections.
But The query performed very badly. A the data size of tables are as :
pdc_uima=# select pg_size_pretty(pg_total_relation_size('clause2'));
pg_size_pretty
----------------
5858 MB
(1 row)
pdc_uima=# select pg_size_pretty(pg_total_relation_size('svo2')); pg_size_pretty
----------------
4719 MB
(1 row)
I explain the query as after making the indexes as :
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.
pdc_uima-# sentence_id=s.sentence_id ;
QUERY PLAN --------------------------------------------------------------------------------------------------------------
Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053)
Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = c.source_id) AND (s.sentence_id = c.sentence_id))
-> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 rows=27471560 width=1993)
-> Materialize (cost=5673828.74..6071992.29 rows=31853084 width=72)
-> Sort (cost=5673828.74..5753461.45 rows=31853084 width=72)
Sort Key: c.clause_id, c.source_id, c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=72)
Indexes are :
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id);
I don't know why it not uses the index scan for clause2 table.
so either of the 2 tables will have to go for simple scan.
Are you expecting seq. scan on svo2 and index scan on clause2?
Your constraint is valid but I need to perform this query faster.
What is the reason behind the seq scan of clause2.
Regards,
Adarsh
Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.
Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
As per your instructions, Please check the below output :-
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
Hash Cond: (c.clause_id = s.clause_id)
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Hash (cost=697537.60..697537.60 rows=27471560 width=1993)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
Merge Cond: (c.source_id = s.doc_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.source_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.doc_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
Merge Cond: (c.sentence_id = s.sentence_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.sentence_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
Please let me know if any other information is required.
--
Best Regards,
Adarsh Sharma
On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
The ideas is to have maximum filtering occuring on leading column of index.As per your instructions, Please check the below output :-
Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
Hash Cond: (c.clause_id = s.clause_id)
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Hash (cost=697537.60..697537.60 rows=27471560 width=1993)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
Merge Cond: (c.source_id = s.doc_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.source_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.doc_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
Merge Cond: (c.sentence_id = s.sentence_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.sentence_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
Please let me know if any other information is required.
--
Best Regards,
Adarsh Sharma
the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas
in the second plan with doc_id predicates is returning only 20954686217.
So maybe you should consider re-ordering of the index on clause2.
I am thinking that you created the indexes by looking at the columns used in the where clause.
But its not always helpful to create indexes based on exact order of predicates specified in query.
Instead the idea should be consider the predicate which is going to do filter out the results.
Likewise we should consider all possible uses of index columns across all queries and then decide on the
order of columns for the composite index to be created.
Whats your take on this?
--
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.
>> Merge Join (cost=5673831.05..34033959.87 rows=167324179 width=2053) >> Merge Cond: ((s.clause_id = c.clause_id) AND (s.doc_id = >> c.source_id) AND (s.sentence_id = c.sentence_id)) >> -> Index Scan using idx_svo2 on svo2 s (cost=0.00..24489343.65 >> rows=27471560 width=1993) >> -> Materialize (cost=5673828.74..6071992.29 rows=31853084 >> width=72) >> -> Sort (cost=5673828.74..5753461.45 rows=31853084 >> width=72) >> Sort Key: c.clause_id, c.source_id, c.sentence_id >> -> Seq Scan on clause2 c (cost=0.00..770951.84 >> rows=31853084 width=72) >> > > As per the size consideration and the number of rows, I think index scan > on clause2 is better. I really doubt that - using index usually involves a lot of random I/O and that makes slow with a lot of rows. And that's exactly this case, as there are 27471560 rows in the first table. You can force the planner to use different plan by disabling merge join, just set set enable_mergejoin = false and see what happens. There are other similar options: http://www.postgresql.org/docs/8.4/static/runtime-config-query.html And yet another option - you can try to mangle with the cost constants, namely seq_page_cost and random_page_cost. Decreasing random_page_cost (default is 4) makes index scans cheaper, so it's more likely the planner will choose them. Tomas
Chetan Suttraway wrote:
I am sorry but I am not able to got your points completely.
My table definitions are as :
Clause2 Table :
CREATE TABLE clause2
(
id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
source_id integer,
sentence_id integer,
clause_id integer,
tense character varying(30),
clause text,
CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id);
svo2 table :--
CREATE TABLE svo2
(
svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
doc_id integer,
sentence_id integer,
clause_id integer,
negation integer,
subject character varying(3000),
verb character varying(3000),
"object" character varying(3000),
preposition character varying(3000),
subject_type character varying(3000),
object_type character varying(3000),
subject_attribute character varying(3000),
object_attribute character varying(3000),
verb_attribute character varying(3000),
subject_concept character varying(100),
object_concept character varying(100),
subject_sense character varying(100),
object_sense character varying(100),
subject_chain character varying(5000),
object_chain character varying(5000),
sub_type_id integer,
obj_type_id integer,
CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id);
Please correct me if I m wrong.
I need to change the order of columns in indexes according to the filter conditions but in this query .
After making
I can change my original query to :
explain analyze select c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;
And the output is :
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..128419720.68 rows=167324179 width=105) (actual time=11.179..285708.966 rows=30473117 loops=1)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=53) (actual time=0.013..19554.222 rows=27471560 loops=1)
-> Index Scan using idx_clause on clause2 c (cost=0.00..4.63 rows=1 width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id = s.doc_id) AND (c.sentence_id = s.sentence_id))
Total runtime: 301599.274 ms
Thanks & best Regards,
Adarsh Sharma
On Fri, Mar 25, 2011 at 2:25 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:The ideas is to have maximum filtering occuring on leading column of index.As per your instructions, Please check the below output :-
Could you please post output of below queries:
explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.clause_id=s.clause_id;
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=7828339.10..4349603998133.96 rows=379772050555842 width=2053)
Hash Cond: (c.clause_id = s.clause_id)
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Hash (cost=697537.60..697537.60 rows=27471560 width=1993)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(5 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where s.doc_id=c.source_id;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43635232.12..358368926.66 rows=20954686217 width=2053)
Merge Cond: (c.source_id = s.doc_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.source_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.doc_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
pdc_uima=# explain select c.clause, s.* from clause2 c, svo2 s where c.sentence_id=s.sentence_id ;
QUERY PLAN
---------------------------------------------------------------------------------------
Merge Join (cost=43711844.03..241541026048.10 rows=PLeaswidth=2053)
Merge Cond: (c.sentence_id = s.sentence_id)
-> Sort (cost=5596061.24..5675693.95 rows=31853084 width=64)
Sort Key: c.sentence_id
-> Seq Scan on clause2 c (cost=0.00..770951.84 rows=31853084 width=64)
-> Materialize (cost=38028881.02..38372275.52 rows=27471560 width=1993)
-> Sort (cost=38028881.02..38097559.92 rows=27471560 width=1993)
Sort Key: s.sentence_id
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=1993)
(9 rows)
Please let me know if any other information is required.
--
Best Regards,
Adarsh Sharma
the first plan with only the predicates on clause_id is returning 379772050555842 rows whereas
in the second plan with doc_id predicates is returning only 20954686217.
So maybe you should consider re-ordering of the index on clause2.
I am thinking that you created the indexes by looking at the columns used in the where clause.
But its not always helpful to create indexes based on exact order of predicates specified in query.
Instead the idea should be consider the predicate which is going to do filter out the results.
Likewise we should consider all possible uses of index columns across all queries and then decide on the
order of columns for the composite index to be created.
Whats your take on this?
I am sorry but I am not able to got your points completely.
My table definitions are as :
Clause2 Table :
CREATE TABLE clause2
(
id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
source_id integer,
sentence_id integer,
clause_id integer,
tense character varying(30),
clause text,
CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_clause ON clause2 USING btree (clause_id, source_id, sentence_id);
svo2 table :--
CREATE TABLE svo2
(
svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
doc_id integer,
sentence_id integer,
clause_id integer,
negation integer,
subject character varying(3000),
verb character varying(3000),
"object" character varying(3000),
preposition character varying(3000),
subject_type character varying(3000),
object_type character varying(3000),
subject_attribute character varying(3000),
object_attribute character varying(3000),
verb_attribute character varying(3000),
subject_concept character varying(100),
object_concept character varying(100),
subject_sense character varying(100),
object_sense character varying(100),
subject_chain character varying(5000),
object_chain character varying(5000),
sub_type_id integer,
obj_type_id integer,
CONSTRAINT pk_svo_demo_id PRIMARY KEY (svo_id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_svo2 ON svo2 USING btree (clause_id, doc_id, sentence_id);
Please correct me if I m wrong.
I need to change the order of columns in indexes according to the filter conditions but in this query .
After making
set enable_mergejoin = false and random_page_cost =2.0 The problem remains the same.What is your recommendations for the new index so that the query runs even faster.
I can change my original query to :
explain analyze select c.clause,s.doc_id,s.subject,s.verb,s.object,s.subject_type,s.object_type from clause2 c, svo2 s where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id ;
And the output is :
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..128419720.68 rows=167324179 width=105) (actual time=11.179..285708.966 rows=30473117 loops=1)
-> Seq Scan on svo2 s (cost=0.00..697537.60 rows=27471560 width=53) (actual time=0.013..19554.222 rows=27471560 loops=1)
-> Index Scan using idx_clause on clause2 c (cost=0.00..4.63 rows=1 width=72) (actual time=0.006..0.007 rows=1 loops=27471560)
Index Cond: ((c.clause_id = s.clause_id) AND (c.source_id = s.doc_id) AND (c.sentence_id = s.sentence_id))
Total runtime: 301599.274 ms
Thanks & best Regards,
Adarsh Sharma
Regards,
Chetan Suttraway
EnterpriseDB, The Enterprise PostgreSQL company.
On 03/25/2011 04:07 AM, Chetan Suttraway wrote: > The ideas is to have maximum filtering occuring on leading column of index. > the first plan with only the predicates on clause_id is returning > 379772050555842 rows whereas > in the second plan with doc_id predicates is returning only 20954686217. > > So maybe you should consider re-ordering of the index on clause2. That won't really help him. He's joining a 27M row table against a 31M row table with basically no WHERE clause. We can see that because he's getting 30M rows back in the EXPLAIN ANALYZE. At that point, it doesn't really matter which table gets index scanned. This query will *always* take several minutes to execute. It would be completely different if he only wanted to get the results for *one* source. Or *one* sentence. But getting all of them ever stored will just take forever. > I am sorry but I am not able to got your points completely. He just means that indexes work better if they're placed in order of selectivity. In your case, it seems sentence_id restricts the result set better than clause_id. So Chetan suggested remaking your indexes to be this instead: CREATE INDEX idx_clause ON clause2 USING btree (sentence_id, clause_id, source_id); CREATE INDEX idx_svo2 ON svo2 USING btree (sentence_id, clause_id, doc_id); This *might* help. But your fundamental problem is that you're joining two giant tables with no clause to limit the result set. If you were only getting back 10,000 rows, or even a million rows, your query could execute in a fraction of the time. But joining every row in both tables and returning a 30-million row result set isn't going to be fun for anyone. Are you actually processing all 30-million rows you get back? Storing them somewhere? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email
To expand on what Shaun said: > But your fundamental problem is that you're joining two > giant tables with no clause to limit the result set. If you were only > getting back 10,000 rows, or even a million rows, your query could execute > in a fraction of the time. But joining every row in both tables and > returning a 30-million row result set isn't going to be fun for anyone. Indexes aren't a magical performance fairy dust. An index gives you a way to look up a single row directly (you can't do that with a scan), but it's a terrible way to look up 90% (or even 50%) of the rows in a table, because the per-row cost of lookup is actually higher than in a scan. That is, once you need to look up more than a certain percentage of rows in a table, it's actually cheaper to scan it and ignore what you don't care about rather than going through the index for each row. It looks like your query is hitting this situation. Try turning off the merge join, as Tomas suggested, to validate the assumption that using the index would actually be worse. To resolve your problem, you shouldn't be trying to make the planner pick a better plan, you should optimize your settings to get this plan to perform better or (ideally) optimize your application so you don't need such an expensive query (because the fundamental problem is that this query is inherently expensive). --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
On 03/25/2011 12:49 PM, Maciek Sakrejda wrote: > Indexes aren't a magical performance fairy dust. One day I intend to use this line for the title of a presentation slide. Maybe the title of the whole talk. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books