Terrible perfomance during nested "... where x in (select ...)" operator - Mailing list pgsql-bugs
From | Черепанов Леонид |
---|---|
Subject | Terrible perfomance during nested "... where x in (select ...)" operator |
Date | |
Msg-id | D078291436B0D411B08800805F6D1315073F13@fantom.tcnet.ru Whole thread Raw |
Responses |
Re: Terrible perfomance during nested "... where x in (select
...)" operator
|
List | pgsql-bugs |
Leonid (leo@rusmedia.net) reports a bug with a severity of 2(?) Short Description Terrible perfomance during nested "... where x in (select ...)" operator Long Description PostgreSQL 7.1, FreeBSD 4.2-STABLE Analyzing the reasons for terrible perfomance of my query I've found a very strange thing. Here is sublimation. Queries likeselect distinct i from t1 where i in (select distinct i from t2 where j in (select distinct j from t3)); is much-much slower thanselect distinct t1.i from t1,t2,t3 where t1.i=t2.i and t2.j=t3.j; EXPLAIN results. The 1st one: NOTICE: QUERY PLAN: Unique (cost=0.00..62059410444.36 rows=15000 width=4) -> Index Scan using t1_i on t1 (cost=0.00..62059410069.36 rows=150000 width=4) SubPlan -> Materialize (cost=413729.34..413729.34 rows=400 width=4) -> Unique (cost=0.00..413729.34rows=400 width=4) -> Index Scan using t2_i on t2 (cost=0.00..413719.34 rows=4000 width=4) SubPlan -> Materialize (cost=103.38..103.38 rows=225 width=4) -> Unique (cost=0.00..103.38 rows=225 width=4) -> Index Scan using t3_j on t3 (cost=0.00..97.76 rows=2250 wid th=4) EXPLAIN And the second: NOTICE: QUERY PLAN: Unique (cost=362.62..10472.26 rows=2768 width=16) -> Merge Join (cost=362.62..10403.06 rows=27681 width=16) -> Index Scan using t1_i on t1 (cost=0.00..8145.36 rows=150000 width=4) -> Sort (cost=362.62..362.62 rows=1606 width=12) -> Hash Join (cost=102.63..277.13 rows=1606width=12) -> Seq Scan on t3 (cost=0.00..34.50 rows=2250 width=4) -> Hash (cost=62.00..62.00 rows=4000 width=8) -> Seq Scan on t2 (cost=0.00..62.00 rows=4000 width=8) EXPLAIN *** IMPORTANT !!! *** VACUUM and VACUUM ANALIZY make things even worse; NOTICE: QUERY PLAN: Unique (cost=0.00..186176903970.36 rows=15000 width=4) -> Index Scan using t1_i on t1 (cost=0.00..186176903595.36 rows=150000 width=4) SubPlan -> Materialize (cost=1241179.30..1241179.30 rows=1200 width=4) -> Unique (cost=0.00..1241179.30 rows=1200 width=4) -> Index Scan using t2_i on t2 (cost=0.00..1241149.30 rows=12000 width=4) SubPlan -> Materialize (cost=103.38..103.38 rows=225 width=4) -> Unique (cost=0.00..103.38 rows=225 width=4) -> Index Scan using t3_j on t3 (cost=0.00..97.76 rows=2250 wid th=4) EXPLAIN NOTICE: QUERY PLAN: Unique (cost=1027.91..11289.49 rows=7240 width=16) -> Merge Join (cost=1027.91..11108.49 rows=72401 width=16) -> Index Scan using t1_i on t1 (cost=0.00..8145.36 rows=150000 width=4) -> Sort (cost=1027.91..1027.91 rows=4817 width=12) -> Hash Join (cost=40.12..733.25 rows=4817width=12) -> Seq Scan on t2 (cost=0.00..185.00 rows=12000 width=8) -> Hash (cost=34.50..34.50 rows=2250 width=4) -> Seq Scan on t3 (cost=0.00..34.50rows=2250 width=4) EXPLAIN Real working times differ 8 TIMES ! Here is a perl script, which generates a PgSQL files with sample tables and datadump. The "scale" parameter controls the amount of data, generated by the script. Recommended "scale" value is 0.1 for typical computer. --------------------------8<-------------------------- open (F, "> create"); print F (<<CREATE); drop table t1; drop table t2; drop table t3; create table t1 (i int); create table t2 (i int, j int); create table t3 (j int); create index t1_i on t1 (i); create index t2_i on t2 (i); create index t2_j on t2 (j); create index t3_j on t3 (j); CREATE open (F, "> compare"); print F (<<COMPARE); explainselect count(distinct i) from t1 where i in (select distinct i from t2 where j in (select distinct j from t3)); select now();select count(distinct i) from t1 where i in (select distinct i from t2 where j in (select distinct j from t3));select now(); explain select count(distinct t1.i) from t1,t2,t3 where t1.i=t2.i and t2.j=t3.j; select now();select count(distinct t1.i) from t1,t2,t3 where t1.i=t2.i and t2.j=t3.j;select now(); explainselect count(distinct t1.i) from t1 inner join (select distinct t2.i from t2 inner join (select distinct j from t3)as tt3 on tt3.j=t2.j ) as tt2 on tt2.i=t1.i; select now();select count(distinct t1.i) from t1 inner join (select distinct t2.i from t2 inner join (select distinct j from t3) as tt3 on tt3.j=t2.j ) as tt2 on tt2.i=t1.i;select now(); COMPARE print ("Scale: "); $m=<>; chomp $m; open (F, "> data." . $m); open (F, ">> data." . $m); print F ("copy t1 from stdin;\n"); for($k=0;$k<300000*$m;$k++) {$r = int ( rand(4000*$m) );print F ("$r\n"); } print F (<<T1); \\. select count(*) as t1_rows from t1; select count(distinct i) as t1_dist_rows from t1; T1 print F ("\ncopy t2 from stdin;\n"); for($k=0;$k<8000*$m;$k++) {$r = int ( rand(4000*$m) );print F ("$r\t$k\n"); } print F (<<T2); \\. select count(*) as t2_rows from t2; select count(distinct i) as t2_dist_rows from t2; T2 print F ("\ncopy t3 from stdin;\n"); for($k=0;$k<4500*$m;$k++) {$r = int ( rand(8000*$m) );print F ("$r\n"); } print F (<<T3); \\. select count(*) as t3_rows from t3; select count(distinct j) as t3_dist_rows from t3; T3 --------------------------8<--------------------------
pgsql-bugs by date: