Thread: Estimation row error
Hello,
I would like to know how row estimation is calculed by explain ?
In my execution plan, this estimation is extremely wrong (267 instead of 198000)
I reproduced this estimation error in this simple case :
drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;
create table t1 as select generate_Series(1,300000) as c1;
create table t2 as select generate_Series(1,400) as c1;
create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;
create table t4 as select generate_Series(1,200000) as c1;
alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);
analyze t1;
analyze t2;
analyze t3;
analyze t4;
EXPLAIN (analyze on, buffers on, verbose on)
select
*
from
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1
Explain plan :
I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ?
Thank you for answering
Best Regards,
Sorry, I forget to precise Postgresql version
BR
Mathieu VINCENT
2015-12-11 9:53 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Hello,I would like to know how row estimation is calculed by explain ?In my execution plan, this estimation is extremely wrong (267 instead of 198000)I reproduced this estimation error in this simple case :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze t1;analyze t2;analyze t3;analyze t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Explain plan :I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ?Thank you for answeringBest Regards,
Hello,
Mathieu VINCENT
No one to help me to understand this bad estimation rows ?
Mathieu VINCENT
2015-12-11 12:35 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Sorry, I forget to precise Postgresql version'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit'
BRMathieu VINCENT
2015-12-11 9:53 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:Hello,I would like to know how row estimation is calculed by explain ?In my execution plan, this estimation is extremely wrong (267 instead of 198000)I reproduced this estimation error in this simple case :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze t1;analyze t2;analyze t3;analyze t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Explain plan :I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ?Thank you for answeringBest Regards,
Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT: > Hello, > > No one to help me to understand this bad estimation rows ? Well, on a rather beefy machine, I'm getting quite a different plan: http://explain.depesz.com/s/3y5r Which may be related to this setting: perftest=# show default_statistics_target ; default_statistics_target --------------------------- 1000 (1 Zeile) I guess the wrong row assumption (which I get as well!) is caused by the given correlation of t3.c1 and t3.c2 (which the planner doesn't "see"). Tomas Vondra has written a nice blog post, covering that topic as well: http://blog.pgaddict.com/posts/common-issues-with-planner-statistics AFAIK, 9.5 has received some improvements in that field, but I didn't try that yet. Best regards, Nick > > Mathieu VINCENT > > 2015-12-11 12:35 GMT+01:00 Mathieu VINCENT > <mathieu.vincent@pmsipilot.com <mailto:mathieu.vincent@pmsipilot.com>>: > > Sorry, I forget to precise Postgresql version > > 'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) > 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit' > > > BR > > Mathieu VINCENT > > > > > 2015-12-11 9:53 GMT+01:00 Mathieu VINCENT > <mathieu.vincent@pmsipilot.com <mailto:mathieu.vincent@pmsipilot.com>>: > > Hello, > > I would like to know how row estimation is calculed by explain ? > In my execution plan, this estimation is extremely wrong (267 > instead of 198000) > I reproduced this estimation error in this simple case : > > drop table if exists t1; > drop table if exists t2; > drop table if exists t3; > drop table if exists t4; > > create table t1 as select generate_Series(1,300000) as c1; > create table t2 as select generate_Series(1,400) as c1; > create table t3 as select generate_Series(1,200000)%100 as > c1,generate_Series(1,200000) as c2; > create table t4 as select generate_Series(1,200000) as c1; > > alter table t1 add PRIMARY KEY (c1); > alter table t2 add PRIMARY KEY (c1); > alter table t3 add PRIMARY KEY (c1,c2); > create index on t3 (c1); > create index on t3 (c2); > alter table t4 add PRIMARY KEY (c1); > > analyze t1; > analyze t2; > analyze t3; > analyze t4; > > EXPLAIN (analyze on, buffers on, verbose on) > select > * > from > t1 t1 > inner join t2 on t1.c1=t2.c1 > inner join t3 on t2.c1=t3.c1 > inner join t4 on t3.c2=t4.c1 > > Explain plan : > http://explain.depesz.com/s/wZ3v > > I think this error may be problematic because planner will > choose nested loop instead of hash joins for ultimate join. Can > you help me to improve this row estimation ? > > Thank you for answering > > Best Regards, > <http://www.psih.fr/> PSIH Décisionnel en santé > Mathieu VINCENT > Data Analyst > PMSIpilot - 61 rue Sully - 69006 Lyon - France > > > -- Gunnar "Nick" Bluth DBA ELSTER Tel: +49 911/991-4665 Mobil: +49 172/8853339
Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote: > Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT: > > Hello, > > > > No one to help me to understand this bad estimation rows ? > > Well, > > on a rather beefy machine, I'm getting quite a different plan: > http://explain.depesz.com/s/3y5r you are using 9.5, right? Got the same plan with 9.5. Btw.: Hi Gunnar ;-) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer: > Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote: > >> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT: >>> Hello, >>> >>> No one to help me to understand this bad estimation rows ? >> >> Well, >> >> on a rather beefy machine, I'm getting quite a different plan: >> http://explain.depesz.com/s/3y5r > > you are using 9.5, right? Got the same plan with 9.5. Nope...: version ------------------------------------------------------------------------------------------------------------ PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit So much for those correlation improvements then ;-/ > Btw.: Hi Gunnar ;-) Hi :) -- Gunnar "Nick" Bluth DBA ELSTER Tel: +49 911/991-4665 Mobil: +49 172/8853339
thks Gunnar,
Have a good day
I removed the correlation between t3.c1 and t3.c2 in this sql script :
drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;
create table t1 as select generate_Series(1,300000) as c1;
create table t2 as select generate_Series(1,400) as c1;
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;
alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);
analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;
EXPLAIN (analyze on, buffers on, verbose on)
select
*
from
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1
Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good day
Mathieu VINCENT
2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Thank you both for the help!
happy holidays
2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,I removed the correlation between t3.c1 and t3.c2 in this sql script :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;analyze verbose t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good dayMathieu VINCENT2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Here, another issue with row estimate.
And, in this example, there is not correlation beetween columns in a same table.
drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
create table t1 as select generate_Series(1,200000) as c1;
create table t2 as select generate_Series(1,200000)%100 as c1;
create table t3 as select generate_Series(1,1500)%750 as c1;
alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);
analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
EXPLAIN (analyze on, buffers on, verbose on)
select
*
from
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
the explain plan : http://explain.depesz.com/s/YVw
Do you understand how postgresql calculate the row estimate ?
BR
BR
Mathieu VINCENT
2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:
Thank you both for the help!happy holidays2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:thks Gunnar,I removed the correlation between t3.c1 and t3.c2 in this sql script :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;analyze verbose t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good dayMathieu VINCENT2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Adding foreign key between on t2 and t3, does not change the plan.
drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
create table t1 as select generate_Series(1,200000) as c1;
create table t2 as select generate_Series(1,200000)%100+1 as c1;
create table t3 as select generate_Series(1,1500)%750+1 as c1;
alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);
ALTER TABLE t2 ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
ALTER TABLE t3 ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);
analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
EXPLAIN (analyze on, buffers on, verbose on)
select
*
from
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t1.c1=t3.c1
2015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Here, another issue with row estimate.And, in this example, there is not correlation beetween columns in a same table.drop table if exists t1;drop table if exists t2;drop table if exists t3;create table t1 as select generate_Series(1,200000) as c1;create table t2 as select generate_Series(1,200000)%100 as c1;create table t3 as select generate_Series(1,1500)%750 as c1;alter table t1 add PRIMARY KEY (c1);create index on t2 (c1);create index on t3 (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1the explain plan : http://explain.depesz.com/s/YVwDo you understand how postgresql calculate the row estimate ?
BRMathieu VINCENT2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:Thank you both for the help!happy holidays2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:thks Gunnar,I removed the correlation between t3.c1 and t3.c2 in this sql script :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;analyze verbose t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good dayMathieu VINCENT2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Hello,
No one to help me to understand this bad estimation rows ?
It's NOT caused by :
- correlation between columns (cross-correlation)
- bad statistics (i tried with default_statistics_target to 10 000)
- bad number of distinct values
- complexe join conditions
I have no more ideas.
thank you for your help.
Mathieu VINCENT
2015-12-17 11:58 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Adding foreign key between on t2 and t3, does not change the plan.drop table if exists t1;drop table if exists t2;drop table if exists t3;create table t1 as select generate_Series(1,200000) as c1;create table t2 as select generate_Series(1,200000)%100+1 as c1;create table t3 as select generate_Series(1,1500)%750+1 as c1;alter table t1 add PRIMARY KEY (c1);create index on t2 (c1);create index on t3 (c1);ALTER TABLE t2 ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);ALTER TABLE t3 ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t1.c1=t3.c12015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:Here, another issue with row estimate.And, in this example, there is not correlation beetween columns in a same table.drop table if exists t1;drop table if exists t2;drop table if exists t3;create table t1 as select generate_Series(1,200000) as c1;create table t2 as select generate_Series(1,200000)%100 as c1;create table t3 as select generate_Series(1,1500)%750 as c1;alter table t1 add PRIMARY KEY (c1);create index on t2 (c1);create index on t3 (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1the explain plan : http://explain.depesz.com/s/YVwDo you understand how postgresql calculate the row estimate ?
BRMathieu VINCENT2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:Thank you both for the help!happy holidays2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:thks Gunnar,I removed the correlation between t3.c1 and t3.c2 in this sql script :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;analyze verbose t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good dayMathieu VINCENT2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Hi
2015-12-18 16:21 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Hello,No one to help me to understand this bad estimation rows ?It's NOT caused by :
- correlation between columns (cross-correlation)
- bad statistics (i tried with default_statistics_target to 10 000)
- bad number of distinct values
- complexe join conditions
I have no more ideas.
PostgreSQL has not cross tables statistics - so expect uniform distribution of foreign keys. This expectation is broken in your example.
You can find some prototype solutions by Tomas Vondra in hackars mailing list.
Regards
Pavel
thank you for your help.Mathieu VINCENT2015-12-17 11:58 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:Adding foreign key between on t2 and t3, does not change the plan.drop table if exists t1;drop table if exists t2;drop table if exists t3;create table t1 as select generate_Series(1,200000) as c1;create table t2 as select generate_Series(1,200000)%100+1 as c1;create table t3 as select generate_Series(1,1500)%750+1 as c1;alter table t1 add PRIMARY KEY (c1);create index on t2 (c1);create index on t3 (c1);ALTER TABLE t2 ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);ALTER TABLE t3 ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t1.c1=t3.c12015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:Here, another issue with row estimate.And, in this example, there is not correlation beetween columns in a same table.drop table if exists t1;drop table if exists t2;drop table if exists t3;create table t1 as select generate_Series(1,200000) as c1;create table t2 as select generate_Series(1,200000)%100 as c1;create table t3 as select generate_Series(1,1500)%750 as c1;alter table t1 add PRIMARY KEY (c1);create index on t2 (c1);create index on t3 (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1the explain plan : http://explain.depesz.com/s/YVwDo you understand how postgresql calculate the row estimate ?
BRMathieu VINCENT2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:Thank you both for the help!happy holidays2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:thks Gunnar,I removed the correlation between t3.c1 and t3.c2 in this sql script :drop table if exists t1;drop table if exists t2;drop table if exists t3;drop table if exists t4;create table t1 as select generate_Series(1,300000) as c1;create table t2 as select generate_Series(1,400) as c1;create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;create table t4 as select generate_Series(1,200000) as c1;alter table t1 add PRIMARY KEY (c1);alter table t2 add PRIMARY KEY (c1);alter table t3 add PRIMARY KEY (c1,c2);create index on t3 (c1);create index on t3 (c2);alter table t4 add PRIMARY KEY (c1);analyze verbose t1;analyze verbose t2;analyze verbose t3;analyze verbose t4;EXPLAIN (analyze on, buffers on, verbose on)select*fromt1 t1inner join t2 on t1.c1=t2.c1inner join t3 on t2.c1=t3.c1inner join t4 on t3.c2=t4.c1Now, the estimate is good : http://explain.depesz.com/s/gCX
Have a good dayMathieu VINCENT2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.
Nope...:
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
So much for those correlation improvements then ;-/
> Btw.: Hi Gunnar ;-)
Hi :)
--
Gunnar "Nick" Bluth
DBA ELSTER
Tel: +49 911/991-4665
Mobil: +49 172/8853339--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance