Thread: Estimation row error

Estimation row error

From
Mathieu VINCENT
Date:
Hello,

I would like to know how row estimation is calculed by explain ?
In my execution plan, this estimation is extremely wrong (267 instead of 198000)
I reproduced this estimation error in this simple case :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze t1;
analyze t2;
analyze t3;
analyze t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Explain plan :

I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ? 

Thank you for answering

Best Regards,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France

Re: Estimation row error

From
Mathieu VINCENT
Date:
Sorry, I forget to precise Postgresql version

'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit'


BR

Mathieu VINCENT



2015-12-11 9:53 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Hello,

I would like to know how row estimation is calculed by explain ?
In my execution plan, this estimation is extremely wrong (267 instead of 198000)
I reproduced this estimation error in this simple case :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze t1;
analyze t2;
analyze t3;
analyze t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Explain plan :

I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ? 

Thank you for answering

Best Regards,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France

Re: Estimation row error

From
Mathieu VINCENT
Date:
Hello,

No one to help me to understand this bad estimation rows ?

Mathieu VINCENT

2015-12-11 12:35 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Sorry, I forget to precise Postgresql version

'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit'


BR

Mathieu VINCENT



2015-12-11 9:53 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Hello,

I would like to know how row estimation is calculed by explain ?
In my execution plan, this estimation is extremely wrong (267 instead of 198000)
I reproduced this estimation error in this simple case :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select generate_Series(1,200000)%100 as c1,generate_Series(1,200000) as c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze t1;
analyze t2;
analyze t3;
analyze t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Explain plan :

I think this error may be problematic because planner will choose nested loop instead of hash joins for ultimate join. Can you help me to improve this row estimation ? 

Thank you for answering

Best Regards,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France


Re: Estimation row error

From
"Gunnar \"Nick\" Bluth"
Date:
Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
> Hello,
>
> No one to help me to understand this bad estimation rows ?

Well,

on a rather beefy machine, I'm getting quite a different plan:
http://explain.depesz.com/s/3y5r

Which may be related to this setting:
perftest=# show default_statistics_target ;
 default_statistics_target
---------------------------
 1000
(1 Zeile)


I guess the wrong row assumption (which I get as well!) is caused by the
given correlation of t3.c1 and t3.c2 (which the planner doesn't "see").

Tomas Vondra has written a nice blog post, covering that topic as well:
http://blog.pgaddict.com/posts/common-issues-with-planner-statistics

AFAIK, 9.5 has received some improvements in that field, but I didn't
try that yet.

Best regards,

Nick

>
> Mathieu VINCENT
>
> 2015-12-11 12:35 GMT+01:00 Mathieu VINCENT
> <mathieu.vincent@pmsipilot.com <mailto:mathieu.vincent@pmsipilot.com>>:
>
>     Sorry, I forget to precise Postgresql version
>
>     'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
>     4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit'
>
>
>     BR
>
>     Mathieu VINCENT
>
>
>
>
>     2015-12-11 9:53 GMT+01:00 Mathieu VINCENT
>     <mathieu.vincent@pmsipilot.com <mailto:mathieu.vincent@pmsipilot.com>>:
>
>         Hello,
>
>         I would like to know how row estimation is calculed by explain ?
>         In my execution plan, this estimation is extremely wrong (267
>         instead of 198000)
>         I reproduced this estimation error in this simple case :
>
>         drop table if exists t1;
>         drop table if exists t2;
>         drop table if exists t3;
>         drop table if exists t4;
>
>         create table t1 as select generate_Series(1,300000) as c1;
>         create table t2 as select generate_Series(1,400) as c1;
>         create table t3 as select generate_Series(1,200000)%100 as
>         c1,generate_Series(1,200000) as c2;
>         create table t4 as select generate_Series(1,200000) as c1;
>
>         alter table t1 add PRIMARY KEY (c1);
>         alter table t2 add PRIMARY KEY (c1);
>         alter table t3 add PRIMARY KEY (c1,c2);
>         create index on t3 (c1);
>         create index on t3 (c2);
>         alter table t4 add PRIMARY KEY (c1);
>
>         analyze t1;
>         analyze t2;
>         analyze t3;
>         analyze t4;
>
>         EXPLAIN (analyze on, buffers on, verbose on)
>         select
>         *
>         from
>         t1 t1
>         inner join t2 on t1.c1=t2.c1
>         inner join t3 on t2.c1=t3.c1
>         inner join t4 on t3.c2=t4.c1
>
>         Explain plan :
>         http://explain.depesz.com/s/wZ3v
>
>         I think this error may be problematic because planner will
>         choose nested loop instead of hash joins for ultimate join. Can
>         you help me to improve this row estimation ?
>
>         Thank you for answering
>
>         Best Regards,
>         <http://www.psih.fr/>    PSIH Décisionnel en santé
>         Mathieu VINCENT
>         Data Analyst
>         PMSIpilot - 61 rue Sully - 69006 Lyon - France
>
>
>


--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


Re: Estimation row error

From
Andreas Kretschmer
Date:
Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:

> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
> > Hello,
> >
> > No one to help me to understand this bad estimation rows ?
>
> Well,
>
> on a rather beefy machine, I'm getting quite a different plan:
> http://explain.depesz.com/s/3y5r

you are using 9.5, right? Got the same plan with 9.5.

Btw.: Hi Gunnar ;-)


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: Estimation row error

From
"Gunnar \"Nick\" Bluth"
Date:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


Re: Estimation row error

From
Mathieu VINCENT
Date:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Estimation row error

From
Matteo Grolla
Date:
Thank you both for the help!
happy holidays

2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Estimation row error

From
Mathieu VINCENT
Date:
Here, another issue with row estimate.
And, in this example, there is not correlation beetween columns in a same table.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100 as c1; 
create table t3 as select generate_Series(1,1500)%750 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
Do you understand how postgresql calculate the row estimate ?

BR
Mathieu VINCENT

2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:
Thank you both for the help!
happy holidays

2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: Estimation row error

From
Mathieu VINCENT
Date:
Adding foreign key between on t2 and t3, does not change the plan.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100+1 as c1; 
create table t3 as select generate_Series(1,1500)%750+1 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);
ALTER TABLE t2  ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
ALTER TABLE t3  ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t1.c1=t3.c1

Cordialement,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France

2015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Here, another issue with row estimate.
And, in this example, there is not correlation beetween columns in a same table.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100 as c1; 
create table t3 as select generate_Series(1,1500)%750 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
Do you understand how postgresql calculate the row estimate ?

BR
Mathieu VINCENT

2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:
Thank you both for the help!
happy holidays

2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: Estimation row error

From
Mathieu VINCENT
Date:
Hello,

No one to help me to understand this bad estimation rows ?
It's NOT caused by :
  • correlation between columns (cross-correlation)
  • bad statistics (i tried with  default_statistics_target to 10 000)
  • bad number of distinct values
  • complexe join conditions
I have no more ideas.

thank you for your help.
Mathieu VINCENT

2015-12-17 11:58 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Adding foreign key between on t2 and t3, does not change the plan.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100+1 as c1; 
create table t3 as select generate_Series(1,1500)%750+1 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);
ALTER TABLE t2  ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
ALTER TABLE t3  ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t1.c1=t3.c1

Cordialement,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France

2015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Here, another issue with row estimate.
And, in this example, there is not correlation beetween columns in a same table.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100 as c1; 
create table t3 as select generate_Series(1,1500)%750 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
Do you understand how postgresql calculate the row estimate ?

BR
Mathieu VINCENT

2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:
Thank you both for the help!
happy holidays

2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





Re: Estimation row error

From
Pavel Stehule
Date:
Hi

2015-12-18 16:21 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Hello,

No one to help me to understand this bad estimation rows ?
It's NOT caused by :
  • correlation between columns (cross-correlation)
  • bad statistics (i tried with  default_statistics_target to 10 000)
  • bad number of distinct values
  • complexe join conditions
I have no more ideas.

PostgreSQL has not cross tables statistics - so expect uniform distribution of foreign keys.  This expectation is broken in your example.

You can find some prototype solutions by Tomas Vondra in hackars mailing list.

Regards

Pavel
 

thank you for your help.
Mathieu VINCENT

2015-12-17 11:58 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Adding foreign key between on t2 and t3, does not change the plan.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100+1 as c1; 
create table t3 as select generate_Series(1,1500)%750+1 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);
ALTER TABLE t2  ADD CONSTRAINT t2_fk FOREIGN KEY (c1) REFERENCES t1(c1);
ALTER TABLE t3  ADD CONSTRAINT t3_fk FOREIGN KEY (c1) REFERENCES t1(c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t1.c1=t3.c1

Cordialement,
PSIH Décisionnel en santé
Mathieu VINCENT 
Data Analyst
PMSIpilot - 61 rue Sully - 69006 Lyon - France

2015-12-17 11:37 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
Here, another issue with row estimate.
And, in this example, there is not correlation beetween columns in a same table.

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 as select generate_Series(1,200000) as c1; 
create table t2 as select generate_Series(1,200000)%100 as c1; 
create table t3 as select generate_Series(1,1500)%750 as c1;

alter table t1 add PRIMARY KEY (c1);
create index on t2 (c1);
create index on t3 (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
Do you understand how postgresql calculate the row estimate ?

BR
Mathieu VINCENT

2015-12-17 10:14 GMT+01:00 Matteo Grolla <matteo.grolla@gmail.com>:
Thank you both for the help!
happy holidays

2015-12-17 10:10 GMT+01:00 Mathieu VINCENT <mathieu.vincent@pmsipilot.com>:
thks Gunnar,

I removed the correlation between t3.c1 and t3.c2 in this sql script :

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;
drop table if exists t4;

create table t1 as select generate_Series(1,300000) as c1; 
create table t2 as select generate_Series(1,400) as c1; 
create table t3 as select floor(random()*100+1) as c1, c2 from generate_Series(1,200000) c2;
create table t4 as select generate_Series(1,200000) as c1;

alter table t1 add PRIMARY KEY (c1);
alter table t2 add PRIMARY KEY (c1);
alter table t3 add PRIMARY KEY (c1,c2);
create index on t3 (c1);
create index on t3 (c2);
alter table t4 add PRIMARY KEY (c1);

analyze verbose t1;
analyze verbose t2;
analyze verbose t3;
analyze verbose t4;

EXPLAIN (analyze on, buffers on, verbose on)
select 
*
from 
t1 t1
inner join t2 on t1.c1=t2.c1
inner join t3 on t2.c1=t3.c1
inner join t4 on t3.c2=t4.c1

Now, the estimate is good : http://explain.depesz.com/s/gCX

Have a good day

Mathieu VINCENT

2015-12-15 11:21 GMT+01:00 Gunnar "Nick" Bluth <gunnar.bluth.extern@elster.de>:
Am 15.12.2015 um 10:49 schrieb Andreas Kretschmer:
> Gunnar Nick Bluth <gunnar.bluth.extern@elster.de> wrote:
>
>> Am 15.12.2015 um 09:05 schrieb Mathieu VINCENT:
>>> Hello,
>>>
>>> No one to help me to understand this bad estimation rows ?
>>
>> Well,
>>
>> on a rather beefy machine, I'm getting quite a different plan:
>> http://explain.depesz.com/s/3y5r
>
> you are using 9.5, right? Got the same plan with 9.5.

Nope...:
                                                  version

------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

So much for those correlation improvements then ;-/


> Btw.: Hi Gunnar ;-)

Hi :)

--
Gunnar "Nick" Bluth
DBA ELSTER

Tel:   +49 911/991-4665
Mobil: +49 172/8853339


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance