Thread: Inaccurate row count estimation

Inaccurate row count estimation

From
"Vyacheslav Kalinin"
Date:
Hello,

Consider two tables:

contacts:
  cid integer primary key,
  pid integer not null,
  cpid integer
  ...

pinfo:
  pid integer,
  ...

pinfo is a parent table with two partitions pinfo_p00 and pinfo_p01, all three have primary keys on pid and partitions have proper constraints
that guarantee pid uniqueness across them.

Now here's the part of the query:

select *
  from contacts c
  left join pinfo pi on (pi.pid = c.cpid)

QUERY PLAN
                    ->  Nested Loop Left Join  (cost=0.00..444.90 rows=1515 width=408) (actual time=0.108..5.561 rows=44 loops=1)
                          Join Filter: (pi.pid = c.cpid)
                          ->  Index Scan using ix_contacts_pid on contacts c  (cost=0.00..14.84 rows=50 width=26) (actual time=0.038..0.425 rows=44 loops=1)
                                Index Cond: (pid = 167)
                          ->  Append  (cost=0.00..8.56 rows=3 width=386) (actual time=0.067..0.090 rows=1 loops=44)
                                ->  Index Scan using pk_pinfo on pinfo pi  (cost=0.00..1.15 rows=1 width=386) (actual time=0.008..0.008 rows=0 loops=44)
                                      Index Cond: (pi.pid = c.cpid)
                                ->  Index Scan using pk_pinfo_p00 on pinfo_p00 pi  (cost=0.00..3.23 rows=1 width=46) (actual time=0.011..0.014 rows=0 loops=44)
                                      Index Cond: (pi.pid = c.cpid)
                                ->  Index Scan using pk_pinfo_p01 on pinfo_p01 pi  (cost=0.00..4.19 rows=1 width=46) (actual time=0.011..0.014 rows=0 loops=44)
                                      Index Cond: (pi.pid = c.cpid)

How come that outermost join expects 1515 rows given the row estimations of the inner and outer nested loop's parts?

Re: Inaccurate row count estimation

From
Tom Lane
Date:
"Vyacheslav Kalinin" <vka@mgcp.com> writes:
> How come that outermost join expects 1515 rows given the row estimations of
> the inner and outer nested loop's parts?

I couldn't reproduce such a problem.  What PG version are you running?

            regards, tom lane

Re: Inaccurate row count estimation

From
"Vyacheslav Kalinin"
Date:
This is 8.3.0.

Here is the reproduce code:

create table contacts (
  cid integer primary key,
  pid integer not null,
  cpid integer
);

create index ix_contacts_pid on contacts (pid);
create index ix_contacts_cpid on contacts (cpid);

create table pinfo (
  pid integer,
  constraint pk_pinfo primary key (pid)
);

create table pinfo_p00 (
  constraint pk_pinfo_p00 primary key (pid),
  constraint cc_pinfo_p00_pid check(pid > 0 and pid < 100000)
) inherits (pinfo);

create table pinfo_p01 (
  constraint pk_pinfo_p01 primary key (pid),
  constraint cc_pinfo_p01_pid check(pid >= 100000 and pid < 200000)
) inherits (pinfo);

insert into pinfo_p00 (pid)
select i from generate_series(10, 10000) i;

insert into pinfo_p01 (pid)
select i from generate_series(100010, 110000) i;

create sequence contacts_seq start 100;

insert into contacts (cid, pid, cpid)
select  nextval('contacts_seq'), p, cp.pid
  from generate_series(100, 1000) p,
      (
       (select pid from pinfo_p00 order by random() limit 20)
       union all
       (select pid from pinfo_p01 order by random() limit 20)
      ) cp

analyze contacts;
analyze pinfo;
analyze pinfo_p00;
analyze pinfo_p01;

explain analyze
select *
  from contacts c
  left join pinfo pi on (pi.pid = c.cpid)
 where c.pid = 200 ;

QUERY PLAN
Nested Loop Left Join  (cost=4.56..569.17 rows=4364 width=16) (actual time=0.150..4.595 rows=40 loops=1)
  Join Filter: (pi.pid = c.cpid)
  ->  Bitmap Heap Scan on contacts c  (cost=4.56..100.34 rows=39 width=12) (actual time=0.067..0.441 rows=40 loops=1)
        Recheck Cond: (pid = 200)
        ->  Bitmap Index Scan on ix_contacts_pid  (cost=0.00..4.55 rows=39 width=0) (actual time=0.041..0.041 rows=40 loops=1)
              Index Cond: (pid = 200)
  ->  Append  (cost=0.00..11.98 rows=3 width=4) (actual time=0.048..0.076 rows=1 loops=40)
        ->  Index Scan using pk_pinfo on pinfo pi  (cost=0.00..1.40 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=40)
              Index Cond: (pi.pid = c.cpid)
        ->  Index Scan using pk_pinfo_p00 on pinfo_p00 pi  (cost=0.00..5.29 rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40)
              Index Cond: (pi.pid = c.cpid)
        ->  Index Scan using pk_pinfo_p01 on pinfo_p01 pi  (cost=0.00..5.29 rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40)
              Index Cond: (pi.pid = c.cpid)
Total runtime: 4.941 ms

Re: Inaccurate row count estimation

From
Tom Lane
Date:
"Vyacheslav Kalinin" <vka@mgcp.com> writes:
> Here is the reproduce code:

After tracing through this I see that the problem is that we don't have
statistics for inheritance trees, and so you're getting a default
estimate for the selectivity of the join condition.  I'm not sure why
the similar example I tried last night didn't show a bogus-looking
rowcount estimate, but in any case the lack of stats is the real issue.

This should get fixed sometime, but don't hold your breath ...

            regards, tom lane

Re: Inaccurate row count estimation

From
"Vyacheslav Kalinin"
Date:
Thanks for the reply, Tom.

After tracing through this I see that the problem is that we don't have
statistics for inheritance trees, and so you're getting a default
estimate for the selectivity of the join condition. 

I might be wrong but I suspect that the inheritance is not the only reason here. If I change the table definitions to:

create table pinfo_p00 (
  pid integer,
  constraint pk_pinfo_p00 primary key (pid),
  constraint cc_pinfo_p00_pid check(pid > 0 and pid < 100000)
);

create table pinfo_p01 (
  pid integer,
  constraint pk_pinfo_p01 primary key (pid),
  constraint cc_pinfo_p01_pid check(pid >= 100000 and pid < 200000)
);

and create a view pinfo, or just do a query with subselect:

explain analyze
select *
  from contacts c
  left join (
      select * from pinfo_p00
      union all
      select * from pinfo_p01
  ) pi on (pi.pid = c.cpid)
 where c.pid = 200 ;

the row-count assessment doesn't seem to be different:

QUERY PLAN
Nested Loop Left Join  (cost=4.56..514.25 rows=3896 width=16) (actual time=0.125..3.976 rows=40 loops=1)
  Join Filter: (pinfo_p00.pid = c.cpid)
  ->  Bitmap Heap Scan on contacts c  (cost=4.56..100.34 rows=39 width=12) (actual time=0.069..0.421 rows=40 loops=1)
        Recheck Cond: (pid = 200)
        ->  Bitmap Index Scan on ix_contacts_pid  (cost=0.00..4.55 rows=39 width=0) (actual time=0.042..0.042 rows=40 loops=1)
              Index Cond: (pid = 200)
  ->  Append  (cost=0.00..10.59 rows=2 width=4) (actual time=0.033..0.061 rows=1 loops=40)
        ->  Index Scan using pk_pinfo_p00 on pinfo_p00  (cost=0.00..5.29 rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40)
              Index Cond: (pinfo_p00.pid = c.cpid)
        ->  Index Scan using pk_pinfo_p01 on pinfo_p01  (cost=0.00..5.29 rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40)
              Index Cond: (pinfo_p01.pid = c.cpid)
Total runtime: 4.341 ms

It scares me a bit as it seems that innocent-looking combination of union's and join's could destroy the subsequent plan completely.