Thread: Estimated rows question

Estimated rows question

From
Sam Ross
Date:
I was wondering why it seems that the query planner can't "see", based
on the histograms, that two join-columns have a very small
intersection, and adjust its row estimation accordingly.  Clearly the
below query returns 1001 rows.  It appears as if much or all of the
necessary machinery exists in mergejoinscansel, and indeed if you
inspect
  leftstartsel, leftendsel, rightstartsel, rightendsel during execution
they are respectively  0.98, 1.00, 0.00, 0.020, which I believe makes sense.

Am I missing something obvious?
Thanks
Sam

create table table_a as select * from generate_series(1,61000) as pkey;
create table table_b as select * from generate_series(60000,110000) as pkey;
create unique index idx_a on table_a(pkey);
create unique index idx_b on table_b(pkey);
analyse table_a;
analyse table_b;

explain select * from table_a a inner join table_b b on a.pkey = b.pkey;

                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Merge Join  (cost=1984.88..2550.42 rows=50001 width=8)
   Merge Cond: (a.pkey = b.pkey)
   ->  Index Only Scan using idx_a on table_a a  (cost=0.00..1864.32
rows=61000 width=4)
   ->  Index Only Scan using idx_b on table_b b  (cost=0.00..1531.32
rows=50001 width=4)


Re: Estimated rows question

From
Tom Lane
Date:
[ sorry for slow response, but I'd not gotten time to think about this... ]

Sam Ross <elliptic@gmail.com> writes:
> I was wondering why it seems that the query planner can't "see", based
> on the histograms, that two join-columns have a very small
> intersection, and adjust its row estimation accordingly.

The reason why not is that eqjoinsel() doesn't take any such
consideration into account.  It's possible that it'd be a good idea
to teach it to do so.  I'm not entirely convinced though.  It would
add a fair amount of expense to that function, as well as adding
some possibly shaky assumptions, and I'm not sure how often we'd
get a usefully-better estimate in practice.  OTOH, there are a lot
of shaky assumptions in eqjoinsel() already, and we did decide this
was worth worrying about in mergejoin cost estimation.

Do you want to try it and submit a patch for testing?

            regards, tom lane