On Sun, Jan 26, 2003 at 11:18:31PM -0500, Tom Lane wrote:
> Bradley Baetz <bbaetz@acm.org> writes:
> > Right, or skip it entirely when selecting stuff with unique constraints.
>
> I'm hesitant to do that until we have some scheme in place for
> invalidating cached plans.
By cached, do you mean PREPARE stuff, or something else?
>
> > I don't think it is. The number of rows is correct if you do product_id
> > IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on.
>
> But that's a completely different code path; it doesn't even enter the
> routines we're concerned about here.
Yes, but its the same concept. Although we seem to be agreeing about
that now :)
> > What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
> > that JOIN_IN doesn't?
>
> Uniqify the inner/outer path and then do a normal inner join. See
> joinpath.c.
Ah, OK. If I comment out line 547 of joinrels.c (which adds JOIN_IN to
the set of join paths) so that the UNIQUE joins are all that are left to
try, then I get:
bbaetz=# explain analyze select count(*) FROM bugs where product_id IN
(SELECT product_id FROM bugs);
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3494816.98..3494816.98 rows=1 width=8) (actual
time=579.71..579.71 rows=1 loops=1)
-> Merge Join (cost=5169.41..3494691.43 rows=50218 width=8) (actual
time=111.41..530.16 rows=50000 loops=1)
Merge Cond: ("outer".product_id = "inner".product_id)
-> Index Scan using bugs_product_id_idx on bugs
(cost=0.00..1834.52 rows=50000 width=4) (actual time=0.13..249.57
rows=50000 loops=1)
-> Sort (cost=920.14..920.17 rows=9 width=4) (actual
time=111.25..143.42 rows=44476 loops=1)
Sort Key: public.bugs.product_id
-> HashAggregate (cost=920.00..920.00 rows=9 width=4)
(actual time=111.17..111.18 rows=9 loops=1)
-> Seq Scan on bugs (cost=0.00..795.00 rows=50000
width=4) (actual time=0.00..67.41 rows=50000 loops=1)
Total runtime: 579.84 msec
(9 rows)
(This isn't picked without my hack, because the cost is slightly higher
than the JOIN_IN version)
However, its much faster (although not as fast as sticking the DISTINCT
in there myself), but the actual rows coming from the sort is really odd
- where is that number coming from? How can sorting 9 rows take 44476
anythings? The final mergejoin cost is still way off, too.
> regards, tom lane
Thanks,
Bradley