Thread: Cross Join Problem

Cross Join Problem

From

"Gauri Kanekar"

Date:

18 August 2008, 10:37:43

Hi,

Following is the Query :
SELECT sum(id), sum(cd), sum(ad)
       FROM table1 a , table2 b cross join table3 c
       WHERE a.nkey = b.key
             AND a.dkey = c.key
             AND c.date = '2008-02-01'
             AND b.id = 999 ;

We have fired this on our production system which is postgres 8.1.3, and got the following explain analyse of it

Aggregate (cost=11045.52..11045.53 rows=1 width=24) (actual time=79.290..79.291 rows=1 loops=1)
   -> Nested Loop (cost=49.98..11043.42 rows=279 width=24) (actual time=1.729..50.498 rows=10473 loops=1)
         -> Nested Loop (cost=0.00..6.05 rows=1 width=8) (actual time=0.028..0.043 rows=1 loops=1)
               -> Index Scan using rnididx on table2 b (cost=0.00..3.02 rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=1)
                     Index Cond: (id = 999)
               -> Index Scan using rddtidx on table3 c (cost=0.00..3.02 rows=1 width=4) (actual time=0.010..0.016 rows=1 loops=1)
                     Index Cond: (date = '2008-02-01 00:00:00'::timestamp without time zone)
         -> Bitmap Heap Scan on table1 a (cost=49.98..10954.93 rows=5496 width=32) (actual time=1.694..19.006 rows=10473 loops=1)
               Recheck Cond: ((a.nkey = "outer"."key") AND (a.dkey = "outer"."key"))
               -> Bitmap Index Scan on rndateidx (cost=0.00..49.98 rows=5496 width=0) (actual time=1.664..1.664 rows=10473 loops=1)
                     Index Cond: ((a.nkey = "outer"."key") AND (a.dkey = "outer"."key"))
Total runtime: 79.397 ms

Time: 80.752 ms

Same Query when we fire on postgres 8.3.3, following is the explain analyse
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1171996.35..1171996.36 rows=1 width=24) (actual time=6360.783..6360.785 rows=1 loops=1)
   -> Nested Loop (cost=0.00..1171994.28 rows=275 width=24) (actual time=3429.309..6330.424 rows=10473 loops=1)
         Join Filter: (a.nkey = b.key)
         -> Index Scan using rnididx on table2 b (cost=0.00..4.27 rows=1 width=4) (actual time=0.030..0.033 rows=1 loops=1)
               Index Cond: (id = 999)
         -> Nested Loop (cost=0.00..1169411.17 rows=206308 width=28) (actual time=0.098..4818.450 rows=879480 loops=1)
               -> Index Scan using rddtidx on table1 c (cost=0.00..4.27 rows=1 width=4) (actual time=0.031..0.034 rows=1 loops=1)
                     Index Cond: (date = '2008-02-01 00:00:00'::timestamp without time zone)
               -> Index Scan using rdnetidx on table1 a (cost=0.00..1156050.51 rows=1068511 width=32) (actual time=0.047..1732.229 rows=879480 loops=1)
                     Index Cond: (a.dkey = c.key)
Total runtime: 6360.978 ms

The Query on postgres 8.1.3 use to take only 80.752 ms is now taking 6364.950 ms.

We have done vacuum analyse on all the tables.

Can anybody helpout over here ... was may b wrong... and why the query seems to take time on postgres 8.3.3.

Is it 8.3.3 problem or its cross join problem on 8.3.3

Thanx

--
Regards
Gauri

Re: Cross Join Problem

From

Tom Lane

Date:

18 August 2008, 11:02:42

"Gauri Kanekar" <meetgaurikanekar@gmail.com> writes:
> Following is the Query :
> SELECT sum(id), sum(cd), sum(ad)
>        FROM table1 a , table2 b cross join table3 c
>        WHERE a.nkey = b.key
>              AND a.dkey = c.key
>              AND c.date = '2008-02-01'
>              AND b.id = 999 ;


> We have fired this on our production system which is postgres 8.1.3, and got
> the following explain analyse of it

>  Aggregate  (cost=11045.52..11045.53 rows=1 width=24) (actual
> time=79.290..79.291 rows=1 loops=1)
>    ->  Nested Loop  (cost=49.98..11043.42 rows=279 width=24) (actual
> time=1.729..50.498 rows=10473 loops=1)
>          ->  Nested Loop  (cost=0.00..6.05 rows=1 width=8) (actual
> time=0.028..0.043 rows=1 loops=1)
>                ->  Index Scan using rnididx on table2 b  (cost=0.00..3.02
> rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=1)
>                      Index Cond: (id = 999)
>                ->  Index Scan using rddtidx on table3 c  (cost=0.00..3.02
> rows=1 width=4) (actual time=0.010..0.016 rows=1 loops=1)
>                      Index Cond: (date = '2008-02-01 00:00:00'::timestamp
> without time zone)
>          ->  Bitmap Heap Scan on table1 a  (cost=49.98..10954.93 rows=5496
> width=32) (actual time=1.694..19.006 rows=10473 loops=1)
>                Recheck Cond: ((a.nkey = "outer"."key") AND (a.dkey =
> "outer"."key"))
>                ->  Bitmap Index Scan on rndateidx  (cost=0.00..49.98
> rows=5496 width=0) (actual time=1.664..1.664 rows=10473 loops=1)
>                      Index Cond: ((a.nkey = "outer"."key") AND (a.dkey =
> "outer"."key"))
>  Total runtime: 79.397 ms

No PG release since 7.3 would have voluntarily planned that query that
way.  Maybe you were using join_collapse_limit = 1 to force the join
order?

            regards, tom lane

Re: Cross Join Problem

From

Tom Lane

Date:

18 August 2008, 12:20:15

[ please keep the list cc'd for the archives' sake ]

"Gauri Kanekar" <meetgaurikanekar@gmail.com> writes:
> On Mon, Aug 18, 2008 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No PG release since 7.3 would have voluntarily planned that query that
>> way.  Maybe you were using join_collapse_limit = 1 to force the join
>> order?

> Yes, We have set join_collapse_limit set to 1.

Ah, so really your question is why join_collapse_limit isn't working as
you expect.  That code changed quite a bit in 8.2, and the way it works
now is that the critical decision occurs while deciding whether to fold
the cross-join (a sub-problem of size 2) into the top-level join
problem.  Which is a decision that's going to be driven by
from_collapse_limit not join_collapse_limit.

So one way you could make it work is to reduce from_collapse_limit to
less than 3, but I suspect you'd find that that has too many bad
consequences for other queries.  What's probably best is to write the
problem query like this:

    FROM table1 a cross join ( table2 b cross join table3 c )

which will cause join_collapse_limit to be the relevant number at both
steps.

            regards, tom lane

Re: Cross Join Problem

From

"Gauri Kanekar"

Date:

19 August 2008, 10:34:44

Thanx alot... its solved my problem

On Mon, Aug 18, 2008 at 8:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[ please keep the list cc'd for the archives' sake ]

"Gauri Kanekar" <meetgaurikanekar@gmail.com> writes:
> On Mon, Aug 18, 2008 at 7:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No PG release since 7.3 would have voluntarily planned that query that
>> way. Maybe you were using join_collapse_limit = 1 to force the join
>> order?

> Yes, We have set join_collapse_limit set to 1.

Ah, so really your question is why join_collapse_limit isn't working as
you expect. That code changed quite a bit in 8.2, and the way it works
now is that the critical decision occurs while deciding whether to fold
the cross-join (a sub-problem of size 2) into the top-level join
problem. Which is a decision that's going to be driven by
from_collapse_limit not join_collapse_limit.

So one way you could make it work is to reduce from_collapse_limit to
less than 3, but I suspect you'd find that that has too many bad
consequences for other queries. What's probably best is to write the
problem query like this:

FROM table1 a cross join ( table2 b cross join table3 c )

which will cause join_collapse_limit to be the relevant number at both
steps.

regards, tom lane

--
Regards
Gauri