Thread: Cartesian product bug?

Cartesian product bug?

From
Åsmund Kveim Lie
Date:
Hi,

We have found a possible bug in 7.3.1. It seems that using CROSS JOIN and
doing plain Cartesian product, listing to tables in the from clause, gives
different results. According to the documentation this should be
equivalent. The following example should explain the problem:

CREATE TABLE a (a1 text, a2 text);
CREATE TABLE b (b1 text, b2 text);
CREATE TABLE c (a1 text, b1 text, c1 text);

INSERT INTO a VALUES('a1', 'a2');
INSERT INTO b VALUES('b1', 'b2');
INSERT INTO c VALUES('a3', 'b1', 'c1');

SELECT * FROM a,b NATURAL JOIN c;
a1 | a2 | b1 | b2 | a1 | c1
----+----+----+----+----+----
  a1 | a2 | b1 | b2 | a3 | c1
(1 row)

SELECT * FROM a CROSS JOIN b NATURAL JOIN c;
a1 | b1 | a2 | b2 | c1
----+----+----+----+----
(0 rows)

These two example queries should give the same result. In the first query,
it seems like it’s doing the natural join between b and c, and then does
the Cartesian product on that result with a. On the second query, it does
as we assume it should, namely does the Cartesian product first.

Is this the correct behavior?

Regards

Åsmund


--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Re: Cartesian product bug?

From
Greg Stark
Date:
Åsmund Kveim Lie <asmundkl@skipthis.ifi.uio.no> writes:

> SELECT * FROM a,b NATURAL JOIN c;

This parses as

select * from a, (b natural join c)

> SELECT * FROM a CROSS JOIN b NATURAL JOIN c;

This parses as

select * from (a cross join b) natural join c

> These two example queries should give the same result. In the first query, it
> seems like it’s doing the natural join between b and c, and then does
> the Cartesian product on that result with a. On the second query, it does as
> we assume it should, namely does the Cartesian product first.
>
> Is this the correct behavior?

yes

You can put parentheses to change the explicit joins like

select * from a cross join (b natural join c);

But the implicit join is harder to fix. I think you either need to use an
explicit join like above or a subquery like

select * from (select * from a,b) as ab natural join c

I tend to find it's easier to stick to all explicit or all implicit joins and
not mix them. Personally I like explicit joins for aesthetic reasons
especially in 7.4 where they get optimized as well as implicit joins.


--
greg

Re: Cartesian product bug?

From
Tom Lane
Date:
=?utf-8?Q?=C3=85smund_Kveim_Lie?= <asmundkl@skipthis.ifi.uio.no> writes:

> SELECT * FROM a,b NATURAL JOIN c;

> SELECT * FROM a CROSS JOIN b NATURAL JOIN c;

> These two example queries should give the same result.

No, they shouldn't, because JOIN binds more tightly than comma.  The
first is equivalent to

SELECT * FROM a CROSS JOIN (b NATURAL JOIN c);

while in the second case the JOINs associate left-to-right, giving

SELECT * FROM (a CROSS JOIN b) NATURAL JOIN c;

Because you have columns with the same names in A and C, the second
NATURAL JOIN has a different implicit join clause than the first.

(Personally I think NATURAL JOIN is an evil, bug-prone construct,
precisely because coincidental matches of column names will mess up your
results.)

> In the first query, it seems like it’s doing the natural
> join between b and c, and then does the Cartesian product on that
> result with a. On the second query, it does as we assume it should,
> namely does the Cartesian product first.

I think your expectations have been set by MySQL, which last I heard
interprets all joins as being done left-to-right.  That's not compliant
with the SQL standard, however.

            regards, tom lane

Re: Cartesian product bug?

From
"scott.marlowe"
Date:
On Fri, 31 Oct 2003, Tom Lane wrote:

> (Personally I think NATURAL JOIN is an evil, bug-prone construct,
> precisely because coincidental matches of column names will mess up your
> results.)

Me too.  When I first saw it, I figured it would "naturally join" the two
tables on their fk/pk relation if there was one.  That seems natural.
Joining on two fields that just happen to have the same name is unnatural
to me.


Re: Cartesian product bug?

From
Greg Stark
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:

> On Fri, 31 Oct 2003, Tom Lane wrote:
>
> > (Personally I think NATURAL JOIN is an evil, bug-prone construct,
> > precisely because coincidental matches of column names will mess up your
> > results.)
>
> Me too.  When I first saw it, I figured it would "naturally join" the two
> tables on their fk/pk relation if there was one.  That seems natural.
> Joining on two fields that just happen to have the same name is unnatural
> to me.

Well 99% of the time I impose on myself a constraint to only use the same name
iff they refer to the same attribute. So if they have the same name then they
really ought to be a reasonable join clause.

However the 1% are things like "date_created, date_updated" or even flags like
"active", "deleted" etc. Which are more than enough to make it utterly
useless.

Too bad really, it would be a handy thing for ad-hoc queries typed at psql. It
would still seem too fragile for production queries though.

--
greg

Re: Cartesian product bug?

From
Bruce Momjian
Date:
Greg Stark wrote:
>
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
>
> > On Fri, 31 Oct 2003, Tom Lane wrote:
> >
> > > (Personally I think NATURAL JOIN is an evil, bug-prone construct,
> > > precisely because coincidental matches of column names will mess up your
> > > results.)
> >
> > Me too.  When I first saw it, I figured it would "naturally join" the two
> > tables on their fk/pk relation if there was one.  That seems natural.
> > Joining on two fields that just happen to have the same name is unnatural
> > to me.
>
> Well 99% of the time I impose on myself a constraint to only use the same name
> iff they refer to the same attribute. So if they have the same name then they
> really ought to be a reasonable join clause.
>
> However the 1% are things like "date_created, date_updated" or even flags like
> "active", "deleted" etc. Which are more than enough to make it utterly
> useless.
>
> Too bad really, it would be a handy thing for ad-hoc queries typed at psql. It
> would still seem too fragile for production queries though.

I think the reason they don't use pk/fk in natural joins is because you can
join all sorts of results, like SELECT in FROM, that doesn't always have
a meaningful pk/fk.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073