Thread: Simulating an outer join

Simulating an outer join

From
Bruce Momjian
Date:
I have been thinking about how to simulate an outer join.  It seems the
best way is to do:

    SELECT tab1.col1, tab2.col3
    FROM tab1, tab2
    WHERE tab1.col1 = tab2.col2
    UNION ALL
    SELECT tab1.col1, NULL
    FROM tab1
    WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2)

Comments?  I know someone was asking about this recently.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Simulating an outer join

From
Mike Mascari
Date:
Bruce Momjian wrote:
>
> I have been thinking about how to simulate an outer join.  It seems the
> best way is to do:
>
>         SELECT tab1.col1, tab2.col3
>         FROM tab1, tab2
>         WHERE tab1.col1 = tab2.col2
>         UNION ALL
>         SELECT tab1.col1, NULL
>         FROM tab1
>         WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2)
>
> Comments?  I know someone was asking about this recently.
>

I wouldn't use IN ;-)

SELECT table1.key, table2.value
FROM table1, table2
WHERE table1.key = table2.key
UNION ALL
SELECT table1.key, NULL
FROM table1 WHERE NOT EXISTS
(SELECT table2.key FROM table2 WHERE table1.key = table2.key);

Mike Mascari

Re: [GENERAL] Simulating an outer join

From
Sarah Officer
Date:
Can somebody comment on using EXISTS vs. IN in a subselect?  I have
some statements with subselects, and I'd like to understand the
ramifications of choosing EXISTS or IN.

Sarah Officer
officers@aries.tucson.saic.com

Mike Mascari wrote:
>
> Bruce Momjian wrote:
> >
> > I have been thinking about how to simulate an outer join.  It seems the
> > best way is to do:
> >
> >         SELECT tab1.col1, tab2.col3
> >         FROM tab1, tab2
> >         WHERE tab1.col1 = tab2.col2
> >         UNION ALL
> >         SELECT tab1.col1, NULL
> >         FROM tab1
> >         WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2)
> >
> > Comments?  I know someone was asking about this recently.
> >
>
> I wouldn't use IN ;-)
>
> SELECT table1.key, table2.value
> FROM table1, table2
> WHERE table1.key = table2.key
> UNION ALL
> SELECT table1.key, NULL
> FROM table1 WHERE NOT EXISTS
> (SELECT table2.key FROM table2 WHERE table1.key = table2.key);
>
> Mike Mascari
>
> ************

Re: [GENERAL] Simulating an outer join

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> >
> > I have been thinking about how to simulate an outer join.  It seems the
> > best way is to do:
> >
> >         SELECT tab1.col1, tab2.col3
> >         FROM tab1, tab2
> >         WHERE tab1.col1 = tab2.col2
> >         UNION ALL
> >         SELECT tab1.col1, NULL
> >         FROM tab1
> >         WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2)
> >
> > Comments?  I know someone was asking about this recently.
> >
>
> I wouldn't use IN ;-)
>
> SELECT table1.key, table2.value
> FROM table1, table2
> WHERE table1.key = table2.key
> UNION ALL
> SELECT table1.key, NULL
> FROM table1 WHERE NOT EXISTS
> (SELECT table2.key FROM table2 WHERE table1.key = table2.key);

Yes, this is our brain-damaged parser/optmizer that likes the usually
slower EXISTS with correlated subquery to the much clearer NOT IN.
Bummer.

I want to avoid having to put this workaround into my book, but I may
have no choice.  The work around is so non-obvious as to be a terrible
hinderance for normal users.


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Simulating an outer join

From
Bruce Momjian
Date:
> Can somebody comment on using EXISTS vs. IN in a subselect?  I have
> some statements with subselects, and I'd like to understand the
> ramifications of choosing EXISTS or IN.

We have some brain-damaged code that is faster with EXISTS than IN.
With IN, the subquery is evaluated and the result put in a temp
relation.  Every test for IN sequentially scans the subquery result
looking for a match.  EXISTS uses an index on the subquery result. Not
sure why we can't just fix this, but I don't understand enough to know
the reason.  People who do understand say there is no good solution
until we redesign the query tree in 7.1.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Simulating an outer join

From
Julian Scarfe
Date:
> Bruce Momjian wrote:
> >
> > I have been thinking about how to simulate an outer join.  It seems the
> > best way is to do:
> >
> >         SELECT tab1.col1, tab2.col3
> >         FROM tab1, tab2
> >         WHERE tab1.col1 = tab2.col2
> >         UNION ALL
> >         SELECT tab1.col1, NULL
> >         FROM tab1
> >         WHERE tab1.col1 NOT IN (SELECT tab2.col2 FROM tab2)
> >
> > Comments?  I know someone was asking about this recently.

Mike Mascari wrote:
>
> I wouldn't use IN ;-)
>
> SELECT table1.key, table2.value
> FROM table1, table2
> WHERE table1.key = table2.key
> UNION ALL
> SELECT table1.key, NULL
> FROM table1 WHERE NOT EXISTS
> (SELECT table2.key FROM table2 WHERE table1.key = table2.key);

FWIW, that's exactly Joe Celko's SQL-89 workaround for OUTER JOINs in 'SQL for
Smarties'. Well in fact he uses (SELECT * FROM table2 WHERE table1.key =
table2.key) as the subquery, but I presume that's an insignificant difference.

Julian Scarfe

Re: [GENERAL] Simulating an outer join

From
Bruce Momjian
Date:
> > UNION ALL
> > SELECT table1.key, NULL
> > FROM table1 WHERE NOT EXISTS
> > (SELECT table2.key FROM table2 WHERE table1.key = table2.key);
>
> FWIW, that's exactly Joe Celko's SQL-89 workaround for OUTER JOINs in 'SQL for
> Smarties'. Well in fact he uses (SELECT * FROM table2 WHERE table1.key =
> table2.key) as the subquery, but I presume that's an insignificant difference.

I am just writing the EXISTS section from my book.  I don't think it
matters what fields are returned from an EXISTS subquery.  If I am
wrong, someone please let me know.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Simulating an outer join

From
Julian Scarfe
Date:
Bruce Momjian wrote:

> I am just writing the EXISTS section from my book.  I don't think it
> matters what fields are returned from an EXISTS subquery.  If I am
> wrong, someone please let me know.

Celko also writes (in his chapter on EXISTS in "SQL for Smarties"):

"In general the SELECT * option should perform better than the actual column.
It lets the query optimizer decide which column to use.  If a column has an
index on it, then simply seeing a pointer to the index is enough to determine
that something exists."

Obviously you're in a much better position than me to judge whether that's the
case in pgsql!  But it might be worth a test.

Julian Scarfe

Re: [GENERAL] Simulating an outer join

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
>
> > I am just writing the EXISTS section from my book.  I don't think it
> > matters what fields are returned from an EXISTS subquery.  If I am
> > wrong, someone please let me know.
>
> Celko also writes (in his chapter on EXISTS in "SQL for Smarties"):
>
> "In general the SELECT * option should perform better than the actual column.
> It lets the query optimizer decide which column to use.  If a column has an
> index on it, then simply seeing a pointer to the index is enough to determine
> that something exists."
>
> Obviously you're in a much better position than me to judge whether that's the
> case in pgsql!  But it might be worth a test.

In psql, I think * would generate all the columns, then throw it away,
while  a specific column would only carry around that column in the
subquery result. so  a single column is better.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

text field data transfer?

From
Date:
I need to transfer data from pg to m$sql.
m$sql has bcp (like pg's copy). bcp use delimiter.
there are 4  ways

1) use pg's copy then m$sql's bcp:
bcp does not recognize null. so, I have to use pg's copy
with null as '' (BTW, it does not work in 6.5.1, right?)

a more serious problem is that
text field usually contains ugly-dirty charactors,
so, what is the safe delimiter? -- the key here is that
delimiter approach does not escape
within the content (for speed).

2) to solve that headache, I try to use "pg_dump -D" .
however, pg put a lot of escape there (e.g.
\015 ).

also, int8 can not be used here. -- I rechecked the design,
and change it to int4 and then dump it :-)

more seriously, m$sql is strangely sensitive to the word "exit"
(quotation mark not included) within the sql single quotation
mark! m$shit!

3) dynamically connect two db. using perl.
  or, if can not connected directly, use perl write a
  "pg_dump-D-like" program.

4) bcp can use fix-length format. However, seems pg's copy can not.
of course, I can write a perl to do that. but text field can not
have fixed length!

Do I have to use 3) ? is there a short cut?