Thread: Multiple selects versus join

Multiple selects versus join

From
Jean-Christian Imbeault
Date:
Is it more efficient to do multiple selects on multiple tables to get
data or do a join of those tables and extract the data from the
resulting temp table?

For example:

select name from a where id=1;
select pub_date from b where id=1;
pub_id = select publisher_id from c where id=1;
select pub_name from d where id = pub_id;

or (I don't know the syntax for join so this is just pseudo-sql);

select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
pub_id where id = 1;

All my tables are related to each other so I am wondering which is more
efficient, do multile selects or joining the tables. (Joining the table
would return just one row).

Thanks for the advice and sorry if the pseudo-sql is hard to understand.

Jc


Re: Multiple selects versus join

From
Adrian 'Dagurashibanipal' von Bidder
Date:
On Tue, 2002-08-20 at 04:00, Jean-Christian Imbeault wrote:
> Is it more efficient to do multiple selects on multiple tables to get
> data or do a join of those tables and extract the data from the
> resulting temp table?
>
> For example:
>
> select name from a where id=1;
> select pub_date from b where id=1;
> pub_id = select publisher_id from c where id=1;
> select pub_name from d where id = pub_id;
>
> or (I don't know the syntax for join so this is just pseudo-sql);
>
> select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
> pub_id where id = 1;
>
> All my tables are related to each other so I am wondering which is more
> efficient, do multile selects or joining the tables. (Joining the table
> would return just one row).

Yo!

If you *know* that you will get just one row, you know more than the
database system can know. Generally: a join of tables with n and m rows
can return at most n*m rows, so joins are not very efficient compared to
multiple selects. A join is basically a select on both tables with
additional processing done. If you do multiple selects and can do the
join 'by hand' because you know much about the data you expect, this
will almost always be faster.

Of course, for very small databases, the overhead of executing multiple
statements instead of only one statement may be significant.

NOTE: I'm not a db pro or anything.

cheers
-- vbi

--
secure email with gpg                         http://fortytwo.ch/gpg

Attachment

Re: Multiple selects versus join

From
"scott.marlowe"
Date:
On 20 Aug 2002, Adrian 'Dagurashibanipal' von Bidder wrote:

> On Tue, 2002-08-20 at 04:00, Jean-Christian Imbeault wrote:
> > Is it more efficient to do multiple selects on multiple tables to get
> > data or do a join of those tables and extract the data from the
> > resulting temp table?
> >
> > For example:
> >
> > select name from a where id=1;
> > select pub_date from b where id=1;
> > pub_id = select publisher_id from c where id=1;
> > select pub_name from d where id = pub_id;
> >
> > or (I don't know the syntax for join so this is just pseudo-sql);
> >
> > select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
> > pub_id where id = 1;
> >
> > All my tables are related to each other so I am wondering which is more
> > efficient, do multile selects or joining the tables. (Joining the table
> > would return just one row).
>
> Yo!
>
> If you *know* that you will get just one row, you know more than the
> database system can know. Generally: a join of tables with n and m rows
> can return at most n*m rows, so joins are not very efficient compared to
> multiple selects. A join is basically a select on both tables with
> additional processing done. If you do multiple selects and can do the
> join 'by hand' because you know much about the data you expect, this
> will almost always be faster.

Not true.  What if you're joining on fields that have a unique constraint
on them?  There can be only one (to paraphrase Highlander) so you know how
many rows you're gonna get back.  It's quite possible you're searching for
a single employee record and all the tables joining that one (even if
there are dozens) are all joined on a unique field that references the
main employee table.  In that case, it WILL be faster to let the database
do it in one query, unless it has a brain dead planner.

The real issue is how many rows are you tossing around versus how many is
the database tossing around.  So, just like you'd pick the table that
would return the fewest rows (most likely anyway) to process first by
hand, so you should build your query using join syntax that forces
postgresql to use that table first.  Indexes make all the difference on
<10% table being returned too.

Basically, if you're gonna return many of the rows in a table, then it may
well be faster to do it in code, but if you know it will be <10% it's much
faster to let the database do it.

Of course, the real reason to let the database do it is to make sure it
gets done right.  It's way likelier that you or I will write code that
makes some fatal mistake we don't see than it is that the database will do
that.  If you make a mistake in your join syntax you'll know right up
front, when you get 0 or 1,000,000,000,000 rows   :-)