Thread: Re: [SQL] optimizing 2-table join w/millions of rows

Re: [SQL] optimizing 2-table join w/millions of rows

From
Michael Olivier
Date:
---Herouth Maoz <herouth@oumail.openu.ac.il> wrote:
>
> At 4:26 +0200 on 20/11/98, Michael Olivier wrote:
>
>
> >
> > select U.acctname from usertest U, bgndtest B where
> >     B.part_needed=3 and B.loc_needed=5 and
> >     B.acctname=U.acctname and U.acctname in
> >         (select acctname from usertest where part=2 and loc=3)
>
> Can you explain *verbally* what you meant to do here? It seems as if
the
> subselect is redundant.

It's not redundant, but an attempt to optimize the overall query
performance by using a subquery. What this does is compare two users
parameters against each other: "find all users who need part 3 and
need loc 5 and who have part 2 and loc 3"

> How about:
>
> SELECT U.acctname
> FROM   usertest U, bgndtest B
> WHERE  B.acctname = U.acctname
>   AND  B.part_needed=3 AND B.loc_needed=5
>   AND  U.part=2 AND U.loc=3;

Yes, that looks equivalent. My problem is this is too slow an
operation as I'm benchmarking it right now. And if I add less-than or
greater-than comparisons, the performance goes _way_ down from there.
How can I get the best performance out of this kind of operation?

Is there any way to force postgres to hold certain tables in memory
all the time? As I said, cost of memory isn't an issue, but
performance is.

thanks,
Michael

_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com


Re: [SQL] optimizing 2-table join w/millions of rows

From
Herouth Maoz
Date:
At 5:05 +0200 on 23/11/98, Michael Olivier wrote:


>
> > How about:
> >
> > SELECT U.acctname
> > FROM   usertest U, bgndtest B
> > WHERE  B.acctname = U.acctname
> >   AND  B.part_needed=3 AND B.loc_needed=5
> >   AND  U.part=2 AND U.loc=3;
>
> Yes, that looks equivalent. My problem is this is too slow an
> operation as I'm benchmarking it right now. And if I add less-than or
> greater-than comparisons, the performance goes _way_ down from there.
> How can I get the best performance out of this kind of operation?
>
> Is there any way to force postgres to hold certain tables in memory
> all the time? As I said, cost of memory isn't an issue, but
> performance is.

This is very strange. As a general rule, a simple join should always be
faster than a query with a subquery. Especially when that subquery is "in"
and not "exists".

In principle, if there is an index on the part, loc and acctname (separate
indices) in usertest, and on part_needed, loc_needed and acctname in
bgndtest, the most optimal query should be: restrict each of the tables by
the literal comparisons, and then match the acctname.

I don't know who wrote the optimiser, and I'm not sure he is on this list.
There are optimizer settings which can be changed, such as SET R_PLANS ON.

In any case, if you see in "real life" that a subquery actually works
better for you, then avoid the join. No reason to do both:

SELECT acctname
FROM bgndtest
WHERE part_needed=3 and loc_needed=5
  AND acctname IN (
      SELECT acctname
      FROM usertest
      WHERE part=2 AND loc=3
  );

Or better yet:

SELECT acctname
FROM bgndtest B
WHERE part_needed=3 and loc_needed=5
  AND EXISTS (
      SELECT *
      FROM usertest U
      WHERE part=2 AND loc=3
        AND U.acctname = B.acctname
  );

Of course, you may do it the other way around. It depends on which of your
tables is supposed to have more records, and which of the comparisons is
actually a "=" comparison, which is very important, because less-than and
greater-than don't use the indices.

What I mean by "the other way around" is:

SELECT acctname
FROM usertest U
WHERE part=2 and loc=3
  AND EXISTS (
      SELECT *
      FROM bgndtest B
      WHERE part_needed=3 and loc_needed=5
        AND U.acctname = B.acctname
  );

As for your question - there is no way to keep an entire table in memory,
unless you have a ram disk.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma