Re: [SQL] optimizing 2-table join w/millions of rows - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] optimizing 2-table join w/millions of rows
Date
Msg-id 18394.911837459@sss.pgh.pa.us
Whole thread Raw
Responses Re: [SQL] optimizing 2-table join w/millions of rows
List pgsql-sql
Michael Olivier <molivier@yahoo.com> writes:
>> How about:
>> SELECT U.acctname
>> FROM   usertest U, bgndtest B
>> WHERE  B.acctname = U.acctname
>> AND  B.part_needed=3 AND B.loc_needed=5
>> AND  U.part=2 AND U.loc=3;

> Yes, that looks equivalent. My problem is this is too slow an
> operation as I'm benchmarking it right now. And if I add less-than or
> greater-than comparisons, the performance goes _way_ down from there.
> How can I get the best performance out of this kind of operation?

The main thing is to make sure you have appropriate indexes and that
the query optimizer is making reasonable use of them.  My guess here
is that you want b-tree indexes on B.acctname and U.acctname, which
the optimizer can use to drive a merge join.  (Use EXPLAIN with the
query to see what plan the optimizer will actually use; don't forget
that if you haven't VACUUMed the tables lately, the optimizer may be
working from bogus statistics.)

A different possibility is to do the primary search on the part number,
ie rewrite to

SELECT U.acctname
FROM   usertest U, bgndtest B
WHERE  B.part_needed=3
AND  U.part=2
AND  B.loc_needed=5
AND  B.acctname = U.acctname
AND  U.loc=3;

Here, indexes on the part number would be stressed.  You'd have to
experiment to see which is faster on your data.  (As far as I know,
the Postgres optimizer will always evaluate your AND clauses left to
right, which means that the key to performance is to make the leftmost
clause reject as large a fraction of candidate rows as possible.)

It's possible that indexes on some of the secondary fields would help,
depending on what the statistics of your data are.  But every index
you add slows down insert/update performance, so keep an eye on that
when you're deciding how many indexes are worth having.

> Is there any way to force postgres to hold certain tables in memory
> all the time? As I said, cost of memory isn't an issue, but
> performance is.

I don't think you can lock a specific table into memory.  What you can
do is increase the size of the shared buffer area used by the backends
(-B switch during postmaster startup, IIRC).  This should give you
pretty much the same result, if you're always hitting the same table,
and it's much more flexible if your access patterns change over time.

            regards, tom lane

pgsql-sql by date:

Previous
From: Sferacarta Software
Date:
Subject: Re[2]: [SQL] abs value, power, sqrt
Next
From: Herouth Maoz
Date:
Subject: Re: [SQL] optimizing 2-table join w/millions of rows