Thread: Large IN query optimization

Large IN query optimization

From

"Worky Workerson"

Date:

13 December 2006, 17:23:04

I'm wondering (given what I have read about PG and IN), if there is a
better way to execute this query.  I have (essentially) two tables in
a read-only DB, ip_info and events, where many events can map to one
ip_info.  I'd like to get all the ip_info.info columns where a
specific event occurred.  This is about what I have come up with (with
indexes on all the columns):

CREATE TABLE ip_info ( ip IP4, info VARCHAR );
CREATE TABLE events ( ip IP4, event_name VARCHAR, event_type VARCHAR);

SELECT ip, info
  FROM ip_info
  WHERE ip IN (SELECT ip FROM events WHERE event_name = 'somename');

This works fine when there are few events named 'somename', however
this subquery can return a fairly large set of rows (>1 million) and
this query takes quite long to execute.  Is there a better way to
write this query?  What would be the optimal plan for this query,
given a roughly even distribution of event_name?  My current plan
looks something like (as I have partitioned the events table by ip):

Hash Join
  Hash Cond ("outer".ip = "inner".ip)
  -> Seq Scan on ip_info
  -> Hash
    -> HashAggregate
      -> Append
        -> Index Scan using "events_ip_01_event_name_idx" on events_ip_01 events
          Index Cond ...
        -> Index Scan using "events_ip_02_event_name_idx" on events_ip_02 events
          Index Cond ...

Is this the optimal plan for this query?  BTW, ip_info has about 5M
rows, and the collective events tables have about 50M rows.

Also, slightly off-topic ... are there any optimizations that I can
use to tell PG that this is a read-only database?

PG 8.1.3 on RHEL4.3 x86_64 ... thinking about upgrading to 8.2 when I
get the time.

Thanks!

Re: Large IN query optimization

From

Tom Lane

Date:

13 December 2006, 18:08:55

"Worky Workerson" <worky.workerson@gmail.com> writes:
> I'm wondering (given what I have read about PG and IN), if there is a
> better way to execute this query.
> ... (as I have partitioned the events table by ip):

Are you on 8.2?  8.1 is pretty stupid about joins to inheritance trees.

            regards, tom lane

Re: Large IN query optimization

From

Karsten Hilbert

Date:

13 December 2006, 20:16:55

On Wed, Dec 13, 2006 at 02:08:44PM -0500, Tom Lane wrote:

> Are you on 8.2?  8.1 is pretty stupid about joins to inheritance trees.

To put up a number for "stupid": For GNUmed a particular
query involving one-level inheritance went down from over
ten minutes to under one second just by running against 8.2
instead of below 8.2.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346