Thread: Speeding up a query

Speeding up a query

From
Andrew Perrin - Demography
Date:
Greetings.

The following query returns the last names of people in our database whom
we consider 'students' during the time period 1-jan-1998 to 1-jun-1998.
It's based on selecting events from a table in which each event can
determine that a person is a Student, NotStudent, or Neutral; the idea is
that a given person is a student from the date of their first Student
event to the date of their next NotStudent event, etc.  

The problem is that it's very slow - on the order of 45-60 seconds to
generate a list that's only about 30 names long.  Can anyone offer some
advice on speeding it up? The dates need to be user-selectable.

Many thanks.

select last_name from personal_data where id in (
select distinct id from event_detail, event_universe where
event_detail.eventtypeid = event_universe.eventtypeid and
event_universe.studentstatus = 'Student' and
event_detail.date_effective <= datetime(date('1-jan-1998')))
and
id not in (
select distinct id from event_detail, event_universe where
event_detail.eventtypeid = event_universe.eventtypeid and
event_universe.studentstatus = 'NotStudent' and
event_detail.date_effective <= datetime(date('1-jun-1998')))


---------------------------------------------------------------------
Andrew J. Perrin - aperrin@demog.berkeley.edu - NT/Unix Admin/Support
Department of Demography    -    University of California at Berkeley
2232 Piedmont Avenue #2120  -    Berkeley, California, 94720-2120 USA
http://demog.berkeley.edu/~aperrin --------------------------SEIU1199



Re: Speeding up a query

From
"Moray McConnachie"
Date:
----- Original Message -----
From: Andrew Perrin - Demography <aperrin@demog.berkeley.edu>
To: <pgsql-sql@postgresql.org>
Sent: Friday, April 07, 2000 8:12 PM
Subject: [SQL] Speeding up a query


>
> select last_name from personal_data where id in (
> select distinct id from event_detail, event_universe where
> event_detail.eventtypeid = event_universe.eventtypeid and
> event_universe.studentstatus = 'Student' and
> event_detail.date_effective <= datetime(date('1-jan-1998')))
> and
> id not in (
> select distinct id from event_detail, event_universe where
> event_detail.eventtypeid = event_universe.eventtypeid and
> event_universe.studentstatus = 'NotStudent' and
> event_detail.date_effective <= datetime(date('1-jun-1998')))

Assuming you have indices set up on all appropriate fields, and that you
replace = by LIKE for text fields (don't know if that makes any speed
difference, but it makes it standard SQL), then you need to modify your
query to use EXISTS instead of IN. NOT IN can be particularly slow. You
don't need the DISTINCT's either, I think.

SELECT last_name FROM personal_data WHERE EXISTS (SELECT id FROM
event_detail,event_universe WHERE id=personal_data.id AND
event_detail.eventtypeid=event_universe.eventtypeid AND
event_universe.studentstatus LIKE 'Student' AND
event_detail.date_effective<=datetime(date('1-jan-1998')))
AND NOT EXISTS
(SELECT id FROM event_detail,event_universe WHERE id=personal_data.id AND
event_detail.eventtypeid=event_universe.eventtypeid AND
event_universe.studentstatus LIKE 'NotStudent' AND
event_detail.date_effective<=datetime(date('1-jun-1998')));

Should be right...
M.