Re: [GENERAL] Expensive query - Mailing list pgsql-general

From Dustin Sallings
Subject Re: [GENERAL] Expensive query
Date
Msg-id Pine.SGI.3.95.981030140928.28572A-100000@bleu.west.spy.net
Whole thread Raw
In response to Expensive query  (Christophe Pettus <pettus@postdirect.com>)
Responses Re: [GENERAL] Expensive query
List pgsql-general
On Fri, 30 Oct 1998, Christophe Pettus wrote:

    That users table looks kinda useless.  I'd recommend changing id
to name and making id an int, then putting your events users in as ints.
Currently, it doesn't do anything at all, and you might as well be doing
this query:

    select distinct id from events where code = 'Whatever'
        and age('now',when) <= ' 1 day';

    I would also recommend not making code a char(10).  If you put an
index on code and an index on date, you should be able to get your results
pretty quick.  If you change the userid and the code to integers and do a
three table join, it might be faster, but it would use *much* less disk
space.

# I have two tables, structured as:
#
# users:
#
#     id        varchar(70) not null unique
#
# events:
#
#     userid        varchar(70) not null,
#     code        char(10) not null,
#     when        datetime not null
#
# The query I need to perform answers the question, "Which users do NOT
# have a particular event (selected by code), and which do not have ANY
# event for the last day?"  The query I use is:
#
#     select id from users
#        where
#           id not in (
#              select unique id from events
#              where code = 'some code'
#           )
#           and
#           id not in (
#              select unique id from events
#              where age('now',when) <= '1 day'
#           );
#
# This query is *very* expensive.  With 10,000 users and 40,000 events,
# it can take up to 20 minutes (!) to execute, and the postgres process
# grows to 40 megabytes (!!) of memory (on a Pentium II system running
# BSDI).  This seems surprising, since each of the subqueries only needs
# to be evaluated once, rather than once per row of users.  Is there a
# way to reformulate this query to make it less expensive?
#
#

--
SA, beyond.com           My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


pgsql-general by date:

Previous
From: "Michael A. Koerber SR"
Date:
Subject: Re: [GENERAL] Creating web images from postgres data
Next
From: Adam Miller
Date:
Subject: thrashing like a beached shark