Simple query: how to optimize - Mailing list pgsql-performance

From Collin Peters
Subject Simple query: how to optimize
Date
Msg-id df01c91b0510281453v5c7ed502rfb3757e886046607@mail.gmail.com
Whole thread Raw
List pgsql-performance
I have two tables, one is called 'users' the other is 'user_activity'.
 The 'users' table simply contains the users in the system there is
about 30,000 rows.  The 'user_activity' table stores the activities
the user has taken.  This table has about 430,000 rows and also
(notably) has a column which tracks the type of activity.  90% of the
table is type 7 which indicates the user logged into the system.

I am trying to write a simple query that returns the last time each
user logged into the system.  This is how the query looks at the
moment:

SELECT u.user_id, MAX(ua.activity_date)
FROM pp_users u
LEFT OUTER JOIN user_activity ua ON (u.user_id = ua.user_id AND
ua.user_activity_type_id = 7)
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'
GROUP BY u.user_id

The above query takes about 5 seconds but I'm wondering how it can be
optimized.  When the query is formatted as above it does use an index
on the user_id column of the user_activity table... but the cost is
huge (cost=0.00..1396700.80).

I have tried formatting it another way with a sub-query but it takes
about the same amount to completed:

SELECT u.user_id, ua.last
FROM pp_users u
LEFT OUTER JOIN (SELECT max(activity_date) as last, user_id FROM
user_activity WHERE user_activity_type_id = 7 GROUP BY user_id) as ua
ON (u.user_id = ua.user_id)
WHERE u.userstatus_id <> 4
AND age(u.joined_date) < interval '30 days'

Can anybody offer any pointers on this scenario?

Regards,
Collin

pgsql-performance by date:

Previous
From: Rodrigo Madera
Date:
Subject: Best way to check for new data.
Next
From: Havasvölgyi Ottó
Date:
Subject: Re: Best way to check for new data.