Re: efficient data reduction (and deduping) - Mailing list pgsql-performance

From Claudio Freire
Subject Re: efficient data reduction (and deduping)
Date
Msg-id CAGTBQpYfx9Vt6zMK40jziMcHHRLZABDYQiTwq1SNkkE4F4k03w@mail.gmail.com
Whole thread Raw
In response to efficient data reduction (and deduping)  (Alessandro Gagliardi <alessandro@path.com>)
Responses Re: efficient data reduction (and deduping)
List pgsql-performance
On Thu, Mar 1, 2012 at 3:27 PM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> INSERT INTO hourly_activity
>     SELECT DISTINCT date_trunc('hour', hr_timestamp) AS activity_hour,
> activity_unlogged.user_id,
>                     client_ip, hr_timestamp, locale, log_id, method,
> server_ip, uri, user_agent
>         FROM activity_unlogged,
>             (SELECT user_id, MAX(hr_timestamp) AS last_timestamp
>                 FROM activity_unlogged GROUP BY user_id, date_trunc('hour',
> hr_timestamp)) AS last_activity
>     WHERE activity_unlogged.user_id = last_activity.user_id AND
> activity_unlogged.hr_timestamp = last_activity.last_timestamp;

Try

INSERT INTO hourly_activity
SELECT ... everything from au1 ...
FROM activity_unlogged au1
LEFT JOIN activity_unlogged au2 ON au2.user_id = au1.user_id
                                                    AND
date_trunc('hour', au2.hr_timestamp) = date_trunc('hour',
au1.hr_timestamp)
                                                    AND
au2.hr_timestamp < au1.hr_timestamp
WHERE au2.user_id is null;

pgsql-performance by date:

Previous
From: Alessandro Gagliardi
Date:
Subject: efficient data reduction (and deduping)
Next
From: Craig James
Date:
Subject: Re: efficient data reduction (and deduping)