Re: efficient data reduction (and deduping) - Mailing list pgsql-performance

From Alessandro Gagliardi
Subject Re: efficient data reduction (and deduping)
Date
Msg-id CAAB3BBL_6ju5QS2qEbmRAtePN2BOi==dBig925RjXH2QyyGzwA@mail.gmail.com
Whole thread Raw
In response to Re: efficient data reduction (and deduping)  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: efficient data reduction (and deduping)  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-performance
Interesting solution. If I'm not mistaken, this does solve the problem of having two entries for the same user at the exact same time (which violates my pk constraint) but it does so by leaving both of them out (since there is no au1.hr_timestamp > au2.hr_timestamp in that case). Is that right?

On Thu, Mar 1, 2012 at 10:35 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
Try

INSERT INTO hourly_activity
SELECT ... everything from au1 ...
FROM activity_unlogged au1
LEFT JOIN activity_unlogged au2 ON au2.user_id = au1.user_id
                                                   AND
date_trunc('hour', au2.hr_timestamp) = date_trunc('hour',
au1.hr_timestamp)
                                                   AND
au2.hr_timestamp < au1.hr_timestamp
WHERE au2.user_id is null;

pgsql-performance by date:

Previous
From: Alessandro Gagliardi
Date:
Subject: Re: efficient data reduction (and deduping)
Next
From: Claudio Freire
Date:
Subject: Re: efficient data reduction (and deduping)