Re: Deletion Challenge - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Deletion Challenge
Date
Msg-id 56688FA7.9090800@aklaver.com
Whole thread Raw
In response to Re: Deletion Challenge  (Berend Tober <btober@computer.org>)
Responses Re: Deletion Challenge  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-general
On 12/09/2015 12:24 AM, Berend Tober wrote:
> Adrian Klaver wrote:
>> On 12/05/2015 08:08 AM, Berend Tober wrote:
>>> /*
>>>
>>> Deletion Challenge
>>>
>>> I want to delete all but the most recent transaction, per person, from a
>>> table that records a transaction history because at some point the
>>> transaction history grows large enough to adversely effect performance,
>>> and also becomes less relevant for retention.
>>>
>>> ...
>>>
>>
>> test=> delete from cash_journal where ARRAY[click, cash_journal_id]
>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
>> group by fairian_id);
>> DELETE 7
>>
>> test=> SELECT * FROM cash_journal order by fairian_id, click,
>> cash_journal_id;
>>   click | cash_journal_id | fairian_id | debit | credit | balance
>> |           description
>> -------+-----------------+------------+-------+--------+---------+----------------------------------
>>
>>     412 |               1 |          7 |     5 |        |      14 |
>> Sold food quantity 7 units.
>>      37 |               7 |          8 |     8 |        |       8 |
>> Ratified contract f1abd670358e03
>>      37 |               9 |          9 |     7 |        |       7 |
>> Ratified contract 1574bddb75c78a
>>      36 |              14 |         18 |     0 |      0 |       0 |
>> initial cash balance
>>     413 |               1 |         25 |       |    995 |       0 |
>> Redeemed bond 7719a1c782a1ba
>> (5 rows)
>>
>
> Nice.
>
> The idea of a NOT IN query had occurred to me briefly, but I failed to
> pursue it because at some point in the distant past I had gained the
> impression that NOT IN queries were not computationally efficient.
> During one round of testing I had like a million rows. I'll have to run
> some EXPLAIN query testing with a larger data sample for comparison.
> Thanks!

Plan B:

WITH d AS
     (SELECT * FROM
         cash_journal
     LEFT JOIN
         (SELECT
             MAX(ARRAY[click,cash_journal_id]) AS mx
         FROM
             cash_journal
         GROUP BY
             fairian_id)
         AS
             mxa
     ON
         mxa.mx=ARRAY[click, cash_journal_id]
     WHERE
         mx IS NULL)
DELETE FROM
     cash_journal
USING
     d
WHERE
     d.click = cash_journal.click
AND
     d.cash_journal_id = cash_journal.cash_journal_id;


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: HELP!!! The WAL Archive is taking up all space
Next
From: "David G. Johnston"
Date:
Subject: Re: Deletion Challenge