Re: Deletion Challenge - Mailing list pgsql-general

From Steve Crawford
Subject Re: Deletion Challenge
Date
Msg-id CAEfWYyxdAMfqpSXrnzHHXvSVmDtGFcFxTnhgT9=HdHDm+0J-Bg@mail.gmail.com
Whole thread Raw
In response to Deletion Challenge  (Berend Tober <btober@computer.org>)
Responses Re: Deletion Challenge  (Berend Tober <btober@computer.org>)
List pgsql-general
If I understand correctly the value of "click" always advances and within a "click" the "cash_journal_id" always advances - not necessarily by single steps so within a fairian_id, ordering by "click" plus "cash_journal_id" would return the records in order from which you want the most recent 5 for each farian_id.

Typing without testing and ignoring performance optimizations, something along the lines of the following should work and covers the "last 5" issue as well.

with stuff_to_delete as (
select farian_id, click, cash_journal_id,
rank() over (partition by farian_id order by (click, cash_journal_id) desc) as howold)
from cash_journal)
delete from cash_journal
using stuff_to_delete
where
cash_journal.farian_id = stuff_to_delete.farian_id
and cash_journal.click = stuff_to_delete.click
and cash_journal.cash_journal_id = stuff_to_delete.cash_journal_id
and stuff_to_delete.howold > 5;

Cheers,
Steve


On Sat, Dec 5, 2015 at 8:08 AM, Berend Tober <btober@computer.org> wrote:
/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

I have devised a way to accomplish this, but it is a 'two-stage'
approach: that is, it requires two delete statements. I would like to
know if there is a way to do it in a single statement.

Bonus challenge: Same question, except preserving the most recent N, for
N > 1, rows for each person so that a short history is retained after
the deletion.

I have included below an annotated test case and my current solution for
the N = 1 case.

*/

DROP TABLE IF EXISTS cash_journal;


CREATE TABLE cash_journal (
    click bigint NOT NULL,
    cash_journal_id bigint NOT NULL,
    fairian_id bigint NOT NULL,
    debit double precision,
    credit double precision,
    balance real DEFAULT 0,
    description text
);

COMMENT ON COLUMN cash_journal.click            IS 'Time of transaction.';
COMMENT ON COLUMN cash_journal.cash_journal_id  IS 'Sequence of transaction within current click.';
COMMENT ON COLUMN cash_journal.fairian_id       IS 'Fairian account effected.';
COMMENT ON COLUMN cash_journal.debit            IS 'Account balance increase amount.';
COMMENT ON COLUMN cash_journal.credit           IS 'Account balance decrease amount.';
COMMENT ON COLUMN cash_journal.balance          IS 'Account balance, per Fairian running total.';
COMMENT ON COLUMN cash_journal.description      IS 'Transaction description.';

/*

Below is some sample data, listed in the click/sequence order that the
data would actually be entered. That is, the 'click' column represents
advancing time, and within each click, transactions are sequenced by the
'cash_journal_id' column. Note there are some missing cash_journal_id
sequence numbers. This is an artifact of having presented here only
an illustrative sample. Generally, within each click, the sequence
would start at one and increment uniformly by one for each new row
in the same click, and then reset to one for the next click. The
missing increments in the sample data should not make any difference
in the solution.

The 'balance' column is a per-player running total, which is a
deliberate denormalization. It is calculated in a before insert trigger
by starting with the per-player previous balance, and then adding
the new row debit, if any, and subtracting the new row credit, if any.

Note, not all Fairians will have a transaction in every click, but any
number of Fairians may have multiple transactions in any click.

*/

copy cash_journal (click,cash_journal_id,fairian_id,debit,credit,balance,description) from stdin;
36      3       7       0       0       0       Initial cash balance
36      4       8       0       0       0       Initial cash balance
36      5       9       0       0       0       Initial cash balance
36      14      18      0       0       0       initial cash balance
37      5       7       9       \N      9       Ratified contract fa35e192121eab
37      7       8       8       \N      8       Ratified contract f1abd670358e03
37      9       9       7       \N      7       Ratified contract 1574bddb75c78a
411     1       25      0       0       0       Initial cash balance
411     2       25      1000    \N      1000    Issued bond 7719a1c782a1ba
412     1       7       5       \N      14      Sold food quantity 7 units.
412     2       25      \N      5       995     Bought food quantity 7 units.
413     1       25      \N      995     0       Redeemed bond 7719a1c782a1ba
\.


SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;

/*

The sample starting data is shown here in order by Fairian so that it is
perhaps easier to see what is happening for each player. Note that the
result of the deletion should be the last row for each player.

 click | cash_journal_id | fairian_id | debit | credit | balance |           description
-------+-----------------+------------+-------+--------+---------+----------------------------------
    36 |               3 |          7 |     0 |      0 |       0 | Initial cash balance
    37 |               5 |          7 |     9 |        |       9 | Ratified contract fa35e192121eab
   412 |               1 |          7 |     5 |        |      14 | Sold food quantity 7 units.
    36 |               4 |          8 |     0 |      0 |       0 | Initial cash balance
    37 |               7 |          8 |     8 |        |       8 | Ratified contract f1abd670358e03
    36 |               5 |          9 |     0 |      0 |       0 | Initial cash balance
    37 |               9 |          9 |     7 |        |       7 | Ratified contract 1574bddb75c78a
    36 |              14 |         18 |     0 |      0 |       0 | initial cash balance
   411 |               1 |         25 |     0 |      0 |       0 | Initial cash balance
   411 |               2 |         25 |  1000 |        |    1000 | Issued bond 7719a1c782a1ba
   412 |               2 |         25 |       |      5 |     995 | Bought food quantity 7 units.
   413 |               1 |         25 |       |    995 |       0 | Redeemed bond 7719a1c782a1ba
(12 rows)

*/


/*

Here is the current, two-stage solution in use. Is there a way to do it
with a single statement?

Can you create a solution that retains an arbitrarily specified number
of rows per player?

*/
BEGIN;

WITH max_click AS (
  SELECT
    cash_journal.fairian_id,
    max(cash_journal.click) AS click
    FROM cash_journal
    GROUP BY cash_journal.fairian_id
    )
  delete from cash_journal j
    using max_click b
    where j.fairian_id = b.fairian_id
    and j.click        < b.click;

WITH max_journal_id AS (
  SELECT
    cash_journal.fairian_id,
    cash_journal.click,
    max(cash_journal.cash_journal_id) AS cash_journal_id
    FROM cash_journal
    GROUP BY cash_journal.fairian_id, cash_journal.click
    )
  delete from cash_journal j
     using max_journal_id b
     where j.fairian_id    = b.fairian_id
     and j.click           = b.click
     and j.cash_journal_id < b.cash_journal_id;

COMMIT;

SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;

/*

 click | cash_journal_id | fairian_id | debit | credit | balance |           description
-------+-----------------+------------+-------+--------+---------+----------------------------------
   412 |               1 |          7 |     5 |        |      14 | Sold food quantity 7 units.
    37 |               7 |          8 |     8 |        |       8 | Ratified contract f1abd670358e03
    37 |               9 |          9 |     7 |        |       7 | Ratified contract 1574bddb75c78a
    36 |              14 |         18 |     0 |      0 |       0 | initial cash balance
   413 |               1 |         25 |       |    995 |       0 | Redeemed bond 7719a1c782a1ba
(5 rows)


*/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: error on pg_restore
Next
From: Adrian Klaver
Date:
Subject: Re: Deletion Challenge