Deletion Challenge - Mailing list pgsql-general

From Berend Tober
Subject Deletion Challenge
Date
Msg-id 56630BE5.3080001@computer.org
Whole thread Raw
Responses Re: Deletion Challenge  (Steve Crawford <scrawford@pinpointresearch.com>)
Re: Deletion Challenge  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: Deletion Challenge  (Benjamin Smith <lists@benjamindsmith.com>)
List pgsql-general
/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

I have devised a way to accomplish this, but it is a 'two-stage'
approach: that is, it requires two delete statements. I would like to
know if there is a way to do it in a single statement.

Bonus challenge: Same question, except preserving the most recent N, for
N > 1, rows for each person so that a short history is retained after
the deletion.

I have included below an annotated test case and my current solution for
the N = 1 case.

*/

DROP TABLE IF EXISTS cash_journal;


CREATE TABLE cash_journal (
     click bigint NOT NULL,
     cash_journal_id bigint NOT NULL,
     fairian_id bigint NOT NULL,
     debit double precision,
     credit double precision,
     balance real DEFAULT 0,
     description text
);

COMMENT ON COLUMN cash_journal.click        IS 'Time of transaction.';
COMMENT ON COLUMN cash_journal.cash_journal_id    IS 'Sequence of transaction within current click.';
COMMENT ON COLUMN cash_journal.fairian_id    IS 'Fairian account effected.';
COMMENT ON COLUMN cash_journal.debit        IS 'Account balance increase amount.';
COMMENT ON COLUMN cash_journal.credit        IS 'Account balance decrease amount.';
COMMENT ON COLUMN cash_journal.balance        IS 'Account balance, per Fairian running total.';
COMMENT ON COLUMN cash_journal.description    IS 'Transaction description.';

/*

Below is some sample data, listed in the click/sequence order that the
data would actually be entered. That is, the 'click' column represents
advancing time, and within each click, transactions are sequenced by the
'cash_journal_id' column. Note there are some missing cash_journal_id
sequence numbers. This is an artifact of having presented here only
an illustrative sample. Generally, within each click, the sequence
would start at one and increment uniformly by one for each new row
in the same click, and then reset to one for the next click. The
missing increments in the sample data should not make any difference
in the solution.

The 'balance' column is a per-player running total, which is a
deliberate denormalization. It is calculated in a before insert trigger
by starting with the per-player previous balance, and then adding
the new row debit, if any, and subtracting the new row credit, if any.

Note, not all Fairians will have a transaction in every click, but any
number of Fairians may have multiple transactions in any click.

*/

copy cash_journal (click,cash_journal_id,fairian_id,debit,credit,balance,description) from stdin;
36    3    7    0    0    0    Initial cash balance
36    4    8    0    0    0    Initial cash balance
36    5    9    0    0    0    Initial cash balance
36    14    18    0    0    0    initial cash balance
37    5    7    9    \N    9    Ratified contract fa35e192121eab
37    7    8    8    \N    8    Ratified contract f1abd670358e03
37    9    9    7    \N    7    Ratified contract 1574bddb75c78a
411    1    25    0    0    0    Initial cash balance
411    2    25    1000    \N    1000    Issued bond 7719a1c782a1ba
412    1    7    5    \N    14    Sold food quantity 7 units.
412    2    25    \N    5    995    Bought food quantity 7 units.
413    1    25    \N    995    0    Redeemed bond 7719a1c782a1ba
\.


SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;

/*

The sample starting data is shown here in order by Fairian so that it is
perhaps easier to see what is happening for each player. Note that the
result of the deletion should be the last row for each player.

  click | cash_journal_id | fairian_id | debit | credit | balance |           description
-------+-----------------+------------+-------+--------+---------+----------------------------------
     36 |               3 |          7 |     0 |      0 |       0 | Initial cash balance
     37 |               5 |          7 |     9 |        |       9 | Ratified contract fa35e192121eab
    412 |               1 |          7 |     5 |        |      14 | Sold food quantity 7 units.
     36 |               4 |          8 |     0 |      0 |       0 | Initial cash balance
     37 |               7 |          8 |     8 |        |       8 | Ratified contract f1abd670358e03
     36 |               5 |          9 |     0 |      0 |       0 | Initial cash balance
     37 |               9 |          9 |     7 |        |       7 | Ratified contract 1574bddb75c78a
     36 |              14 |         18 |     0 |      0 |       0 | initial cash balance
    411 |               1 |         25 |     0 |      0 |       0 | Initial cash balance
    411 |               2 |         25 |  1000 |        |    1000 | Issued bond 7719a1c782a1ba
    412 |               2 |         25 |       |      5 |     995 | Bought food quantity 7 units.
    413 |               1 |         25 |       |    995 |       0 | Redeemed bond 7719a1c782a1ba
(12 rows)

*/


/*

Here is the current, two-stage solution in use. Is there a way to do it
with a single statement?

Can you create a solution that retains an arbitrarily specified number
of rows per player?

*/
BEGIN;

WITH max_click AS (
   SELECT
     cash_journal.fairian_id,
     max(cash_journal.click) AS click
     FROM cash_journal
     GROUP BY cash_journal.fairian_id
     )
   delete from cash_journal j
     using max_click b
     where j.fairian_id = b.fairian_id
     and j.click        < b.click;

WITH max_journal_id AS (
   SELECT
     cash_journal.fairian_id,
     cash_journal.click,
     max(cash_journal.cash_journal_id) AS cash_journal_id
     FROM cash_journal
     GROUP BY cash_journal.fairian_id, cash_journal.click
     )
   delete from cash_journal j
      using max_journal_id b
      where j.fairian_id    = b.fairian_id
      and j.click           = b.click
      and j.cash_journal_id < b.cash_journal_id;

COMMIT;

SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;

/*

  click | cash_journal_id | fairian_id | debit | credit | balance |           description
-------+-----------------+------------+-------+--------+---------+----------------------------------
    412 |               1 |          7 |     5 |        |      14 | Sold food quantity 7 units.
     37 |               7 |          8 |     8 |        |       8 | Ratified contract f1abd670358e03
     37 |               9 |          9 |     7 |        |       7 | Ratified contract 1574bddb75c78a
     36 |              14 |         18 |     0 |      0 |       0 | initial cash balance
    413 |               1 |         25 |       |    995 |       0 | Redeemed bond 7719a1c782a1ba
(5 rows)


*/


pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Support for hardware tokens for server/replication private key
Next
From: anj patnaik
Date:
Subject: error on pg_restore