Re: Replacing old tuples on insert - Mailing list pgsql-novice

From Michael Fuhr
Subject Re: Replacing old tuples on insert
Date
Msg-id 20041127075717.GA76802@winnie.fuhr.org
Whole thread Raw
In response to Replacing old tuples on insert  ("Pradeepkumar, Pyatalo (IE10)" <Pradeepkumar.Pyatalo@honeywell.com>)
List pgsql-novice
On Wed, Nov 24, 2004 at 02:20:42AM -0700, Pradeepkumar, Pyatalo (IE10) wrote:
>
> I have a table as follows -
> table TrendData( TrendID bigserial,
>                  TrendName varchar(30),
>                  .....
>                  .....);
>
> Now for this table i want a maximum of only 5000 tuples. After the maximum
> is reached the new tuples that are inserted should replace the older ones.
> Could anyone help me out in writing the query or a trigger would also be
> fine.

How accurate does the tuple count need to be?  A simple but flawed
approach would be to use a rule like the following:

CREATE RULE TrendTrunc AS ON INSERT TO TrendData DO ALSO
  DELETE FROM TrendData
  WHERE TrendID <= currval('trenddata_trendid_seq') - 5000;

This rule is flawed because it doesn't account for gaps in the
sequence that could exist due to deleted rows or rolled back
transactions.  Such gaps could cause the rule to truncate the
table to a lower row count than the desired 5000.

Here's a rule that handles gaps but is also flawed:

CREATE RULE TrendTrunc AS ON INSERT TO TrendData DO ALSO
  DELETE FROM TrendData
  WHERE TrendID < (SELECT MIN(TrendID)
                   FROM (SELECT TrendID
                         FROM TrendData
                         ORDER BY TrendID DESC
                         LIMIT 5000) AS s);

This rule could leave the table with more than 5000 rows if multiple
transactions are inserting at the same time.

The following trigger solution should handle gaps and strictly
enforce the 5000-row limit by locking the table before deleting
excess records, thus waiting for other transactions that have
inserted or deleted records to complete (unfortunately this will
hurt performance if concurrent transactions are doing inserts or
deletes).  It uses a trigger because rules don't allow LOCK.  The
trigger fires after each statement instead of after each row so
INSERTs that insert multiple rows will delete excess rows only once,
after all new rows have been inserted.

I haven't considered all possible situations and I've done only
simple testing, so the solution could have flaws.  An improvement
would be to make the function reusable for multiple tables with
differing field names and maximum row counts; this is left as an
exercise for the reader.

CREATE FUNCTION TrendTrunc() RETURNS TRIGGER AS '
BEGIN
    LOCK TABLE TrendData IN SHARE ROW EXCLUSIVE MODE;

    DELETE FROM TrendData
    WHERE TrendID < (SELECT MIN(TrendID)
                     FROM (SELECT TrendID
                           FROM TrendData
                           ORDER BY TrendID DESC
                           LIMIT 5000) AS s);

    RETURN NULL;
END;
' LANGUAGE plpgsql;

CREATE TRIGGER TrendAfterInsert AFTER INSERT ON TrendData
  FOR EACH STATEMENT EXECUTE PROCEDURE TrendTrunc();

If anybody sees problems or has suggestions for improvements then
please follow up.  In particular, is there a more efficient way
to get the cutoff TrendID?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

pgsql-novice by date:

Previous
From: Tom Lane
Date:
Subject: Re: What causes 'FATAL: invalid cache id: 30' in my postgresql log?
Next
From: "Keith Worthington"
Date:
Subject: missing function?