Re: [PERFORMANCE] slow small delete on large table - Mailing list pgsql-performance

From Christopher Browne
Subject Re: [PERFORMANCE] slow small delete on large table
Date
Msg-id m3oerp6stu.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to [PERFORMANCE] slow small delete on large table  ("Ed L." <pgsql@bluepolka.net>)
List pgsql-performance
pgsql@bluepolka.net ("Ed L.") wrote:
> A 7.3.4 question...
>
> I want to "expire" some data after 90 days, but not delete too
> much at once so as not to overwhelm a system with precariously
> balanced disk I/O and on a table with millions of rows.  If I
> could say it the way I think for a simple example, it'd be
> like this:
>
>     delete from mytable
>     where posteddatetime < now() - '90 days'
>     limit 100;
>
> Of course, that's not legal 7.3.4 syntax.  These are both too
> slow due to sequential scan of table:
>
>     delete from mytable where key in (
>         select key
>         from mytable
>         where posteddatetime < now() - '90 days'
>         limit 100);
> or
>     delete from mytable where exists (
>         select m.key
>         from mytable m
>         where m.key = mytable.key
>           and m.posteddatetime < now() - '90 days'
>         limit 100);
>
> Tried to use a cursor, but couldn't figure out the syntax
> for select-for-delete yet, or find appropriate example on
> google.  Any clues?

I'm hoping that there's an index on posteddatetime, right?

There are several approaches that would be quite sensible to consider...

1.  Delete records as often as possible, so that the number deleted at
any given time stays small.

2.  Or find an hour at which the system isn't busy, and blow through a
lot of them then.

3.  Open a cursor querying records in your acceptable range, e.g.

  declare nukem cursor for select key from mytable where posteddate <
  now() - '90 days'::interval;

  Fetch 100 entries from the cursor, and submit, across another
  connection, delete requests for the 100 entries, all as one
  transaction, which you commit.

  Sleep a bit, and fetch another 100.

  Note that the cursor will draw groups of 100 entries into memory;
  it's good to immediately delete them, as they'll be in buffers.
  Keeping the number of rows deleted small, and sleeping a bit, means
  you're not trashing buffers too badly.  The query doesn't enforce
  any particular order on things; it effect chews out old entries in
  any order the query finds them.  If you can't keep up with
  insertions, there could be rather old entries that would linger
  around...

  This parallels the "sleepy vacuum" that takes a similar strategy to
  keeping vacuums from destroying performance.

4.  Rotor tables.

Have "mytable" be a view on a sequence of tables.

create view mytable as
  select * from mytable1
   union all
  select * from mytable2
   union all
  select * from mytable3
   union all
  select * from mytable4
   union all
  select * from mytable5
   union all
  select * from mytable6
   union all
  select * from mytable7
   union all
  select * from mytable8
   union all
  select * from mytable9
   union all
  select * from mytable10

A rule can choose an appropriate table from the 9 to _actually_ insert
into.

Every 3 days, you truncate the eldest table and rotate on to insert
into the next table.

That will take mere moments, which is real helpful to save you I/O on
the deletes.

There is an unfortunate other problem with this; joins against mytable
are pretty bad, and self-joins effectively turn into a union all
across 100 joins.  (Table 1 against 1-10, Table 2 against 1-10, and so
forth...)

For this not to suck rather incredibly requires fairly carefully
structuring queries on the table.  That may or may not be compatible
with your needs...
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/x.html
A Linux  machine!  because  a 486  is a terrible  thing to  waste!
-- <jjs@wintermute.ucr.edu> Joe Sloan

pgsql-performance by date:

Previous
From: Kevin Brown
Date:
Subject: Re: Column correlation drifts, index ignored again
Next
From: Tom Lane
Date:
Subject: Re: [PERFORMANCE] slow small delete on large table