Re: Deleting Rows From Large Tables - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Deleting Rows From Large Tables
Date
Msg-id CAMkU=1y1+nTdK0QBgxhV4ZQAXn8Gqe2XgdYESbt9nSfXC2qO1g@mail.gmail.com
Whole thread Raw
In response to Deleting Rows From Large Tables  (Rob Emery <re-pgsql@codeweavers.net>)
List pgsql-performance
On Fri, May 17, 2013 at 4:26 AM, Rob Emery <re-pgsql@codeweavers.net> wrote:
Hi All,

We've got 3 quite large tables that due to an unexpected surge in
usage (!) have grown to about 10GB each, with 72, 32 and 31 million
rows in. I've been tasked with cleaning out about half of them, the
problem I've got is that even deleting the first 1,000,000 rows seems
to take an unreasonable amount of time. Unfortunately this is on quite
an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres
8.4; which serves other things like our logging systems.

How many Cores do you have?  I think the Dell 2950 could have anywhere from 1 to 8.

Pick a smaller number of rows to delete, and run it with "explain analyze" to see what it is going on.  I would say to use "explain (analyze, buffers)" with track_io_timing on, but those don't exist back in 8.4.

Perhaps this would be a good excuse to upgrade!

If I run a sustained (more than about 5 minutes) delete it'll have a
detrimental effect on the other services.


Do you know why?  Can you identify the affected queries from those other services and run explain analyze on them?


 
I'm trying to batch up the
deletes into small chunks of approximately 1 month of data ; even this
seems to take too long, I originally reduced this down to a single
day's data and had the same problem. I can keep decreasing the size of
the window I'm deleting but I feel I must be doing something either
fundamentally wrong or over-complicating this enormously.


If your server is sized only to do its typical workload, then any substantial extra work load is going to cause problems.  Trying to delete 1 day's work in a few seconds stills seems like it is very likely excessive.  Why not jump all the way down to 5 minutes, or limit it to a certain number of rows from table a, say 100 per unit?  If you start large and work your way down, you will often be working in the dark because you won't have the patience to let the large ones run to completion, slowing down the whole system.  If you start at the bottom and work up, you will always know where you are as the previous one ran to completion and you have the timings from it.

How fast do you need to clean this up?  If it took months to get into the situation, can't you take a few weeks to get out of it?

 
I've
switched over to retrieving a list of IDs to delete, storing them in
temporary tables and deleting based on the primary keys on each of the
tables with something similar to this:

BEGIN TRANSACTION;

CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT);
CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT);

INSERT INTO table_a_ids_to_delete
    SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at
< '2007-01-01T00:00:00';

I'd probably add a "LIMIT 100" in there.  Then you can set created_at to the final time point desired, rather than trying to increment it each time and deciding how much to increment.
 

INSERT INTO table_b_ids_to_delete
    SELECT table_b_id FROM table_a_table_b_xref
    INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id =
table_a_table_b.quote_id);

Do these to queries slow down other operations?  Or is it just the deletes?
 

DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete
    WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id;

DELETE FROM table_b USING table_b_ids_to_delete
    WHERE table_b.id = table_b_ids_to_delete.id;

DELETE FROM table_a USING table_a_ids_to_delete
    WHERE table_a.id =  table_a_ids_to_delete.id;

COMMIT;

How much time to do the 3 deletes take relative to each other and to the inserts?

Cheers,

Jef

pgsql-performance by date:

Previous
From: jonranes
Date:
Subject: Re: PostgreSQL 9.2.3 performance problem caused Exclusive locks
Next
From: Greg Spiegelberg
Date:
Subject: Re: Deleting Rows From Large Tables