Re: Best way to delete big amount of records from big table - Mailing list pgsql-performance

From Laurenz Albe
Subject Re: Best way to delete big amount of records from big table
Date
Msg-id 5e44d4e8ae73193efa82fb2744dbd95cc5130669.camel@cybertec.at
Whole thread Raw
In response to Best way to delete big amount of records from big table  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
Responses Re: Best way to delete big amount of records from big table  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
List pgsql-performance
On Fri, 2020-03-27 at 15:13 +0100, Ekaterina Amez wrote:
> I'm trying to clean up a database with millions of records of 
> useless-but-don't-remove-just-in-case data. [...]
> 
> But also I'm cleaning tables with 150million records where I'm going to 
> remove 60% of existing data and after a few tests I'm not sure what's 
> the best approach as all seem to take similar time to run. These tables 
> are grouped in 4 tables group with master, detail, master_history, 
> detail_history structure. None of the tables have primary key nor 
> foreign key or any constraint but the sequence used for what should be 
> the PK column, though this column is not defined as PK.

You should define primary and foreign keys if you can, but I guess
I don't have to tell you that.

> I've decided to delete from the last one in chunks (10 days of data per 
> chunk but it coud be any other quantity) so I've created a function.  
> I've tested it with indexes (in master_hist for filtering data and in 
> detail_hist for the fk and pk), without indexes, after analyzing table, 
> and no matter what I always end up with more or less the same execution 
> time. I can afford the time it's getting to run but I'd like to know if 
> it's there a better way to do this.

There is no need to delete in batches unless you have a need to keep
transactions short (danger of deadlock because the data are still
modified, or you cannot afford to block autovacuum that long).

If you can drop the indexes while you do it (downtime), go for it.
Perhaps there is a way to use partial indexes that exclude all the
data that you have to delete, then work could go on as normal.

> I'm testing on version 9.2 BUT 
> production server is 8.4 (legacy application, supposed to be in at least 
> 9.2 but recently discovered it was 8.4, planning upgrade but not now). 
> Config parameters are default ones.

Now that is a seriously bad idea.  You should test on the same version
as you have running in production.  And you should insist in an upgrade.
People who insist in running ancient software often insist in ancient
hardware as well, and both is a good way to get data corruption.
If the system blows up, they are going to blame you.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Best way to delete big amount of records from big table
Next
From: Ekaterina Amez
Date:
Subject: Re: Best way to delete big amount of records from big table