Re: automated row deletion - Mailing list pgsql-general

From Dave Huber
Subject Re: automated row deletion
Date
Msg-id 7CDADB576E07AC4FA71E1B12566C9126540E0A0C35@lti-mb-1.LTI.com
Whole thread Raw
In response to Re: automated row deletion  (John R Pierce <pierce@hogranch.com>)
List pgsql-general
John, I got your previous post, but I think I misunderstood something. You didn't mean a disk partition. I think I get
whatyou're describing now. I had previously missed the link in your earlier post, too. Please accept my apologies for
notbeing more diligent in my reading. I'll look into this partitioned table bit. 

Thanks,
Dave

-----Original Message-----
From: John R Pierce [mailto:pierce@hogranch.com]
Sent: Wednesday, October 07, 2009 12:01 PM
To: Dave Huber
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] automated row deletion

Dave Huber wrote:
>
> A colleague gave me the following query to run:
>
>
>
> DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM
> data_log_20msec_table ORDER BY log_id DESC OFFSET 10000000))
>
> ...
>
> This query keeps the most recent 10 million rows and deletes the
> remaining ones. If I call this once a minute, it would be deleting
> 3000 rows each time. Is there a way to optimize this statement?
> Postgres was setup with default configuration. Is there anything we
> can change in the configuration to make this run more efficiently? The
> table is defined as below:
>
> ...
>
> Is there anything we can do here that can optimize the deletion of rows?
>
>
>

as I previously wrote...


I think you'll find row deletes would kill your performance.   For time
aged data like that, we use partitioned tables, we typically do it by
the week (keeping 6 months of history), but you might end up doing it by
N*1000 PK values or some such, so you can use your PK to determine the
partition.   With a partitioning scheme, its much faster to add a new
one and drop the oldest at whatever interval you need.   See
http://www.postgresql.org/docs/current/static/ddl-partitioning.html



based on the numbers you give above, I think I'd do it by 100000 log_id
values, so you'd end up with 101 partition tables, and every half hour
or so you'd truncate the oldest partition and start a new one (reusing
the previously oldest in a round robin fashion).   truncate is 1000s of
times faster than delete.








This electronic mail message is intended exclusively for the individual(s) or entity to which it is addressed. This
message,together with any attachment, is confidential and may contain privileged information. Any unauthorized review,
use,printing, retaining, copying, disclosure or distribution is strictly prohibited. If you have received this message
inerror, please immediately advise the sender by reply email message to the sender and delete all copies of this
message.
THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform Electronic Transactions Act or any other law of
similarimport, absent an express statement to the contrary contained in this e-mail, neither this e-mail nor any
attachmentsare an offer or acceptance to enter into a contract, and are not intended to bind the sender, LeTourneau
Technologies,Inc., or any of its subsidiaries, affiliates, or any other person or entity. 
WARNING: Although the company has taken reasonable precautions to ensure no viruses are present in this email, the
companycannot accept responsibility for any loss or damage arising from the use of this email or attachments. 


pgsql-general by date:

Previous
From: "Loic d'Anterroches"
Date:
Subject: Re: pg_dump with 1100 schemas being a bit slow
Next
From: Greg Smith
Date:
Subject: Re: How to troubleshoot authentication failure?