Thread: Fast way to delete big table?

Fast way to delete big table?

From
Haiming Zhang
Date:

Hi All,

 

I have a big table (200G with 728 million records), the table slows down lots of things. It's time to clean the data up. The requirement is when I delete I must summarise few columns to a new table for backup purpose. So that mean I need to go through each row and add the columns' value to the summary table (the corresponding category).

 

The table has indexes (multicoumn indexes) before, I removed the indexes because I read on forum says the indexes heavily reduce the speed of deleting. That's true if I only delete from the table, but my case is I first SELECT this row, then add few values to the summary table and then DELETE the row according to its multiple primary key. This seems to be a bad idea now as it takes very long time to DELETE the row (as delete needs to find the row first).

 

Here are the two version of the delete functions, please help to point out how can I speed it up.

1.

 

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE

                                                                rec                     RECORD;

                                                                subrec                  RECORD;

 

BEGIN

                                                                FOR rec IN SELECT * FROM tableA limit 100 LOOP

                                                                                                                                BEGIN

                                                                                                                                                UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

 

                                                                                                                                                delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and tableA.primaryKeyB=rec.primaryKeyB;

                                                                                                                                END;

                                                                END LOOP;

 

                                                                return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

 

And then I have a .bat script to loop the above function million times.

 

2.

 

CREATE OR REPLACE FUNCTION summary_delete_table()

RETURNS integer AS

$BODY$

DECLARE

                                                                rec                     RECORD;

                                                                td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;

 

BEGIN

                                                                FOR rec IN td_cursor LOOP

                                                                                                BEGIN

                                                                                                                UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match;

 

                                                                                                                delete from tableA WHERE CURRENT OF td_cursor;

                                                                                                END;

                                                                END LOOP;

 

                                                                return 1;

END;

$BODY$

LANGUAGE plpgsql VOLATILE

COST 100;

 

 

Method 2 is bit faster but not much, the delete speed is 2478 rows/s for method 2 and 2008 rows/s for method 1.

Any suggestions are welcomed.

 

BTW, I guess if reindex, it may take few days to finish.

Also, I tried change delete 100 rows at a time and 1000, 2000. The result showed 1000 is faster than 100 and 2000 a time.

 

 

 

Thanks and Regards,

Haiming

 

 

Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

 

 

Attachment

Re: Fast way to delete big table?

From
Gavin Flower
Date:
On 16/05/16 17:09, Haiming Zhang wrote:
>
> Hi All,
>
> I have a big table (200G with 728 million records), the table slows
> down lots of things. It's time to clean the data up. The requirement
> is when I delete I must summarise few columns to a new table for
> backup purpose. So that mean I need to go through each row and add the
> columns' value to the summary table (the corresponding category).
>
> The table has indexes (multicoumn indexes) before, I removed the
> indexes because I read on forum says the indexes *heavily* reduce the
> speed of deleting. That's true if I only delete from the table, but my
> case is I first SELECT this row, then add few values to the summary
> table and then DELETE the row according to its multiple primary key.
> This seems to be a bad idea now as it takes very long time to DELETE
> the row (as delete needs to find the row first).
>
> Here are the two version of the delete functions, please help to point
> out how can I speed it up.
>
> 1.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec                     RECORD;
>
> subrec                  RECORD;
>
> BEGIN
>
> FOR rec IN SELECT * FROM tableA limit 100 LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
> tableA.primaryKeyB=rec.primaryKeyB;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> And then I have a .bat script to loop the above function million times.
>
> 2.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec                     RECORD;
>
> td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;
>
> BEGIN
>
> FOR rec IN td_cursor LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA WHERE CURRENT OF td_cursor;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> Method 2 is bit faster but not much, the delete speed is 2478 rows/s
> for method 2 and 2008 rows/s for method 1.
>
> Any suggestions are welcomed.
>
> BTW, I guess if reindex, it may take few days to finish.
>
> Also, I tried change delete 100 rows at a time and 1000, 2000. The
> result showed 1000 is faster than 100 and 2000 a time.
>
> Thanks and Regards,
>
> Haiming
>
[...]

In one transaction:

 1. populate the summary table
 2. drop indexes on tableA
 3. delete selected entries in tableA
 4. recreate indexes for tableA

If deleting all entries, then simply truncate it!

N.B. I have NOT checked the fine print in the documentation, nor tested
this - so be warned!  :-)


Cheers,
Gavin




Re: Fast way to delete big table?

From
John R Pierce
Date:
On 5/15/2016 10:23 PM, Gavin Flower wrote:
4. recreate indexes for tableA

note on a large table, this step can take a LONG time.   its greatly facilitated by setting  maintenance_work_mem = 1G beforehand.





-- 
john r pierce, recycling bits in santa cruz

Re: Fast way to delete big table?

From
Haiming Zhang
Date:
Hi Gavin,

Thanks for the suggestion. What you said was what I tried to do (except the last reindexes, planned to do it when the table is cleaned up), however it was too slow. I have run for two days, nothing much happened.

Truncate does not work for my purpose.

Regards,
Haiming


 

Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

 

 
-----Original Message-----
From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
Sent: Monday, 16 May 2016 3:24 PM
To: Haiming Zhang <Haiming.Zhang@redflex.com.au>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Fast way to delete big table?

On 16/05/16 17:09, Haiming Zhang wrote:
>
> Hi All,
>
> I have a big table (200G with 728 million records), the table slows
> down lots of things. It's time to clean the data up. The requirement
> is when I delete I must summarise few columns to a new table for
> backup purpose. So that mean I need to go through each row and add the
> columns' value to the summary table (the corresponding category).
>
> The table has indexes (multicoumn indexes) before, I removed the
> indexes because I read on forum says the indexes *heavily* reduce the
> speed of deleting. That's true if I only delete from the table, but my
> case is I first SELECT this row, then add few values to the summary
> table and then DELETE the row according to its multiple primary key.
> This seems to be a bad idea now as it takes very long time to DELETE
> the row (as delete needs to find the row first).
>
> Here are the two version of the delete functions, please help to point
> out how can I speed it up.
>
> 1.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec                     RECORD;
>
> subrec                  RECORD;
>
> BEGIN
>
> FOR rec IN SELECT * FROM tableA limit 100 LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
> tableA.primaryKeyB=rec.primaryKeyB;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> And then I have a .bat script to loop the above function million times.
>
> 2.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec                     RECORD;
>
> td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;
>
> BEGIN
>
> FOR rec IN td_cursor LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA WHERE CURRENT OF td_cursor;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> Method 2 is bit faster but not much, the delete speed is 2478 rows/s
> for method 2 and 2008 rows/s for method 1.
>
> Any suggestions are welcomed.
>
> BTW, I guess if reindex, it may take few days to finish.
>
> Also, I tried change delete 100 rows at a time and 1000, 2000. The
> result showed 1000 is faster than 100 and 2000 a time.
>
> Thanks and Regards,
>
> Haiming
>
[...]

In one transaction:

 1. populate the summary table
 2. drop indexes on tableA
 3. delete selected entries in tableA
 4. recreate indexes for tableA

If deleting all entries, then simply truncate it!

N.B. I have NOT checked the fine print in the documentation, nor tested this - so be warned!  :-)


Cheers,
Gavin


Attachment

Re: Fast way to delete big table?

From
Haiming Zhang
Date:

Hi John,

 

Thanks. Agree, the reindexes will take forever to finish. Do you think increase temp_buffers will help too?

 

Regards,

Haiming

 

 

Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming.Zhang@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Monday, 16 May 2016 3:28 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Fast way to delete big table?

 

On 5/15/2016 10:23 PM, Gavin Flower wrote:

4. recreate indexes for tableA


note on a large table, this step can take a LONG time.   its greatly facilitated by setting  maintenance_work_mem = 1G beforehand.



 

-- 
john r pierce, recycling bits in santa cruz
Attachment

Re: Fast way to delete big table?

From
Charles Clavadetscher
Date:
Hello

I am not sure if this is an option, but would partial indexes be of
help? You could create your indexes in such a way that you exclude the
rows that are not needed for your current queries, i.e. those that you
would summarize in a separate table and delete using the same condition
that you use for selecting which rows must be deleted.

CREATE INDEX ... ON tablea (...) WHERE ...;

This would not make deletion quicker, but operations should have a gain.

What I also see is that you don't have any condition when selecting from
the loop, i.e. you actually go through all rows and decide on the fly
which you must delete. Maybe reducing the number of rows to be processed
by the function helps.

Finally I don't know the structure of the table, but you may select only
those columns that you need for your summary instead of *. If you have
many columns this could have an impact on the performance of the select
statement.

Regards
Charles

On 05/16/2016 07:09 AM, Haiming Zhang wrote:
> Hi All,
>
> I have a big table (200G with 728 million records), the table slows down
> lots of things. It's time to clean the data up. The requirement is when
> I delete I must summarise few columns to a new table for backup purpose.
> So that mean I need to go through each row and add the columns' value to
> the summary table (the corresponding category).
>
> The table has indexes (multicoumn indexes) before, I removed the indexes
> because I read on forum says the indexes *heavily* reduce the speed of
> deleting. That's true if I only delete from the table, but my case is I
> first SELECT this row, then add few values to the summary table and then
> DELETE the row according to its multiple primary key. This seems to be a
> bad idea now as it takes very long time to DELETE the row (as delete
> needs to find the row first).
>
> Here are the two version of the delete functions, please help to point
> out how can I speed it up.
>
> 1.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
>
> rec                     RECORD;
>
>
> subrec                  RECORD;
>
> BEGIN
>
>                                                                  FOR rec
> IN SELECT * FROM tableA limit 100 LOOP
>
>
          BEGIN 
>
>
                          UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match; 
>
>
                          delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
tableA.primaryKeyB=rec.primaryKeyB;
>
>
          END; 
>
>                                                                  END LOOP;
>
>                                                                  return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> And then I have a .bat script to loop the above function million times.
>
> 2.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
>
> rec                     RECORD;
>
>
> td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;
>
> BEGIN
>
>                                                                  FOR rec
> IN td_cursor LOOP
>
>                                                                                                  BEGIN
>
>
UPDATEsummaryTable set count1 = rec.count1 + rec.count1, count2 =... where category match; 
>
>
deletefrom tableA WHERE CURRENT OF td_cursor; 
>
>                                                                                                  END;
>
>                                                                  END LOOP;
>
>                                                                  return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> Method 2 is bit faster but not much, the delete speed is 2478 rows/s for
> method 2 and 2008 rows/s for method 1.
>
> Any suggestions are welcomed.
>
> BTW, I guess if reindex, it may take few days to finish.
>
> Also, I tried change delete 100 rows at a time and 1000, 2000. The
> result showed 1000 is faster than 100 and 2000 a time.
>
> Thanks and Regards,
>
> Haiming
>
>
>
> *Haiming Zhang* | Engineer | *Redflex Group*
> *T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
> Haiming.Zhang@redflex.com.au
> 31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
>
> If you are not an authorised recipient of this e-mail, please contact me
> at Redflex immediately by return phone call or by email. In this case,
> you should not read, print, retransmit, store or act in reliance on this
> e-mail or any attachments, and should destroy all copies of them. This
> e-mail and any attachments are confidential and may contain privileged
> information and/or copyright material of Redflex or third parties. You
> should only retransmit, distribute or commercialise the material if you
> are authorised to do so. This notice should not be removed.
>

--
Swiss PostgreSQL Users Group
c/o Charles Clavadetscher
Motorenstrasse 18
CH - 8005 Zürich

http://www.swisspug.org


Re: Fast way to delete big table?

From
Adrian Klaver
Date:
On 05/15/2016 10:33 PM, Haiming Zhang wrote:
> Hi Gavin,
>
> Thanks for the suggestion. What you said was what I tried to do (except
> the last reindexes, planned to do it when the table is cleaned up),
> however it was too slow. I have run for two days, nothing much happened.

 From your original post I gathered you did not do what Gavin suggested.
In that you dropped the indexes first and the combined the UPDATE
summary and DELETE row into one operation. So I am little confused on
what you are asking. Are you look for suggestions on what to do in the
future or how to make the existing condition(no indexes on the big
table) work better or both?

>
> Truncate does not work for my purpose.
>
> Regards,
> Haiming
>
>
>
>
>
>
> *Haiming Zhang* | Engineer | *Redflex Group*
> *T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
> Haiming.Zhang@redflex.com.au
> 31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
>
> If you are not an authorised recipient of this e-mail, please contact me
> at Redflex immediately by return phone call or by email. In this case,
> you should not read, print, retransmit, store or act in reliance on this
> e-mail or any attachments, and should destroy all copies of them. This
> e-mail and any attachments are confidential and may contain privileged
> information and/or copyright material of Redflex or third parties. You
> should only retransmit, distribute or commercialise the material if you
> are authorised to do so. This notice should not be removed.
>
>
>
>
>
>
> -----Original Message-----
> From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
> Sent: Monday, 16 May 2016 3:24 PM
> To: Haiming Zhang <Haiming.Zhang@redflex.com.au>;
> pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Fast way to delete big table?
>
> On 16/05/16 17:09, Haiming Zhang wrote:
>>
>> Hi All,
>>
>> I have a big table (200G with 728 million records), the table slows
>> down lots of things. It's time to clean the data up. The requirement
>> is when I delete I must summarise few columns to a new table for
>> backup purpose. So that mean I need to go through each row and add the
>> columns' value to the summary table (the corresponding category).
>>
>> The table has indexes (multicoumn indexes) before, I removed the
>> indexes because I read on forum says the indexes *heavily* reduce the
>> speed of deleting. That's true if I only delete from the table, but my
>> case is I first SELECT this row, then add few values to the summary
>> table and then DELETE the row according to its multiple primary key.
>> This seems to be a bad idea now as it takes very long time to DELETE
>> the row (as delete needs to find the row first).
>>
>> Here are the two version of the delete functions, please help to point
>> out how can I speed it up.
>>
>> 1.
>>
>> CREATE OR REPLACE FUNCTION summary_delete_table()
>>
>> RETURNS integer AS
>>
>> $BODY$
>>
>> DECLARE
>>
>> rec                     RECORD;
>>
>> subrec                  RECORD;
>>
>> BEGIN
>>
>> FOR rec IN SELECT * FROM tableA limit 100 LOOP
>>
>> BEGIN
>>
>> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
>> where category match;
>>
>> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
>> tableA.primaryKeyB=rec.primaryKeyB;
>>
>> END;
>>
>> END LOOP;
>>
>> return 1;
>>
>> END;
>>
>> $BODY$
>>
>> LANGUAGE plpgsql VOLATILE
>>
>> COST 100;
>>
>> And then I have a .bat script to loop the above function million times.
>>
>> 2.
>>
>> CREATE OR REPLACE FUNCTION summary_delete_table()
>>
>> RETURNS integer AS
>>
>> $BODY$
>>
>> DECLARE
>>
>> rec                     RECORD;
>>
>> td_cursor            CURSOR FOR SELECT * FROM tableA limit 100;
>>
>> BEGIN
>>
>> FOR rec IN td_cursor LOOP
>>
>> BEGIN
>>
>> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
>> where category match;
>>
>> delete from tableA WHERE CURRENT OF td_cursor;
>>
>> END;
>>
>> END LOOP;
>>
>> return 1;
>>
>> END;
>>
>> $BODY$
>>
>> LANGUAGE plpgsql VOLATILE
>>
>> COST 100;
>>
>> Method 2 is bit faster but not much, the delete speed is 2478 rows/s
>> for method 2 and 2008 rows/s for method 1.
>>
>> Any suggestions are welcomed.
>>
>> BTW, I guess if reindex, it may take few days to finish.
>>
>> Also, I tried change delete 100 rows at a time and 1000, 2000. The
>> result showed 1000 is faster than 100 and 2000 a time.
>>
>> Thanks and Regards,
>>
>> Haiming
>>
> [...]
>
> In one transaction:
>
>  1. populate the summary table
>  2. drop indexes on tableA
>  3. delete selected entries in tableA
>  4. recreate indexes for tableA
>
> If deleting all entries, then simply truncate it!
>
> N.B. I have NOT checked the fine print in the documentation, nor tested
> this - so be warned!  :-)
>
>
> Cheers,
> Gavin
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Fast way to delete big table?

From
Jeff Janes
Date:
On Sun, May 15, 2016 at 10:09 PM, Haiming Zhang <Haiming.Zhang@redflex.com.au> wrote:

Hi All,

 

I have a big table (200G with 728 million records), the table slows down lots of things. It's time to clean the data up. The requirement is when I delete I must summarise few columns to a new table for backup purpose. So that mean I need to go through each row and add the columns' value to the summary table (the corresponding category).


Is the performance slow-down an emergency?  If not, then delete and summarize the rows in bite size chunks, over weeks or months.  It took a long time to get into the situation, spend some time calmly getting out of it.
 

 

The table has indexes (multicoumn indexes) before, I removed the indexes because I read on forum says the indexes heavily reduce the speed of deleting.


That generally isn't true in PostgreSQL.  Deletes do not directly perform any index maintenance (triggers might cause them to do some indirectly).  Doing the deletes does build up "vacuum debt" which must eventually be paid, and more indexes does make that more expensive.  But that happens in the background.

 

That's true if I only delete from the table, but my case is I first SELECT this row, then add few values to the summary table and then DELETE the row according to its multiple primary key. This seems to be a bad idea now as it takes very long time to DELETE the row (as delete needs to find the row first).


This doesn't match your example, as in your examples there are no WHERE clauses on the main table select, so no index could be used to find the rows.

Your examples also have no selection criteria or stopping criteria (other than when your outer driving script stops calling the function).  That means you might be deleting any arbitrary rows in the master table (not just the oldest ones) and might proceed with these deletions until the table is empty.  Given that, why not just summarize the entire table and then truncate the whole thing?

...


You could try to rewrite this into a single SQL query, using a CTE with "with t as (delete .... returning * )".  The hardest part is that PostgreSQL alas doesn't allow LIMIT on DELETE statements, so you have to go through some ugly contortions with a sub-select, or a where clause on your delete using, for example, a date to do the limiting, and have your script increment the date on each invocation.  I've not done this with summarization, but have done it with partitioning where I had to move tuples from the parent table to their new partition (or the reverse, moving from an ill-conceived partition back to the parent) with a minimum of disruption.

It should be faster, but I don't know how much.  It would depend a lot on whether your bottleneck is CPU or IO, amongst other things.
 

 

 

Method 2 is bit faster but not much, the delete speed is 2478 rows/s for method 2 and 2008 rows/s for method 1.

Any suggestions are welcomed.


Why is this not fast enough?  So, it would take 4 days.  How fast do you need it to be?  

But there is the possibility that it will slow down over time, as each execution might have to scan over the debris of all of the previously deleted rows before it gets to the ones you want to work on.

Cheers,

Jeff

Re: Fast way to delete big table?

From
hmzha2
Date:
Hi Charles,

Thanks.

The reason I dont have condition when selecting is it's faster than having.
Because my aim is to go through every row, find values of few columns and
append them to a summary table. Then delete the row from the table. So find
the rows on the fly is fine for me.

I have tried to reduce the number of rows to be processed, even I process 1
row in production machine, it takes 24 hours to finish.

Regards,
Haiming



--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903917.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Fast way to delete big table?

From
hmzha2
Date:
Hi Jeff, Thank you very much. >Your examples also have no selection criteria or stopping criteria (other >than when your outer driving script stops calling the function). That >means you might be deleting any arbitrary rows in the master table (not >just the oldest ones) and might proceed with these deletions until the >table is empty. Given that, why not just summarize the entire table and >then truncate the whole thing? Yeah, that's my aim. Go through every row and delete the entire table. The problem of summarizing first then truncate is I cant guarantee the database and system keep running during the summarizing as the customer sometimes stops the application and postgres manually. Plus there are new data goes into the table so I need to process chunk by chunk until it finishes. >You could try to rewrite this into a single SQL query, using a CTE with >"with t as (delete .... returning * )". The hardest part is that >PostgreSQL alas doesn't allow LIMIT on DELETE statements, so you have to go >through some ugly contortions with a sub-select, or a where clause on your >delete using, for example, a date to do the limiting, and have your script >increment the date on each invocation. I've not done this with >summarization, but have done it with partitioning where I had to move >tuples from the parent table to their new partition (or the reverse, moving >from an ill-conceived partition back to the parent) with a minimum of >disruption. worth to try, eg. delete the oldest 100 rows at a time. But wont the sorting take more time than just select from the fly? >Why is this not fast enough? So, it would take 4 days. How fast do you >need it to be? Works ok in my testing environment but not on the production machine. Select * from tableA limit 1; takes milliseconds to finish update summary table from the previous select result, takes milliseconds delete from tableA where primaryKey = ... takes 2.9 hours to finish. Regards, Haiming

View this message in context: Re: Fast way to delete big table?
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Fast way to delete big table?

From
"Charles Clavadetscher"
Date:
Hello

> The reason I dont have condition when selecting is it's faster than having.
> Because my aim is to go through every row, find values of few columns and append them to a summary table. Then
> delete the row from the table. So find the rows on the fly is fine for me.
>
> I have tried to reduce the number of rows to be processed, even I process 1 row in production machine, it takes 24
> hours to finish.

This is quite weird... It there any other process locking the record or the table?
Bye
Charles

>
> Regards,
> Haiming
>
>
>
> --
> View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903917.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Fast way to delete big table?

From
Jeff Janes
Date:
On Mon, May 16, 2016 at 10:22 PM, hmzha2 <Haiming.Zhang@redflex.com.au> wrote:


> Works ok in my testing environment but not
> on the production machine. Select * from tableA limit 1; takes milliseconds
> to finish update summary table from the previous select result, takes
> milliseconds delete from tableA where primaryKey = ... takes 2.9 hours to
> finish.

Perhaps you have a foreign constraint pointing to your tableA from
another table, and the other table is un-indexed on those columns (in
production).  So every delete from tableA leads to a full scan of that
other table to make sure no offending rows exist.

If that is not the case, then I'd try to do a "strace -ttt -T -y -p
<PID>" on the process which is taking so long, and see what it is
doing.  Unfortunately, you have to do that on production, as that is
the only system where the problem appears.

Cheers,

Jeff


Re: Fast way to delete big table?

From
hmzha2
Date:
Charles Clavadetscher wrote
> Hello
>
> This is quite weird... It there any other process locking the record or
> the table?
> Bye
> Charles

Yes, weird. The table is using by other process (keep inserting rows to the
table) at the same time but no one should lock the row as we dont touch rows
after they've been inserted).





--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5903923.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Fast way to delete big table?

From
hmzha2
Date:
Hi Jeff and all,

Thank you for your help. Just update my progress, I finally  made it working
in production by using the second solution. The extra thing I have done was
"restart postgres" and the script started to delete 2558 records per second.
I guess it must has locked records or something blocked the delete, and
"restart" removed the obstacle.

Thanks again for the help.

Regards,
Haiming



--
View this message in context: http://postgresql.nabble.com/Fast-way-to-delete-big-table-tp5903787p5904081.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.