Thread: Batch process

Batch process

From
Rama Krishnan
Date:
Hi All, 

I am doing purge activity my sales table contains 5M records I am going to delete more than 1 year data (which was 3M) records so it was running more so I want to do batch wise deletion through plsql




 created or replace function data_purge() returns void as$$
Declare
Begin
Drop table test_old;
Create table test_old as select * from sales where bill_date<now() -interval '1 year';

Delete table sales where sales_id in (select sales_id from test_old;

End;
$$ language plpgsql;


Kindly guide me

Re: Batch process

From
Ron
Date:
On 7/20/22 00:08, Rama Krishnan wrote:
> Hi All,
>
> I am doing purge activity my sales table contains 5M records I am going to 
> delete more than 1 year data (which was 3M) records so it was running more 
> so I want to do batch wise deletion through plsql
>
>
>
>
>  created or replace function data_purge() returns void as$$
> Declare
> Begin
> Drop table test_old;
> Create table test_old as select * from sales where bill_date<now() 
> -interval '1 year';
>
> Delete table sales where sales_id in (select sales_id from test_old;
>
> End;
> $$ language plpgsql;
>
>
> Kindly guide me

Why not just DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';?

-- 
Angular momentum makes the world go 'round.



Re: Batch process

From
Adrian Klaver
Date:
On 7/20/22 01:28, Ron wrote:
> On 7/20/22 00:08, Rama Krishnan wrote:
>> Hi All,
>>
>> I am doing purge activity my sales table contains 5M records I am 
>> going to delete more than 1 year data (which was 3M) records so it was 
>> running more so I want to do batch wise deletion through plsql
>>
>>
>>
>>
>>  created or replace function data_purge() returns void as$$
>> Declare
>> Begin
>> Drop table test_old;
>> Create table test_old as select * from sales where bill_date<now() 
>> -interval '1 year';
>>
>> Delete table sales where sales_id in (select sales_id from test_old;
>>
>> End;
>> $$ language plpgsql;
>>
>>
>> Kindly guide me
> 
> Why not just DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';?
> 

Because it looks like the OP is saving the old records in test_old.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Batch process

From
Adrian Klaver
Date:
On 7/19/22 22:08, Rama Krishnan wrote:
> Hi All,
> 
> I am doing purge activity my sales table contains 5M records I am going 
> to delete more than 1 year data (which was 3M) records so it was running 
> more so I want to do batch wise deletion through plsql
> 
> 
> 
> 
>   created or replace function data_purge() returns void as$$
> Declare
> Begin
> Drop table test_old;
> Create table test_old as select * from sales where bill_date<now() 
> -interval '1 year';
> 
> Delete table sales where sales_id in (select sales_id from test_old;
> 
> End;
> $$ language plpgsql;
> 
> 
> Kindly guide me
> 

This looks like a case, going forward for partitioning:

https://www.postgresql.org/docs/14/ddl-partitioning.html


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Batch process

From
Rama Krishnan
Date:
Hi Adrian, 

Thanks for the update. 


Is it possible to achieve 10000 records deletion on every iteration because my original table contains 5M records during the deletion process it consumes more cpu and resources. 

On Wed, 20 Jul, 2022, 20:37 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:
On 7/19/22 22:08, Rama Krishnan wrote:
> Hi All,
>
> I am doing purge activity my sales table contains 5M records I am going
> to delete more than 1 year data (which was 3M) records so it was running
> more so I want to do batch wise deletion through plsql
>
>
>
>
>   created or replace function data_purge() returns void as$$
> Declare
> Begin
> Drop table test_old;
> Create table test_old as select * from sales where bill_date<now()
> -interval '1 year';
>
> Delete table sales where sales_id in (select sales_id from test_old;
>
> End;
> $$ language plpgsql;
>
>
> Kindly guide me
>

This looks like a case, going forward for partitioning:

https://www.postgresql.org/docs/14/ddl-partitioning.html


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Batch process

From
Rob Sargent
Date:
On 7/20/22 09:26, Rama Krishnan wrote:
Hi Adrian, 

Thanks for the update. 


Is it possible to achieve 10000 records deletion on every iteration because my original table contains 5M records during the deletion process it consumes more cpu and resources.


See here for date manipulation functions.  you can delete by month or week, which ever makes it fit your hardware

Re: Batch process

From
Adrian Klaver
Date:
On 7/20/22 9:38 AM, Rama Krishnan wrote:

Reply to list also
Ccing list.
> Hi ALL,
> 
> I have created the batch wise query but the variable is not working in 
> the delete.
> 
> 
> create or replace function sports_sale() returns void as $$
> declare
>     counter integer := 0;
>     row_count integer :=0;
>     start integer :=1;
> 
> begin
> 
>     SELECT sale_start_count INTO start FROM sale_delete_counter ORDER BY 
> ID DESC LIMIT 1;
>     SELECT sale_loop_counter INTO counter FROM sale_delete_counter ORDER 
> BY ID DESC LIMIT 1;
>     SELECT count(*) INTO row_count FROM sports_sale_archive;
>     SELECT ceil(row_count/10000) INTO row_count;
>       while counter < row_count loop
>         raise notice 'Counter %', counter;
>         delete from sports_sale where id in (select id from 
> sports_sale_archive WHERE id between start and start+9999);
>         counter := counter + 1;
>         start :=start+10000;
>         INSERT INTO 
> sale_delete_counter(sale_start_count,sale_loop_counter) 
> VALUES(start,counter);
>         exit when counter>5;
>       end loop;
> end;$$ language plpgsql;

1) I thought this was a date based deletion?

2) How do you know that the ids in "id between start and start+9999" 
actually exist?

3) Could this not be simplified to something like?:

create or replace function sports_sale() returns void as $$
declare
    counter integer;

begin
     select count(*) into counter from sports_sale_archive where 
<date/id> between <start> and <end>;

      while counter > 0 loop
        raise notice 'Counter %', counter;
        delete from sports_sale where id in (select id from 
sports_sale_archive WHERE id between <start> and <end> order by 
<date/id> limit 10000);
        counter := counter - 10000;
      end loop;
end;$$ language plpgsql;


Not tested and should be taken as starting point as it is not entirely 
clear to me what you are trying to achieve.
> 
> 
> Here I have created the archive table based on created_date with more 
> one year data. i want to execute this delete query using batch wise(each 
> iteration 10K totally 50K records deletion per execution ). The issue 
> was that the variable was not working in deletion subquery.
> Regards
> 
> A.Rama Krishnan
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Batch process

From
Ron
Date:
But the first thing he does is drop test_old.

On 7/20/22 09:52, Adrian Klaver wrote:
> On 7/20/22 01:28, Ron wrote:
>> On 7/20/22 00:08, Rama Krishnan wrote:
>>> Hi All,
>>>
>>> I am doing purge activity my sales table contains 5M records I am going 
>>> to delete more than 1 year data (which was 3M) records so it was running 
>>> more so I want to do batch wise deletion through plsql
>>>
>>>
>>>
>>>
>>>  created or replace function data_purge() returns void as$$
>>> Declare
>>> Begin
>>> Drop table test_old;
>>> Create table test_old as select * from sales where bill_date<now() 
>>> -interval '1 year';
>>>
>>> Delete table sales where sales_id in (select sales_id from test_old;
>>>
>>> End;
>>> $$ language plpgsql;
>>>
>>>
>>> Kindly guide me
>>
>> Why not just DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';?
>>
>
> Because it looks like the OP is saving the old records in test_old.
>
>

-- 
Angular momentum makes the world go 'round.



Re: Batch process

From
Adrian Klaver
Date:
On 7/20/22 11:16 AM, Ron wrote:
> But the first thing he does is drop test_old.

Then:

Create table test_old as select * from sales where bill_date<now() 
-interval '1 year';

At that point you could do either:

Delete from sales where sales_id in (select sales_id from test_old);

or

DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';


> 
> On 7/20/22 09:52, Adrian Klaver wrote:
>> On 7/20/22 01:28, Ron wrote:
>>> On 7/20/22 00:08, Rama Krishnan wrote:
>>>> Hi All,
>>>>
>>>> I am doing purge activity my sales table contains 5M records I am 
>>>> going to delete more than 1 year data (which was 3M) records so it 
>>>> was running more so I want to do batch wise deletion through plsql
>>>>
>>>>
>>>>
>>>>
>>>>  created or replace function data_purge() returns void as$$
>>>> Declare
>>>> Begin
>>>> Drop table test_old;
>>>> Create table test_old as select * from sales where bill_date<now() 
>>>> -interval '1 year';
>>>>
>>>> Delete table sales where sales_id in (select sales_id from test_old;
>>>>
>>>> End;
>>>> $$ language plpgsql;
>>>>
>>>>
>>>> Kindly guide me
>>>
>>> Why not just DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 
>>> year';?
>>>
>>
>> Because it looks like the OP is saving the old records in test_old.
>>
>>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Batch process

From
Ron
Date:
How will DELETE WHERE sales_id IN (...); , given that test_old has no index?

Of course, we don't know if there's an index on sales.bill_date, since OP's 
only response has been another "tell me how to do it".

On 7/20/22 13:32, Adrian Klaver wrote:
> On 7/20/22 11:16 AM, Ron wrote:
>> But the first thing he does is drop test_old.
>
> Then:
>
> Create table test_old as select * from sales where bill_date<now() 
> -interval '1 year';
>
> At that point you could do either:
>
> Delete from sales where sales_id in (select sales_id from test_old);
>
> or
>
> DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';
>
>
>>
>> On 7/20/22 09:52, Adrian Klaver wrote:
>>> On 7/20/22 01:28, Ron wrote:
>>>> On 7/20/22 00:08, Rama Krishnan wrote:
>>>>> Hi All,
>>>>>
>>>>> I am doing purge activity my sales table contains 5M records I am 
>>>>> going to delete more than 1 year data (which was 3M) records so it was 
>>>>> running more so I want to do batch wise deletion through plsql
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  created or replace function data_purge() returns void as$$
>>>>> Declare
>>>>> Begin
>>>>> Drop table test_old;
>>>>> Create table test_old as select * from sales where bill_date<now() 
>>>>> -interval '1 year';
>>>>>
>>>>> Delete table sales where sales_id in (select sales_id from test_old;
>>>>>
>>>>> End;
>>>>> $$ language plpgsql;
>>>>>
>>>>>
>>>>> Kindly guide me
>>>>
>>>> Why not just DELETE FROM sales WHERE bill_date<NOW() -INTERVAL '1 year';?
>>>>
>>>
>>> Because it looks like the OP is saving the old records in test_old.
>>>
>>>
>>
>
>

-- 
Angular momentum makes the world go 'round.



Re: Batch process

From
Adrian Klaver
Date:
On 7/20/22 11:56, Ron wrote:
> How will DELETE WHERE sales_id IN (...); , given that test_old has no 
> index?
> 
> Of course, we don't know if there's an index on sales.bill_date, since 
> OP's only response has been another "tell me how to do it".
> 

This awaits more information. At this point it is about the journey not 
the destination:)


-- 
Adrian Klaver
adrian.klaver@aklaver.com