Re: “Loop” in plpgsql Function - PostgreSQL 9.2 - Mailing list pgsql-general

From mariusz
Subject Re: “Loop” in plpgsql Function - PostgreSQL 9.2
Date
Msg-id 1453196385.20194.75.camel@marlap
Whole thread Raw
In response to “Loop” in plpgsql Function - PostgreSQL 9.2  ("drum.lucas@gmail.com" <drum.lucas@gmail.com>)
List pgsql-general
hello,


On Tue, 2016-01-19 at 10:37 +1300, drum.lucas@gmail.com wrote:
> I've created a function that allows me to do an huge update.
>
> But I need to limit this function. I need to do 50k rows (example) and
> then stop it. After that I need to continue from the rows that I've
> stopped... I'll have to modify the call function select
> batch_number() as well.
>
> How can I do that? Using for?
>
>         The function below is already working, but, th
>         e table has 40m rows. And it's taking years.
>
>
> FUNCTION:
>
> CREATE or REPLACE FUNCTION batch_number()
> RETURNS INTEGER AS $$
>
> declare
>    batch_num integer;
>    offset_num integer;
> begin
>     offset_num = 0;
>     batch_num = 1;
>
>     while (select true from gorfs.nfs_data where batch_number is null limit 1) loop
>         with ids(id) as
>         (
>             select
>                 file_id
>             from
>                 gorfs.nfs_data
>             order by
>                 file_id
>             offset offset_num
>             limit 1000
>         )
>         update
>             gorfs.nfs_data
>         set
>             batch_number = batch_num
>         from ids
>         where file_id = ids.id;
>
>         offset_num = offset_num + 1000;
>         batch_num = batch_num + 1;
>     end loop;
>
>     return batch_num;
> end
>
> $$ language 'plpgsql';

do i read correctly you want to do something like that:

WITH b(n,i) AS
  (SELECT row_number() OVER (ORDER BY file_id), file_id
     FROM gorfs.nfs_data)
UPDATE gorfs.nfs_data SET batch_number = ((b.n-1)/1000)+1
  FROM b WHERE file_id = b.i

apply math to match your desired result, this one based on your fun.

which should not scan that many times for batch_number being null. i
don't know implementation details and i do not know if you have any
partial index where batch_number is null which could be used to speed
this up.

of cource this is for one run to change all at once.
but if you really want multiple tx on this how about that:

CREATE INDEX ON gorfs.nfs_data (file_id) WHERE batch_number IS NULL;

find next batchnum to use eg max(batch_number)+1 from gorfs.nfs_data,
or some sequence, locking or whatever if concurency migth byte you.
let it be YourNextBN

WITH b(i) AS
  (SELECT file_id
     FROM gorfs.nfs_data
     WHERE batch_number IS NULL ORDER BY file_id LIMIT 1000
     FOR UPDATE)
UPDATE gorfs.nfs_data SET batch_number = YourNextBN
  FROM b WHERE batch_number IS NULL AND file_id = b.i

now, wouldn't it be little simpler then your loop with multiple scanning
same data and throwing it out via offset?

on the other hand, if you think creating simple small index on file_id
where batch_number is null is not an option, then this may not give you
real speedup

of cource your real life example might be more complicated than one
column update, but looking at your fun, it scared me, i mean, your fun
updates every one row in table anyway, so why so many times with
additional scan to check loop condition? it gets worse if there is no
partial index where batch_number is null.

regards, mariusz




pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: “Loop” in plpgsql Function -PostgreSQL 9.2
Next
From: DerekW
Date:
Subject: Re: Error installing 9.5 on Win 2012 R2: data dir not created