Re: “Loop” in plpgsql Function - PostgreSQL 9.2 - Mailing list pgsql-general
From | mariusz |
---|---|
Subject | Re: “Loop” in plpgsql Function - PostgreSQL 9.2 |
Date | |
Msg-id | 1453196385.20194.75.camel@marlap Whole thread Raw |
In response to | “Loop” in plpgsql Function - PostgreSQL 9.2 ("drum.lucas@gmail.com" <drum.lucas@gmail.com>) |
List | pgsql-general |
hello, On Tue, 2016-01-19 at 10:37 +1300, drum.lucas@gmail.com wrote: > I've created a function that allows me to do an huge update. > > But I need to limit this function. I need to do 50k rows (example) and > then stop it. After that I need to continue from the rows that I've > stopped... I'll have to modify the call function select > batch_number() as well. > > How can I do that? Using for? > > The function below is already working, but, th > e table has 40m rows. And it's taking years. > > > FUNCTION: > > CREATE or REPLACE FUNCTION batch_number() > RETURNS INTEGER AS $$ > > declare > batch_num integer; > offset_num integer; > begin > offset_num = 0; > batch_num = 1; > > while (select true from gorfs.nfs_data where batch_number is null limit 1) loop > with ids(id) as > ( > select > file_id > from > gorfs.nfs_data > order by > file_id > offset offset_num > limit 1000 > ) > update > gorfs.nfs_data > set > batch_number = batch_num > from ids > where file_id = ids.id; > > offset_num = offset_num + 1000; > batch_num = batch_num + 1; > end loop; > > return batch_num; > end > > $$ language 'plpgsql'; do i read correctly you want to do something like that: WITH b(n,i) AS (SELECT row_number() OVER (ORDER BY file_id), file_id FROM gorfs.nfs_data) UPDATE gorfs.nfs_data SET batch_number = ((b.n-1)/1000)+1 FROM b WHERE file_id = b.i apply math to match your desired result, this one based on your fun. which should not scan that many times for batch_number being null. i don't know implementation details and i do not know if you have any partial index where batch_number is null which could be used to speed this up. of cource this is for one run to change all at once. but if you really want multiple tx on this how about that: CREATE INDEX ON gorfs.nfs_data (file_id) WHERE batch_number IS NULL; find next batchnum to use eg max(batch_number)+1 from gorfs.nfs_data, or some sequence, locking or whatever if concurency migth byte you. let it be YourNextBN WITH b(i) AS (SELECT file_id FROM gorfs.nfs_data WHERE batch_number IS NULL ORDER BY file_id LIMIT 1000 FOR UPDATE) UPDATE gorfs.nfs_data SET batch_number = YourNextBN FROM b WHERE batch_number IS NULL AND file_id = b.i now, wouldn't it be little simpler then your loop with multiple scanning same data and throwing it out via offset? on the other hand, if you think creating simple small index on file_id where batch_number is null is not an option, then this may not give you real speedup of cource your real life example might be more complicated than one column update, but looking at your fun, it scared me, i mean, your fun updates every one row in table anyway, so why so many times with additional scan to check loop condition? it gets worse if there is no partial index where batch_number is null. regards, mariusz
pgsql-general by date: