Thread: seemingly slow for-loop in plpgsql
Hi. I have a table with a hashcode-field which is a md5-checksum of a file. I updated all null-entries using a rails-script by calling '/sbin/md5 -q' (on FreeBSD). When all null-entries were updated I found out that '\n' was added to the md5-checksum. :-) So I wanted to update the table using plpgsql. As I understand it from the docs (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html) in section 38.6.1.2. RETURN NEXT and RETURN QUERY that ' ... if a PL/pgSQL function produces a very large result set, performance might be poor: ...'. select count(*) from duplicates;count ---------- 134673 select count(*) from duplicates where length(hashcode) = 33;count -------- 31731 \d duplicates Table "public.duplicates" Column | Type | Modifiers --------------+----------+---------------------------------------------------------id | integer | not null default nextval('duplicates_id_seq'::regclass)uid | integer |filename | text |hashcode | text | Indexes: "duplicates_hashcode_idx" btree (hashcode) "duplicates_uid_idx" btree (uid) create or replace function update_hashcode() returns setof duplicates as $body$ declare d duplicates%rowtype; h text; begin for d in select * from duplicates where length(hashcode) = 33 loop h := rtrim(d.hashcode, E'\n'); update duplicatesset hashcode = h where id = d.id; return next d; end loop; end $body$ language 'plpgsql' ; select count(*) from update_hashcode(); Postgres is 8.3.3 on FreeBSD current on a test-server with an opteron at 2 GHz and 4 GB ram. The server is not the fastest around but I have another table with 85 mill. entries where 12 mill. have '\n' as part of the hashcode. The prod.server is a HP DL360 with a p800-controller so it's much faster but the script will still be too slow to make this solution viable. How can I tune the plpgsql-script? Using cursors? I tried with a cursor-based script and ended up with this skeleton-script: create or replace function update_hashcode(refcursor) returns refcursor as ' declare d duplicates%rowtype; h text; begin open $1 for select * from duplicates; return $1; end; ' language plpgsql; begin; select update_hashcode('funccursor'); fetch next in funccursor; commit; which fetches the next row. But how can I iterate over the rows using cursors? -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare
"Claus Guttesen" <kometen@gmail.com> writes: > create or replace function update_hashcode() returns setof duplicates as > $body$ > declare > d duplicates%rowtype; > h text; > begin > for d in select * from duplicates where length(hashcode) = 33 loop > h := rtrim(d.hashcode, E'\n'); > update duplicates set hashcode = h where id = d.id; > return next d; > end loop; > end > $body$ > language 'plpgsql' ; Why in the world are you using a for-loop for this at all? It would be tremendously faster as a single SQL command: update duplicates set hashcode = rtrim(hashcode, E'\n') where length(hashcode) = 33; regards, tom lane
> Why in the world are you using a for-loop for this at all? It would be > tremendously faster as a single SQL command: > > update duplicates set hashcode = rtrim(hashcode, E'\n') where length(hashcode) = 33; Thank you. I was "caught" in the plpgsql-lane. This is of course much simpler and much faster. -- regards Claus When lenity and cruelty play for a kingdom, the gentler gamester is the soonest winner. Shakespeare