Thread: seemingly slow for-loop in plpgsql

seemingly slow for-loop in plpgsql

From

"Claus Guttesen"

Date:

02 September 2008, 19:29:52

Hi.

I have a table with a hashcode-field which is a md5-checksum of a
file. I updated all null-entries using a rails-script by calling
'/sbin/md5 -q' (on FreeBSD). When all null-entries were updated I
found out that '\n' was added to the md5-checksum. :-)

So I wanted to update the table using plpgsql. As I understand it from
the docs (http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html)
in section 38.6.1.2. RETURN NEXT and RETURN QUERY that ' ... if a
PL/pgSQL function produces a very large result set, performance might
be poor: ...'.

select count(*) from duplicates;count
----------
134673

select count(*) from duplicates where length(hashcode) = 33;count
--------
31731

\d duplicates

Table "public.duplicates"
Column     |  Type   | Modifiers
--------------+----------+---------------------------------------------------------id              | integer | not null
default
nextval('duplicates_id_seq'::regclass)uid            | integer |filename   | text      |hashcode | text      |
Indexes:   "duplicates_hashcode_idx" btree (hashcode)   "duplicates_uid_idx" btree (uid)


create or replace function update_hashcode() returns setof duplicates as
$body$
declare d duplicates%rowtype; h text;
begin for d in select * from duplicates where length(hashcode) = 33 loop   h := rtrim(d.hashcode, E'\n');   update
duplicatesset hashcode = h where id = d.id;   return next d; end loop;
 
end
$body$
language 'plpgsql' ;

select count(*) from update_hashcode();

Postgres is 8.3.3 on FreeBSD current on a test-server with an opteron
at 2 GHz and 4 GB ram. The server is not the fastest around but I have
another table with 85 mill. entries where 12 mill. have '\n' as part
of the hashcode. The prod.server is a HP DL360 with a p800-controller
so it's much faster but the script will still be too slow to make this
solution viable.

How can I tune the plpgsql-script? Using cursors? I tried with a
cursor-based script and ended up with this skeleton-script:

create or replace function update_hashcode(refcursor) returns refcursor as '
declare d duplicates%rowtype; h text;
begin open $1 for select * from duplicates; return $1;
end;
' language plpgsql;

begin;
select update_hashcode('funccursor');
fetch next in funccursor;
commit;

which fetches the next row. But how can I iterate over the rows using cursors?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

Re: seemingly slow for-loop in plpgsql

From

Tom Lane

Date:

02 September 2008, 19:48:25

"Claus Guttesen" <kometen@gmail.com> writes:
> create or replace function update_hashcode() returns setof duplicates as
> $body$
> declare
>   d duplicates%rowtype;
>   h text;
> begin
>   for d in select * from duplicates where length(hashcode) = 33 loop
>     h := rtrim(d.hashcode, E'\n');
>     update duplicates set hashcode = h where id = d.id;
>     return next d;
>   end loop;
> end
> $body$
> language 'plpgsql' ;

Why in the world are you using a for-loop for this at all?  It would be
tremendously faster as a single SQL command:

update duplicates set hashcode = rtrim(hashcode, E'\n') where length(hashcode) = 33;
        regards, tom lane

Re: seemingly slow for-loop in plpgsql

From

"Claus Guttesen"

Date:

03 September 2008, 04:20:47

> Why in the world are you using a for-loop for this at all?  It would be
> tremendously faster as a single SQL command:
>
> update duplicates set hashcode = rtrim(hashcode, E'\n') where length(hashcode) = 33;

Thank you. I was "caught" in the plpgsql-lane. This is of course much
simpler and much faster.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare