Thread: commiting in/between functions

commiting in/between functions

From
Magnus Månsson
Date:
Hey ppl,
I am trying to convert some huge tables in a quite huge database (about
50gigs).  The convertion requires me to loop through 30 tables with 30
million rows each with a script an I chose plpgsql to do the job though I
have worked with oracles pl/sql before.

Making the script wasnt any big problem, the problem is the amount of memory
it wants to use, if I had a terabyte of memory I do not think that would be
enough, if I try to convert a 100th of the database at a time it uses all
the swap, 1G and the 512M memory.
Though all tables are statistics and static I know that noone changes in
them, so I would like to commit from time to time, but what I have found
commits are not allowed in a function, it has to be one transaction.

Is there any way to pass my problem besides making my plpgsql take arguments
and call the function about 500 times with different arguments? (that would
work if I could commit between to functions, one function to call and one to
do the work, but not even that is allowed)

And if this still isnt supported, is it planned in some how?

thanks in advance..


--
Magnus Månsson

Re: commiting in/between functions

From
Tom Lane
Date:
Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:
> Making the script wasnt any big problem, the problem is the amount of memory
> it wants to use, if I had a terabyte of memory I do not think that would be
> enough, if I try to convert a 100th of the database at a time it uses all
> the swap, 1G and the 512M memory.

It would help if you showed us the function and told us what version of
Postgres you're talking about.  This could just be a garden-variety
memory leakage problem.

            regards, tom lane

Re: commiting in/between functions

From
Tom Lane
Date:
Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:
> On Tue, Mar 11, 2003 at 10:34:08AM -0500, Tom Lane wrote:
>> It would help if you showed us the function and told us what version of
>> Postgres you're talking about.  This could just be a garden-variety
>> memory leakage problem.

> The version of postgresql running on the server is 7.1 and if needed I can
> test it on data I have duplic�ted to another server with 7.3.

Please try 7.3 before we look any further; 7.1 was a long time back and
we've fixed a number of intra-function memory leaks since then.

If you still see it in 7.3, please send along the CREATE TABLE commands
for the tables referenced by the function, and I'll try to duplicate the
problem here.

            regards, tom lane

Re: commiting in/between functions

From
Magnus Månsson
Date:
On Tue, Mar 11, 2003 at 10:34:08AM -0500, Tom Lane wrote:
> Magnus =?iso-8859-1?Q?M=E5nsson?= <ganja@0x63.nu> writes:
> > Making the script wasnt any big problem, the problem is the amount of memory
> > it wants to use, if I had a terabyte of memory I do not think that would be
> > enough, if I try to convert a 100th of the database at a time it uses all
> > the swap, 1G and the 512M memory.
>
> It would help if you showed us the function and told us what version of
> Postgres you're talking about.  This could just be a garden-variety
> memory leakage problem.
>
>             regards, tom lane
>

I have included my function below, a short explaination:
The connectionstats_yyyy_mm_dd tables contains information about connections
on the network, one tables for every day. hosts_yyyy_mm_dd contains
information about the hosts that made the connections, also one table for
every day.
connectionstats_2003_01_01 contains 1.7 million rows and hosts_2003_01_01
98.000 rows.

My goal is to get all connectionstats in one table and all hosts in one
table, though the connectionstats are referring to hosts using the id I have
to recreate the ids and give the connection-row the right id. I also have to
check that I haven't already inserted the host in the hosts-table from
another day (ip and hostname together should be a unique key).

If I run this function it will in some minutes eat up all the avail memory
(512+1024M) and after that the postmaster-process will be killed by the
kernel.

My workaround that I have made now is a version where I have starthour as an
parameter and it runs it in 1 hour-intervals to us less memory (1 day takes
24 function-calls).

The version of postgresql running on the server is 7.1 and if needed I can
test it on data I have duplicæted to another server with 7.3.


--create or replace function constats()
drop function constats()
create function constats()
returns int2 as '
declare
  rec1_ RECORD;
  rec2_ RECORD;
  rec3_ RECORD;
  client_ int;
  server_ int;
begin
  for rec1_ in select * from connectionstats_2003_01_01 loop
    select into rec2_ address, hostname, is_local from hosts_2003_01_01 where id=rec1_.client;
    if not found then
      raise notice ''foo: %'', rec1_.client;
    end if;
    select into rec3_ id from new_hosts where address=rec2_.address and hostname=rec2_.hostname;
    if not found then
        --The host was not found in the new table, insert it.
      insert into new_hosts (address, hostname, is_local) values (rec2_.address, rec2_.hostname, rec2_.is_local);
      get diagnostics client_ = result_oid;
    else
      client_ := rec3_.id;
    end if;
    select into rec2_ address, hostname, is_local from hosts_2003_01_01 where id=rec1_.server;
    if not found then
      raise notice ''bar'';
    end if;
    select into rec3_ id from new_hosts where address=rec2_.address and hostname=rec2_.hostname;
    if not found then
        --The host was not found in the new table, insert it.
      insert into new_hosts (address, hostname, is_local) values (rec2_.address, rec2_.hostname, rec2_.is_local);
      get diagnostics server_ = result_oid;
    else
      server_ := rec3_.id;
    end if;
    insert into new_connectionstats (time, client, server, server_port, protocol, service, inbound, outbound) values
    (rec1_.time, client_, server_, rec1_.server_port, rec1_.protocol, rec1_.service, rec1_.inbound, rec1_.outbound);
  end loop;
  return (0);
end;'
LANGUAGE 'plpgsql';

--
Magnus Månsson