Re: Newbie questions relating to transactions - Mailing list pgsql-general

From Carl Sopchak
Subject Re: Newbie questions relating to transactions
Date
Msg-id 200903081317.13158.carl.sopchak@cegis123.com
Whole thread Raw
In response to Re: Newbie questions relating to transactions  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-general
On Sunday, March 08, 2009, Alvaro Herrera wrote:
> Carl Sopchak wrote:
> > Well, the upgrade to 8.3 seemed to rid me of the command limit, but now
> > I'm running out of memory.  I have 2Gb physical and 8Gb swap (after
> > adding 4Gb).
>
> Do you have AFTER triggers on the involved tables?  They are recorded on
> memory and we have no mechanism to spill to disk, so it's frequent that
> those cause out-of-memory.  If that's the explanation, your workaround
> would be to get rid of them.
No triggers have been defined on any tables in the database...

>
> > Is there a way for me to run this outside of one huge transaction?  This
> > really shouldn't be using more than a few hundred megs of RAM (assuming
> > cursor records are all stored in memory)...
>
> Hmm, maybe you're holding too many cursors open and not closing them
> timely?  Did you post your function for review?
I am only using one cursor, which is opened and closed repeatedly.  It pulls
the base data from the database for the calculations.

I have not posted the function for review yet because its function is
proprietary.  However, I have stripped out the proprietary stuff, and include
the code below.  I marked everything stripped out by placing a brief
description enclosed within {{ and }}.  I left all of the places that the
database is accessed in the code.  I changed some of the line wrapping to fit
a reasonable width (which I mention in case you see syntax type errors).

There is a few lines that save the calculation details based on a flag in the
trial_header table.  This flag is set to N for the run that I am having
issues with, so these records are not being created.  I left that code in
below for completeness...

I realize this isn't probably the cleanest code out there (I'm sure using
prepared statements would help speed), but it was really meant to be a "quick
and dirty" way to calculate the data I need.  Any comments or suggestions on
improving the code is welcome.

create or replace function Run_Trial (tid integer) returns void as $proc$

declare
  {{ declarations }}
begin
   -- Set start time...
   program_version := '1.16';
   update trial_header set start_timestamp = clock_timestamp(),
      run_version = program_version, end_timestamp = null
      where trial_id = tid;

   -- get rid of prior run, if any:
   delete from trial_results where trial_id = tid;
   delete from trial_calc_detail where trial_id = tid;

   -- Get the trial parameters:
   select * into trial_hdr from trial_header where trial_id = tid;

   {{ Do some calculations  }}

   -- Create temp table of data.  This simplifies the coding below A LOT.
   {{ conditional calc }}

   -- (I can't figure out how to do this with a dynamic select and
   -- "insert into trial_data select" and I get an error
   -- if I make the dynamic SQL a "select into temp table trial_data"...)
   -- Do it by brute force, I guess...
   drop table if exists trial_data;
   create temp table trial_data(
      {{ fields }}
   );
   for row in execute
      'select {{ select statement }}' loop
         execute 'insert into trial_data values(' || {{ fields }} || ')';
   end loop;

   create index trial_data_idx on trial_data (data_yyyymm, data_date);
   create index trial_data_idx2 on trial_data (data_date);

   -- Get date range for the data set we're using
   for row in execute 'select min(data_date) as min_date,
              max(data_date) as max_date from trial_data' loop
      low_data_date := row.min_date;
      high_data_date := row.max_date;
   end loop;

   -- Calculate maximum number of years that data covers
   max_years = floor((high_data_date - low_data_date) / 365.25);

   --  Loop through all possible "x year" periods
   for cur_years in 1 .. max_years loop

      -- start from the first period on file:
      next_iteration_start := low_data_date;

      num_periods := trial_hdr.periods_per_year * cur_years
                                 + trial_hdr.{{ field }};

      for row in execute 'select count(*) as cnt from (
               select data_date from trial_data where data_date >= ' ||
                         quote_literal(next_iteration_start) ||
               ' Limit ' || to_char(num_periods, '9999999999') || ') a' loop
         data_periods := row.cnt;
      end loop;

      -- Do each "x year" period in data
      while data_periods = num_periods loop

         -- Initialize calculation

         -- used to set sucessive values for next_iteration_start:
         iteration_counter := 0;

         {{ some calculations }}

         for row in execute 'select max(data_date) as max_date' ||
                  ' from ( select data_date from trial_data' ||
                  ' where data_date >= ' ||
                   quote_literal(next_iteration_start) ||
                  ' order by data_date' ||
                  ' Limit ' || to_char(num_periods, '999999999') || ') a' loop
            per_end_date := row.max_date;
         end loop;

         -- Get data for calculation
         open data_cursor for execute 'select * from trial_data' ||
            ' where data_date >= ' || quote_literal(next_iteration_start) ||
            ' order by data_date ' ||
            ' Limit ' || to_char(num_periods, '999999999');

         loop  -- through periods for calculation
            fetch data_cursor into data;
            if not found then
               exit;
            end if;

            -- determine next iteration start date:
            iteration_counter := iteration_counter + 1;
            if iteration_counter = 1 then
               {{ calculations }}
            end if;
            if iteration_counter = 2 then
               next_iteration_start := data.data_date;
            end if;

            {{ calculations based on row data }}

            -- save details if requested:
            if upper(trial_hdr.save_calc_details) = 'Y' then
               insert into trial_calc_detail values( {{ fields }} );
            end if;

            {{ Calculation }}
         end loop;  -- through periods for calculation

         -- Final calculations:
         {{ calculations }}

         -- save results:
         insert into trial_results values( {{ fields }} );

         close data_cursor;

         for row in execute 'select count(*) as cnt from (
                  select data_date from trial_data where data_date >= ' ||
                  quote_literal(next_iteration_start) ||
                  ' Limit ' || to_char(num_periods, '9999999999') || ') a'
           loop
            data_periods := row.cnt;
         end loop;

      end loop;  -- Do each "x year" period in data

   end loop;  -- loop through all possible "x year" periods

   -- mark finish time on trial:
   update trial_header set end_timestamp = clock_timestamp()
             where trial_id = tid;

   -- clean up:
   drop table trial_data;
end;

$proc$ language plpgsql;

pgsql-general by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Newbie questions relating to transactions
Next
From: Christian Schröder
Date:
Subject: Re: Performance of subselects