intagg memory leak - Mailing list pgsql-general

From Sam Mason
Subject intagg memory leak
Date
Msg-id 20080606144655.GL1723@frubble.xen.chris-lamb.co.uk
Whole thread Raw
Responses Re: intagg memory leak
List pgsql-general
Hi,

I've been using the intagg code to perform aggregations under the
assumption that it's going to be more efficient than the array_accum
documented elsewhere[1].

I'm seeing big memory leaks when doing a query like:

  SELECT d.source_loc_id, d.movement_date - '2006-1-1',
    array_to_string(int_array_aggregate(l.source_ls_id),' ') AS livestockids
  FROM movedates d, livestock_locations l
  WHERE l.source_loc_id = d.source_loc_id
    AND d.movement_date BETWEEN l.start_date AND COALESCE(l.end_date,'2500-1-1')
  GROUP BY d.source_loc_id, d.movement_date
  ORDER BY d.movement_date, d.source_loc_id;

Explain gives the following reasonable plan:

 Sort  (cost=340841771.28..340843520.38 rows=699639 width=12)
   Sort Key: d.movement_date, d.source_loc_id
   ->  HashAggregate  (cost=340761605.76..340773849.45 rows=699639 width=12)
         ->  Merge Join  (cost=19364430.15..327907117.88 rows=1713931718 width=12)
               Merge Cond: (d.source_loc_id = l.source_loc_id)
               Join Filter: ((d.movement_date >= l.start_date) AND (d.movement_date <= COALESCE(l.end_date,
'2500-01-01'::date)))
               ->  Sort  (cost=899684.97..917175.93 rows=6996384 width=8)
                     Sort Key: d.source_loc_id
                     ->  Seq Scan on movedates d  (cost=0.00..104259.84 rows=6996384 width=8)
               ->  Sort  (cost=18464745.18..18733010.76 rows=107306232 width=16)
                     Sort Key: l.source_loc_id
                     ->  Seq Scan on livestock_locations l  (cost=0.00..2134386.32 rows=107306232 width=16)
(12 rows)

But I'm getting an out of memory (i.e. RAM, not disk space) error after
a while.  I've broken the query down into chunks to solve my immediate
problem, but if anyone has any better solutions that would be great.

I'm assuming the arrays that int_array_aggregate() returns aren't ever
getting released.  Memory usage goes to a few gigs (it's a 32bit build)
before bombing out.


Thanks,
  Sam

 [1] http://www.postgresql.org/docs/current/static/xaggr.html

pgsql-general by date:

Previous
From: Zoltan Boszormenyi
Date:
Subject: Heavily fragmented table and index data in 8.0.3
Next
From: "Ken Winter"
Date:
Subject: Re: Extracting data from deprecated MONEY fields