Hi,
I've been using the intagg code to perform aggregations under the
assumption that it's going to be more efficient than the array_accum
documented elsewhere[1].
I'm seeing big memory leaks when doing a query like:
SELECT d.source_loc_id, d.movement_date - '2006-1-1',
array_to_string(int_array_aggregate(l.source_ls_id),' ') AS livestockids
FROM movedates d, livestock_locations l
WHERE l.source_loc_id = d.source_loc_id
AND d.movement_date BETWEEN l.start_date AND COALESCE(l.end_date,'2500-1-1')
GROUP BY d.source_loc_id, d.movement_date
ORDER BY d.movement_date, d.source_loc_id;
Explain gives the following reasonable plan:
Sort (cost=340841771.28..340843520.38 rows=699639 width=12)
Sort Key: d.movement_date, d.source_loc_id
-> HashAggregate (cost=340761605.76..340773849.45 rows=699639 width=12)
-> Merge Join (cost=19364430.15..327907117.88 rows=1713931718 width=12)
Merge Cond: (d.source_loc_id = l.source_loc_id)
Join Filter: ((d.movement_date >= l.start_date) AND (d.movement_date <= COALESCE(l.end_date,
'2500-01-01'::date)))
-> Sort (cost=899684.97..917175.93 rows=6996384 width=8)
Sort Key: d.source_loc_id
-> Seq Scan on movedates d (cost=0.00..104259.84 rows=6996384 width=8)
-> Sort (cost=18464745.18..18733010.76 rows=107306232 width=16)
Sort Key: l.source_loc_id
-> Seq Scan on livestock_locations l (cost=0.00..2134386.32 rows=107306232 width=16)
(12 rows)
But I'm getting an out of memory (i.e. RAM, not disk space) error after
a while. I've broken the query down into chunks to solve my immediate
problem, but if anyone has any better solutions that would be great.
I'm assuming the arrays that int_array_aggregate() returns aren't ever
getting released. Memory usage goes to a few gigs (it's a 32bit build)
before bombing out.
Thanks,
Sam
[1] http://www.postgresql.org/docs/current/static/xaggr.html