Speeding aggregates by splitting the work - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Speeding aggregates by splitting the work
Date
Msg-id 20011004134911.A25410@svana.org
Whole thread Raw
List pgsql-general
I currently have a query that calculates a sum of a field grouping by some
fields in other tables, like this:

select a.aid, b.bid, c.cid, sum(d.amount)
from a, b, c, d
where a.aid = b.aid
and b.bid = c.bid
and c.cid = d.cid
group by a.aid, b.bid, c.cid;

Those ids are all primary keys of their respective tables. A has about 20
entries, B has about 2000, C has about 3000 and D has about 2 million.

What happens is that a huge join is done, sorted and then aggregated. The
sort takes a huge amount of memory and slows everything down.

My idea is to have the planner do the equivalent of:

select d.cid, sum(d.amount) from d;

Which will be about 3,000 rows, and *then* do the join with A, B and C to
produce the final result. As a bonus (at least under the current
arrangement), the result would be sorted by d.cid, lending itself to merge
joining.

This works because (as far as I can see) sum(x) == sum(sum(x)). Is this
feasable or am I missing some corner cases?

Generalising is harder. count(x) == sum(count(x)) and avg doesn't work
obviously at all. However, avg = sum/count, both of which can be simplified
according to the earlier rules. Naturally you could handcraft this but it
would possibly be nicer for the planner to do it.

Basically, it works because by sorting earlier the width is less thus not as
much memory is required to do the sorting. If my logic is wrong, let me know.
When I get back home I'll have to check whether the amount of required
memory is taken into account for a sort.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

pgsql-general by date:

Previous
From: "Michael R. Fahey"
Date:
Subject: Re: Can't add PL/PGSQL function even after running createlang.
Next
From: "Sameer Maggon"
Date:
Subject: Unique Identifier