Re: Aggregate ORDER BY patch - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Aggregate ORDER BY patch
Date
Msg-id 871vjzl82a.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Aggregate ORDER BY patch  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: Aggregate ORDER BY patch
List pgsql-hackers
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> Performance.
Andrew> tuplesort_getdatum etc. seems to be substantially faster thanAndrew> tuplesort_gettupleslot especially for the
casewhere you'reAndrew> sorting a pass-by-value datum such as an integer (since theAndrew> datum is then stored only in
thesort tuple header andAndrew> doesn't require a separate space allocation forAndrew> itself). Using a slot in all
caseswould have slowed downAndrew> some common cases like count(distinct id) by a measurableAndrew> amount.
 
Andrew> Cases like array_agg(x order by x) benefit from the fasterAndrew> code path too.
Andrew> The memory management between the two cases is sufficientlyAndrew> different that combining them into one
functionwhile stillAndrew> maintaining the slot vs. datum distinction would be ugly andAndrew> probably error-prone.
Therelatively minor duplication ofAndrew> logic seemed much clearer to me.
 

Just to quantify this, using a production-quality build (optimized and
without assertions), it turns out that the fast code path
(process_ordered_aggregate_single) is faster by 300% for pass-by-value
types, and by approximately 20% for short values of pass-by-reference
types, as compared to disabling that code path and forcing even the
one-arg case to use the slot interface.

So using the slot interface for everything would have constituted a
300% slowdown over the older code for count(distinct id), obviously
undesirable.

As it stands, I can't detect any performance regression over the
previous code.

This means that agg(x order by y) is rather noticably slower than
agg(x order by x), but this is pretty much unavoidable given how the
sorting code works.

Future performance enhancements (which I have no particular plans to
tackle) would involve having the planner consult the desired aggregate
orderings and estimating the cost of sorting as opposed to the cost of
producing a plan with the input already ordered. Also combining the
sort step for aggregates that share a single ordering.

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Summary and Plan for Hot Standby
Next
From: "David E. Wheeler"
Date:
Subject: Re: Summary and Plan for Hot Standby