Re: Improving avg performance for numeric - Mailing list pgsql-hackers

From Hadi Moshayedi
Subject Re: Improving avg performance for numeric
Date
Msg-id CAK=1=WoQLNiGF0WL9E821iN_m1mAFK0MUJFzuhO_3uCOzJEMDg@mail.gmail.com
Whole thread Raw
In response to Re: Improving avg performance for numeric  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Improving avg performance for numeric  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Hello,

I updated the patch by taking ideas from your patch, and unifying the transition struct and update function for different aggregates. The speed of avg improved even more. It now has 60% better performance than the current committed version.

This patch optimizes numeric/int8 sum, avg, stddev_pop, stddev_samp, var_pop, var_samp.

I also noticed that this patch makes matview test fail. It seems that it just changes the ordering of rows for queries like "SELECT * FROM tv;". Does this seem like a bug in my patch, or should we add "ORDER BY" clauses to this test to make it more deterministic?

I also agree with you that adding sum functions to use preallocated buffers will make even more optimization. I'll try to see if I can find a simple way to do this.

Thanks,
  -- Hadi

On Mon, Mar 18, 2013 at 5:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

I played with sum(numeric) optimization

Now it is based on generic numeric_add function - this code is
relative old - now we can design aggregates with internal transition
buffers, and probably we can do this work more effective.

I just removed useles palloc/free operations and I got a 30% better
performance! My patch is ugly - because I used a generic add_var
function. Because Sum, Avg and almost all aggregates functions is
limited by speed of sum calculation I thing so we need a new numeric
routines optimized for calculation "sum", that use a only preallocated
buffers. A speed of numeric is more important now, because there are
more and more warehouses, where CPU is botleneck.

Regards

Pavel


2013/3/18 Hadi Moshayedi <hadi@moshayedi.net>:
> Hi Pavel,
>
> Thanks a lot for your feedback.
>
> I'll work more on this patch this week, and will send a more complete patch
> later this week.
>
> I'll also try to see how much is the speed up of this method for other
> types.
>
> Thanks,
>   -- Hadi
>
>
> On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> 2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
>> > Revisiting:
>> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>> >
>> > I think the reasons which the numeric average was slow were:
>> >   (1) Using Numeric for count, which is slower than int8 to increment,
>> >   (2) Constructing/deconstructing arrays at each transition step.
>> >
>> > This is also discussed at:
>> >
>> > http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>> >
>> > So, I think we can improve the speed of numeric average by keeping the
>> > transition state as an struct in the aggregate context, and just passing
>> > the
>> > pointer to that struct from/to the aggregate transition function.
>> >
>> > The attached patch uses this method.
>> >
>> > I tested it using the data generated using:
>> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
>> > generate_series(1, 10000000) s;
>> >
>> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
>> > improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
>> > improvement.
>> >
>> > I think we may also be able to use a similar method to improve
>> > performance
>> > of some other numeric aggregates (like stddev). But I want to know your
>> > feedback first.
>> >
>> > Is this worth working on?
>>
>> I checked this patch and it has a interesting speedup - and a price of
>> this methoud should not be limited to numeric type only
>>
>> Pavel
>>
>> >
>> > Thanks,
>> >   -- Hadi
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>
>

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Trust intermediate CA for client certificates
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used