Thread: Improving avg performance for numeric

Improving avg performance for numeric

From
Hadi Moshayedi
Date:
Revisiting: http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz

I think the reasons which the numeric average was slow were:
  (1) Using Numeric for count, which is slower than int8 to increment,
  (2) Constructing/deconstructing arrays at each transition step.


So, I think we can improve the speed of numeric average by keeping the transition state as an struct in the aggregate context, and just passing the pointer to that struct from/to the aggregate transition function.

The attached patch uses this method. 

I tested it using the data generated using:
CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM generate_series(1, 10000000) s;

After applying this patch, run time of "SELECT avg(d) FROM avg_test;" improves from 10.701 seconds to 5.204 seconds, which seems to be a huge improvement.

I think we may also be able to use a similar method to improve performance of some other numeric aggregates (like stddev). But I want to know your feedback first.

Is this worth working on?

Thanks,
  -- Hadi

Attachment

Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
> Revisiting:
> http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>
> I think the reasons which the numeric average was slow were:
>   (1) Using Numeric for count, which is slower than int8 to increment,
>   (2) Constructing/deconstructing arrays at each transition step.
>
> This is also discussed at:
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>
> So, I think we can improve the speed of numeric average by keeping the
> transition state as an struct in the aggregate context, and just passing the
> pointer to that struct from/to the aggregate transition function.
>
> The attached patch uses this method.
>
> I tested it using the data generated using:
> CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> generate_series(1, 10000000) s;
>
> After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> improvement.
>
> I think we may also be able to use a similar method to improve performance
> of some other numeric aggregates (like stddev). But I want to know your
> feedback first.
>
> Is this worth working on?

nice

+1

Regards

Pavel


>
> Thanks,
>   -- Hadi
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
> Revisiting:
> http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>
> I think the reasons which the numeric average was slow were:
>   (1) Using Numeric for count, which is slower than int8 to increment,
>   (2) Constructing/deconstructing arrays at each transition step.
>
> This is also discussed at:
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>
> So, I think we can improve the speed of numeric average by keeping the
> transition state as an struct in the aggregate context, and just passing the
> pointer to that struct from/to the aggregate transition function.
>
> The attached patch uses this method.
>
> I tested it using the data generated using:
> CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> generate_series(1, 10000000) s;
>
> After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> improvement.
>
> I think we may also be able to use a similar method to improve performance
> of some other numeric aggregates (like stddev). But I want to know your
> feedback first.
>
> Is this worth working on?

I checked this patch and it has a interesting speedup - and a price of
this methoud should not be limited to numeric type only

Pavel

>
> Thanks,
>   -- Hadi
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
Hi Pavel,

Thanks a lot for your feedback. 

I'll work more on this patch this week, and will send a more complete patch later this week.

I'll also try to see how much is the speed up of this method for other types. 

Thanks,
  -- Hadi

On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
> Revisiting:
> http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>
> I think the reasons which the numeric average was slow were:
>   (1) Using Numeric for count, which is slower than int8 to increment,
>   (2) Constructing/deconstructing arrays at each transition step.
>
> This is also discussed at:
> http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>
> So, I think we can improve the speed of numeric average by keeping the
> transition state as an struct in the aggregate context, and just passing the
> pointer to that struct from/to the aggregate transition function.
>
> The attached patch uses this method.
>
> I tested it using the data generated using:
> CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
> generate_series(1, 10000000) s;
>
> After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
> improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
> improvement.
>
> I think we may also be able to use a similar method to improve performance
> of some other numeric aggregates (like stddev). But I want to know your
> feedback first.
>
> Is this worth working on?

I checked this patch and it has a interesting speedup - and a price of
this methoud should not be limited to numeric type only

Pavel

>
> Thanks,
>   -- Hadi
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

I played with sum(numeric) optimization

Now it is based on generic numeric_add function - this code is
relative old - now we can design aggregates with internal transition
buffers, and probably we can do this work more effective.

I just removed useles palloc/free operations and I got a 30% better
performance! My patch is ugly - because I used a generic add_var
function. Because Sum, Avg and almost all aggregates functions is
limited by speed of sum calculation I thing so we need a new numeric
routines optimized for calculation "sum", that use a only preallocated
buffers. A speed of numeric is more important now, because there are
more and more warehouses, where CPU is botleneck.

Regards

Pavel


2013/3/18 Hadi Moshayedi <hadi@moshayedi.net>:
> Hi Pavel,
>
> Thanks a lot for your feedback.
>
> I'll work more on this patch this week, and will send a more complete patch
> later this week.
>
> I'll also try to see how much is the speed up of this method for other
> types.
>
> Thanks,
>   -- Hadi
>
>
> On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> 2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
>> > Revisiting:
>> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>> >
>> > I think the reasons which the numeric average was slow were:
>> >   (1) Using Numeric for count, which is slower than int8 to increment,
>> >   (2) Constructing/deconstructing arrays at each transition step.
>> >
>> > This is also discussed at:
>> >
>> > http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>> >
>> > So, I think we can improve the speed of numeric average by keeping the
>> > transition state as an struct in the aggregate context, and just passing
>> > the
>> > pointer to that struct from/to the aggregate transition function.
>> >
>> > The attached patch uses this method.
>> >
>> > I tested it using the data generated using:
>> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
>> > generate_series(1, 10000000) s;
>> >
>> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
>> > improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
>> > improvement.
>> >
>> > I think we may also be able to use a similar method to improve
>> > performance
>> > of some other numeric aggregates (like stddev). But I want to know your
>> > feedback first.
>> >
>> > Is this worth working on?
>>
>> I checked this patch and it has a interesting speedup - and a price of
>> this methoud should not be limited to numeric type only
>>
>> Pavel
>>
>> >
>> > Thanks,
>> >   -- Hadi
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>
>

Attachment

Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
Hello,

I updated the patch by taking ideas from your patch, and unifying the transition struct and update function for different aggregates. The speed of avg improved even more. It now has 60% better performance than the current committed version.

This patch optimizes numeric/int8 sum, avg, stddev_pop, stddev_samp, var_pop, var_samp.

I also noticed that this patch makes matview test fail. It seems that it just changes the ordering of rows for queries like "SELECT * FROM tv;". Does this seem like a bug in my patch, or should we add "ORDER BY" clauses to this test to make it more deterministic?

I also agree with you that adding sum functions to use preallocated buffers will make even more optimization. I'll try to see if I can find a simple way to do this.

Thanks,
  -- Hadi

On Mon, Mar 18, 2013 at 5:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

I played with sum(numeric) optimization

Now it is based on generic numeric_add function - this code is
relative old - now we can design aggregates with internal transition
buffers, and probably we can do this work more effective.

I just removed useles palloc/free operations and I got a 30% better
performance! My patch is ugly - because I used a generic add_var
function. Because Sum, Avg and almost all aggregates functions is
limited by speed of sum calculation I thing so we need a new numeric
routines optimized for calculation "sum", that use a only preallocated
buffers. A speed of numeric is more important now, because there are
more and more warehouses, where CPU is botleneck.

Regards

Pavel


2013/3/18 Hadi Moshayedi <hadi@moshayedi.net>:
> Hi Pavel,
>
> Thanks a lot for your feedback.
>
> I'll work more on this patch this week, and will send a more complete patch
> later this week.
>
> I'll also try to see how much is the speed up of this method for other
> types.
>
> Thanks,
>   -- Hadi
>
>
> On Mon, Mar 18, 2013 at 10:36 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>>
>> 2013/3/16 Hadi Moshayedi <hadi@moshayedi.net>:
>> > Revisiting:
>> > http://www.postgresql.org/message-id/45661BE7.4050205@paradise.net.nz
>> >
>> > I think the reasons which the numeric average was slow were:
>> >   (1) Using Numeric for count, which is slower than int8 to increment,
>> >   (2) Constructing/deconstructing arrays at each transition step.
>> >
>> > This is also discussed at:
>> >
>> > http://www.citusdata.com/blog/53-postgres-performance-to-avg-or-to-sum-divided-by-count
>> >
>> > So, I think we can improve the speed of numeric average by keeping the
>> > transition state as an struct in the aggregate context, and just passing
>> > the
>> > pointer to that struct from/to the aggregate transition function.
>> >
>> > The attached patch uses this method.
>> >
>> > I tested it using the data generated using:
>> > CREATE TABLE avg_test AS SELECT (random() * 999)::decimal(5,2) as d FROM
>> > generate_series(1, 10000000) s;
>> >
>> > After applying this patch, run time of "SELECT avg(d) FROM avg_test;"
>> > improves from 10.701 seconds to 5.204 seconds, which seems to be a huge
>> > improvement.
>> >
>> > I think we may also be able to use a similar method to improve
>> > performance
>> > of some other numeric aggregates (like stddev). But I want to know your
>> > feedback first.
>> >
>> > Is this worth working on?
>>
>> I checked this patch and it has a interesting speedup - and a price of
>> this methoud should not be limited to numeric type only
>>
>> Pavel
>>
>> >
>> > Thanks,
>> >   -- Hadi
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>
>

Attachment

Re: Improving avg performance for numeric

From
Kevin Grittner
Date:
Hadi Moshayedi <hadi@moshayedi.net> wrote:

> I updated the patch by taking ideas from your patch, and unifying
> the transition struct and update function for different
> aggregates. The speed of avg improved even more. It now has 60%
> better performance than the current committed version.

Outstanding!

> I also noticed that this patch makes matview test fail. It seems
> that it just changes the ordering of rows for queries like
> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
> should we add "ORDER BY" clauses to this test to make it more
> deterministic?

I added some ORDER BY clauses.  That is probably a good thing
anyway for purposes of code coverage.  Does that fix it for you?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
2013/3/19 Kevin Grittner <kgrittn@ymail.com>:
> Hadi Moshayedi <hadi@moshayedi.net> wrote:
>
>> I updated the patch by taking ideas from your patch, and unifying
>> the transition struct and update function for different
>> aggregates. The speed of avg improved even more. It now has 60%
>> better performance than the current committed version.
>
> Outstanding!

I did some tests ala  OLAP queries and I am thinking so ~ 40% speedup
for queries with AVG is realistic. Depends on other conditions.

But there are lot of situation when data are in shared buffers or file
system memory and then this patch can carry significant speedup - and
probably can be better if some better algorithm for sum two numeric
numbers in aggregate.

Regards

Pavel

>
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?
>
> I added some ORDER BY clauses.  That is probably a good thing
> anyway for purposes of code coverage.  Does that fix it for you?
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



Re: Improving avg performance for numeric

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Hadi Moshayedi <hadi@moshayedi.net> wrote:
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?

> I added some ORDER BY clauses.� That is probably a good thing
> anyway for purposes of code coverage.� Does that fix it for you?

Uh, what?  Fooling around with the implementation of avg() should surely
not change any planning decisions.
        regards, tom lane



Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
I am not sure how this works, but I also changed numeric sum(), and the views in question had a numeric sum() column. Can that have any impact?

I am going to dig deeper to see why this happens.

On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kevin Grittner <kgrittn@ymail.com> writes:
> Hadi Moshayedi <hadi@moshayedi.net> wrote:
>> I also noticed that this patch makes matview test fail. It seems
>> that it just changes the ordering of rows for queries like
>> "SELECT * FROM tv;". Does this seem like a bug in my patch, or
>> should we add "ORDER BY" clauses to this test to make it more
>> deterministic?

> I added some ORDER BY clauses.  That is probably a good thing
> anyway for purposes of code coverage.  Does that fix it for you?

Uh, what?  Fooling around with the implementation of avg() should surely
not change any planning decisions.

                        regards, tom lane

Re: Improving avg performance for numeric

From
Tom Lane
Date:
[ please do not top-reply ]

Hadi Moshayedi <hadi@moshayedi.net> writes:
> On Tue, Mar 19, 2013 at 6:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh, what?  Fooling around with the implementation of avg() should surely
>> not change any planning decisions.

> I am not sure how this works, but I also changed numeric sum(), and the
> views in question had a numeric sum() column. Can that have any impact?

[ looks at patch... ]  Oh, I see what's affecting the plan: you changed
the aggtranstypes to internal for a bunch of aggregates.  That's not
very good, because right now the planner takes that to mean that the
aggregate could eat a lot of space.  We don't want that to happen for
these aggregates, I think.
        regards, tom lane



Re: Improving avg performance for numeric

From
Tom Lane
Date:
I wrote:
> [ looks at patch... ]  Oh, I see what's affecting the plan: you changed
> the aggtranstypes to internal for a bunch of aggregates.  That's not
> very good, because right now the planner takes that to mean that the
> aggregate could eat a lot of space.  We don't want that to happen for
> these aggregates, I think.

After thinking about that for awhile: if we pursue this type of
optimization, what would probably be appropriate is to add an aggregate
property (stored in pg_aggregate) that allows direct specification of
the size that the planner should assume for the aggregate's transition
value.  We were getting away with a hardwired assumption of 8K for
"internal" because the existing aggregates that used that transtype all
had similar properties, but it was always really a band-aid not a proper
solution.  A per-aggregate override could be useful in other cases too.

This was looking like 9.4 material already, but adding such a property
would definitely put it over the top of what we could think about
squeezing into 9.3, IMO.
        regards, tom lane



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

2013/3/19 Tom Lane <tgl@sss.pgh.pa.us>:
> I wrote:
>> [ looks at patch... ]  Oh, I see what's affecting the plan: you changed
>> the aggtranstypes to internal for a bunch of aggregates.  That's not
>> very good, because right now the planner takes that to mean that the
>> aggregate could eat a lot of space.  We don't want that to happen for
>> these aggregates, I think.
>
> After thinking about that for awhile: if we pursue this type of
> optimization, what would probably be appropriate is to add an aggregate
> property (stored in pg_aggregate) that allows direct specification of
> the size that the planner should assume for the aggregate's transition
> value.  We were getting away with a hardwired assumption of 8K for
> "internal" because the existing aggregates that used that transtype all
> had similar properties, but it was always really a band-aid not a proper
> solution.  A per-aggregate override could be useful in other cases too.
>
> This was looking like 9.4 material already, but adding such a property
> would definitely put it over the top of what we could think about
> squeezing into 9.3, IMO.
>

Postgres is not a "in memory" OLAP database, but lot of companies use
it for OLAP queries due pg comfortable usage. This feature can be very
interesting for these users - and can introduce interesting speedup
with relative low price.

Regards

Pavel

>                         regards, tom lane



Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
Hi Tom,

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After thinking about that for awhile: if we pursue this type of
> optimization, what would probably be appropriate is to add an aggregate
> property (stored in pg_aggregate) that allows direct specification of
> the size that the planner should assume for the aggregate's transition
> value.  We were getting away with a hardwired assumption of 8K for
> "internal" because the existing aggregates that used that transtype all
> had similar properties, but it was always really a band-aid not a proper
> solution.  A per-aggregate override could be useful in other cases too.

Cool.

I created a patch which adds an aggregate property to pg_aggregate, so
the transition space is can be overridden. This patch doesn't contain
the numeric optimizations. It uses "0" (meaning not-set) for all
existing aggregates.

I manual-tested it a bit, by changing this value for aggregates and
observing the changes in plan. I also updated some docs and pg_dump.

Does this look like something along the lines of what you meant?

Thanks,
  -- Hadi

Attachment

Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

2013/3/20 Hadi Moshayedi <hadi@moshayedi.net>:
> Hi Tom,
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After thinking about that for awhile: if we pursue this type of
>> optimization, what would probably be appropriate is to add an aggregate
>> property (stored in pg_aggregate) that allows direct specification of
>> the size that the planner should assume for the aggregate's transition
>> value.  We were getting away with a hardwired assumption of 8K for
>> "internal" because the existing aggregates that used that transtype all
>> had similar properties, but it was always really a band-aid not a proper
>> solution.  A per-aggregate override could be useful in other cases too.
>
> Cool.
>
> I created a patch which adds an aggregate property to pg_aggregate, so
> the transition space is can be overridden. This patch doesn't contain
> the numeric optimizations. It uses "0" (meaning not-set) for all
> existing aggregates.
>
> I manual-tested it a bit, by changing this value for aggregates and
> observing the changes in plan. I also updated some docs and pg_dump.
>
> Does this look like something along the lines of what you meant?

please, can you subscribe your patch to next commitfest?

I tested this patch, and it increase performance about 20% what is
interesting. More - it allows more comfortable custom aggregates for
custom types with better hash agg support.

Regards

Pavel

>
> Thanks,
>   -- Hadi



Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
I am attaching the updated the patch, which also fixes a bug which
caused one of the regression tests failed.

I'll subscribe this patch to the commitfest in the next hour.

Can you please review the patch?

Thanks,
   -- Hadi

Attachment

Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
2013/7/8 Hadi Moshayedi <hadi@moshayedi.net>:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
>
> I'll subscribe this patch to the commitfest in the next hour.
>
> Can you please review the patch?

sure, :)

Pavel

>
> Thanks,
>    -- Hadi



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

I am testing your code, and It increase speed of sum about 24% faster
then original implementation.

But I am surprise of AVG speed - it should have same speed like sum in
new implementation, but it is 2x slower, than sum - what signalize
some strange and there is used wrong transition function

I am sending fixed version

postgres=# create table bubu(a int, b float, c numeric);
CREATE TABLE
postgres=# insert into bubu select i, i+1, i+1.122 from
generate_series(1,1000000) g(i);
INSERT 0 1000000

After fixing a speed of sum and avg for numeric is similar

postgres=# select avg(c) from bubu;
         avg
---------------------
 500001.622000000000
(1 row)

Time: 228.483 ms
postgres=# select sum(c) from bubu;
       sum
------------------
 500001622000.000
(1 row)

Time: 222.791 ms

Regards

Pavel





2013/7/8 Hadi Moshayedi <hadi@moshayedi.net>:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
>
> I'll subscribe this patch to the commitfest in the next hour.
>
> Can you please review the patch?
>
> Thanks,
>    -- Hadi

Attachment

Re: Improving avg performance for numeric

From
Josh Berkus
Date:
On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
> I am attaching the updated the patch, which also fixes a bug which
> caused one of the regression tests failed.
> 
> I'll subscribe this patch to the commitfest in the next hour.
> 
> Can you please review the patch?

I'm afraid that, since this patch wasn't included anywhere near the
first week of the CommitFest, I can't possibly include it in the June
commitfest now.  Accordingly, I have moved it to the September
commitfest.  Hopefully someone can look at it before then.

Sorry for missing this in my "patch sweep" at the beginning of the CF.
Searching email for patches is, at best, inexact.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

2013/7/8 Josh Berkus <josh@agliodbs.com>:
> On 07/07/2013 09:14 PM, Hadi Moshayedi wrote:
>> I am attaching the updated the patch, which also fixes a bug which
>> caused one of the regression tests failed.
>>
>> I'll subscribe this patch to the commitfest in the next hour.
>>
>> Can you please review the patch?
>
> I'm afraid that, since this patch wasn't included anywhere near the
> first week of the CommitFest, I can't possibly include it in the June
> commitfest now.  Accordingly, I have moved it to the September
> commitfest.  Hopefully someone can look at it before then.
>
> Sorry for missing this in my "patch sweep" at the beginning of the CF.
> Searching email for patches is, at best, inexact.
>

sure, it should be in September CF. It is relative simple patch
without global impacts. But I like it, it increase speed for
sum(numeric) about 25% and avg(numeric) about 50%

Regards

Pavel

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com



Re: Improving avg performance for numeric

From
Josh Berkus
Date:
Pavel,

> 
> sure, it should be in September CF. It is relative simple patch
> without global impacts. But I like it, it increase speed for
> sum(numeric) about 25% and avg(numeric) about 50%

Do you think you could give this a review after CF1 ends, but before
September?  I hate to make Hadi wait just because I didn't see his patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
2013/7/8 Josh Berkus <josh@agliodbs.com>:
> Pavel,
>
>>
>> sure, it should be in September CF. It is relative simple patch
>> without global impacts. But I like it, it increase speed for
>> sum(numeric) about 25% and avg(numeric) about 50%
>
> Do you think you could give this a review after CF1 ends, but before
> September?  I hate to make Hadi wait just because I didn't see his patch.

yes, I can.

Regards

Pavel


>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello

here is a rebased patch. Hadi, please, can verify this version?

Regards

Pavel

p.s. Performance tests

postgres=# create table foo(a int, b float, c double precision, d numeric, gr int);
CREATE TABLE
postgres=#
postgres=# insert into foo select 1, 2.0, 3.0, 3.14, random()*10000 from generate_series(1,10000000);

postgres=# \d foo
          Table "public.foo"
 Column |       Type       | Modifiers
--------+------------------+-----------
 a      | integer          |
 b      | double precision |
 c      | double precision |
 d      | numeric          |
 gr     | integer          |


set work_mem to '2MB';

postgres=# show debug_assertions;
 debug_assertions
------------------
 off
(1 row)



postgres=# explain (analyze, timing off) select sum(a) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=4) (actual rows=10000000 loops=1)
 Total runtime: 1210.321 ms (1195.117 ms) -- patched (original)
(3 rows)

Time: 1210.709 ms
postgres=# explain (analyze, timing off) select sum(a) from foo group by gr;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233431.71 rows=9984 width=8) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 2923.987 ms (2952.292 ms)
(3 rows)

Time: 2924.384 ms

postgres=# explain (analyze, timing off) select avg(a) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=4) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=4) (actual rows=10000000 loops=1)
 Total runtime: 1331.627 ms (1312.140 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(a) from foo group by gr;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233456.67 rows=9984 width=8) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 3139.296 ms (3079.479 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 1327.841 ms (1339.214 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(b) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233431.71 rows=9984 width=12) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=12) (actual rows=10000000 loops=1)
 Total runtime: 3047.893 ms (3095.591 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 1454.665 ms (1471.413 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(b) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233456.67 rows=9984 width=12) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=12) (actual rows=10000000 loops=1)
 Total runtime: 3282.838 ms (3187.157 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 1348.555 ms (1364.585 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(c) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233431.71 rows=9984 width=12) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=12) (actual rows=10000000 loops=1)
 Total runtime: 3028.663 ms (3069.710 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(c) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=8) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=8) (actual rows=10000000 loops=1)
 Total runtime: 1488.980 ms (1463.813 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(c) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233456.67 rows=9984 width=12) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=12) (actual rows=10000000 loops=1)
 Total runtime: 3252.972 ms (3149.986 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(d) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=7) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=7) (actual rows=10000000 loops=1)
 Total runtime: 2301.769 ms (2784.430 ms)
(3 rows)

postgres=# explain (analyze, timing off) select sum(d) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233456.67 rows=9984 width=11) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=11) (actual rows=10000000 loops=1)
 Total runtime: 4189.272 ms (4440.335 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(d) from foo;
                                            QUERY PLAN                                            
---------------------------------------------------------------------------------------------------
 Aggregate  (cost=208332.23..208332.24 rows=1 width=7) (actual rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=7) (actual rows=10000000 loops=1)
 Total runtime: 2308.493 ms (5195.970 ms)
(3 rows)

postgres=# explain (analyze, timing off) select avg(d) from foo group by gr;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 HashAggregate  (cost=233331.87..233456.67 rows=9984 width=11) (actual rows=10001 loops=1)
   ->  Seq Scan on foo  (cost=0.00..183332.58 rows=9999858 width=11) (actual rows=10000000 loops=1)
 Total runtime: 4179.978 ms (6828.398 ms)
(3 rows)


int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error .. cleaner code
numeric sum 6490 ms (7224 ms) --  10% faster
numeric avg 6487 ms (12023 ms) -- 46% faster





2013/8/22 Hadi Moshayedi <hadi@moshayedi.net>
Hello Pavel,

> > Do you think you could give this a review after CF1 ends, but before
> > September?  I hate to make Hadi wait just because I didn't see his patch.
>
> yes, I can.

When do you think you will have time to review this patch?

Thanks,
  -- Hadi

Attachment

Re: Improving avg performance for numeric

From
Hadi Moshayedi
Date:
Hello,

> int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
> cleaner code
> numeric sum 6490 ms (7224 ms) --  10% faster
> numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

--Hadi



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:



2013/8/29 Hadi Moshayedi <hadi@moshayedi.net>
Hello,

> int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
> cleaner code
> numeric sum 6490 ms (7224 ms) --  10% faster
> numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

I found a few bugs, that I fixed. There was a issue with empty sets. Other issues I didn't find.

Regards

Pavel
 

--Hadi

Re: Improving avg performance for numeric

From
Peter Eisentraut
Date:
On 7/8/13 10:05 AM, Pavel Stehule wrote:
> I am testing your code, and It increase speed of sum about 24% faster
> then original implementation.

This patch needs to be rebased (and/or the later version registered in
the commit fest).




Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">2013/9/4 Peter Eisentraut <span
dir="ltr"><<ahref="mailto:peter_e@gmx.net" target="_blank">peter_e@gmx.net</a>></span><br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 7/8/13
10:05AM, Pavel Stehule wrote:<br /> > I am testing your code, and It increase speed of sum about 24% faster<br />
>then original implementation.<br /><br /></div>This patch needs to be rebased (and/or the later version registered
in<br/> the commit fest).<br /><br /></blockquote></div><br /></div><div class="gmail_extra">I updated a commit fest
info<br/><br /></div><div class="gmail_extra">Regards<br /><br /></div><div class="gmail_extra">Pavel <br
/></div></div>

Re: Improving avg performance for numeric

From
Peter Eisentraut
Date:
On 9/4/13 2:26 PM, Pavel Stehule wrote:
> 
> 
> 
> 2013/9/4 Peter Eisentraut <peter_e@gmx.net <mailto:peter_e@gmx.net>>
> 
>     On 7/8/13 10:05 AM, Pavel Stehule wrote:
>     > I am testing your code, and It increase speed of sum about 24% faster
>     > then original implementation.
> 
>     This patch needs to be rebased (and/or the later version registered in
>     the commit fest).
> 
> 
> I updated a commit fest info

The new patch also needs to be rebased.






Re: Improving avg performance for numeric

From
Pavel Stehule
Date:



2013/9/4 Peter Eisentraut <peter_e@gmx.net>
On 9/4/13 2:26 PM, Pavel Stehule wrote:
>
>
>
> 2013/9/4 Peter Eisentraut <peter_e@gmx.net <mailto:peter_e@gmx.net>>
>
>     On 7/8/13 10:05 AM, Pavel Stehule wrote:
>     > I am testing your code, and It increase speed of sum about 24% faster
>     > then original implementation.
>
>     This patch needs to be rebased (and/or the later version registered in
>     the commit fest).
>
>
> I updated a commit fest info

The new patch also needs to be rebased.



rebased


Attachment

Re: Improving avg performance for numeric

From
Tomas Vondra
Date:
Hi,

I've reviewed the v6 of the "numeric optimize" patch
(http://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=EKFSbOAWeA@mail.gmail.com),
as Pavel did some hacking on the patch and asked me to do the review.

The patch seems fine to me, the following comments are mostly nitpicking:

1) Applies cleanly to the HEAD (although only by "patch" and not "git
apply").

2) I think we should use "estimate" instead of "approximation" in the
docs, it seems more correct / natural to me (but maybe I'm wrong on this
one).

3) state_data_size does not make much sense to me - it should be
state_size. This probably comes from the state_data_type, but that's
('state' + 'data type') and by replacing the second part with 'size'
you'll get state_size.

4) currently the state size may be specified for all data types, no
matter if their size is fixed or variable. Wouldn't it be safer to allow
this option only for variable-length data types? Right now you can
specify stype=int and sspace=200 which does not make much sense to me.
We can always relax the restrictions if needed, the opposite is much
more difficult.

5) in numeric.c, there are no comments for  - fields of the NumericAggState (some are obvious, some are not)  -
multiplefunctions are missing the initial comments (e.g.
 
numeric_accum, do_numeric_accum)

6) I think the "first" variable in do_numeric_accum is rather useless,
it's trivial to remove it - just use the state->first instead and move
the assignment to the proper branch (ok, this is nitpicking).

7) I think the error message in makeNumericAggState should be something 
like "This must not be called in non-aggregate context!" or something 
like that.

8) The records in pg_aggregate.c are using either 0 (for fixed-length) 
or 128. This seems slightly excessive to me. What is the reasoning 
behind this? Is that because of the two NumericVar fields?

regards
Tomas



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:
Hello


2013/9/22 Tomas Vondra <tv@fuzzy.cz>
Hi,

I've reviewed the v6 of the "numeric optimize" patch
(http://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=EKFSbOAWeA@mail.gmail.com),
as Pavel did some hacking on the patch and asked me to do the review.

The patch seems fine to me, the following comments are mostly nitpicking:

1) Applies cleanly to the HEAD (although only by "patch" and not "git
apply").

2) I think we should use "estimate" instead of "approximation" in the
docs, it seems more correct / natural to me (but maybe I'm wrong on this
one).

3) state_data_size does not make much sense to me - it should be
state_size. This probably comes from the state_data_type, but that's
('state' + 'data type') and by replacing the second part with 'size'
you'll get state_size.

This name is consistent with previous field state_data_type - I expected so this mean 'state data' + 'type'. I am not native speaker, so my position is not strong, but in this moment I am thinking so state_data_size has a sense.  In this case both variant has sense - 'state data' + type or 'state' + 'data type'.
 

4) currently the state size may be specified for all data types, no
matter if their size is fixed or variable. Wouldn't it be safer to allow
this option only for variable-length data types? Right now you can
specify stype=int and sspace=200 which does not make much sense to me.
We can always relax the restrictions if needed, the opposite is much
more difficult.


good idea
 
5) in numeric.c, there are no comments for
  - fields of the NumericAggState (some are obvious, some are not)
  - multiple functions are missing the initial comments (e.g.
numeric_accum, do_numeric_accum)

ok

6) I think the "first" variable in do_numeric_accum is rather useless,
it's trivial to remove it - just use the state->first instead and move
the assignment to the proper branch (ok, this is nitpicking).

ok
 

7) I think the error message in makeNumericAggState should be something like "This must not be called in non-aggregate context!" or something like that.

I used a "a numeric aggregate function called in non-aggregate context" - it is similar to other related messages
 

8) The records in pg_aggregate.c are using either 0 (for fixed-length) or 128. This seems slightly excessive to me. What is the reasoning behind this? Is that because of the two NumericVar fields?

NumericAggState has 96 bytes - but you have to add a space for digits of included numeric values (inclued in NumericVar) -- so it is others 16 + 16 = 128. I am not able to specify how much digits will be used exactly - 16 bytes is just good enough estimation - it is not used for memory allocation, it is used for some planner magic.
 

regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Improving avg performance for numeric

From
"Tomas Vondra"
Date:
On 23 Září 2013, 18:18, Pavel Stehule wrote:
> Hello
>
>
> 2013/9/22 Tomas Vondra <tv@fuzzy.cz>
>
>> Hi,
>>
>> I've reviewed the v6 of the "numeric optimize" patch
>> (http://www.postgresql.org/**message-id/**CAFj8pRDQhG7Pqmf8XqXY0PnHfakkP**
>>
QLPHnoRLJ_=EKFSbOAWeA@mail.**gmail.com<http://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=EKFSbOAWeA@mail.gmail.com>
>> ),
>> as Pavel did some hacking on the patch and asked me to do the review.
>>
>> The patch seems fine to me, the following comments are mostly
>> nitpicking:
>>
>> 1) Applies cleanly to the HEAD (although only by "patch" and not "git
>> apply").
>>
>> 2) I think we should use "estimate" instead of "approximation" in the
>> docs, it seems more correct / natural to me (but maybe I'm wrong on this
>> one).
>>
>> 3) state_data_size does not make much sense to me - it should be
>> state_size. This probably comes from the state_data_type, but that's
>> ('state' + 'data type') and by replacing the second part with 'size'
>> you'll get state_size.
>>
>
> This name is consistent with previous field state_data_type - I expected
> so
> this mean 'state data' + 'type'. I am not native speaker, so my position
> is
> not strong, but in this moment I am thinking so state_data_size has a
> sense.  In this case both variant has sense - 'state data' + type or
> 'state' + 'data type'.

OK, let's leave this up to a native speaker.

>>
>> 8) The records in pg_aggregate.c are using either 0 (for fixed-length)
>> or
>> 128. This seems slightly excessive to me. What is the reasoning behind
>> this? Is that because of the two NumericVar fields?
>>
>
> NumericAggState has 96 bytes - but you have to add a space for digits of
> included numeric values (inclued in NumericVar) -- so it is others 16 + 16
> = 128. I am not able to specify how much digits will be used exactly - 16
> bytes is just good enough estimation - it is not used for memory
> allocation, it is used for some planner magic.

OK, makes sense.

I've made some basic tests on a 40M table with random numerics, for
example this query:
  select avg(val), sum(val), var_pop(val) from numeric_test ;

takes ~57 seconds on current master, but only ~26 seconds with the v7
patch. Granted, in practice the improvements won't be as good because of
I/O costs etc., but it's a nice gain.

Seems "ready for commiter" to me. I'll wait a few days for others to
comment, and then I'll update the commitfest page.

regards
Tomas




Re: Improving avg performance for numeric

From
Robert Haas
Date:
On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Seems "ready for commiter" to me. I'll wait a few days for others to
> comment, and then I'll update the commitfest page.

Some thoughts:

The documentation doesn't reflect the restriction to type internal.
On a related note, why restrict this to type internal?

Formatting fixes are needed:

+               if (aggtransspace > 0)
+               {
+                       costs->transitionSpace += aggtransspace;
+               }

Project style is not to use curly-braces for single statements.  Also,
the changes to numeric.c add blank lines before and after function
header comments, which is not the usual style.

!       if (state == NULL)
!               PG_RETURN_NULL();
!       else
!               PG_RETURN_POINTER(state);

I think this should just say PG_RETURN_POINTER(state).  PG_RETURN_NULL
is for returning an SQL NULL, not (void *) 0.  Is there some reason
why we need an SQL NULL here, rather than a NULL pointer?

On the whole this looks fairly solid on a first read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improving avg performance for numeric

From
Pavel Stehule
Date:



2013/9/24 Robert Haas <robertmhaas@gmail.com>
On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Seems "ready for commiter" to me. I'll wait a few days for others to
> comment, and then I'll update the commitfest page.

Some thoughts:

The documentation doesn't reflect the restriction to type internal.
On a related note, why restrict this to type internal?


Now, for almost all types Postgres well estimate size of state value. Only arrays with unknown size can be a different. When we enable this value for all types, then users can specify some bad values for scalar buildin types. Next argument is simply and bad - I don't see a good use case for customization this value for other than types than internal type.

I have no strong position here - prefer joining with internal type due little bit higher robustness.
 

Formatting fixes are needed:

+               if (aggtransspace > 0)
+               {
+                       costs->transitionSpace += aggtransspace;
+               }

 
Project style is not to use curly-braces for single statements.  Also,
the changes to numeric.c add blank lines before and after function
header comments, which is not the usual style.

!       if (state == NULL)
!               PG_RETURN_NULL();
!       else
!               PG_RETURN_POINTER(state);

I think this should just say PG_RETURN_POINTER(state).  PG_RETURN_NULL
is for returning an SQL NULL, not (void *) 0.  Is there some reason
why we need an SQL NULL here, rather than a NULL pointer?

fixed
 

On the whole this looks fairly solid on a first read-through.

Thank you

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Improving avg performance for numeric

From
Tomas Vondra
Date:
On 24.9.2013 17:57, Pavel Stehule wrote:
> 
> 
> 
> 2013/9/24 Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>>
> 
> On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra <tv@fuzzy.cz 
> <mailto:tv@fuzzy.cz>> wrote:
>> Seems "ready for commiter" to me. I'll wait a few days for others
>> to comment, and then I'll update the commitfest page.
> 
> Some thoughts:
> 
> The documentation doesn't reflect the restriction to type internal. 
> On a related note, why restrict this to type internal?
> 
> 
> 
> Now, for almost all types Postgres well estimate size of state
> value. Only arrays with unknown size can be a different. When we
> enable this value for all types, then users can specify some bad
> values for scalar buildin types. Next argument is simply and bad - I
> don't see a good use case for customization this value for other than
> types than internal type.
> 
> I have no strong position here - prefer joining with internal type
> due little bit higher robustness.

I share this oppinion. I was not able to come up with a single use case
benefiting from allowing this for types other than internal. Seems like
a footgun to me, with no potential benefit.

So +1 to keeping the patch 'type internal only' from me.

With the formatting issues now fixed, I believe the patch is ready for
committer (and someone already switched it to that state).

Tomas



Re: Improving avg performance for numeric

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ numeric-optimize-v8.patch ]

I've committed this with some adjustments.  Aside from cosmetic
changes and documentation/comment improvements:

* I don't agree at all with the claim that aggtransspace is only useful
for INTERNAL transition data.  There are lots of cases with regular
SQL types where the planner doesn't have a good idea of how large the
transition value will be, and with the current code it tends to guess
on the small side (frequently 32 bytes :-().  It seems to me to be
useful to give users a knob to twiddle there.  Consider for example
an aggregate that uses an integer array as transition state; the author
of the aggregate might know that there are usually hundreds of entries
in the array, and wish to dial up aggtransspace to prevent the planner
from optimistically choosing hash aggregation.

As committed, I just took out the restriction in CREATE AGGREGATE
altogether; it will let you set SSPACE for any transition datatype.
The planner will ignore it for pass-by-value types, where the behavior
is known, but otherwise honor it.  It's possible that we should put in
a restriction to INTERNAL plus pass-by-reference types, or even INTERNAL
plus pass-by-reference variable-length types, but I can't get excited
about that.  The error message would be too confusing I think.

* There was a stray "else" added to clauses.c that disabled space
accounting for polymorphic aggregates.

* I simplified the summing code in do_numeric_accum; the copying and
reallocating it was doing was not only unnecessary but contained
unpleasant assumptions about what you can do with a NumericVar.  This
probably makes the committed patch a bit faster than what was submitted,
but I didn't try to benchmark the change.

* I made sure all the functions accessed their input state pointer with    state = PG_ARGISNULL(0) ? NULL :
(NumericAggState*) PG_GETARG_POINTER(0);
 
There were places that omitted the ARGISNULL test, and thus only happened
to work if the uninitialized Datum value they got was zero.  While that
might chance to be true in the current state of the code, it's not an
assumption you're supposed to make.

* numeric sum/avg failed to check for NaN inputs.

* I fixed incorrect tests in the code added to pg_dump.
        regards, tom lane