Re: [PATCHES] Avg performance for int8/numeric - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: [PATCHES] Avg performance for int8/numeric
Date
Msg-id 456A1076.6080400@paradise.net.nz
Whole thread Raw
In response to Re: [PATCHES] Avg performance for int8/numeric  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCHES] Avg performance for int8/numeric  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> On Sat, 2006-11-25 at 18:57 +1300, Mark Kirkwood wrote:
>>> Also Neil suggested investigating using a single composite type
>>> {int8, 
>>> numeric} for the {N,sum(X)} transition values. This could well be a 
>>> faster way to do this (not sure how to make it work yet... but it
>>> sounds 
>>> promising...).
> 
>> If that is true it implies that any fixed length array is more expensive
>> than using a composite type.
> 
> Not sure how you derived that conclusion from this statement, but it
> doesn't appear to me to follow at all.  The reason for Neil's suggestion
> was to avoid using numeric arithmetic to run a simple counter, and the
> reason that this array stuff is expensive is that the array *components*
> are variable-length, which is something that no amount of array
> redesigning will eliminate.
> 

Here is what I think the major contributors to the time spent in avg are:

1/ maintaining sum of squares in the transition array ~ 33%
2/ calling numeric_inc on a numeric counter ~ 10%
3/ deconstruct/construct array of 2 numerics ~ 16%

I derived these by constructing a (deliberately inefficient) version of 
sum that used an array of numerics and calculated extra stuff in its 
transaction array, and then started removing code a bit at a time to see 
what happened (I'm sure there are smarter ways... but this worked ok...).

The current patch does 1/, and doing a composite type of {int8, numeric} 
would let us use a an int8 counter instead of numeric, which would 
pretty much sort out 2/.

The array cost is more tricky - as Tom mentioned the issue is related to 
the variable length nature of the array components, so just changing to 
a composite type may not in itself save any of the (so-called) 'array 
cost'. Having said that - the profiles suggest that we are perhaps doing 
a whole lot more alloc'ing (i.e copying? detoasting?) of memory for 
numerics than perhaps we need... I'm not sure how deeply buried the 
decision about alloc'ing is being made, so doing anything about this may 
be hard.

It looks to me like trying out a composite type is the next obvious step 
to do, and then (once I've figured out how so that) we can check its 
performance again!

Cheers

Mark


pgsql-hackers by date:

Previous
From: Michael Paesold
Date:
Subject: Missing ParameterStatus for backslash_quote
Next
From: Jim Nasby
Date:
Subject: Re: XA support (distributed transactions)