Re: Built-in binning functions - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: Built-in binning functions
Date
Msg-id 5403A8DF.6030008@2ndquadrant.com
Whole thread Raw
In response to Re: Built-in binning functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Built-in binning functions
List pgsql-hackers
On 31/08/14 22:33, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> On 30/08/14 19:24, Tom Lane wrote:
>>> I wasn't terribly happy about that either.  I still think we should
>>> reduce this to a single polymorphic function, as in the attached.
>
>> I did try to write generic function very similar to what you wrote but
>> discarded it because of the performance reasons. If we really want to
>> support any data type I am all for having generic function but I still
>> would like to have one optimized for float8 because that seems to be the
>> most used use-case (at least that one is the reason why I even wrote
>> this) for performance reasons.
>
> Well, part of the reason why your v3 float8 function looks so fast is that
> it's cheating: it will not produce the right answers for comparisons
> involving NaNs.  I'm not sure how good it would look once you'd added
> some isnan() tests to make the behavior actually equivalent to
> float8_cmp_internal().
>

True, it increases the time of the test to around 285ms, still almost 
30% speed difference.

> However, assuming it still seems worthwhile once that's accounted for,
> I don't have a strong objection to having an additional code path for
> float8.  There are two ways we could do that:
>
> 1. Add a runtime test in the polymorphic function, eg
>
>     if (element_type == FLOAT8OID)
>         result = width_bucket_float8(...);
>     else if (typentry->typlen > 0)
>         result = width_bucket_fixed(...);
>     else
>         result = width_bucket_variable(...);
>

Yeah I think I prefer this one too, will see how much performance it eats.

>
>> The difference between my generic and Tom's generic is because Tom's is
>> slowed down by the deconstruct_array.
>
> Meh.  It looked to me like your version would have O(N^2) performance
> problems from computing array offsets repeatedly, depending on exactly
> which array element it ended up on.  It would avoid repeat calculations
> only if it always moved right.
>

I actually think that worst case (when you go always left) for my 
version is O(N) since you only need to seek for the half of previous 
interval (it's doing binary search after all) and you do O(N) in the 
deconstruct_array. It would be very different if we could cache the 
array somehow (ie, if this was an aggregate) then it would obviously 
make a lot of sense to use deconstruct_array and in that case it would 
make even sense to resort probably, but sadly we can't cache afaik.

Also, I made more tests with various array sizes (3-10000) and 
distributions and mine version was never slower.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Built-in binning functions
Next
From: David G Johnston
Date:
Subject: Re: Built-in binning functions