Home > mailing lists

Re: Built-in binning functions - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Built-in binning functions
Date	August 31, 2014 23:34:10
Msg-id	17571.1409517238@sss.pgh.pa.us Whole thread Raw
In response to	Re: Built-in binning functions (Petr Jelinek <petr@2ndquadrant.com>)
Responses	Re: Built-in binning functions
List	pgsql-hackers

Tree view

Petr Jelinek <petr@2ndquadrant.com> writes:
> On 30/08/14 19:24, Tom Lane wrote:
>> I wasn't terribly happy about that either.  I still think we should
>> reduce this to a single polymorphic function, as in the attached.

> I did try to write generic function very similar to what you wrote but 
> discarded it because of the performance reasons. If we really want to 
> support any data type I am all for having generic function but I still 
> would like to have one optimized for float8 because that seems to be the 
> most used use-case (at least that one is the reason why I even wrote 
> this) for performance reasons.

Well, part of the reason why your v3 float8 function looks so fast is that
it's cheating: it will not produce the right answers for comparisons
involving NaNs.  I'm not sure how good it would look once you'd added
some isnan() tests to make the behavior actually equivalent to
float8_cmp_internal().

However, assuming it still seems worthwhile once that's accounted for,
I don't have a strong objection to having an additional code path for
float8.  There are two ways we could do that:

1. Add a runtime test in the polymorphic function, eg
if (element_type == FLOAT8OID)    result = width_bucket_float8(...);else if (typentry->typlen > 0)    result =
width_bucket_fixed(...);else   result = width_bucket_variable(...);

2. Define a SQL function width_bucket(float8, float8[]) alongside
the polymorphic one.

While #2 might be marginally faster at runtime, the main difference
between these is what the parser would choose to do with ambiguous cases.

I experimented a bit and it seemed that the parser would prefer the
float8 function in any situation where the inputs were of numeric types,
probably because float8 is the preferred type in the numeric category.
I'm not real sure that would be a good thing: in particular it would
cast integer and numeric inputs to float8, which risks precision loss,
and there doesn't seem to be any way to get the parser to make the
other choice if the inputs are of numeric-category types.

As against that objection, it would make cross-type numeric cases easier
to use, for example "width_bucket(1, array[2.4])" would be accepted.
If we just offer the anyelement/anyarray version then the parser would
make you cast "1" to numeric before it would consider the function to
match.

On balance the runtime-test approach looks like a better idea, in that
it doesn't risk any unexpected semantic behaviors.

> The difference between my generic and Tom's generic is because Tom's is 
> slowed down by the deconstruct_array.

Meh.  It looked to me like your version would have O(N^2) performance
problems from computing array offsets repeatedly, depending on exactly
which array element it ended up on.  It would avoid repeat calculations
only if it always moved right.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 31 August 2014, 23:03:42
Subject: Re: On partitioning

From: Hannu Krosing
Date: 31 August 2014, 23:38:25
Subject: Re: On partitioning

Re: Built-in binning functions - Mailing list pgsql-hackers

Previous

Next