Re: Built-in binning functions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Built-in binning functions |
Date | |
Msg-id | 17571.1409517238@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Built-in binning functions (Petr Jelinek <petr@2ndquadrant.com>) |
Responses |
Re: Built-in binning functions
|
List | pgsql-hackers |
Petr Jelinek <petr@2ndquadrant.com> writes: > On 30/08/14 19:24, Tom Lane wrote: >> I wasn't terribly happy about that either. I still think we should >> reduce this to a single polymorphic function, as in the attached. > I did try to write generic function very similar to what you wrote but > discarded it because of the performance reasons. If we really want to > support any data type I am all for having generic function but I still > would like to have one optimized for float8 because that seems to be the > most used use-case (at least that one is the reason why I even wrote > this) for performance reasons. Well, part of the reason why your v3 float8 function looks so fast is that it's cheating: it will not produce the right answers for comparisons involving NaNs. I'm not sure how good it would look once you'd added some isnan() tests to make the behavior actually equivalent to float8_cmp_internal(). However, assuming it still seems worthwhile once that's accounted for, I don't have a strong objection to having an additional code path for float8. There are two ways we could do that: 1. Add a runtime test in the polymorphic function, eg if (element_type == FLOAT8OID) result = width_bucket_float8(...);else if (typentry->typlen > 0) result = width_bucket_fixed(...);else result = width_bucket_variable(...); 2. Define a SQL function width_bucket(float8, float8[]) alongside the polymorphic one. While #2 might be marginally faster at runtime, the main difference between these is what the parser would choose to do with ambiguous cases. I experimented a bit and it seemed that the parser would prefer the float8 function in any situation where the inputs were of numeric types, probably because float8 is the preferred type in the numeric category. I'm not real sure that would be a good thing: in particular it would cast integer and numeric inputs to float8, which risks precision loss, and there doesn't seem to be any way to get the parser to make the other choice if the inputs are of numeric-category types. As against that objection, it would make cross-type numeric cases easier to use, for example "width_bucket(1, array[2.4])" would be accepted. If we just offer the anyelement/anyarray version then the parser would make you cast "1" to numeric before it would consider the function to match. On balance the runtime-test approach looks like a better idea, in that it doesn't risk any unexpected semantic behaviors. > The difference between my generic and Tom's generic is because Tom's is > slowed down by the deconstruct_array. Meh. It looked to me like your version would have O(N^2) performance problems from computing array offsets repeatedly, depending on exactly which array element it ended up on. It would avoid repeat calculations only if it always moved right. regards, tom lane
pgsql-hackers by date: