Re: Built-in binning functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Built-in binning functions
Date
Msg-id 14915.1404778444@sss.pgh.pa.us
Whole thread Raw
In response to Built-in binning functions  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: Built-in binning functions
Re: Built-in binning functions
List pgsql-hackers
Petr Jelinek <petr@2ndquadrant.com> writes:
> here is a patch implementing varwidth_bucket (naming is up for 
> discussion) function which does binning with variable bucket width.

I didn't see any discussion of the naming question in this thread.
I'd like to propose that it should be just "width_bucket()"; we can
easily determine which function is meant, considering that the
SQL-spec variants don't take arrays and don't even have the same
number of actual arguments.

Also, the set of functions provided still needs more thought, because the
resolution of overloaded functions doesn't really work as nicely as all
that.  I illustrate:

regression=# create function myf(int8) returns int as 'select 0' language sql;
CREATE FUNCTION
regression=# create function myf(float8) returns int as 'select 1' language sql; 
CREATE FUNCTION
regression=# create function myf(anyelement) returns int as 'select 2' language sql;
CREATE FUNCTION
regression=# select myf(1);myf 
-----  1
(1 row)

So given plain integer arguments, we'll invoke the float8 version not the
int8 version, which may be undesirable.  The same for int2 arguments.
We could fix that by adding bespoke int4 and maybe int2 variants, but
TBH, I'm not sure that the specific-type functions are worth the trouble.
Maybe we should just have one generic function, and take the trouble to
optimize its array-subscripting calculations for either fixed-length or
variable-length array elements?  Have you got performance measurements
demonstrating that multiple implementations really buy enough to justify
the extra code?

Also, I'm not convinced by this business of throwing an error for a
NULL array element.  Per spec, null arguments to width_bucket()
produce a null result, not an error --- shouldn't this flavor act
similarly?  In any case I think the test needs to use
array_contains_nulls() not just ARR_HASNULL.

Lastly, the spec defines behaviors for width_bucket that allow either
ascending or descending buckets.  We could possibly do something similar
in these functions by initially comparing the two endpoint elements to see
which one is larger, and then reversing the sense of all the comparisons
if the first one is larger.  I'm not sure if that's worth the trouble or
not, but if the SQL committee thought descending bucket numbering was
worthwhile, maybe it's worthwhile here too.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Improve bytea error messages
Next
From: Fujii Masao
Date:
Subject: Re: tab completion for setting search_path