Re: Built-in binning functions - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Built-in binning functions
Date
Msg-id CAFj8pRBOuoD3ntt-M_sFVwTsRes3FDwTiDiTnu8-MYBBGX32tg@mail.gmail.com
Whole thread Raw
In response to Re: Built-in binning functions  (Petr Jelinek <petr@2ndquadrant.com>)
Responses Re: Built-in binning functions
List pgsql-hackers
Hi

I am looking to your last patch and I have a few questions, notes

1. I am thinking so reduction to only numeric types is not necessary - although we can live without it - but there are lot of non numeric categories: chars, date, ...

2. Still I strongly afraid about used searching method - there is not possible to check a  validity of input. Did you check how much linear searching is slower - you spoke, so you have a real data and real use case? Used methods can returns wrong result without any warning, what is acceptable for extensions, but I am not sure, for core feature.

3. I am thinking about name - I don't think so varwidth_bucket is correct -- in relation to name "width_bucket" ... what about "range_bucket" or "scope_bucket" ?

last patch is very simple, there are no new compilation or regress tests issues.

Regards

Pavel





2014-08-25 16:15 GMT+02:00 Petr Jelinek <petr@2ndquadrant.com>:
Hi,

I finally had some time to get back to this.

I attached version3 of the patch which "fixes" Tom's complaint about int8 version by removing the int8 version as it does not seem necessary (the float8 can handle integers just fine).

This patch now basically has just one optimized function for float8 and one for numeric datatypes, just like width_bucket.

On 08/07/14 02:14, Tom Lane wrote:
Also, I'm not convinced by this business of throwing an error for a
NULL array element.  Per spec, null arguments to width_bucket()
produce a null result, not an error --- shouldn't this flavor act
similarly?  In any case I think the test needs to use
array_contains_nulls() not just ARR_HASNULL.

I really think there should be difference between NULL array and NULL inside array and that NULL inside the array is wrong input. I changed the check to array_contains_nulls() though.

On 08/07/14 02:14, Tom Lane wrote:
Lastly, the spec defines behaviors for width_bucket that allow either
ascending or descending buckets.  We could possibly do something
similar

I am not sure it's worth it here as we require input to be sorted anyway. It might be worthwhile if we decided to do this as an aggregate (since there input would not be required to be presorted) instead of function but I am not sure if that's desirable either as that would limit usability to only the single main use-case (group by and count()).



On 20/07/14 11:01, Simon Riggs wrote:
On 16 July 2014 20:35, Pavel Stehule <pavel.stehule@gmail.com> wrote:

On 08/07/14 02:14, Tom Lane wrote:

I didn't see any discussion of the naming question in this thread.
I'd like to propose that it should be just "width_bucket()"; we can
easily determine which function is meant, considering that the
SQL-spec variants don't take arrays and don't even have the same
number of actual arguments.

I did mention in submission that the names are up for discussion, I am all
for naming it just width_bucket.

I had this idea too - but I am not sure if it is good idea. A distance
between ANSI SQL with_bucket and our enhancing is larger than in our
implementation of "median" for example.

I can live with both names, but current name I prefer.


So I suggest that we use the more generic function name bin(), with a
second choice of discretize()  (though that seems annoyingly easy to
spell incorrectly)


I really don't think bin() is good choice here, bucket has same meaning in this context and bin is often used as shorthand for binary which would be confusing here.

Anyway I currently left the name as it is, I would not be against using width_bucket() or even just bucket(), not sure about the discretize() option.


--
 Petr Jelinek                  http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: On partitioning
Next
From: Tomas Vondra
Date:
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg