Thread: onlyvalue aggregate (was: First Aggregate Funtion?)

onlyvalue aggregate (was: First Aggregate Funtion?)

From

Marko Tiikkaja

Date:

28 October 2015, 16:50:26

Hi,

Here's a patch for the aggregate function outlined by Corey Huinker in
CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .  I
called it "onlyvalue", which is a horrible name, but I have nothing
better to offer.  (Corey called it "only", but that doesn't really work
since ONLY is a fully reserved keyword.)

I'll add this to September's commit fest, but if you want to bash me or
the patch in the meanwhile, go ahead.


.m

Attachment

onlyvalue_v1.patch

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Pavel Stehule

Date:

28 October 2015, 16:53:53

2015-10-28 17:50 GMT+01:00 Marko Tiikkaja <marko@joh.to>:

Hi,

Here's a patch for the aggregate function outlined by Corey Huinker in CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com . I called it "onlyvalue", which is a horrible name, but I have nothing better to offer. (Corey called it "only", but that doesn't really work since ONLY is a fully reserved keyword.)

I'll add this to September's commit fest, but if you want to bash me or the patch in the meanwhile, go ahead.

what is use case for this function and why it should be in core?

Regards

Pavel

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Marko Tiikkaja

Date:

28 October 2015, 17:04:04

On 10/28/15 5:53 PM, Pavel Stehule wrote:
> what is use case for this function and why it should be in core?

Corey had one example in his email, but I can offer another one which 
came up this week at $work.  The query looked something like this:

SELECT a, sum(amount), onlyvalue(rolling_count)
FROM
(    SELECT a, amount, count(*) OVER (ORDER BY a) AS rolling_count    FROM tbl
) ss
GROUP BY a;

We know that all the values for the column are going to be the same 
value for every "a", so we could use min() or max().  But the advantage 
of "onlyvalue" is that it actually checks that, so if someone went and 
changed the window frame to do something slightly different, the query 
would blow up instead of silently returning the (now likely incorrect) 
minimum or maximum value.  It's also self-documenting for the reader of 
such queries.

In my experience this problem comes up often enough that it would be 
make sense to have this aggregate in core.


.m

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Pavel Stehule

Date:

28 October 2015, 17:22:25

2015-10-28 18:03 GMT+01:00 Marko Tiikkaja <marko@joh.to>:

On 10/28/15 5:53 PM, Pavel Stehule wrote:
what is use case for this function and why it should be in core?

Corey had one example in his email, but I can offer another one which came up this week at $work. The query looked something like this:

SELECT a, sum(amount), onlyvalue(rolling_count)
FROM
(
SELECT a, amount, count(*) OVER (ORDER BY a) AS rolling_count
FROM tbl
) ss
GROUP BY a;

We know that all the values for the column are going to be the same value for every "a", so we could use min() or max(). But the advantage of "onlyvalue" is that it actually checks that, so if someone went and changed the window frame to do something slightly different, the query would blow up instead of silently returning the (now likely incorrect) minimum or maximum value. It's also self-documenting for the reader of such queries.

In my experience this problem comes up often enough that it would be make sense to have this aggregate in core.

This function is pretty inconsistent with any other builtin aggregate function, so it is one argument against to push it to core. Next, this function can be pretty simply implemented in PLpgSQL. And my last argument - I don't remember too often request for this functionality. It looks like module for PGXN much more, than PG core functionality.

Regards

Pavel

.m

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Tom Lane

Date:

28 October 2015, 17:38:40

Marko Tiikkaja <marko@joh.to> writes:
> Here's a patch for the aggregate function outlined by Corey Huinker in 
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .  I 
> called it "onlyvalue", which is a horrible name, but I have nothing 
> better to offer.  (Corey called it "only", but that doesn't really work 
> since ONLY is a fully reserved keyword.)

On the name front, maybe think "single" rather than "only"?  This might
lead to "single()" or "single_value()", or "singleton()" if you want to
sound highbrow.

On the semantics front, I'm not sure that I like excluding nulls from the
input domain.  I'd rather that it acted like IS NOT DISTINCT, ie, nulls
are fine as long as all the input values are nulls.  The implementation
would need some work for that.
        regards, tom lane

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Pavel Stehule

Date:

28 October 2015, 18:00:37

2015-10-28 18:38 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Marko Tiikkaja <marko@joh.to> writes:
> Here's a patch for the aggregate function outlined by Corey Huinker in
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com . I
> called it "onlyvalue", which is a horrible name, but I have nothing
> better to offer. (Corey called it "only", but that doesn't really work
> since ONLY is a fully reserved keyword.)

On the name front, maybe think "single" rather than "only"? This might
lead to "single()" or "single_value()", or "singleton()" if you want to
sound highbrow.

this function should to have some distinguish name than other aggregates because important work of this func is not some calculation but some constraint check.

On the semantics front, I'm not sure that I like excluding nulls from the
input domain. I'd rather that it acted like IS NOT DISTINCT, ie, nulls
are fine as long as all the input values are nulls. The implementation
would need some work for that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

"David G. Johnston"

Date:

28 October 2015, 18:45:02

On Wed, Oct 28, 2015 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Tiikkaja <marko@joh.to> writes:
> Here's a patch for the aggregate function outlined by Corey Huinker in
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com . I
> called it "onlyvalue", which is a horrible name, but I have nothing
> better to offer. (Corey called it "only", but that doesn't really work
> since ONLY is a fully reserved keyword.)

On the name front, maybe think "single" rather than "only"? This might
lead to "single()" or "single_value()", or "singleton()" if you want to
sound highbrow.

On the semantics front, I'm not sure that I like excluding nulls from the
input domain. I'd rather that it acted like IS NOT DISTINCT, ie, nulls
are fine as long as all the input values are nulls. The implementation
would need some work for that.

homogeneous() ?

It is basically an assertion function since in most cases where you would use this you needn't use a function at all but instead simply place the column in the GROUP BY clause.

I have at various times desired having various assertion functions that return the input value if the assertion is met but causes a query error if it is not. That said functions can be both scalar and aggregate oriented makes sense. Adding just this one function in seems like it adds a partial feature to -core. I'd rather it bake more in PGXN before considering putting it into core. There doesn't seem to be any hard need or benefit to doing so at this time.

I would probably stick to the concept of assertion and call it something like

"assert_nongrouping()"

to denote that the input does not cause multiple groups to be created due to their being multiple values.

David J.

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Robert Haas

Date:

30 October 2015, 09:33:47

On Wed, Oct 28, 2015 at 5:50 PM, Marko Tiikkaja <marko@joh.to> wrote:
> Here's a patch for the aggregate function outlined by Corey Huinker in
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .  I
> called it "onlyvalue", which is a horrible name, but I have nothing better
> to offer.  (Corey called it "only", but that doesn't really work since ONLY
> is a fully reserved keyword.)

I've written an aggregate that does something like this a few times.
I think one time I called it "the", which is probably too clever, but
then you could query for the(project_manager) and similar.  I've
usually written it to not error-check and just return the first
non-NULL value it runs across, which suggests a name like any_old() or
whatever().

I actually think by comparison with those ideas, onlyvalue() - or
maybe only_value() - is not bad.

> I'll add this to September's commit fest,

November, probably.

> but if you want to bash me or the
> patch in the meanwhile, go ahead.

What if we want to say nice things?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Dean Rasheed

Date:

02 November 2015, 08:32:34

On 28 October 2015 at 16:50, Marko Tiikkaja <marko@joh.to> wrote:
> Hi,
>
> Here's a patch for the aggregate function outlined by Corey Huinker in
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .

+1. I've wanted something like this a few times. Of the names
suggested so far, I think I prefer "single_value", and yes I think it
should work with NULLs.

Regards,
Dean

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Simon Riggs

Date:

02 November 2015, 09:10:20

On 2 November 2015 at 09:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

On 28 October 2015 at 16:50, Marko Tiikkaja <marko@joh.to> wrote:
> Hi,
>
> Here's a patch for the aggregate function outlined by Corey Huinker in
> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .

+1. I've wanted something like this a few times. Of the names
suggested so far, I think I prefer "single_value", and yes I think it
should work with NULLs.

I think we should avoid using ONLY or VALUE within the names since those already have other meanings in Postgres.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Marko Tiikkaja

Date:

02 November 2015, 11:00:06

On 11/2/15 9:32 AM, Dean Rasheed wrote:
> On 28 October 2015 at 16:50, Marko Tiikkaja <marko@joh.to> wrote:
>> Here's a patch for the aggregate function outlined by Corey Huinker in
>> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .
>
> +1. I've wanted something like this a few times. Of the names
> suggested so far, I think I prefer "single_value", and yes I think it
> should work with NULLs.

This was actually a last-minute design change I made before submitting 
the patch.  The two times I've previously written this aggregate both 
accepted NULLs and only enforced that there must not be more than one 
non-NULL value, but that's only because I was thinking about the "poor 
man's FILTER" case, which is obsolete since version 9.4.  The reason I 
changed in this version is that accepting NULLs can also hide bugs, and 
it's (usually) easy to filter them out with FILTER.

Did you have some specific use case in mind where accepting NULLs would 
be beneficial?

.m

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Dean Rasheed

Date:

02 November 2015, 11:38:58

On 2 November 2015 at 09:10, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think we should avoid using ONLY or VALUE within the names since those
> already have other meanings in Postgres.
>

We already have window functions called first_value, last_value and
nth_value, so I think use of "value" in the name is OK (and
consistent). Actually thinking some more, I think the name
"unique_value" is better because it's more suggestive of the fact that
uniqueness is being enforced.

Regards,
Dean

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Dean Rasheed

Date:

02 November 2015, 11:40:44

On 2 November 2015 at 10:59, Marko Tiikkaja <marko@joh.to> wrote:
> On 11/2/15 9:32 AM, Dean Rasheed wrote:
>>
>> On 28 October 2015 at 16:50, Marko Tiikkaja <marko@joh.to> wrote:
>>>
>>> Here's a patch for the aggregate function outlined by Corey Huinker in
>>> CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com .
>>
>>
>> +1. I've wanted something like this a few times. Of the names
>> suggested so far, I think I prefer "single_value", and yes I think it
>> should work with NULLs.
>
>
> This was actually a last-minute design change I made before submitting the
> patch.  The two times I've previously written this aggregate both accepted
> NULLs and only enforced that there must not be more than one non-NULL value,
> but that's only because I was thinking about the "poor man's FILTER" case,
> which is obsolete since version 9.4.  The reason I changed in this version
> is that accepting NULLs can also hide bugs, and it's (usually) easy to
> filter them out with FILTER.
>
> Did you have some specific use case in mind where accepting NULLs would be
> beneficial?
>

I'm not sure what you mean when you say accepting NULLs can hide bugs.
I think that if the input values to the aggregate were
1,1,1,NULL,1,1,1 then it should raise an error. ITSM that that is more
likely to reveal problems with your underlying data or the query. If
you want to ignore NULLs, you can always add a FILTER(WHERE val IS NOT
NULL) clause.

Regards,
Dean

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Marko Tiikkaja

Date:

02 November 2015, 11:52:20

On 11/2/15 12:40 PM, Dean Rasheed wrote:
> I'm not sure what you mean when you say accepting NULLs can hide bugs.
> I think that if the input values to the aggregate were
> 1,1,1,NULL,1,1,1 then it should raise an error. ITSM that that is more
> likely to reveal problems with your underlying data or the query. If
> you want to ignore NULLs, you can always add a FILTER(WHERE val IS NOT
> NULL) clause.

Ah, I see.  So you're arguing that the aggregate should accept NULLs as 
input, but consider them distinct from any non-NULL values.  I thought 
you meant accepting NULLs and *not* considering them distinct, which 
could easily hide problems.

In that case, I don't oppose to changing the behavior.  I'll make the 
necessary changes.

.m

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Marko Tiikkaja

Date:

21 November 2015, 03:54:26

Hi Dean,

Here's v2 of the patch.  How's this look?


.m

Attachment

single_value_v2.patch

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Corey Huinker

Date:

22 November 2015, 01:17:23

<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Nov 20, 2015 at 10:54 PM, Marko Tiikkaja <span
dir="ltr"><<ahref="mailto:marko@joh.to" target="_blank">marko@joh.to</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Dean,<br /><br /> Here's v2
ofthe patch.  How's this look?<span class="HOEnZb"><font color="#888888"><br /><br /><br /> .m<br /></font></span><br
/><br/> --<br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div><br /></div><div
class="gmail_extra">SorryI didn't see this thread sooner.</div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Ihaven't applied the patch, but the test cases illustrate that the function does everything I want,
andhas inspired me to think a bit about some other assert-ish functions. Thanks!</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra"><br /></div></div>

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Dean Rasheed

Date:

23 November 2015, 09:29:49

On 21 November 2015 at 03:54, Marko Tiikkaja <marko@joh.to> wrote:
> Here's v2 of the patch.  How's this look?
>

Here are some initial review comments:

* My first thought on reading this patch is that it is somewhat
under-commented. For example, I would expect at least a block comment
at the top of the new code added by this patch. Also the fields of the
new structure could use some comments -- it might be obvious what
datum and isnull are for, but fcinfo is less obvious until you read
the code that uses it. Likewise the transfn is quite long, with almost
no explanatory comments.

* There's a clear bug in the null handling in the second branch of the
transfn -- when the new value is null and the previous value wasn't.
For example:

SELECT single_value(x) FROM (VALUES ('x'), (null)) t(x);
server closed the connection unexpectedly

* In the finalfn, I think calling AggCheckCallContext() should be the
first thing it does. Compare for example array_agg_array_finalfn().

* There's another less obvious bug in the way these functions handle
complex types. For example:

SELECT single_value(t) FROM (VALUES (1,'One'), (1,'One')) t(x,y);
ERROR:  cache lookup failed for type 2139062143

You might want to look at how array_agg() handles that. Also the code
behind array_position() has some elements in common with this patch.

* Consider collation handling, as illustrated by array_position().

So I'm afraid there's still some work to do, but there are plenty of
examples in existing code to borrow from.

Regards,
Dean

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Greg Stark

Date:

23 November 2015, 12:15:18

On Wed, Oct 28, 2015 at 5:03 PM, Marko Tiikkaja <marko@joh.to> wrote:
> SELECT a, sum(amount), onlyvalue(rolling_count)
> FROM
> (
>     SELECT a, amount, count(*) OVER (ORDER BY a) AS rolling_count
>     FROM tbl
> ) ss
> GROUP BY a;

The same thing would happen even in the more common case of having
functionally dependent columns if they happen to be buried in a
subquery. That might well be convenient if you have some expression
you want to use in multiple aggregates such as:

SELECT pk, acol, avg(x), min(x), max(x)  FROM (     SELECT a,pk, a,acol, b.c+b.d+b.e AS x         FROM a JOIN b ON
(a.pk= b.fk)  ) GROUP BY pk

Postgres would happily accept that if you collapsed the subquery and
ran the group by directly on the join but the subquery in between is
actually enough to hide the functional dependency information so it
complains that acol is not functionally dependent on the group by
column.

-- 
greg

Re: onlyvalue aggregate (was: First Aggregate Funtion?)

From

Michael Paquier

Date:

24 December 2015, 02:49:10

On Mon, Nov 23, 2015 at 6:29 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 21 November 2015 at 03:54, Marko Tiikkaja <marko@joh.to> wrote:
>> Here's v2 of the patch.  How's this look?
>>
>
> Here are some initial review comments:
>
> * My first thought on reading this patch is that it is somewhat
> under-commented. For example, I would expect at least a block comment
> at the top of the new code added by this patch. Also the fields of the
> new structure could use some comments -- it might be obvious what
> datum and isnull are for, but fcinfo is less obvious until you read
> the code that uses it. Likewise the transfn is quite long, with almost
> no explanatory comments.
>
> * There's a clear bug in the null handling in the second branch of the
> transfn -- when the new value is null and the previous value wasn't.
> For example:
>
> SELECT single_value(x) FROM (VALUES ('x'), (null)) t(x);
> server closed the connection unexpectedly
>
> * In the finalfn, I think calling AggCheckCallContext() should be the
> first thing it does. Compare for example array_agg_array_finalfn().
>
> * There's another less obvious bug in the way these functions handle
> complex types. For example:
>
> SELECT single_value(t) FROM (VALUES (1,'One'), (1,'One')) t(x,y);
> ERROR:  cache lookup failed for type 2139062143
>
> You might want to look at how array_agg() handles that. Also the code
> behind array_position() has some elements in common with this patch.
>
> * Consider collation handling, as illustrated by array_position().
>
> So I'm afraid there's still some work to do, but there are plenty of
> examples in existing code to borrow from.

There is a review, but no input from the author for more than 1 month,
hence patch has been marked as "returned with feedback". Feel free to
move it to next CF if you want to post a new version.
-- 
Michael