Home > mailing lists

Re: First Aggregate Funtion? - Mailing list pgsql-hackers

From	Corey Huinker
Subject	Re: First Aggregate Funtion?
Date	July 20, 2015 16:03:01
Msg-id	CADkLM=foA_oC_Ri23F9PbfLnfwXFbC3Lt8bBzRu3=CB77G9_qw@mail.gmail.com Whole thread Raw
In response to	Re: First Aggregate Funtion? (Paul A Jungwirth <pj@illuminatedcomputing.com>)
Responses	Re: First Aggregate Funtion?
List	pgsql-hackers

Tree view

On Mon, Jul 20, 2015 at 11:06 AM, Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:

> The above implementation of "first" aggregate returns the first non-NULL item
> value.

I'm curious what advantages this approach has over these FIRST/LAST
functions from the Wiki?:

https://wiki.postgresql.org/wiki/First/last_%28aggregate%29

Also to get the "first non-null value" you can apply an ordering to
just the aggregate function, e.g.:

select first(id order by start_time nulls last) from events;

If you want speed you should probably write a C version.

Is there something I'm missing?

Also since we're on the hackers list is this a proposal to add these
functions to core Postgres?

Yours,
Paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

If it is a proposal to add to core, I'd like to suggest a close cousin function of first()/last(): only(). [1]

It would behave like first() but would throw an error if it encountered more than one distinct value in the window.

This would be helpful in dependent grouping situations like this:

select a.keyval, a.name_of_the thing, sum(b.metric_value) as metric_value

from a

join b on b.a_keyval = a.keyval

group by a.keyval, a.name_of_the_thing

Now, everyone's made this optimization to reduce group-by overhead:

select a.keyval, min(a.name_of_the_thing) as name_of_the_thing, sum(b.metric_value) as metric_value

from a

join b on b.a_keyval = a.keyval

group by a.keyval

Which works fine, but it's self-anti-documenting:

- it implies that name of the thing *could* be different across rows with the same keyval

- it implies we have some business preference for names that are first in alphabetical order.

- it implies that the string has more in common with the summed metrics (imagine this query has dozens of them) than the key values to the left.

Using first(a.name_of_the_thing) is less overhead than min()/max(), but has the same issues listed above.

By using only(a.name_of_the_thing) we'd have a bit more clarity that the author expected all of those values to be the same across the aggregate window, and discovering otherwise was reason enough to fail the query.

*IF* we're considering adding these to core, I think that only() would be just a slight modification of the last() implementation, and could be done at the same time.

[1] I don't care what it gets named. I just want the functionality.

pgsql-hackers by date:

From: Emre Hasegeli
Date: 20 July 2015, 15:39:26
Subject: Re: TABLESAMPLE patch is really in pretty sad shape

From: Marko Tiikkaja
Date: 20 July 2015, 16:07:10
Subject: Re: First Aggregate Funtion?

Re: First Aggregate Funtion? - Mailing list pgsql-hackers

Previous

Next