Thread: range_agg with multirange inputs

range_agg with multirange inputs

From
Paul Jungwirth
Date:
Here is a patch adding range_agg(anymultirange). Previously range_agg 
only accepted anyrange.

Here is a bug report from last month requesting this addition:

https://www.postgresql.org/message-id/CAOC8YUcOtAGscPa31ik8UEMzgn8uAWA09s6CYOGPyP9_cBbWTw%40mail.gmail.com

As that message points out, range_intersect_agg accepts either anyrange 
or anymultirange, so it makes sense for range_agg to do the same.

I noticed that the docs only mentioned range_intersect_agg(anyrange), so 
I added the anymultirange versions of both on the aggregate functions page.

I also added a few more tests for range_intersect_agg since the coverage 
there seemed light.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Attachment

Re: range_agg with multirange inputs

From
Chapman Flack
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

This applies (with some fuzz) and passes installcheck-world, but a rebase
is needed, because 3 lines of context aren't enough to get the doc changes
in the right place in the aggregate function table. (I think generating
the patch with 4 lines of context would be enough to keep that from being
a recurring issue.)

One thing that seems a bit funny is this message in the new
multirange_agg_transfn:

+   if (!type_is_multirange(mltrngtypoid))
+       ereport(ERROR,
+               (errcode(ERRCODE_DATATYPE_MISMATCH),
+                errmsg("range_agg must be called with a multirange")));

It's clearly copied from the corresponding test and message in
range_agg_transfn. They both say "range_agg must be called ...", which
makes perfect sense, as from the user's perspective both messages come
from (different overloads of) a function named range_agg.

Still, it could be odd to have (again from the user's perspective)
a function named range_agg that sometimes says "range_agg must be
called with a range" and other times says "range_agg must be called
with a multirange".

I'm not sure how to tweak the wording (of either message or both) to
make that less weird, but there's probably a way.

I kind of wonder whether either message is really reachable, at least
through the aggregate machinery in the expected way. Won't that machinery
ensure that it is calling the right transfn with the right type of
argument? If that's the case, maybe the messages could only be seen
by someone calling the transfn directly ... which also seems ruled out
for these transfns because the state type is internal. Is this whole test
more of the nature of an assertion?

Regards,
-Chap

The new status of this patch is: Waiting on Author

Re: range_agg with multirange inputs

From
Paul Jungwirth
Date:
On 2/26/22 17:13, Chapman Flack wrote:
> This applies (with some fuzz) and passes installcheck-world, but a rebase
> is needed, because 3 lines of context aren't enough to get the doc changes
> in the right place in the aggregate function table. (I think generating
> the patch with 4 lines of context would be enough to keep that from being
> a recurring issue.)

Thank you for the review and the tip re 4 lines of context! Rebase attached.

> One thing that seems a bit funny is this message in the new
> multirange_agg_transfn:
> 
> +   if (!type_is_multirange(mltrngtypoid))
> +       ereport(ERROR,
> +               (errcode(ERRCODE_DATATYPE_MISMATCH),
> +                errmsg("range_agg must be called with a multirange")));

I agree it would be more helpful to users to let them know we can take 
either kind of argument. I changed the message to "range_agg must be 
called with a range or multirange". How does that seem?

> I kind of wonder whether either message is really reachable, at least
> through the aggregate machinery in the expected way. Won't that machinery
> ensure that it is calling the right transfn with the right type of
> argument? If that's the case, maybe the messages could only be seen
> by someone calling the transfn directly ... which also seems ruled out
> for these transfns because the state type is internal. Is this whole test
> more of the nature of an assertion?

I don't think they are reachable, so perhaps they are more like asserts. 
Do you think I should change it? It seems like a worthwhile check in any 
case.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

Re: range_agg with multirange inputs

From
Chapman Flack
Date:
On 02/28/22 23:31, Paul Jungwirth wrote:
> On 2/26/22 17:13, Chapman Flack wrote:
>> (I think generating
>> the patch with 4 lines of context would be enough to keep that from being
>> a recurring issue.)
> 
> Thank you for the review and the tip re 4 lines of context! Rebase attached.

I think the 4 lines should suffice, but it looks like this patch was
generated from a rebase of the old one (with three lines) that ended up
putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
position is now baked into the 4 lines of context. :)

So I think it needs a bit of manual attention to get the additions back
in the right places, and then a 4-context-lines patch generated from that.

> I changed the message to "range_agg must be called
> with a range or multirange". How does that seem?

That works for me.

>> I kind of wonder whether either message is really reachable, at least
>> through the aggregate machinery in the expected way. Won't that machinery
>> ensure that it is calling the right transfn with the right type of
>> argument? If that's the case, maybe the messages could only be seen
>> by someone calling the transfn directly ... which also seems ruled out
>> for these transfns because the state type is internal. Is this whole test
>> more of the nature of an assertion?
> 
> I don't think they are reachable, so perhaps they are more like asserts. Do
> you think I should change it? It seems like a worthwhile check in any case.

I would not change them to actual Assert, which would blow up the whole
process on failure. If it's a genuine "not expected to happen" case,
maybe changing it to elog (or ereport with errmsg_internal) would save
a little workload for translators. But as you were copying an existing
ereport with a translatable message, there's also an argument for sticking
to that style, and maybe mentioning the question to an eventual committer
who might have a stronger opinion.

I did a small double-take seeing the C range_agg_finalfn being shared
by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that
the reason it works is get_fn_expr_rettype works equally well with
either parameter type.

Do you think it would be worth adding a comment at the C function
explaining that? In a quick query I just did, I found no other aggregate
final functions sharing a C function that way, so this could be the first.

Regards,
-Chap



Re: range_agg with multirange inputs

From
Paul Jungwirth
Date:
On 3/1/22 13:33, Chapman Flack wrote:
> I think the 4 lines should suffice, but it looks like this patch was
> generated from a rebase of the old one (with three lines) that ended up
> putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
> position is now baked into the 4 lines of context. :)

You're right, my last rebase messed up the docs. Here it is fixed. Sorry 
about that!

> I would not change them to actual Assert, which would blow up the whole
> process on failure. If it's a genuine "not expected to happen" case,
> maybe changing it to elog (or ereport with errmsg_internal) would save
> a little workload for translators.

I like the elog solution. I've changed them in both places.

> I did a small double-take seeing the C range_agg_finalfn being shared
> by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that
> the reason it works is get_fn_expr_rettype works equally well with
> either parameter type.
> 
> Do you think it would be worth adding a comment at the C function
> explaining that? In a quick query I just did, I found no other aggregate
> final functions sharing a C function that way, so this could be the first.

I see 13 other shared finalfns (using select 
array_agg(aggfnoid::regproc) as procs, array_agg(aggtransfn) as 
transfns, aggfinalfn from pg_aggregate where aggfinalfn is distinct from 
0 group by aggfinalfn having count(*) > 1;) but a comment can't hurt! Added.

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

Re: range_agg with multirange inputs

From
Chapman Flack
Date:
On 03/05/22 15:53, Paul Jungwirth wrote:
> On 3/1/22 13:33, Chapman Flack wrote:
>> I think the 4 lines should suffice, but it looks like this patch was
>> generated from a rebase of the old one (with three lines) that ended up
>> putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
>> position is now baked into the 4 lines of context. :)
> 
> You're right, my last rebase messed up the docs. Here it is fixed. Sorry
> about that!

When I apply this patch, I get a func.sgml with two entries for
range_intersect_agg(anymultirange).

> I like the elog solution. I've changed them in both places.

It looks like you've now got elog in three places: the "must be called
with a range or multirange" in multirange_agg_transfn and
multirange_intersect_agg_transfn, and the "called in non-aggregate
context" in multirange_agg_transfn.

I think that last is also ok, given that its state type is internal,
so it shouldn't be reachable in a user call.

In range_agg_transfn, you've changed the message in the "must be called
with a range or multirange"; that seems like another good candidate to
be an elog.

> I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as
> procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where
> aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but
> a comment can't hurt! Added.

I think your query finds aggregate declarations that share the same SQL
function declaration as their finalizer functions. That seems to be more
common.

The query I used looks for cases where different SQL-declared functions
appear as finalizers of aggregates, but the different SQL declared functions
share the same internal C implementation. That's the query where this seems
to be the unique result.

WITH
 finals(regp) AS (
  SELECT DISTINCT
    CAST(aggfinalfn AS regprocedure)
   FROM
    pg_aggregate
   WHERE
    aggfinalfn <> 0 -- InvalidOid
 )
SELECT
  prosrc, array_agg(regp)
 FROM
  pg_proc, finals
 WHERE
  oid = regp AND prolang = 12 -- INTERNALlanguageId
 GROUP BY
  prosrc
 HAVING
  count(*) > 1;

In other words, I think the interesting thing to say in the C comment
is not "shared by range_agg(anyrange) and range_agg(anymultirange)", but
"shared by range_agg_finalfn(internal,anyrange) and
multirange_agg_finalfn(internal,anymultirange)".

It seems a little extra surprising to have one C function declared in SQL
with two different names and parameter signatures. It ends up working
out because it relies on get_fn_expr_rettype, which can do its job for
either polymorphic type it might find in the parameter declaration.
But that's a bit subtle. :)

Regards,
-Chap



Re: range_agg with multirange inputs

From
Paul Jungwirth
Date:
On 3/10/22 14:07, Chapman Flack wrote:
> When I apply this patch, I get a func.sgml with two entries for
> range_intersect_agg(anymultirange).

Arg, fixed.

> In range_agg_transfn, you've changed the message in the "must be called
> with a range or multirange"; that seems like another good candidate to
> be an elog.

Agreed. Updated here.

> I think your query finds aggregate declarations that share the same SQL
> function declaration as their finalizer functions. That seems to be more
> common.
> 
> The query I used looks for cases where different SQL-declared functions
> appear as finalizers of aggregates, but the different SQL declared functions
> share the same internal C implementation.

Okay, I see. I believe that is quite common for ordinary SQL functions. 
Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. 
You're right there are no cases for other finalfns yet, but I don't 
think there is anything special about finalfns that would make this a 
weirder thing to do there than with ordinary functions. Still, noting it 
with a comment does seem helpful. I've updated the remark to match what 
you suggested.

Thank you again for the review, and sorry for so many iterations! :-)

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

Re: range_agg with multirange inputs

From
Chapman Flack
Date:
On 03/11/22 22:18, Paul Jungwirth wrote:
> Arg, fixed.
> 
>> In range_agg_transfn, you've changed the message in the "must be called
>> with a range or multirange"; that seems like another good candidate to
>> be an elog.
> 
> Agreed. Updated here.

This looks good to me and passes installcheck-world, so I'll push
the RfC button.

> Sharing a prosrc seems even less remarkable than sharing an aggfinalfn.
> You're right there are no cases for other finalfns yet, but I don't think
> there is anything special about finalfns that would make this a weirder
> thing to do there than with ordinary functions.

You sent me back to look at how many of those there are. I get 42 cases
of shared prosrc (43 now).

The chief subgroup of those looks to involve sharing between parameter
signatures where the types have identical layouts and the semantic
differences are unimportant to the function in question (comparisons
between bit or between varbit, overlaps taking timestamp or timestamptz,
etc.).

The other prominent group is range and multirange constructors, where
the C function has an obviously generic name like range_constructor2
and gets shared by a bunch of SQL declarations.

I think here we've added the first instance where the C function is
shared by SQL-declared functions accepting two different polymorphic
pseudotypes. But it's clearly simple and works, nothing objectionable
about it.

I had experimented with renaming multirange_agg_finalfn to just
range_agg_finalfn so it would just look like two overloads of one
function sharing a prosrc, and ultimately gave up because genbki.pl
couldn't resolve the OID where the name is used in pg_aggregate.dat.

That's why it surprised me to see three instances where other functions
(overlaps, isfinite, name) do use the same SQL name for different
overloads, but the explanation seems to be that nothing else at genbki
time refers to those, so genbki's unique-name limitation doesn't affect
them.

Neither here nor there for this patch, but an interesting new thing
I learned while reviewing it.

Regards,
-Chap



Re: range_agg with multirange inputs

From
Greg Stark
Date:
Fwiw the cfbot is failing due to a duplicate OID. Traditionally we
didn't treat duplicate OIDs as reason to reject a patch because
they're inevitable as other patches get committed and the committer
can just renumber them.

I think the cfbot kind of changes this calculus since it's a pain lose
the visibility into whether the rest of the tests are passing that the
cfbot normally gives us.

If it's not to much of a hassle could you renumber resubmit the patch
with an updated OID?

[10:54:57.606] su postgres -c "make -s -j${BUILD_JOBS} world-bin"
[10:54:57.927] Duplicate OIDs detected:
[10:54:57.927] 8000



Re: range_agg with multirange inputs

From
Peter Eisentraut
Date:
This patch has been committed.  I split it into a few pieces.

On 12.03.22 04:18, Paul Jungwirth wrote:
> On 3/10/22 14:07, Chapman Flack wrote:
>> When I apply this patch, I get a func.sgml with two entries for
>> range_intersect_agg(anymultirange).
> 
> Arg, fixed.
> 
>> In range_agg_transfn, you've changed the message in the "must be called
>> with a range or multirange"; that seems like another good candidate to
>> be an elog.
> 
> Agreed. Updated here.

I kept those messages as "range" or "multirange" separately, instead of 
"range or multirange".  This way, we don't have to update all the 
messages of this kind when a new function is added.  Since these are 
only internal messages anyway, I opted for higher maintainability.