Thread: range_agg with multirange inputs
Here is a patch adding range_agg(anymultirange). Previously range_agg only accepted anyrange. Here is a bug report from last month requesting this addition: https://www.postgresql.org/message-id/CAOC8YUcOtAGscPa31ik8UEMzgn8uAWA09s6CYOGPyP9_cBbWTw%40mail.gmail.com As that message points out, range_intersect_agg accepts either anyrange or anymultirange, so it makes sense for range_agg to do the same. I noticed that the docs only mentioned range_intersect_agg(anyrange), so I added the anymultirange versions of both on the aggregate functions page. I also added a few more tests for range_intersect_agg since the coverage there seemed light. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested This applies (with some fuzz) and passes installcheck-world, but a rebase is needed, because 3 lines of context aren't enough to get the doc changes in the right place in the aggregate function table. (I think generating the patch with 4 lines of context would be enough to keep that from being a recurring issue.) One thing that seems a bit funny is this message in the new multirange_agg_transfn: + if (!type_is_multirange(mltrngtypoid)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("range_agg must be called with a multirange"))); It's clearly copied from the corresponding test and message in range_agg_transfn. They both say "range_agg must be called ...", which makes perfect sense, as from the user's perspective both messages come from (different overloads of) a function named range_agg. Still, it could be odd to have (again from the user's perspective) a function named range_agg that sometimes says "range_agg must be called with a range" and other times says "range_agg must be called with a multirange". I'm not sure how to tweak the wording (of either message or both) to make that less weird, but there's probably a way. I kind of wonder whether either message is really reachable, at least through the aggregate machinery in the expected way. Won't that machinery ensure that it is calling the right transfn with the right type of argument? If that's the case, maybe the messages could only be seen by someone calling the transfn directly ... which also seems ruled out for these transfns because the state type is internal. Is this whole test more of the nature of an assertion? Regards, -Chap The new status of this patch is: Waiting on Author
On 2/26/22 17:13, Chapman Flack wrote: > This applies (with some fuzz) and passes installcheck-world, but a rebase > is needed, because 3 lines of context aren't enough to get the doc changes > in the right place in the aggregate function table. (I think generating > the patch with 4 lines of context would be enough to keep that from being > a recurring issue.) Thank you for the review and the tip re 4 lines of context! Rebase attached. > One thing that seems a bit funny is this message in the new > multirange_agg_transfn: > > + if (!type_is_multirange(mltrngtypoid)) > + ereport(ERROR, > + (errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("range_agg must be called with a multirange"))); I agree it would be more helpful to users to let them know we can take either kind of argument. I changed the message to "range_agg must be called with a range or multirange". How does that seem? > I kind of wonder whether either message is really reachable, at least > through the aggregate machinery in the expected way. Won't that machinery > ensure that it is calling the right transfn with the right type of > argument? If that's the case, maybe the messages could only be seen > by someone calling the transfn directly ... which also seems ruled out > for these transfns because the state type is internal. Is this whole test > more of the nature of an assertion? I don't think they are reachable, so perhaps they are more like asserts. Do you think I should change it? It seems like a worthwhile check in any case. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 02/28/22 23:31, Paul Jungwirth wrote: > On 2/26/22 17:13, Chapman Flack wrote: >> (I think generating >> the patch with 4 lines of context would be enough to keep that from being >> a recurring issue.) > > Thank you for the review and the tip re 4 lines of context! Rebase attached. I think the 4 lines should suffice, but it looks like this patch was generated from a rebase of the old one (with three lines) that ended up putting the new 'range_agg' entry ahead of 'max' in func.sgml, which position is now baked into the 4 lines of context. :) So I think it needs a bit of manual attention to get the additions back in the right places, and then a 4-context-lines patch generated from that. > I changed the message to "range_agg must be called > with a range or multirange". How does that seem? That works for me. >> I kind of wonder whether either message is really reachable, at least >> through the aggregate machinery in the expected way. Won't that machinery >> ensure that it is calling the right transfn with the right type of >> argument? If that's the case, maybe the messages could only be seen >> by someone calling the transfn directly ... which also seems ruled out >> for these transfns because the state type is internal. Is this whole test >> more of the nature of an assertion? > > I don't think they are reachable, so perhaps they are more like asserts. Do > you think I should change it? It seems like a worthwhile check in any case. I would not change them to actual Assert, which would blow up the whole process on failure. If it's a genuine "not expected to happen" case, maybe changing it to elog (or ereport with errmsg_internal) would save a little workload for translators. But as you were copying an existing ereport with a translatable message, there's also an argument for sticking to that style, and maybe mentioning the question to an eventual committer who might have a stronger opinion. I did a small double-take seeing the C range_agg_finalfn being shared by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that the reason it works is get_fn_expr_rettype works equally well with either parameter type. Do you think it would be worth adding a comment at the C function explaining that? In a quick query I just did, I found no other aggregate final functions sharing a C function that way, so this could be the first. Regards, -Chap
On 3/1/22 13:33, Chapman Flack wrote: > I think the 4 lines should suffice, but it looks like this patch was > generated from a rebase of the old one (with three lines) that ended up > putting the new 'range_agg' entry ahead of 'max' in func.sgml, which > position is now baked into the 4 lines of context. :) You're right, my last rebase messed up the docs. Here it is fixed. Sorry about that! > I would not change them to actual Assert, which would blow up the whole > process on failure. If it's a genuine "not expected to happen" case, > maybe changing it to elog (or ereport with errmsg_internal) would save > a little workload for translators. I like the elog solution. I've changed them in both places. > I did a small double-take seeing the C range_agg_finalfn being shared > by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that > the reason it works is get_fn_expr_rettype works equally well with > either parameter type. > > Do you think it would be worth adding a comment at the C function > explaining that? In a quick query I just did, I found no other aggregate > final functions sharing a C function that way, so this could be the first. I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but a comment can't hurt! Added. Thanks, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 03/05/22 15:53, Paul Jungwirth wrote: > On 3/1/22 13:33, Chapman Flack wrote: >> I think the 4 lines should suffice, but it looks like this patch was >> generated from a rebase of the old one (with three lines) that ended up >> putting the new 'range_agg' entry ahead of 'max' in func.sgml, which >> position is now baked into the 4 lines of context. :) > > You're right, my last rebase messed up the docs. Here it is fixed. Sorry > about that! When I apply this patch, I get a func.sgml with two entries for range_intersect_agg(anymultirange). > I like the elog solution. I've changed them in both places. It looks like you've now got elog in three places: the "must be called with a range or multirange" in multirange_agg_transfn and multirange_intersect_agg_transfn, and the "called in non-aggregate context" in multirange_agg_transfn. I think that last is also ok, given that its state type is internal, so it shouldn't be reachable in a user call. In range_agg_transfn, you've changed the message in the "must be called with a range or multirange"; that seems like another good candidate to be an elog. > I see 13 other shared finalfns (using select array_agg(aggfnoid::regproc) as > procs, array_agg(aggtransfn) as transfns, aggfinalfn from pg_aggregate where > aggfinalfn is distinct from 0 group by aggfinalfn having count(*) > 1;) but > a comment can't hurt! Added. I think your query finds aggregate declarations that share the same SQL function declaration as their finalizer functions. That seems to be more common. The query I used looks for cases where different SQL-declared functions appear as finalizers of aggregates, but the different SQL declared functions share the same internal C implementation. That's the query where this seems to be the unique result. WITH finals(regp) AS ( SELECT DISTINCT CAST(aggfinalfn AS regprocedure) FROM pg_aggregate WHERE aggfinalfn <> 0 -- InvalidOid ) SELECT prosrc, array_agg(regp) FROM pg_proc, finals WHERE oid = regp AND prolang = 12 -- INTERNALlanguageId GROUP BY prosrc HAVING count(*) > 1; In other words, I think the interesting thing to say in the C comment is not "shared by range_agg(anyrange) and range_agg(anymultirange)", but "shared by range_agg_finalfn(internal,anyrange) and multirange_agg_finalfn(internal,anymultirange)". It seems a little extra surprising to have one C function declared in SQL with two different names and parameter signatures. It ends up working out because it relies on get_fn_expr_rettype, which can do its job for either polymorphic type it might find in the parameter declaration. But that's a bit subtle. :) Regards, -Chap
On 3/10/22 14:07, Chapman Flack wrote: > When I apply this patch, I get a func.sgml with two entries for > range_intersect_agg(anymultirange). Arg, fixed. > In range_agg_transfn, you've changed the message in the "must be called > with a range or multirange"; that seems like another good candidate to > be an elog. Agreed. Updated here. > I think your query finds aggregate declarations that share the same SQL > function declaration as their finalizer functions. That seems to be more > common. > > The query I used looks for cases where different SQL-declared functions > appear as finalizers of aggregates, but the different SQL declared functions > share the same internal C implementation. Okay, I see. I believe that is quite common for ordinary SQL functions. Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. You're right there are no cases for other finalfns yet, but I don't think there is anything special about finalfns that would make this a weirder thing to do there than with ordinary functions. Still, noting it with a comment does seem helpful. I've updated the remark to match what you suggested. Thank you again for the review, and sorry for so many iterations! :-) Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 03/11/22 22:18, Paul Jungwirth wrote: > Arg, fixed. > >> In range_agg_transfn, you've changed the message in the "must be called >> with a range or multirange"; that seems like another good candidate to >> be an elog. > > Agreed. Updated here. This looks good to me and passes installcheck-world, so I'll push the RfC button. > Sharing a prosrc seems even less remarkable than sharing an aggfinalfn. > You're right there are no cases for other finalfns yet, but I don't think > there is anything special about finalfns that would make this a weirder > thing to do there than with ordinary functions. You sent me back to look at how many of those there are. I get 42 cases of shared prosrc (43 now). The chief subgroup of those looks to involve sharing between parameter signatures where the types have identical layouts and the semantic differences are unimportant to the function in question (comparisons between bit or between varbit, overlaps taking timestamp or timestamptz, etc.). The other prominent group is range and multirange constructors, where the C function has an obviously generic name like range_constructor2 and gets shared by a bunch of SQL declarations. I think here we've added the first instance where the C function is shared by SQL-declared functions accepting two different polymorphic pseudotypes. But it's clearly simple and works, nothing objectionable about it. I had experimented with renaming multirange_agg_finalfn to just range_agg_finalfn so it would just look like two overloads of one function sharing a prosrc, and ultimately gave up because genbki.pl couldn't resolve the OID where the name is used in pg_aggregate.dat. That's why it surprised me to see three instances where other functions (overlaps, isfinite, name) do use the same SQL name for different overloads, but the explanation seems to be that nothing else at genbki time refers to those, so genbki's unique-name limitation doesn't affect them. Neither here nor there for this patch, but an interesting new thing I learned while reviewing it. Regards, -Chap
Fwiw the cfbot is failing due to a duplicate OID. Traditionally we didn't treat duplicate OIDs as reason to reject a patch because they're inevitable as other patches get committed and the committer can just renumber them. I think the cfbot kind of changes this calculus since it's a pain lose the visibility into whether the rest of the tests are passing that the cfbot normally gives us. If it's not to much of a hassle could you renumber resubmit the patch with an updated OID? [10:54:57.606] su postgres -c "make -s -j${BUILD_JOBS} world-bin" [10:54:57.927] Duplicate OIDs detected: [10:54:57.927] 8000
This patch has been committed. I split it into a few pieces. On 12.03.22 04:18, Paul Jungwirth wrote: > On 3/10/22 14:07, Chapman Flack wrote: >> When I apply this patch, I get a func.sgml with two entries for >> range_intersect_agg(anymultirange). > > Arg, fixed. > >> In range_agg_transfn, you've changed the message in the "must be called >> with a range or multirange"; that seems like another good candidate to >> be an elog. > > Agreed. Updated here. I kept those messages as "range" or "multirange" separately, instead of "range or multirange". This way, we don't have to update all the messages of this kind when a new function is added. Since these are only internal messages anyway, I opted for higher maintainability.