Re: Properly mark NULL returns in numeric aggregates - Mailing list pgsql-hackers
From | Jesse Zhang |
---|---|
Subject | Re: Properly mark NULL returns in numeric aggregates |
Date | |
Msg-id | CAGf+fX4PDv8k9t95Ue7GO5dmj73cmKqCKVB3=_2i-sj8TH12JA@mail.gmail.com Whole thread Raw |
In response to | Re: Properly mark NULL returns in numeric aggregates (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
Hi David, On Mon, Apr 13, 2020 at 10:46 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 14 Apr 2020 at 06:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, they're relying exactly on the assumption that nodeAgg is not > > going to try to copy a value declared "internal", and therefore they > > can be loosey-goosey about whether the value pointer is null or not. > > However, if you want to claim that that's wrong, you have to explain > > why it's okay for some other code to be accessing a value that's > > declared "internal". I'd say that the meaning of that is precisely > > "keepa u hands off". > > > > In the case at hand, the current situation is that we only expect the > > values returned by these combine functions to be read by the associated > > final functions, which are on board with the null-pointer representation > > of an empty result. Your argument is essentially that it should be > > possible to feed the values to the aggregate's associated serialization > > function as well. But the core code never does that, so I'm not convinced > > that we should add it to the requirements; we'd be unable to test it. > > Casting my mind back to when I originally wrote that code, I attempted > to do so in such a way so that it could one day be used for a 3-stage > aggregation. e.g Parallel Partial Aggregate -> Gather -> Combine > Serial Aggregate on one node, then on some master node a Deserial > Combine Finalize Aggregate. You're very right that we can't craft > such a plan with today's master (We didn't even add a supporting enum > for it in AggSplit). However, it does appear that there are > extensions or forks out there which attempt to use the code in this > way, so it would be good to not leave those people out in the cold > regarding this. Greenplum plans split-aggregation quite similarly from Postgres: while it doesn't pass partial results through a intra-cluster "Gather" -- using a reshuffle-by-hash type operation instead -- Greenplum _does_ split an aggregate into final and partial halves, running them on different nodes. In short, the relation ship among the combine, serial, and deserial functions are similar to how they are in Postgres today (serial->deserial->combine), in the context of splitting aggregates. The current problem arises because Greenplum spills the hash table in hash aggregation (a diff we're working actively to upstream), a process in which we have to touch (read: serialize and copy) the internal trans values. However, we are definitely eyeing what you described as something to move towards. As a fork, we'd like to carry as thin a diff as possible. So the current situation is pretty much forcing us to diverge in the functions mentioned up-thread. In hindsight, "sloppy" might not have been a wise choice of words, apologies for the possible offense, David! > > For testing, can't we just have an Assert() in > advance_transition_function that verifies isnull matches the > nullability of the return value for INTERNAL returning transfns? i.e, > the attached > > I don't have a test case to hand that could cause this to fail, but it > sounds like Jesse might. One easy way to cause this is "sum(x) FILTER (WHERE false)" which will for sure make the partial results NULL. Is that what you're looking for? I'll be happy to send in the SQL. Cheers, Jesse
pgsql-hackers by date: