Re: [PATCH] Negative Transition Aggregate Functions (WIP) - Mailing list pgsql-hackers

From David Rowley
Subject Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date
Msg-id CAApHDvrzxh2v9SUyVZKK19NQx+E-1bjgCCZ9dbLQ+XsyCSA_uw@mail.gmail.com
Whole thread Raw
In response to [PATCH] Negative Transition Aggregate Functions (WIP)  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: [PATCH] Negative Transition Aggregate Functions (WIP)
List pgsql-hackers
On Sun, Dec 15, 2013 at 12:17 PM, David Rowley <dgrowleyml@gmail.com> wrote:

One thing that is currently on my mind is what to do when passing volatile functions to the aggregate. Since the number of times we execute a volatile function will much depend on the window frame options, I think we should include some sort of warning in the documentation that "The number of times that the expression is evaluated within the aggregate function when used in the context of a WINDOW is undefined". The other option would be to disable this optimisation if the aggregate expression contained a volatile function, but doing this, to me seems a bit weird as is anyone actually going to be depending on a volatile function being executed so many times?



I just wanted to bring this back into people's minds.
During writing this patch I found and removed a comment which was a todo item to implement these negative transition functions. This comment said about maybe disallowing the use of these if the expression of the function contained a volatile function. I wondered why this was important and I still don't really see why we would disallow this only to enforce that we call that function an undefined number of times anyway.

nextval was the only volatile function that I could think of that would allow me to give an example which was easy to understand what is going on here.

CREATE SEQUENCE test_seq;
SELECT currval('test_seq'),
       COUNT(*) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
       SUM(nextval('test_seq')) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM generate_series(1,2) x (x);
DROP SEQUENCE test_seq;

The results are:
 currval | count | sum
---------+-------+-----
       2 |     2 |   3
       3 |     1 |   3

I've not looked to see if the spec has anything about this but, the first row will have sum as 1+2 then the 2nd row will just have 1 row to aggregate and the value will be 3 due to nextval returning 3, I could see an argument that the results for this should actually be:

 currval | count | sum
---------+-------+-----
       2 |     2 |   3
       3 |     1 |   2 

If it was then the solution would have to be to materalise the expression by evaluating it once for each tuple which sounds like a big change. I thought maybe if we're going to be playing around with the number of times these expressions are evaluated then we should stick a node in the docs to tell our users not to depend on this.

Something like the attached maybe.
Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Next
From: Heikki Linnakangas
Date:
Subject: Re: autovacuum_work_mem