Thread: ntile() throws ERROR when hashagg is false
Hi
ntile() throws ERROR when hashagg is false, test case given below.
postgres=# create table foo (a int, b int, c text);
CREATE TABLE
postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from generate_series(0, 36) i;
INSERT 0 37
postgres=# explain select ntile(a) OVER () from foo GROUP BY a;
QUERY PLAN
-------------------------------------------------------------------
WindowAgg (cost=25.00..29.50 rows=200 width=8)
-> HashAggregate (cost=25.00..27.00 rows=200 width=4)
Group Key: a
-> Seq Scan on foo (cost=0.00..22.00 rows=1200 width=4)
(4 rows)
postgres=# select ntile(a) OVER () from foo GROUP BY a;
ntile
-------
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
11
(20 rows)
postgres=# set enable_hashagg to false;
SET
postgres=# explain select ntile(a) OVER () from foo GROUP BY a;
QUERY PLAN
-------------------------------------------------------------------------
WindowAgg (cost=83.37..91.87 rows=200 width=8)
-> Group (cost=83.37..89.37 rows=200 width=4)
Group Key: a
-> Sort (cost=83.37..86.37 rows=1200 width=4)
Sort Key: a
-> Seq Scan on foo (cost=0.00..22.00 rows=1200 width=4)
(6 rows)
postgres=# select ntile(a) OVER () from foo GROUP BY a;
ERROR: argument of ntile must be greater than zero
ntile() throws ERROR when hashagg is false, test case given below.
postgres=# create table foo (a int, b int, c text);
CREATE TABLE
postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from generate_series(0, 36) i;
INSERT 0 37
postgres=# explain select ntile(a) OVER () from foo GROUP BY a;
QUERY PLAN
-------------------------------------------------------------------
WindowAgg (cost=25.00..29.50 rows=200 width=8)
-> HashAggregate (cost=25.00..27.00 rows=200 width=4)
Group Key: a
-> Seq Scan on foo (cost=0.00..22.00 rows=1200 width=4)
(4 rows)
postgres=# select ntile(a) OVER () from foo GROUP BY a;
ntile
-------
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
11
(20 rows)
postgres=# set enable_hashagg to false;
SET
postgres=# explain select ntile(a) OVER () from foo GROUP BY a;
QUERY PLAN
-------------------------------------------------------------------------
WindowAgg (cost=83.37..91.87 rows=200 width=8)
-> Group (cost=83.37..89.37 rows=200 width=4)
Group Key: a
-> Sort (cost=83.37..86.37 rows=1200 width=4)
Sort Key: a
-> Seq Scan on foo (cost=0.00..22.00 rows=1200 width=4)
(6 rows)
postgres=# select ntile(a) OVER () from foo GROUP BY a;
ERROR: argument of ntile must be greater than zero
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
>>>>> "Rajkumar" == Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> writes: Rajkumar> Hi Rajkumar> ntile() throws ERROR when hashagg is false, test case given Rajkumar> below. Rajkumar> postgres=# create table foo (a int, b int, c text); Rajkumar> CREATE TABLE Rajkumar> postgres=# insert into foo select i%20, i%30, to_char(i%12, 'FM0000') from Rajkumar> generate_series(0, 36) i; Rajkumar> INSERT 0 37 Rajkumar> postgres=# explain select ntile(a) OVER () from foo GROUP BY a; This query isn't actually legal per the spec; the argument of ntile is restricted to being a constant or parameter, so it can't change from row to row. PG is more flexible, but that doesn't make the query any more meaningful. What I think pg is actually doing is taking the value of the ntile() argument from the first row and using that for the whole partition. In your example, enabling or disabling hashagg changes the order of the input rows for the window function (since you've specified no ordering in the window definition), and with hashagg off, you get the smallest value of a first (which is 0 and thus an error). -- Andrew (irc:RhodiumToad)
On 14 June 2018 at 18:57, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > What I think pg is actually doing is taking the value of the ntile() > argument from the first row and using that for the whole partition. > In your example, enabling or disabling hashagg changes the order of the > input rows for the window function (since you've specified no ordering > in the window definition), and with hashagg off, you get the smallest > value of a first (which is 0 and thus an error). Seems that's the case. I'd guess it was written that way so we could allow PARAM_EXTERN Params rather than requiring the arg to be a Const. I wonder if it would be worth adding a run-time check in window_ntile() that causes an ERROR on first call if there are any Vars or PARAM_EXEC Params in the function argument. An ERROR might be better than doing something that the user does not expect. Ideally, something would alert the user much sooner than the executor, but I think doing it that way would be quite a bit more work. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes: > On 14 June 2018 at 18:57, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >> What I think pg is actually doing is taking the value of the ntile() >> argument from the first row and using that for the whole partition. Yes, easily verified by looking at window_ntile(): the argument is only examined on first call. > I wonder if it would be worth adding a run-time check in > window_ntile() that causes an ERROR on first call if there are any > Vars or PARAM_EXEC Params in the function argument. An ERROR might be > better than doing something that the user does not expect. -1, that would break cases that are legal and useful, such as where a PARAM_EXEC Param represents an outer-query-level variable, while still failing to catch some problematic cases (eg. volatile functions). I think also that there are cases that are not legal per spec but can still be useful, as long as the user knows what they're doing. It might be worth some documentation changes though. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I wonder if it would be worth adding a run-time check in >> window_ntile() that causes an ERROR on first call if there are any >> Vars or PARAM_EXEC Params in the function argument. An ERROR might >> be better than doing something that the user does not expect. Tom> -1, that would break cases that are legal and useful, such as Tom> where a PARAM_EXEC Param represents an outer-query-level variable, Tom> while still failing to catch some problematic cases (eg. volatile Tom> functions). The only sane run-time check (that I can think of) that could be applied would be to check that the value is the same on each row of a partition. Tom> I think also that there are cases that are not legal per spec but Tom> can still be useful, as long as the user knows what they're doing. Yes, it would make sense for example to allow the value to change between partitions. -- Andrew (irc:RhodiumToad)