Thread: Wrong aggregate result when sorting by a NULL value

Wrong aggregate result when sorting by a NULL value

From
Ondřej Bouda
Date:
Dear PostgreSQLers,

the following seems as a bug to me on Postgres 11.0:


CREATE FUNCTION first_arg(ANYELEMENT, ANYELEMENT) RETURNS ANYELEMENT
AS $function$
     SELECT $1
$function$ LANGUAGE SQL IMMUTABLE STRICT;

CREATE AGGREGATE first(ANYELEMENT) (
     SFUNC = first_arg,
     STYPE = ANYELEMENT
);

CREATE TABLE t (
     x TEXT,
     y INT,
     z DATE
);
INSERT INTO t (x, y, z) VALUES ('val', 42, NULL);

SELECT first(x ORDER BY y) FROM t; -- returns 'val', as expected
SELECT first(x ORDER BY y, z) FROM t; -- returns NULL, which seems wrong


I would expect both the SELECT statements to return 'val'. Additional 
order by "z" should make no difference as there is just one row in the 
table.

More interestingly, if "z" is not NULL, the result is correct:

UPDATE t SET z = CURRENT_DATE;

SELECT first(x ORDER BY y) FROM t; -- returns 'val'
SELECT first(x ORDER BY y, z) FROM t; -- returns 'val'


The documentation [https://www.postgresql.org/docs/11/static/xaggr.html] 
says that if the state function is STRICT, the first non-NULL value is 
automatically used as the initial state. The ORDER BY option is not 
documented to have any effect on this - the documentation just says that 
"[DISTINCT and ORDER BY] options are implemented behind the scenes and 
are not the concern of the aggregate's support functions."

Do I miss something, or is it really a bug?

Best regards,
Ondrej Bouda


Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
=?UTF-8?Q?Ond=c5=99ej_Bouda?= <obouda@email.cz> writes:
> the following seems as a bug to me on Postgres 11.0:

Yeah, somebody broke this between 10.x and 11.0.  You don't need the
custom aggregate, even plain min() fails:

regression=# SELECT min(x ORDER BY z) FROM t;
 min 
-----
 
(1 row)

while 10.5 delivers the expected result:

regression=# SELECT min(x ORDER BY z) FROM t;
 min 
-----
 val
(1 row)

I've not looked at the code yet, but it's acting like somebody changed the
STRICT logic from "are any of the aggregate's arguments null" to "is any
part of the whole row (including ordering values) null".  Wrong ...

            regards, tom lane


Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
I wrote:
> =?UTF-8?Q?Ond=c5=99ej_Bouda?= <obouda@email.cz> writes:
>> the following seems as a bug to me on Postgres 11.0:

> Yeah, somebody broke this between 10.x and 11.0. ...
> I've not looked at the code yet, but it's acting like somebody changed the
> STRICT logic from "are any of the aggregate's arguments null" to "is any
> part of the whole row (including ordering values) null".  Wrong ...

git bisect fingers this:

commit 69c3936a1499b772a749ae629fc59b2d72722332
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Jan 9 13:25:38 2018 -0800

    Expression evaluation based aggregate transition invocation.

Andres, do you have time to look at this right now?

            regards, tom lane


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
Hi,

On 2018-11-02 11:34:46 -0400, Tom Lane wrote:
> I wrote:
> > =?UTF-8?Q?Ond=c5=99ej_Bouda?= <obouda@email.cz> writes:
> >> the following seems as a bug to me on Postgres 11.0:
> 
> > Yeah, somebody broke this between 10.x and 11.0. ...
> > I've not looked at the code yet, but it's acting like somebody changed the
> > STRICT logic from "are any of the aggregate's arguments null" to "is any
> > part of the whole row (including ordering values) null".  Wrong ...
> 
> git bisect fingers this:
> 
> commit 69c3936a1499b772a749ae629fc59b2d72722332
> Author: Andres Freund <andres@anarazel.de>
> Date:   Tue Jan 9 13:25:38 2018 -0800
> 
>     Expression evaluation based aggregate transition invocation.
> 
> Andres, do you have time to look at this right now?

Thanks for bisecting.  I'll take a look later today.

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Andrew Gierth
Date:
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> Andres, do you have time to look at this right now?

 Andres> Thanks for bisecting.  I'll take a look later today.

Looks like this:

+           scratch.d.agg_strict_input_check.nargs = numInputs;

should have been pertrans->numTransInputs instead?

-- 
Andrew (irc:RhodiumToad)


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
Hi,

On 2018-11-02 17:34:43 +0000, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
> 
>  >> Andres, do you have time to look at this right now?
> 
>  Andres> Thanks for bisecting.  I'll take a look later today.
> 
> Looks like this:
> 
> +           scratch.d.agg_strict_input_check.nargs = numInputs;
> 
> should have been pertrans->numTransInputs instead?

That looks like it's precisely the reason. I'll push something tomorrow
PST morning, crediting both you (diagnosis) and Tom (bisecting)
obviously, unless you prefer to do so yourself.

It's a bit sad that we don't have any tests that test this :/. I'll add
something for the specific case, but that'll obviously not be
exhaustive.

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
Hi,

On 2018-11-02 22:57:00 -0700, Andres Freund wrote:
> On 2018-11-02 17:34:43 +0000, Andrew Gierth wrote:
> > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
> > 
> >  >> Andres, do you have time to look at this right now?
> > 
> >  Andres> Thanks for bisecting.  I'll take a look later today.
> > 
> > Looks like this:
> > 
> > +           scratch.d.agg_strict_input_check.nargs = numInputs;
> > 
> > should have been pertrans->numTransInputs instead?
> 
> That looks like it's precisely the reason. I'll push something tomorrow
> PST morning, crediting both you (diagnosis) and Tom (bisecting)
> obviously, unless you prefer to do so yourself.
> 
> It's a bit sad that we don't have any tests that test this :/. I'll add
> something for the specific case, but that'll obviously not be
> exhaustive.

And pushed. Thanks Ondřej for the report, thanks Tom & Andrew for
identifying the issue.

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> And pushed. Thanks Ondřej for the report, thanks Tom & Andrew for
> identifying the issue.

Hm, buildfarm seems less than pleased.  Did you miss making a
corresponding change in the JIT code?

            regards, tom lane


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
On 2018-11-03 18:18:40 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > And pushed. Thanks Ondřej for the report, thanks Tom & Andrew for
> > identifying the issue.
> 
> Hm, buildfarm seems less than pleased.  Did you miss making a
> corresponding change in the JIT code?

Hm, I'm somewhat confused, let me look into that. The JIT code shouldn't
really need to be changed here - it's the *generation* of expression
steps that's going wrong - which then later get turned into JITed code,
but that part worked previously for other expressions.

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
Hi,

On 2018-11-03 15:27:09 -0700, Andres Freund wrote:
> On 2018-11-03 18:18:40 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > And pushed. Thanks Ondřej for the report, thanks Tom & Andrew for
> > > identifying the issue.
> > 
> > Hm, buildfarm seems less than pleased.  Did you miss making a
> > corresponding change in the JIT code?
> 
> Hm, I'm somewhat confused, let me look into that. The JIT code shouldn't
> really need to be changed here - it's the *generation* of expression
> steps that's going wrong - which then later get turned into JITed code,
> but that part worked previously for other expressions.

Turns out it's not a great idea to generate EEOP_AGG_STRICT_INPUT_CHECK
expressions with nargs = 0. Head -> Desk.  Pushed a fix (+ new
assertion).

Thanks!

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Jeff Janes
Date:
On Sat, Nov 3, 2018 at 7:08 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hm, I'm somewhat confused, let me look into that. The JIT code shouldn't
> really need to be changed here - it's the *generation* of expression
> steps that's going wrong - which then later get turned into JITed code,
> but that part worked previously for other expressions.

Turns out it's not a great idea to generate EEOP_AGG_STRICT_INPUT_CHECK
expressions with nargs = 0. Head -> Desk.  Pushed a fix (+ new
assertion).


I'm now getting a compiler warning:

execExpr.c: In function 'ExecBuildAggTrans':
execExpr.c:2864:7: warning: unused variable 'numInputs' [-Wunused-variable]
   int   numInputs = pertrans->numInputs;

Cheers,

Jeff

Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> I'm now getting a compiler warning:

> execExpr.c: In function 'ExecBuildAggTrans':
> execExpr.c:2864:7: warning: unused variable 'numInputs' [-Wunused-variable]
>    int   numInputs = pertrans->numInputs;

Used-for-asserts-only problem.  Will fix.

            regards, tom lane


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
On 2018-11-04 11:19:59 -0500, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > I'm now getting a compiler warning:
> 
> > execExpr.c: In function 'ExecBuildAggTrans':
> > execExpr.c:2864:7: warning: unused variable 'numInputs' [-Wunused-variable]
> >    int   numInputs = pertrans->numInputs;
> 
> Used-for-asserts-only problem.  Will fix.

Thanks.  I wonder if we shouldn't turn Asserts() into something roughly
akin to if (0) {expr};. That way we'd not deal with errors about unused
variables anymore - we're not safe against unreachable code warnings
anyway.

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-11-04 11:19:59 -0500, Tom Lane wrote:
>> Used-for-asserts-only problem.  Will fix.

> Thanks.  I wonder if we shouldn't turn Asserts() into something roughly
> akin to if (0) {expr};. That way we'd not deal with errors about unused
> variables anymore - we're not safe against unreachable code warnings
> anyway.

Meh.  I'm unexcited about getting rid of one type of compiler warning by
introducing another one.

            regards, tom lane


Re: Wrong aggregate result when sorting by a NULL value

From
Andres Freund
Date:
On 2018-11-04 18:49:40 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-11-04 11:19:59 -0500, Tom Lane wrote:
> >> Used-for-asserts-only problem.  Will fix.
> 
> > Thanks.  I wonder if we shouldn't turn Asserts() into something roughly
> > akin to if (0) {expr};. That way we'd not deal with errors about unused
> > variables anymore - we're not safe against unreachable code warnings
> > anyway.
> 
> Meh.  I'm unexcited about getting rid of one type of compiler warning by
> introducing another one.

It's impracticable to enable dead code warnings in postgres
anyway. There's way way too many of them (all the returns after
elog(ERROR) etc are enough to make it infeasible to change that).

Greetings,

Andres Freund


Re: Wrong aggregate result when sorting by a NULL value

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2018-11-04 18:49:40 -0500, Tom Lane wrote:
>> Meh.  I'm unexcited about getting rid of one type of compiler warning by
>> introducing another one.

> It's impracticable to enable dead code warnings in postgres
> anyway. There's way way too many of them (all the returns after
> elog(ERROR) etc are enough to make it infeasible to change that).

I think you have a very limited conception of what sorts of warnings
code like that would trigger.  In particular, even given the
unsubstantiated assumption that nobody's compiler would complain,
I think static analyzers like Coverity might.

            regards, tom lane