Re: MAX/MIN optimization via rewrite (plus query rewrites generally) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: MAX/MIN optimization via rewrite (plus query rewrites generally)
Date
Msg-id 12475.1100213177@sss.pgh.pa.us
Whole thread Raw
In response to Re: MAX/MIN optimization via rewrite (plus query rewrites generally)  (Greg Stark <gsstark@mit.edu>)
Responses Re: MAX/MIN optimization via rewrite (plus query rewrites generally)  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Oh?  How is a first() aggregate going to know what sort order you want
>> within the group?

> It would look something like

> select x,first(a),first(b) from (select x,a,b from table order by x,y) group by x

> which is equivalent to

> select DISTINCT ON (x) x,a,b from table ORDER BY x,y

No, it is not.  The GROUP BY has no commitment to preserve order ---
consider for example the possibility that we implement the GROUP BY by
hashing.

> The group by can see that the subquery is already sorted by x and
> doesn't need to be resorted. In fact I believe you added the smarts to
> detect that condition in response to a user asking about precisely
> this type of scenario.

The fact that an optimization is present does not make it part of the
guaranteed semantics of the language.

Basically, first() is a broken concept in SQL.  Of course DISTINCT ON
is broken too for the same reasons, but I do not see that first() is
one whit less of a kluge than DISTINCT ON.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: MAX/MIN optimization via rewrite (plus query rewrites generally)
Next
From: Patrick B Kelly
Date:
Subject: Re: multiline CSV fields