Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id CAKJS1f-MuzHQ=Em4J+xWC4bRXMJF99Hb-N0TmC=UrJCM9tAYVw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 28 March 2018 at 03:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
>
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
>
> None of this is responding to my point.  I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've instead
> selected straw men to attack.

I think what everyone else is getting at is that we're not offering to
improve the performance of the people who care about the order which
the values are aggregated. This patch only offers a possible
performance benefit to those who don't. I mentioned and linked to a
thread about someone from PostGIS asking for this, which as far as I
understand has quite a large user base. You've either ignored that or
think the number of people using PostGIS is negligible.

As far as I understand your argument, it's about there being a
possibility a group of people existing who rely on the aggregation
order being defined without an ORDER BY in the aggregate function.
Unfortunately, It appears from the responses from many of the other's
who voiced an opinion about this is that there is no shortage of other
reasons why relying on values being aggregated in a defined order
without an ORDER BY in the aggregate function arguments is a dangerous
assumption to make.  Several reasons were listed why this is undefined
and I mentioned the great lengths we'd need to go to do make the order
more defined without an explicit ORDER BY, and I still imagine I'd
have missed some of the reasons.  I imagine the number of people which
rely on the order being defined without an ORDER BY is diminishing
each release as we add parallelism support for more node types.  Both
Andres and I agree that it's a shame to block useful optimisations due
to the needs of a small diminishing group of people who are not very
good at reading our carefully crafted documentation... for the past 8
years [1].

I imagine this small group of people, if they do exist, must slowly be
waking up to the fact that we've been devising new ways to ruin their
query results for many years now.  It seems pretty strange for us to
call a truce now after we've been wreaking havoc on this group for so
many releases... I really do hope this part is not true, but if such a
person appeared in -novice or -general asking for help, we'd be
telling them to add an ORDER BY, and we'd be quoting the 8-year-old
line in the documents which states that we make no guarantees in this
area in the absence of an ORDER BY.  If they truly have no other
choice then we might consider suggesting they may get what they want
if they disable parallel query, and we'd probably rhyme off a few
other reasons why it might suddenly break on them again.

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=doc/src/sgml/func.sgml;h=7d6125c97e5203c9d092ceec3aaf351c1a5fcf1b;hp=f2906cc82230150f72353609e9c831e90dcc10ca;hb=34d26872ed816b299eef2fa4240d55316697f42d;hpb=6a6efb964092902bf53965649c3ed78b1868b37e


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: Tom Lane
Date:
Subject: Re: Proposal: http2 wire format