Home > mailing lists

Re: Order by in a sub query when aggregating the main query - Mailing list pgsql-general

From	Federico
Subject	Re: Order by in a sub query when aggregating the main query
Date	September 27, 2022 21:11:35
Msg-id	CAN19dyeYeyPYZF45EnAwHvmLA6V6A+SmNjdAHDT9gVzoToV9ew@mail.gmail.com Whole thread Raw
In response to	Re: Order by in a sub query when aggregating the main query (Federico <cfederico87@gmail.com>)
List	pgsql-general

Tree view

I've changed the code to use order by in the aggregate and it seems
there are no noticeable changes in the query performance.
Thanks for the help.

Best,
Federico Caselli

On Sun, 25 Sept 2022 at 00:30, Federico <cfederico87@gmail.com> wrote:
>
> Understood, thanks for the explanation.
> I'll work on updating the queries used by sqlalchemy to do array_agg(x
> order by x) instead of the order by in the subquery.
>
> > I think that right now that'd
> > incur additional sorting overhead, which is annoying.  But work is
> > ongoing to recognize when the input is already correctly sorted
> > for an aggregate, so it should get better in PG 16 or so.
>
> Nice to know, hopefully it's too bad for this use case
>
> Thanks, Federico Caselli
>
> On Sun, 25 Sept 2022 at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Federico <cfederico87@gmail.com> writes:
> > > A basic example of the type of query in question is the following (see
> > > below for the actual query):
> >
> > >   select w, array_agg(x)
> > >   from (
> > >     select v, v / 10 as w
> > >     from pg_catalog.generate_series(25, 0, -1) as t(v)
> > >     order by v
> > >   ) as t(x)
> > >   group by w
> >
> > > This query will return an ordered array as specified by the order by
> > > clause.in the subquery.
> > > Can this behaviour be relied upon?
> >
> > No, not really.  It might always work given a particular set of
> > circumstances.  As long as the planner chooses to do the outer
> > query's grouped aggregation as a HashAgg, there'd be no reason
> > for it to reshuffle the subquery output before feeding that to
> > array_agg.  However, if it decided that sort-group-and-aggregate
> > was better, it'd insert a sort by w above the subquery, and then
> > you'd lose any certainty of the ordering by v continuing to hold.
> > (Maybe the sort by w would be stable for equal keys, but that's
> > not guaranteed.)
> >
> > What you really ought to do is write
> >
> >   select w, array_agg(x order by x)
> >   from ...
> >
> > to be in the clear per SQL standard.  I think that right now that'd
> > incur additional sorting overhead, which is annoying.  But work is
> > ongoing to recognize when the input is already correctly sorted
> > for an aggregate, so it should get better in PG 16 or so.
> >
> >                         regards, tom lane

pgsql-general by date:

From: Laurenz Albe
Date: 27 September 2022, 15:59:35
Subject: Re: Findout long unused tables in database

From: "Peter J. Holzer"
Date: 27 September 2022, 22:26:37
Subject: Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

Re: Order by in a sub query when aggregating the main query - Mailing list pgsql-general

Previous

Next