Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y - Mailing list pgsql-general

From Thorsten Glaser
Subject Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y
Date
Msg-id 4f11e322-31cf-fc3d-86aa-401a3bf9d86@evolvis.org
Whole thread Raw
In response to Re: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)y  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
On Sat, 4 Mar 2023, Alban Hertroys wrote:

>> But isn’t that the same as with a regular LEFT JOIN?
>
>Similar, but not the same, I’d say.
>
>I do now notice that I made some copying errors there, I was a bit
>nauseous at that time.

np, I’m under pollen attack currently so also not at my best.

>That should have read:
>
>>> select jsonb_build_object('opening_times’,
>>>     jsonb_agg(obj
>>>     ORDER BY
>>>         obj->>'weekday’,
>>>         obj->>'from_hour’,
>>>         obj->>'to_hour')
>>> )
>>> from cot
>>> cross join lateral jsonb_build_object(
>>>         'weekday', cot.weekday,
>>>         'from_hour', cot.from_hour,
>>>         'to_hour', cot.to_hour) obj
>
>The lateral join applies the function to each row returned from the
>left side of the join and enriches that row with the function result. I
>used a cross join because there is no join condition to apply to the
>lateral, otherwise you could also use an inner join on true.

Okay, so I got it now. But that’s still identical to the LEFT JOIN
which I’m using in that example, because…

>A left join wouldn’t make much sense here, unless the function could

… I have a LEFT JOIN already and can just use the CTE there,
so I don’t have to add an extra lateral join.

But good to know for the future/when I don’t have that.

>return NULL - for example if it were a function marked as STRICT and
>some of the input parameter values (from the table) could be NULL.

OK.

>You need a sub-select, which in turn creates its own result set. It’s
>up to the planner whether the left or the right side gets executed
>first, after which the results of the other side of the join get merged
>to this, or whether this can all be collected in one go. That’s up to
>the query planner to decide though, and it could be right.

OK, but that shouldn’t make a difference here as it needs to run over
all rows of the cgwa table anyway (possibly reduced by filtering on
users).

While not the case here, I see that for other entries the lateral
join would cause more work: for the “qualification” kinds of tables,
for example, the individual qualification table has very few rows
(these are entered by the site admin), but the m:n connection table
(whatever the correct name for these is) has a lot because many of
the users have many of these qualifications. If I use a CTE to add
a JSON object to the individual qualification table first, it doesn’t
run on each qualification multiple times; if I use a lateral join,
it possibly, modulo planner optimisations, runs the jsonb_build_object
function many times per qualification despite them all giving the same
result. And, even if the optimisations catch that, it’s mentally not
the same.

>In my experience, lateral joins go well with the jsonb functions. They
>tend to reduce code repetition when referencing object members, such as
>in your case.

Right. In my case I can get the same by adding a CTE instead though,
and it’s hard to see which is better, performance-wise.

This is a lot to take in, and I r̲e̲a̲l̲l̲y̲ appreciate the detailed
explanations given alongside ☻

bye,
//mirabilos
--
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)



pgsql-general by date:

Previous
From: Christoph Moench-Tegeder
Date:
Subject: Re: shp2pgsql error under windows
Next
From: Ron
Date:
Subject: Re: Dropping behavior for unique CONSTRAINTs