Thread: DISTINCT *and* ORDER BY in aggregate functions on expressions(!)
Hi, I’ve got a… rather large query (see below), in which I join a complex data structure (whose exact contents do not, at this point, matter) together to get some auxiliary data to expose as JSON field. In this query I can use, for example… jsonb_build_object('user_permissions', jsonb_agg(DISTINCT ap.name ORDER BY ap.name)) … to get a distinct, sorted, list of “user permissions” from a table ap which I joined to the user table which is the main subject of the query. (For some reason I need to add DISTINCT because else duplica‐ tes are shown.) Wrapping this as… jsonb_build_object('user_permissions', COALESCE( jsonb_agg(DISTINCT ap.name ORDER BY ap.name) FILTER (WHERE ap.id IS NOT NULL))) … gets me the JSON object’s value for the user_permissions set to null if there’s nothing in the m:n intermediate table for the user in question. This works well. However, what I seem to be not allowed to do is (without the extra COALESCE, to simplify): jsonb_build_object('opening_times', jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) ORDER BY cot.weekday, cot.from_hour, cot.to_hour) ) This is because, when I use DISTINCT (but only then‽), the ORDER BY arguments must be… arguments to the function, or something. In the above case, the sort key is ap.name which is also the argument to the jsonb_agg function, so no problem there, but here, the jsonb_agg argument is the return value of a function so… it has no name. What I’m looking for is something like… jsonb_build_object('opening_times', jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) AS jbo ORDER BY jbo->>'weekday', jbo->>'from_hour', jbo->>'to_hour') ) … except I cannot define aliases in that place. Any other syntax would also work. The suggested solution for this is apparently to do… CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB) RETURNS JSONB AS $$ SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour') FROM jsonb_array_elements($1) AS e $$ LANGUAGE SQL IMMUTABLE; … and then query as… jsonb_build_object('opening_times', core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour))) ) … which involves internally subquerying for each output row (i.e. row of the user table) times amount of sub-JSONArrays that need to be sorted like this, which is currently 3. All other solutions I can find involve subqueries in the first place; I am somewhat proud I even managed to write this with JOINs and without any subqueries in the first place so I’m hesitant to go that route. Any advice here? And, somewhat related: when outputting JSONB, the order of JSONObject elements is indeterminate, which is… okay, but forcing sorted (ASCIIbetically, i.e. by codepoint order) keys would be very very welcome, for reproducibility of the output. (I’m sure the reproducible-builds project would also love if this could be changed, or at least added, in a way it can be enabled for queries, as session parameter perhaps?) The query in its current incarnanation is as follows: -- -----BEGIN SQL----- CREATE OR REPLACE FUNCTION core_openingtime_jsonb_sort(JSONB) RETURNS JSONB AS $$ SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'from_hour', e->>'to_hour') FROM jsonb_array_elements($1) AS e $$ LANGUAGE SQL IMMUTABLE; CREATE OR REPLACE FUNCTION core_generalworkavailability_jsonb_sort(JSONB) RETURNS JSONB AS $$ SELECT jsonb_agg(e ORDER BY e->>'weekday', e->>'forenoon', e->>'afternoon', e->>'evening') FROM jsonb_array_elements($1) AS e $$ LANGUAGE SQL IMMUTABLE; COPY (SELECT cp.email, cp.first_name, cp.last_name, cp.street, cp.number, jsonb_build_object('groups', COALESCE(jsonb_agg(DISTINCT ag.name ORDER BY ag.name) FILTER (WHERE ag.id IS NOT NULL))) || jsonb_build_object('help_operations', COALESCE(jsonb_agg(DISTINCT cho.name ORDER BY cho.name) FILTER (WHERE cho.id IS NOT NULL))) || jsonb_build_object('emergency_opening_times', COALESCE( core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object( 'weekday', ceot.weekday, 'from_hour', ceot.from_hour, 'to_hour', ceot.to_hour)) FILTER (WHERE ceot.id IS NOT NULL)))) || jsonb_build_object('opening_times', COALESCE( core_openingtime_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)) FILTER (WHERE cot.id IS NOT NULL)))) || jsonb_build_object('possible_work_times', COALESCE( core_generalworkavailability_jsonb_sort(jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cgwa.weekday, 'forenoon', cgwa.forenoon, 'afternoon', cgwa.afternoon, 'evening', cgwa.evening)) FILTER (WHERE cgwa.id IS NOT NULL)))) || jsonb_build_object('qualifications', jsonb_build_object('administrative', COALESCE(jsonb_agg(DISTINCT cqa.name ORDER BY cqa.name) FILTER (WHERE cqa.id IS NOT NULL))) || jsonb_build_object('health', COALESCE(jsonb_agg(DISTINCT cqh.name ORDER BY cqh.name) FILTER (WHERE cqh.id IS NOT NULL))) || jsonb_build_object('language', COALESCE(jsonb_agg(DISTINCT cqlang.name ORDER BY cqlang.name) FILTER (WHERE cqlang.id IS NOT NULL))) || jsonb_build_object('license', COALESCE(jsonb_agg(DISTINCT cqlic.name ORDER BY cqlic.name) FILTER (WHERE cqlic.id IS NOT NULL))) || jsonb_build_object('technical', COALESCE(jsonb_agg(DISTINCT cqt.name ORDER BY cqt.name) FILTER (WHERE cqt.id IS NOT NULL)))) || jsonb_build_object('restrictions', COALESCE(jsonb_agg(DISTINCT cr.name ORDER BY cr.name) FILTER (WHERE cr.id IS NOT NULL))) || jsonb_build_object('user_permissions', COALESCE(jsonb_agg(DISTINCT ap.name ORDER BY ap.name) FILTER (WHERE ap.id IS NOT NULL))) AS "other_data" FROM core_person cp LEFT JOIN core_person_emergency_opening_times cpeot ON cpeot.person_id=cp.id LEFT JOIN core_openingtime ceot ON ceot.id=cpeot.openingtime_id LEFT JOIN core_person_groups cpg ON cpg.person_id=cp.id LEFT JOIN auth_group ag ON ag.id=cpg.group_id LEFT JOIN core_person_help_operations cpho ON cpho.person_id=cp.id LEFT JOIN core_helpoperation cho ON cho.id=cpho.helpoperation_id LEFT JOIN core_person_opening_times cpot ON cpot.person_id=cp.id LEFT JOIN core_openingtime cot ON cot.id=cpot.openingtime_id LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id LEFT JOIN core_generalworkavailability cgwa ON cgwa.id=cppwt.generalworkavailability_id LEFT JOIN core_person_qualifications_administrative cpqa ON cpqa.person_id=cp.id LEFT JOIN core_qualificationadministrative cqa ON cqa.id=cpqa.qualificationadministrative_id LEFT JOIN core_person_qualifications_health cpqh ON cpqh.person_id=cp.id LEFT JOIN core_qualificationhealth cqh ON cqh.id=cpqh.qualificationhealth_id LEFT JOIN core_person_qualifications_language cpqlang ON cpqlang.person_id=cp.id LEFT JOIN core_qualificationlanguage cqlang ON cqlang.id=cpqlang.qualificationlanguage_id LEFT JOIN core_person_qualifications_license cpqlic ON cpqlic.person_id=cp.id LEFT JOIN core_qualificationlicense cqlic ON cqlic.id=cpqlic.qualificationlicense_id LEFT JOIN core_person_qualifications_technical cpqt ON cpqt.person_id=cp.id LEFT JOIN core_qualificationtechnical cqt ON cqt.id=cpqt.qualificationtechnical_id LEFT JOIN core_person_restrictions cpr ON cpr.person_id=cp.id LEFT JOIN core_restriction cr ON cr.id=cpr.restriction_id LEFT JOIN core_person_user_permissions cpup ON cpup.person_id=cp.id LEFT JOIN auth_permission ap ON ap.id=cpup.permission_id GROUP BY cp.email, cp.first_name, cp.last_name, cp.street, cp.number ORDER BY cp.email ) TO STDOUT WITH (FORMAT csv, HEADER, FORCE_QUOTE *, ENCODING 'UTF-8'); DROP FUNCTION core_openingtime_jsonb_sort(JSONB); DROP FUNCTION core_generalworkavailability_jsonb_sort(JSONB); -- -----END SQL----- Thanks in advance, //mirabilos -- „Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund, mksh auf jedem System zu installieren.“ -- XTaran auf der OpenRheinRuhr, ganz begeistert (EN: “[…]uhr.gz is a reason to install mksh on every system.”)
mirabilos <tg@evolvis.org> writes: > This works well. However, what I seem to be not allowed to do is > (without the extra COALESCE, to simplify): > ... > This is because, when I use DISTINCT (but only then‽), the ORDER BY > arguments must be… arguments to the function, or something. Well, yeah. Simplify it to SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable; If there are several rows containing the same value of x and different values of y, which y value are we supposed to sort the unique-ified x value by? It's an ill-defined query. For the specific example you give, it's true that any specific possible output of jsonb_build_object() would correspond to a unique set of cot.weekday, cot.from_hour, cot.to_hour values. But the parser can't be expected to know that. Many functions can produce the same output for different sets of inputs. I'd suggest moving the distinct-ification into an earlier processing step (i.e. a sub-select), or maybe thinking harder about why you're getting duplicates in the first place. regards, tom lane
On Mon, Feb 27, 2023 at 4:11 PM mirabilos <tg@evolvis.org> wrote:
jsonb_build_object('opening_times',
jsonb_agg(DISTINCT jsonb_build_object(
'weekday', cot.weekday,
'from_hour', cot.from_hour,
'to_hour', cot.to_hour)
ORDER BY cot.weekday, cot.from_hour, cot.to_hour)
)
This is because, when I use DISTINCT (but only then‽), the ORDER BY
arguments must be… arguments to the function, or something.
So long as the function call itself is at least stable...:
DISTINCT func_call(...) ORDER BY func_call(...)
Order By is evaluated AFTER and over the contents of the distinct-ified expression
All other solutions I can find involve subqueries in the first
place; I am somewhat proud I even managed to write this with
JOINs and without any subqueries in the first place so I’m
hesitant to go that route.
That pride seems misplaced. Related to Tom's comment, the presence of the DISTINCTs is telling you that what you did is not good. DISTINCT is almost always a code smell, and given the prevalence of direct table joins in your query, it is indeed a valid signal.
Lastly, if you do need to care about normalizing the output of JSON you should consider writing a function that takes arbitrary json input and reformats it, rather than trying to build up json from scratch where every individual component needs to be aware and take action. i.e., get rid of the ORDER BY also. Maybe this belongs in an application layer with tooling that already provides this capability.
David J.
On Mon, Feb 27, 2023 at 5:22 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
Lastly, if you do need to care about normalizing the output of JSON you should consider writing a function that takes arbitrary json input and reformats it, rather than trying to build up json from scratch where every individual component needs to be aware and take action. i.e., get rid of the ORDER BY also. Maybe this belongs in an application layer with tooling that already provides this capability.
Just to clarify/correct myself - if building up json arrays you'll just remove the distinct and then do a normal aggregate order by based upon whatever ordering the base data presents. I was thinking you were doing order by to get keys in order (i.e., jsonb_object_agg) but that isn't the case here.
David J.
On Mon, 27 Feb 2023, Tom Lane wrote: >Well, yeah. Simplify it to > > SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable; That’s… a bit too simple for this case. >If there are several rows containing the same value of x and different >values of y, which y value are we supposed to sort the unique-ified x >value by? It's an ill-defined query. The problem here is that I do not have an ‘x’. >For the specific example you give, it's true that any specific >possible output of jsonb_build_object() would correspond to >a unique set of cot.weekday, cot.from_hour, cot.to_hour values. Not necessarily (see directly below), but why would that matter? It should sort the generated JSON objects within the array. >, or maybe thinking harder >about why you're getting duplicates in the first place. The application developer informed me that it’s entirely possible that some user entered matching information twice. I don’t have the exact query that produced duplicates easily in the history (the downside of working with \i) and tests on other users didn’t produce duplicates. So, yes, filtering them out is indeed part of the task here. On Mon, 27 Feb 2023, David G. Johnston wrote: >So long as the function call itself is at least stable...: > >DISTINCT func_call(...) ORDER BY func_call(...) aieee really? (I’d hope jsonb_build_object to be.) Is that better or worse than using the extra functions to sort…? >Order By is evaluated AFTER and over the contents of the distinct-ified >expression That’s right and good, but the problem is that I do not seem to have a syntax with which to refer to the distinct-ified expression to use in the ORDER BY clause. >> place; I am somewhat proud I even managed to write this with >> JOINs and without any subqueries in the first place so I’m >> hesitant to go that route. > >That pride seems misplaced. Related to Tom's comment, the presence of the >DISTINCTs is telling you that what you did is not good. DISTINCT is almost >always a code smell Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE as inner join) nor on tables this massive, and this is my second foray into aggregate functions only. >Lastly, if you do need to care about normalizing the output of JSON you >should consider writing a function that takes arbitrary json input and >reformats it, rather than trying to build up json from scratch where every >individual component needs to be aware and take action. i.e., get rid of >the ORDER BY also. Maybe this belongs in an application layer with tooling >that already provides this capability. Funnily enough, both here and in the other place where I tried to use JSON output, PostgreSQL (with COPY TO STDOUT) *is* the application layer. Here I’m generating a CSV file; in the other situation I was going to put the generated JSON directly into an HTTP result filehandle. In the latter, I dropped that approach, output CSV and converted that (by replacing newlines with “],[” and prepending “[[” and appending “]]”) to JSON myself, which worked there as it was all-numeric. But the frustration there was about unnecessary whitespace instead. Both cases have in common that a, possibly huge, result set can be directly streamed from PostgreSQL to the consumer, but the former lacks just that tiny bit of functionality that would make it really rock :/ I was asking here because perhaps either that missing functionality can be considered, or to find out if there’s better ways to produce that output, due to my inexperience with SQL. The ways I’m using do work, and I’m relatively happy, but… bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)
Thorsten Glaser <tg@evolvis.org> writes: > On Mon, 27 Feb 2023, Tom Lane wrote: >> Well, yeah. Simplify it to >> SELECT array_agg(DISTINCT x ORDER BY y) FROM mytable; > That’s… a bit too simple for this case. Sure, I was just trying to explain the rule. >> For the specific example you give, it's true that any specific >> possible output of jsonb_build_object() would correspond to >> a unique set of cot.weekday, cot.from_hour, cot.to_hour values. > Not necessarily (see directly below), but why would that matter? > It should sort the generated JSON objects within the array. Well, that may be what you want, but it's not what you wrote in the query. Follow David's advice and do jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) ORDER BY jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)) I'm pretty sure that this will only incur one evaluation of the common subexpression, so even though it's tedious to type it's not inefficient. regards, tom lane
On Mon, Feb 27, 2023 at 6:22 PM Thorsten Glaser <tg@evolvis.org> wrote:
Not about the DISTINCTs. I haven’t used JOIN much (usually WHERE
as inner join) nor on tables this massive, and this is my second
foray into aggregate functions only.
Fair.
Consider this then as a jumping point to a more precise query form:
create table base (base_id integer primary key, base_val text not null);
insert into base values (1, 'one');create table subone (subone_id serial primary key, base_id integer, subone_value text not null);
insert into subone (base_id, subone_value) values (1, 'subone-one'), (1, 'subone-two');
create table subtwo (subtwo_id serial primary key, base_id integer, subtwo_value text not null);
insert into subtwo (base_id, subtwo_value) values (1, 'subtwo-one'), (1, 'subtwo-two');
--BAD cross joining going on with multiple one-to-many relationships
select * from base join subone using (base_id) join subtwo using (base_id); -- not good
--GOOD, only joining one-to-one relationships
select jsonb_build_object('base_id', base_id, 'subone_arr', subone_arr, 'subtwo_arr', subtwo_arr)
from base
join lateral (select json_agg(jsonb_build_object('key', subone_value) order by subone_value)
from subone where subone.base_id = base.base_id) as so (subone_arr) on true
join (select base_id, json_agg(subtwo_value order by subtwo_value desc)
from subtwo group by base_id) as st (base_id, subtwo_arr) using (base_id)
from base
join lateral (select json_agg(jsonb_build_object('key', subone_value) order by subone_value)
from subone where subone.base_id = base.base_id) as so (subone_arr) on true
join (select base_id, json_agg(subtwo_value order by subtwo_value desc)
from subtwo group by base_id) as st (base_id, subtwo_arr) using (base_id)
I used a mix of forms in the two joins, and there are other variants, but the basic concept holds - produce single rows in subqueries then join those various single rows together to produce your desired json output.
David J.
On Mon, 27 Feb 2023, David G. Johnston wrote: >Consider this then as a jumping point to a more precise query form: […] >the basic concept holds - produce single rows in subqueries then join those >various single rows together to produce your desired json output. Ouch. I’ll have to read up and experiment with that, I guess. But wouldn’t I rather then do a CTA for each referenced table that does the aggregating and GROUP BY person_id, then join that (which has single rows for each user row) to users? >--BAD cross joining going on with multiple one-to-many relationships >--GOOD, only joining one-to-one relationships Not quite. This is many-to-many created by Django… On Mon, 27 Feb 2023, Tom Lane wrote: >Sure, I was just trying to explain the rule. Aaah, okay. Sorry, I misunderstood that, and the… general direction of the replies, then. >Well, that may be what you want, but it's not what you wrote in >the query. Follow David's advice and do […] >I'm pretty sure that this will only incur one evaluation of the >common subexpression, so even though it's tedious to type it's not >inefficient. Thanks. But I fear it’s not as simple as you wrote. More like: jsonb_build_object('opening_times', jsonb_agg(DISTINCT jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) ORDER BY jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'weekday', jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'from_hour', jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour)->>'to_hour') ) Isn’t that more like it? (Though I guess at that point I could just drop at least the to_hour fallback sort, hoping nobody inserts overlapping hours with indentical start times…) Now that I see that, it sort of re-raises a question I had during developing this but didn’t pursue. How about, instead of creating a JSONB object here, I create a (not-JSON) ARRAY['weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour] in the aggregate function. The array is something I could naturally use to sort (its elements already are in sort key order), and its elements *also* are in the order jsonb_build_object expects its arguments (assuming I can do something like passing the array to it instead of “spreading” the arguments). Pseudocode: … hmm, not that easy. The aggregate would return e.g. this… ARRAY[['weekday',1,…],['weekday',2,…]] … and there’s no array_map function that could be used to pass each inner array, one by one, to jsonb_build_object; converting the outer array to JSON gets us json_array_elements (same problem), other ways I can think of also don’t seem to have matching functions (jq has… interesting ones). As I write this, I fear that won’t fly because nōn-JSON arrays cannot contain mixed types (text and numbers) in PostgreSQL… at least I ran into that at some point in the past vaguely remembering… Oh well, //mirabilos -- Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* schmutzige Tricks, wie bei einer doppelt verketteten Liste beide Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
> On 28 Feb 2023, at 3:54, Thorsten Glaser <tg@evolvis.org> wrote: (…) >> Well, that may be what you want, but it's not what you wrote in >> the query. Follow David's advice and do > […] >> I'm pretty sure that this will only incur one evaluation of the >> common subexpression, so even though it's tedious to type it's not >> inefficient. > > Thanks. But I fear it’s not as simple as you wrote. More like: > > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) > > Isn’t that more like it? Perhaps you can use a lateral cross join to get the result of jsonb_build_object as a jsonb value to pass around? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On Tue, 28 Feb 2023, Alban Hertroys wrote: >Perhaps you can use a lateral cross join to get the result of >jsonb_build_object as a jsonb value to pass around? I don’t see how. (But then I’ve not yet worked with lateral JOINs.) But I c̲a̲n̲ just generate the objects first, I t̲h̲i̲n̲k̲, given one of them corresponds to exactly one of the rows of an m:n-linked table and nothing else. Something like… WITH cgwaj AS ( SELECT cgwa.id AS id, jsonb_build_object( 'weekday', cgwa.weekday, 'forenoon', cgwa.forenoon, 'afternoon', cgwa.afternoon, 'evening', cgwa.evening) AS obj FROM core_generalworkavailability cgwa ), -- … same for opening times SELECT cp.email, …, -- … jsonb_build_object('possible_work_times', COALESCE( jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', cgwaj.obj->>'evening') FILTER (WHERE cgwaj.id IS NOT NULL))) || -- … FROM core_person cp -- … LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id -- … That is, add a CTE for each m:n-attached table whose “value” is an object, not a single field, keep the id field; LEFT JOIN that (instead of the original table), then we have a field to use in ORDER BY. I think. I’ve not yet tried it (I don’t have access to that DB normally, I was just helping out). This avoids sub-SELECTs in the sense of needing to run one for each user row, because the innermost JSON object building needs to be done for each (connected (if the query is not filtering on specific users)) row of the “property table”, anyway. (And even if filtered, that can be passed down.) bye, //mirabilos -- Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* schmutzige Tricks, wie bei einer doppelt verketteten Liste beide Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
> On 3 Mar 2023, at 0:02, Thorsten Glaser <tg@evolvis.org> wrote: > > On Tue, 28 Feb 2023, Alban Hertroys wrote: > >> Perhaps you can use a lateral cross join to get the result of >> jsonb_build_object as a jsonb value to pass around? > > I don’t see how. (But then I’ve not yet worked with lateral JOINs.) You posted this bit: > jsonb_build_object('opening_times', > jsonb_agg(DISTINCT jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) > ORDER BY > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'weekday', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'from_hour', > jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour)->>'to_hour') > ) You can rewrite that into something like this: select jsonb_build_object('opening_times’, obj ORDER BY obj->>'weekday’, obj->>'from_hour’, obj->>'to_hour') ) from cot cross join lateral jsonb_agg(jsonb_build_object( 'weekday', cot.weekday, 'from_hour', cot.from_hour, 'to_hour', cot.to_hour) obj That’s off the top of my head and I did leave out the DISTINCT. Just to show the concept here. A bit of experimenting andreading should get you there, I’m keeping $work waiting :P (…) > WITH > cgwaj AS ( > SELECT cgwa.id AS id, jsonb_build_object( > 'weekday', cgwa.weekday, > 'forenoon', cgwa.forenoon, > 'afternoon', cgwa.afternoon, > 'evening', cgwa.evening) AS obj > FROM core_generalworkavailability cgwa > ), > -- … same for opening times > SELECT cp.email, …, > -- … > jsonb_build_object('possible_work_times', COALESCE( > jsonb_agg(DISTINCT cgwaj.obj ORDER BY cgwaj.obj->>'weekday', > cgwaj.obj->>'forenoon', cgwaj.obj->>'afternoon', > cgwaj.obj->>'evening') > FILTER (WHERE cgwaj.id IS NOT NULL))) || > -- … > FROM core_person cp > -- … > LEFT JOIN core_person_possible_work_times cppwt ON cppwt.person_id=cp.id > LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > -- … > > That is, add a CTE for each m:n-attached table whose “value” is > an object, not a single field, keep the id field; LEFT JOIN that > (instead of the original table), then we have a field to use in > ORDER BY. > > I think. I’ve not yet tried it (I don’t have access to that DB > normally, I was just helping out). > > This avoids sub-SELECTs in the sense of needing to run one for > each user row, because the innermost JSON object building needs > to be done for each (connected (if the query is not filtering on > specific users)) row of the “property table”, anyway. (And even > if filtered, that can be passed down.) > > bye, > //mirabilos > -- > Solange man keine schmutzigen Tricks macht, und ich meine *wirklich* > schmutzige Tricks, wie bei einer doppelt verketteten Liste beide > Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz > hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r > Alban Hertroys -- There is always an exception to always.
On Fri, 3 Mar 2023, Alban Hertroys wrote: >You can rewrite that into something like this: > >select jsonb_build_object('opening_times’, > obj > ORDER BY > obj->>'weekday’, > obj->>'from_hour’, > obj->>'to_hour') >) >from cot >cross join lateral jsonb_agg(jsonb_build_object( > 'weekday', cot.weekday, > 'from_hour', cot.from_hour, > 'to_hour', cot.to_hour) obj But isn’t that the same as with a regular LEFT JOIN? >> cgwaj AS ( >> SELECT cgwa.id AS id, jsonb_build_object( >> 'weekday', cgwa.weekday, >> 'forenoon', cgwa.forenoon, >> 'afternoon', cgwa.afternoon, >> 'evening', cgwa.evening) AS obj >> FROM core_generalworkavailability cgwa plus >> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id With the addition that I can aggregate… bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)
> On 3 Mar 2023, at 20:32, Thorsten Glaser <tg@evolvis.org> wrote: > > On Fri, 3 Mar 2023, Alban Hertroys wrote: > >> You can rewrite that into something like this: >> >> select jsonb_build_object('opening_times’, >> obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_agg(jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj > > But isn’t that the same as with a regular LEFT JOIN? Similar, but not the same, I’d say. I do now notice that I made some copying errors there, I was a bit nauseous at that time. That should have read: >> select jsonb_build_object('opening_times’, >> jsonb_agg(obj >> ORDER BY >> obj->>'weekday’, >> obj->>'from_hour’, >> obj->>'to_hour') >> ) >> from cot >> cross join lateral jsonb_build_object( >> 'weekday', cot.weekday, >> 'from_hour', cot.from_hour, >> 'to_hour', cot.to_hour) obj The lateral join applies the function to each row returned from the left side of the join and enriches that row with thefunction result. I used a cross join because there is no join condition to apply to the lateral, otherwise you could also use an inner joinon true. I think you could also have used an implicit Cartesian product (using ‘,’ for the join), and that in that case the lateralwould be implied. I prefer explicit notation though. A left join wouldn’t make much sense here, unless the function could return NULL - for example if it were a function markedas STRICT and some of the input parameter values (from the table) could be NULL. >>> cgwaj AS ( >>> SELECT cgwa.id AS id, jsonb_build_object( >>> 'weekday', cgwa.weekday, >>> 'forenoon', cgwa.forenoon, >>> 'afternoon', cgwa.afternoon, >>> 'evening', cgwa.evening) AS obj >>> FROM core_generalworkavailability cgwa > > plus There are some differences. You need a sub-select, which in turn creates its own result set. It’s up to the planner whether the left or the right sidegets executed first, after which the results of the other side of the join get merged to this, or whether this can allbe collected in one go. That’s up to the query planner to decide though, and it could be right. >>> LEFT JOIN cgwaj ON cgwaj.id=cppwt.generalworkavailability_id > > With the addition that I can aggregate… You can do so in both situations, but I guess that confusion stems from my copy/paste mistake. In my experience, lateral joins go well with the jsonb functions. They tend to reduce code repetition when referencing objectmembers, such as in your case. Regards, Alban Hertroys -- There is always an exception to always.
On Sat, 4 Mar 2023, Alban Hertroys wrote: >> But isn’t that the same as with a regular LEFT JOIN? > >Similar, but not the same, I’d say. > >I do now notice that I made some copying errors there, I was a bit >nauseous at that time. np, I’m under pollen attack currently so also not at my best. >That should have read: > >>> select jsonb_build_object('opening_times’, >>> jsonb_agg(obj >>> ORDER BY >>> obj->>'weekday’, >>> obj->>'from_hour’, >>> obj->>'to_hour') >>> ) >>> from cot >>> cross join lateral jsonb_build_object( >>> 'weekday', cot.weekday, >>> 'from_hour', cot.from_hour, >>> 'to_hour', cot.to_hour) obj > >The lateral join applies the function to each row returned from the >left side of the join and enriches that row with the function result. I >used a cross join because there is no join condition to apply to the >lateral, otherwise you could also use an inner join on true. Okay, so I got it now. But that’s still identical to the LEFT JOIN which I’m using in that example, because… >A left join wouldn’t make much sense here, unless the function could … I have a LEFT JOIN already and can just use the CTE there, so I don’t have to add an extra lateral join. But good to know for the future/when I don’t have that. >return NULL - for example if it were a function marked as STRICT and >some of the input parameter values (from the table) could be NULL. OK. >You need a sub-select, which in turn creates its own result set. It’s >up to the planner whether the left or the right side gets executed >first, after which the results of the other side of the join get merged >to this, or whether this can all be collected in one go. That’s up to >the query planner to decide though, and it could be right. OK, but that shouldn’t make a difference here as it needs to run over all rows of the cgwa table anyway (possibly reduced by filtering on users). While not the case here, I see that for other entries the lateral join would cause more work: for the “qualification” kinds of tables, for example, the individual qualification table has very few rows (these are entered by the site admin), but the m:n connection table (whatever the correct name for these is) has a lot because many of the users have many of these qualifications. If I use a CTE to add a JSON object to the individual qualification table first, it doesn’t run on each qualification multiple times; if I use a lateral join, it possibly, modulo planner optimisations, runs the jsonb_build_object function many times per qualification despite them all giving the same result. And, even if the optimisations catch that, it’s mentally not the same. >In my experience, lateral joins go well with the jsonb functions. They >tend to reduce code repetition when referencing object members, such as >in your case. Right. In my case I can get the same by adding a CTE instead though, and it’s hard to see which is better, performance-wise. This is a lot to take in, and I r̲e̲a̲l̲l̲y̲ appreciate the detailed explanations given alongside ☻ bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)