Thread: Multiple Aggregations Order

Multiple Aggregations Order

From
João Haas
Date:
Hi there,

I'm working on a query where I need to fetch information from a table along with some data from a many-to-many connection table in a single query. My idea is to do an outer join with the connection query and aggregate the needed data in multiple 'array_agg's, and then handle this aggregated data later in code.

The issue is, there are a lot of aggs (4 by now, may increase later), and I need to order these by a 'order' field on the connection table. I can put an 'ORDER BY "order"' statement inside each 'array_agg', but I don't think that would be the most efficient way. Doing the join with a sorted connection table didn't work for me as well, probably due to other joins on the query. I tried doing some stuff with subqueries, but all attempts ended up in either failure or increased query time.

The structure is a graph based structure, and both query and structure themselves are quite complex, but a simplified example would look like this:

CREATE TABLE tb (
id SERIAL PRIMARY KEY
);
CREATE TABLE conn (
parent_id INTEGER,
child_id INTEGER,
"order" INTEGER,
kind INTEGER
);

INSERT INTO tb VALUES (1), (2), (3);
INSERT INTO conn VALUES (1, 2, 2, 10), (1, 3, 1, 20);

SELECT tb.*, array_agg(conn.child_id), array_agg(conn.kind)
FROM tb
LEFT OUTER JOIN conn
ON conn.parent_id = tb.id
GROUP BY tb.id;

Not sure if there's really a solution, but does anyone have any idea on how to solve this issue without doing multiple 'ORDER BY's inside each aggregation?

Thank you!
--
João C. Haas

Re: Multiple Aggregations Order

From
Michael Lewis
Date:
"handle this aggregated data later in code"

What is your end goal though? Also, approx how many rows in these tables? Can you share an example query and plan? What version are you using?

Re: Multiple Aggregations Order

From
"David G. Johnston"
Date:
On Tuesday, January 14, 2020, João Haas <joao.ca.haas@gmail.com> wrote:
SELECT tb.*, array_agg(conn.child_id), array_agg(conn.kind)


Create a custom type (using row(...) might work...?) with the relevant fields and “...array_agg((child_id, kind)::custom_type order by ...”?

David J.

Re: Multiple Aggregations Order

From
João Haas
Date:
I'm aiming to serialize some graph data into a JSON format, and some of the data needed for the serialization is in the relation tables, like, "this node connects to this other node in this way". These are served to IOT devices and the data changes a lot, so there's a ton of requests and caching is not that efficient. Due to that, it would be ideal if I could fetch everything in a single query, with the data needed aggregated in arrays, so that I can denormalize them later in code (eg.: zip(data['child_id_set'], data['child_kind_set']) in python).

Each query should have from 100~1000 items from the 'tb' table. The amount of child relations each item has vary a lot depending on the node, so it can be from 1~10000.

The tables themselves have some millions of rows, but I don't have access to the production database to check how many exactly. Due to this, although I can share the query plan, it wouldn't be so meaningful, since it is considering my dev env, which only have ~100 rows, and it's throwing seq scans for everything.

The query is the following:
WITH RECURSIVE tree(tree_id, tree_path) AS (
    SELECT "conn"."child_id",
           ARRAY["conn"."parent_id"]::integer[]
      FROM "conn"
     WHERE "parent_id" IN (643) -- Starting point of graph
     UNION
    SELECT DISTINCT ON ("conn"."child_id")
           "conn"."child_id",
           "tree"."tree_path" || "conn"."parent_id"
      FROM "tree", "conn"
     WHERE "conn"."parent_id" = "tree"."tree_id"
       AND NOT "conn"."child_id" = ANY("tree"."tree_path")
) SELECT "tb".*,
         array_length("tree"."tree_path", 1) AS "depth",
         array_agg("conn"."child_id" ORDER BY ("conn"."order", "conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_id_set",
         array_agg("conn"."kind" ORDER BY ("conn"."order", "conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_kind_set",
         array_agg("conn"."restrictions" ORDER BY ("conn"."order", "conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_restrictions_set",
         array_agg("conn"."meta" ORDER BY ("conn"."order", "conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_meta_set"
    FROM "tb"
    LEFT OUTER JOIN "conn"
      ON "tb"."id" = "conn"."parent_id"
   INNER JOIN (SELECT DISTINCT ON ("tree_id") * FROM "tree") AS "tree"
      ON "tree"."tree_id" = "tb"."id"
GROUP BY "tb"."id", "tree"."tree_path";

I'm currently using Postgres 11

On Tue, Jan 14, 2020 at 4:49 PM Michael Lewis <mlewis@entrata.com> wrote:
"handle this aggregated data later in code"

What is your end goal though? Also, approx how many rows in these tables? Can you share an example query and plan? What version are you using?

Re: Multiple Aggregations Order

From
João Haas
Date:
I also thought about that. The 'issue', is that when you call array_agg in a row/type, it casts the entire thing in a string. But some of the aggregation fields are jsonb fields, which are also casted to strings, with extra escapes. Then, I'm also using Django, which seems to mess the string even more. I could try to denormalize this whole mess, but I don't think the overhead of sorting each agg is enough for this. Also, Django handles these extra fields really well on raw queries, so it's not an issue having a lot of fields.

On Tue, Jan 14, 2020 at 5:43 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 14, 2020, João Haas <joao.ca.haas@gmail.com> wrote:
SELECT tb.*, array_agg(conn.child_id), array_agg(conn.kind)


Create a custom type (using row(...) might work...?) with the relevant fields and “...array_agg((child_id, kind)::custom_type order by ...”?

David J.

Re: Multiple Aggregations Order

From
Thomas Kellerer
Date:
João Haas schrieb am 14.01.2020 um 18:26:
> I'm working on a query where I need to fetch information from a table
> along with some data from a many-to-many connection table in a single
> query. My idea is to do an outer join with the connection query and
> aggregate the needed data in multiple 'array_agg's, and then handle
> this aggregated data later in code.
>
> The issue is, there are a lot of aggs (4 by now, may increase later),
> and I need to order these by a 'order' field on the connection table.
> I can put an 'ORDER BY "order"' statement inside each 'array_agg',
> but I don't think that would be the most efficient way. Doing the
> join with a sorted connection table didn't work for me as well,
> probably due to other joins on the query. I tried doing some stuff
> with subqueries, but all attempts ended up in either failure or
> increased query time.
>

What about aggregating into a single jsonb array?
You lose some of the data type information, but maybe that's OK for the backend that processes the data.

Something along the lines:

   SELECT tb.*,
          array_length(tree.tree_path, 1) AS depth,
          jsonb_agg(jsonb_build_object('child_id', conn.child_id, 'kind', conn.kind, 'restrictions', conn.restrictions)
orderby conn."order") 
   FROM tb
   ...
   GROUP BY ...