Re: Slow GroupAggregate and Sort - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Slow GroupAggregate and Sort
Date
Msg-id CAMkU=1wJHC0rZrWTErUceFLez6bhn_1R2Y_OCEF0mRJPYeCRZg@mail.gmail.com
Whole thread Raw
In response to Re: Slow GroupAggregate and Sort  (Darwin Correa <dcorrea@jedai.group>)
Responses Re: Slow GroupAggregate and Sort  (Darwin Correa <dcorrea@jedai.group>)
List pgsql-performance
On Mon, Jan 1, 2024 at 9:57 AM Darwin Correa <dcorrea@jedai.group> wrote:
Hello, Happy New Year! I add my responses in blue.



---- El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes <jeff.janes@gmail.com> escribió ----

I thought the point of sharding was to bring more CPU and RAM to bear than can feasibly be obtained in one machine.  Doesn't that make 24 shards per machine completely nuts?

Based o citus docs the recommended shards is 2x cpu cores in my case I've tested with few shards and 1:1, 2:1 shards but always have slow query time in the last step (sorting and grouping) in máster node.

That might make sense if PostgreSQL didn't do parallelization itself.  But according to your plan, PostgreSQL itself tries to parallelize 4 ways (although fails, as it can't find any available workers) and then you have 24 nodes all doing the same thing, all with only 12 CPU.  That doesn't seem good. although it now does seem unrelated to the issue at hand.


I'd break this down into more manageable chunks for investigation.  Populate one scratch table (on one node, not a hypertable) with all 2.6 million rows.  See how long it takes to populate it based on the citus query, and separately see how long it takes to run the aggregate query on the populated scratch table.

After scratch table filled sort took 32s, explain (https://explain.dalibo.com/plan/8a3h26hcc6328c11)

So that plan shows the sort to be egregiously slow, and with no involvement of citus and no apparent reason for slowness.  I'm thinking you have a pathological collation being used.  What is your default collation?  (Your DDL shows that no non-default collations are in use, but doesn't indicate what the default is)

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Parallel hints in PostgreSQL with consistent perfromance
Next
From: mohini mane
Date:
Subject: Re: Parallel hints in PostgreSQL with consistent perfromance