Re: pg16: XX000: could not find pathkey item to sort - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: pg16: XX000: could not find pathkey item to sort |
Date | |
Msg-id | CAExHW5tu5Wz1149+73cOTTQqses73e_aJdhyxi-Yh=FNGXWOdA@mail.gmail.com Whole thread Raw |
In response to | Re: pg16: XX000: could not find pathkey item to sort (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: pg16: XX000: could not find pathkey item to sort
|
List | pgsql-hackers |
On Thu, Mar 14, 2024 at 3:45 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 14 Mar 2024 at 18:23, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> I don't understand why root->query_pathkeys has both a and b. "a" is there because of GROUP BY and ORDER BY clause. But why "b"?
So that the ORDER BY aggregate function can be evaluated without
nodeAgg.c having to perform the sort. See
adjust_group_pathkeys_for_groupagg().
Thanks. To me, it looks like we are gathering pathkeys, which if used to sort the result of overall join, would avoid sorting in as many as aggregates as possible.
relation_can_be_sorted_early() finds, pathkeys which if used to sort the given relation, would help sorting the overall join. Contrary to what I said earlier, it might help if the base relation is sorted on "a" and "b". What I find weird is that the sorting is not pushed down to the partitions, where it would help most.
#explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
and that's the case even without parallel plans
#explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
But it could be just because the corresponding plan was not found to be optimal. May be because there isn't enough data in those tables.
If the problem you speculate is different from this one, I am not able to see it. It might help give an example query or explain more.
--
Best Wishes,
Ashutosh Bapat
pgsql-hackers by date: