Re: pg16: XX000: could not find pathkey item to sort - Mailing list pgsql-hackers

From David Rowley
Subject Re: pg16: XX000: could not find pathkey item to sort
Date
Msg-id CAApHDvo7RzcQYw-gnkZr6QCijCqf8vJLkJ4XFk-KawvyAw109Q@mail.gmail.com
Whole thread Raw
In response to Re: pg16: XX000: could not find pathkey item to sort  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: pg16: XX000: could not find pathkey item to sort
List pgsql-hackers
On Tue, 3 Oct 2023 at 09:11, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm concerned that this patch will be too much overhead when creating
> paths when a PathKey's EquivalenceClass has a large number of members
> from partitioned tables.

I just tried out the patch to see how much it affects the performance
of the planner.  I think we need to find a better way to strip off the
pathkeys for the columns that have been aggregated.

Setup:
create table lp (a int, b int) partition by list (a);
select 'create table lp'||x||' partition of lp for values in('||x||')'
from generate_series(0,999)x;
\gexec

\pset pager off
set enable_partitionwise_aggregate=1;

Benchmark query:
explain (summary on) select a,count(*) from lp group by a;

Master:
Planning Time: 23.945 ms
Planning Time: 23.887 ms
Planning Time: 23.927 ms

perf top:
   7.39%  libc.so.6         [.] __memmove_avx_unaligned_erms
   6.98%  [kernel]          [k] clear_page_rep
   5.69%  postgres          [.] bms_is_subset
   5.07%  postgres          [.] fetch_upper_rel
   4.41%  postgres          [.] bms_equal

Patched:
Planning Time: 41.410 ms
Planning Time: 41.474 ms
Planning Time: 41.488 ms

perf top:
 19.02%  postgres          [.] bms_is_subset
   6.91%  postgres          [.] find_ec_member_matching_expr
   5.93%  libc.so.6         [.] __memmove_avx_unaligned_erms
   5.55%  [kernel]          [k] clear_page_rep
   4.07%  postgres          [.] fetch_upper_rel
   3.46%  postgres          [.] bms_equal

> I wondered if we should instead just check
> if the subpath's pathkeys match root->group_pathkeys and if they do
> set the AggPath's pathkeys to list_copy_head(subpath->pathkeys,
> root->num_groupby_pathkeys),  that'll be much cheaper, but it just
> feels a bit too much like a special case.

I tried this approach (patch attached) and it does perform better than
the other patch:

create_agg_path_fix2.patch:
Planning Time: 24.357 ms
Planning Time: 24.293 ms
Planning Time: 24.259 ms

   7.45%  libc.so.6         [.] __memmove_avx_unaligned_erms
   6.90%  [kernel]          [k] clear_page_rep
   5.56%  postgres          [.] bms_is_subset
   5.38%  postgres          [.] bms_equal

I wonder if the attached patch is too much of a special case fix.  I
guess from the lack of complaints previously that there are no other
cases where we could possibly have pathkeys that belong to columns
that are aggregated.  I've not gone to much effort to see if I can
craft a case that hits this without the ORDER BY/DISTINCT aggregate
optimisation, however.

David

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()
Next
From: Michael Paquier
Date:
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss