Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 - Mailing list pgsql-hackers

From James Coleman
Subject Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date
Msg-id CAAaqYe8qA6hSiUA-QZcYmJct_jo+vJU2WfwWVKuSehOWw_dgpw@mail.gmail.com
Whole thread Raw
In response to Re: "could not find pathkey item to sort" for TPC-DS queries 94-96  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Apr 14, 2021 at 8:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 14, 2021 at 5:43 PM James Coleman <jtc331@gmail.com> wrote:
> > The query in question is:
> > select  count(*)
> >         from store_sales
> >             ,household_demographics
> >             ,time_dim, store
> >         where ss_sold_time_sk = time_dim.t_time_sk
> >             and ss_hdemo_sk = household_demographics.hd_demo_sk
> >             and ss_store_sk = s_store_sk
> >             and time_dim.t_hour = 15
> >             and time_dim.t_minute >= 30
> >             and household_demographics.hd_dep_count = 7
> >             and store.s_store_name = 'ese'
> >         order by count(*)
> >         limit 100;
> >
> > From debugging output it looks like this is the plan being chosen
> > (cheapest total path):
> >         Gather(store_sales household_demographics time_dim) rows=60626
> > cost=3145.73..699910.15
> >                 HashJoin(store_sales household_demographics time_dim)
> > rows=25261 cost=2145.73..692847.55
> >                   clauses: store_sales.ss_hdemo_sk =
> > household_demographics.hd_demo_sk
> >                         HashJoin(store_sales time_dim) rows=252609
> > cost=1989.73..692028.08
> >                           clauses: store_sales.ss_sold_time_sk =
> > time_dim.t_time_sk
> >                                 SeqScan(store_sales) rows=11998564
> > cost=0.00..658540.64
> >                                 SeqScan(time_dim) rows=1070
> > cost=0.00..1976.35
> >                         SeqScan(household_demographics) rows=720
> > cost=0.00..147.00
>
> This doesn't really make sense to me given the strack trace in the OP.
> That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop ->
> NestLoop -> GatherMerge -> Sort. If the plan were as you have it here,
> there would be no Sort and no Gather Merge, so where would be getting
> a failure related to pathkeys?

Ah, yeah, I'm not sure where the original stacktrace came from, but
here's the stack for the query I reproduced it with (perhaps it does
so on different queries or there was some other GUC change in the
reported plan):

#0  errfinish (filename=filename@entry=0x56416eefa845 "createplan.c",
lineno=lineno@entry=6186,
    funcname=funcname@entry=0x56416eefb660 <__func__.24872>
"prepare_sort_from_pathkeys") at elog.c:514
#1  0x000056416eb6ed52 in prepare_sort_from_pathkeys
(lefttree=0x564170552658, pathkeys=0x5641704f2640, relids=0x0,
reqColIdx=reqColIdx@entry=0x0,
    adjust_tlist_in_place=adjust_tlist_in_place@entry=false,
p_numsortkeys=p_numsortkeys@entry=0x7fff1252817c,
p_sortColIdx=0x7fff12528170,
    p_sortOperators=0x7fff12528168, p_collations=0x7fff12528160,
p_nullsFirst=0x7fff12528158) at createplan.c:6186
#2  0x000056416eb6ee69 in make_sort_from_pathkeys (lefttree=<optimized
out>, pathkeys=<optimized out>, relids=<optimized out>) at
createplan.c:6313
#3  0x000056416eb71fc7 in create_sort_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f650, flags=flags@entry=1)
    at createplan.c:2118
#4  0x000056416eb6f638 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f650,
flags=flags@entry=1) at createplan.c:489
#5  0x000056416eb72e06 in create_gather_merge_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f6e8) at createplan.c:1885
#6  0x000056416eb6f723 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f6e8,
flags=flags@entry=4) at createplan.c:541
#7  0x000056416eb726fb in create_agg_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f8c8) at createplan.c:2238
#8  0x000056416eb6f67b in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f8c8,
flags=flags@entry=3) at createplan.c:509
#9  0x000056416eb71f8e in create_sort_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054f560, flags=flags@entry=1)
    at createplan.c:2109
#10 0x000056416eb6f638 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054f560,
flags=flags@entry=1) at createplan.c:489
#11 0x000056416eb72c83 in create_limit_plan
(root=root@entry=0x564170511a70,
best_path=best_path@entry=0x56417054ffa0, flags=flags@entry=1)
    at createplan.c:2784
#12 0x000056416eb6f713 in create_plan_recurse
(root=root@entry=0x564170511a70, best_path=0x56417054ffa0,
flags=flags@entry=1) at createplan.c:536
#13 0x000056416eb6f79d in create_plan (root=root@entry=0x564170511a70,
best_path=<optimized out>) at createplan.c:349
#14 0x000056416eb7fe93 in standard_planner (parse=0x564170437268,
query_string=<optimized out>, cursorOptions=2048,
boundParams=<optimized out>)
    at planner.c:407

> I think if we can get the correct plan the thing to look at would be
> the tlists at the relevant levels of the plan.

Does the information in [1] help at all? The tlist does have an
Aggref, as expected, but its aggsplit value doesn't match the
pathkey's Aggref's aggsplit value.

James

1: https://www.postgresql.org/message-id/CAAaqYe_NU4hO9COoJdcXWqjtH%3DdGMknYdsSdJjZ%3DJOHPTea-Nw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Next
From: James Coleman
Date:
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96