Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 - Mailing list pgsql-hackers

From James Coleman
Subject Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date
Msg-id CAAaqYe90ASC2deAaVou6ev5ppXMGG_epWK+D5yeqpwxG3-OEzw@mail.gmail.com
Whole thread Raw
In response to Re: "could not find pathkey item to sort" for TPC-DS queries 94-96  (Luc Vlaming <luc@swarm64.com>)
Responses Re: "could not find pathkey item to sort" for TPC-DS queries 94-96  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Thu, Apr 15, 2021 at 5:33 AM Luc Vlaming <luc@swarm64.com> wrote:
>
> On 15-04-2021 04:01, James Coleman wrote:
> > On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:
> >>
> >> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
> >> <tomas.vondra@enterprisedb.com> wrote:
> >>>
> >>> On 4/12/21 2:24 PM, Luc Vlaming wrote:
> >>>> Hi,
> >>>>
> >>>> When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
> >>>> 95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
> >>>> When I disable enable_gathermerge the problem goes away and then the
> >>>> plan for query 94 looks like below. I tried figuring out what the
> >>>> problem is but to be honest I would need some pointers as the code that
> >>>> tries to matching equivalence members in prepare_sort_from_pathkeys is
> >>>> something i'm really not familiar with.
> >>>>
> >>>
> >>> Could be related to incremental sort, which allowed some gather merge
> >>> paths that were impossible before. We had a couple issues related to
> >>> that fixed in November, IIRC.
> >>>
> >>>> To reproduce you can either ingest and test using the toolkit I used too
> >>>> (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
> >>>> alternatively just use the schema (see
> >>>> https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)
> >>>>
> >>>
> >>> Thanks, I'll see if I can reproduce that with your schema.
> >>>
> >>>
> >>> regards
> >>>
> >>> --
> >>> Tomas Vondra
> >>> EnterpriseDB: http://www.enterprisedb.com
> >>> The Enterprise PostgreSQL Company
> >>
> >> The query in question is:
> >>
> >> select  count(*)
> >>          from store_sales
> >>              ,household_demographics
> >>              ,time_dim, store
> >>          where ss_sold_time_sk = time_dim.t_time_sk
> >>              and ss_hdemo_sk = household_demographics.hd_demo_sk
> >>              and ss_store_sk = s_store_sk
> >>              and time_dim.t_hour = 15
> >>              and time_dim.t_minute >= 30
> >>              and household_demographics.hd_dep_count = 7
> >>              and store.s_store_name = 'ese'
> >>          order by count(*)
> >>          limit 100;
> >>
> >>  From debugging output it looks like this is the plan being chosen
> >> (cheapest total path):
> >>          Gather(store_sales household_demographics time_dim) rows=60626
> >> cost=3145.73..699910.15
> >>                  HashJoin(store_sales household_demographics time_dim)
> >> rows=25261 cost=2145.73..692847.55
> >>                    clauses: store_sales.ss_hdemo_sk =
> >> household_demographics.hd_demo_sk
> >>                          HashJoin(store_sales time_dim) rows=252609
> >> cost=1989.73..692028.08
> >>                            clauses: store_sales.ss_sold_time_sk =
> >> time_dim.t_time_sk
> >>                                  SeqScan(store_sales) rows=11998564
> >> cost=0.00..658540.64
> >>                                  SeqScan(time_dim) rows=1070
> >> cost=0.00..1976.35
> >>                          SeqScan(household_demographics) rows=720
> >> cost=0.00..147.00
> >>
> >> prepare_sort_from_pathkeys fails to find a pathkey because
> >> tlist_member_ignore_relabel returns null -- which seemed weird because
> >> the sortexpr is an Aggref (in a single member equivalence class) and
> >> the tlist contains a single member that's also an Aggref. It turns out
> >> that the only difference between the two Aggrefs is that the tlist
> >> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
> >> aggsplit = AGGSPLIT_SIMPLE.
> >>
> >> That's as far as I've gotten so far, but I figured I'd get that info
> >> out to see if it means anything obvious to anyone else.
> >
> > This really goes back to [1] where we fixed a similar issue by making
> > find_em_expr_usable_for_sorting_rel parallel the rules in
> > prepare_sort_from_pathkeys.
> >
> > Most of those conditions got copied, and the case we were trying to
> > handle is the fact that prepare_sort_from_pathkeys can generate a
> > target list entry under those conditions if one doesn't exist. However
> > there's a further restriction there I don't remember looking at: it
> > uses pull_var_clause and tlist_member_ignore_relabel to ensure that
> > all of the vars that feed into the sort expression are found in the
> > target list. As I understand it, that is: it will build a target list
> > entry for something like "md5(column)" if "column" (and that was one
> > of our test cases for the previous fix) is in the target list already.
> >
> > But there's an additional detail here: the call to pull_var_clause
> > requests aggregates, window functions, and placeholders be treated as
> > vars. That means for our Aggref case it would require that the two
> > Aggrefs be fully equal, so the differing aggsplit member would cause a
> > target list entry not to be built, hence our error here.
> >
> > I've attached a quick and dirty patch that encodes that final rule
> > from prepare_sort_from_pathkeys into
> > find_em_expr_usable_for_sorting_rel. I can't help but think that
> > there's a cleaner way to do with this with less code duplication, but
> > hindering that is that prepare_sort_from_pathkeys is working with a
> > TargetList while find_em_expr_usable_for_sorting_rel is working with a
> > list of expressions.
> >
> > James
> >
> > 1: https://www.postgresql.org/message-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_%3DNS9GWXuNiFANg%40mail.gmail.com
> >
>
> Hi,
>
> The patch seems to make the planner proceed and not error out anymore.
> Cannot judge if it's doing the right thing however or if its enough :)
> It works for me for all reported queries however (queries 94-96).
>
> And sorry for the confusion wrt the stacktrace and plan. I tried to
> produce a plan to possibly help with debugging but that would ofc then
> not have the problem of the missing sortkey as otherwise i cannot
> present a plan :) The stacktrace was however correct, and the plan
> considered involved a gather-merge with a sort. Unfortunately I could
> not (easily) get the plan outputted in the end; even when setting the
> costs to 0 somehow...
>
> Regards,
> Luc

Same patch, but with a test case now.

James

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Forget close an open relation in ReorderBufferProcessTXN()
Next
From: Matthias van de Meent
Date:
Subject: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation