Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 - Mailing list pgsql-hackers

From James Coleman
Subject Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date
Msg-id CAAaqYe8hst0zS4FDc-gS++u=NRkdKYEn5-zi8FLjDp2+DELBdw@mail.gmail.com
Whole thread Raw
In response to Re: "could not find pathkey item to sort" for TPC-DS queries 94-96  (James Coleman <jtc331@gmail.com>)
Responses Re: "could not find pathkey item to sort" for TPC-DS queries 94-96  (Luc Vlaming <luc@swarm64.com>)
List pgsql-hackers
On Wed, Apr 14, 2021 at 5:42 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Apr 12, 2021 at 8:37 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > On 4/12/21 2:24 PM, Luc Vlaming wrote:
> > > Hi,
> > >
> > > When trying to run on master (but afaik also PG-13) TPC-DS queries 94,
> > > 95 and 96 on a SF10 I get the error "could not find pathkey item to sort".
> > > When I disable enable_gathermerge the problem goes away and then the
> > > plan for query 94 looks like below. I tried figuring out what the
> > > problem is but to be honest I would need some pointers as the code that
> > > tries to matching equivalence members in prepare_sort_from_pathkeys is
> > > something i'm really not familiar with.
> > >
> >
> > Could be related to incremental sort, which allowed some gather merge
> > paths that were impossible before. We had a couple issues related to
> > that fixed in November, IIRC.
> >
> > > To reproduce you can either ingest and test using the toolkit I used too
> > > (see https://github.com/swarm64/s64da-benchmark-toolkit/), or
> > > alternatively just use the schema (see
> > > https://github.com/swarm64/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds/schemas/psql_native)
> > >
> >
> > Thanks, I'll see if I can reproduce that with your schema.
> >
> >
> > regards
> >
> > --
> > Tomas Vondra
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
>
> The query in question is:
>
> select  count(*)
>         from store_sales
>             ,household_demographics
>             ,time_dim, store
>         where ss_sold_time_sk = time_dim.t_time_sk
>             and ss_hdemo_sk = household_demographics.hd_demo_sk
>             and ss_store_sk = s_store_sk
>             and time_dim.t_hour = 15
>             and time_dim.t_minute >= 30
>             and household_demographics.hd_dep_count = 7
>             and store.s_store_name = 'ese'
>         order by count(*)
>         limit 100;
>
> From debugging output it looks like this is the plan being chosen
> (cheapest total path):
>         Gather(store_sales household_demographics time_dim) rows=60626
> cost=3145.73..699910.15
>                 HashJoin(store_sales household_demographics time_dim)
> rows=25261 cost=2145.73..692847.55
>                   clauses: store_sales.ss_hdemo_sk =
> household_demographics.hd_demo_sk
>                         HashJoin(store_sales time_dim) rows=252609
> cost=1989.73..692028.08
>                           clauses: store_sales.ss_sold_time_sk =
> time_dim.t_time_sk
>                                 SeqScan(store_sales) rows=11998564
> cost=0.00..658540.64
>                                 SeqScan(time_dim) rows=1070
> cost=0.00..1976.35
>                         SeqScan(household_demographics) rows=720
> cost=0.00..147.00
>
> prepare_sort_from_pathkeys fails to find a pathkey because
> tlist_member_ignore_relabel returns null -- which seemed weird because
> the sortexpr is an Aggref (in a single member equivalence class) and
> the tlist contains a single member that's also an Aggref. It turns out
> that the only difference between the two Aggrefs is that the tlist
> entry has "aggsplit = AGGSPLIT_INITIAL_SERIAL" while the sortexpr has
> aggsplit = AGGSPLIT_SIMPLE.
>
> That's as far as I've gotten so far, but I figured I'd get that info
> out to see if it means anything obvious to anyone else.

This really goes back to [1] where we fixed a similar issue by making
find_em_expr_usable_for_sorting_rel parallel the rules in
prepare_sort_from_pathkeys.

Most of those conditions got copied, and the case we were trying to
handle is the fact that prepare_sort_from_pathkeys can generate a
target list entry under those conditions if one doesn't exist. However
there's a further restriction there I don't remember looking at: it
uses pull_var_clause and tlist_member_ignore_relabel to ensure that
all of the vars that feed into the sort expression are found in the
target list. As I understand it, that is: it will build a target list
entry for something like "md5(column)" if "column" (and that was one
of our test cases for the previous fix) is in the target list already.

But there's an additional detail here: the call to pull_var_clause
requests aggregates, window functions, and placeholders be treated as
vars. That means for our Aggref case it would require that the two
Aggrefs be fully equal, so the differing aggsplit member would cause a
target list entry not to be built, hence our error here.

I've attached a quick and dirty patch that encodes that final rule
from prepare_sort_from_pathkeys into
find_em_expr_usable_for_sorting_rel. I can't help but think that
there's a cleaner way to do with this with less code duplication, but
hindering that is that prepare_sort_from_pathkeys is working with a
TargetList while find_em_expr_usable_for_sorting_rel is working with a
list of expressions.

James

1: https://www.postgresql.org/message-id/CAAaqYe9C3f6A_tZCRfr9Dm7hPpgGwpp4i-K_%3DNS9GWXuNiFANg%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies
Next
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies