Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20190707210205.4lxkbq6tpjcj2hlk@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Sun, Jul 07, 2019 at 09:01:43AM -0400, James Coleman wrote:
>On Sun, Jul 7, 2019 at 8:34 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:
>> >On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
>> ><tomas.vondra@2ndquadrant.com> wrote:
>> >>
>> >> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
>> >> >
>> >> >Unrelated: if you or someone else you know that's more familiar with
>> >> >the parallel code, I'd be interested in their looking at the patch at
>> >> >some point, because I have a suspicion it might not be operating in
>> >...
>> >> So I've looked into that, and the reason seems fairly simple - when
>> >> generating the Gather Merge paths, we only look at paths that are in
>> >> partial_pathlist. See generate_gather_paths().
>> >>
>> >> And we only have sequential + index paths in partial_pathlist, not
>> >> incremental sort paths.
>> >>
>> >> IMHO we can do two things:
>> >>
>> >> 1) modify generate_gather_paths to also consider incremental sort for
>> >> each sorted path, similarly to what create_ordered_paths does
>> >>
>> >> 2) modify build_index_paths to also generate an incremental sort path
>> >> for each index path
>> >>
>> >> IMHO (1) is the right choice here, because it automatically does the
>> >> trick for all other types of ordered paths, not just index scans. So,
>> >> something like the attached patch, which gives me plans like this:
>> >...
>> >> But I'm not going to claim those are total fixes, it's the minimum I
>> >> needed to do to make this particular type of plan work.
>> >
>> >Thanks for looking into this!
>> >
>> >I intended to apply this to my most recent version of the patch (just
>> >sent a few minutes ago), but when I apply it I noticed that the
>> >partition_aggregate regression tests have several of these failures:
>> >
>> >ERROR:  could not find pathkey item to sort
>> >
>> >I haven't had time to look into the cause yet, so I decided to wait
>> >until the next patch revision.
>> >
>>
>> I wanted to investigate this today, but I can't reprodure it. How are
>> you building and running the regression tests?
>>
>> Attached is a patch adding the incremental sort below gather merge, and
>> also tweaking the costing. But that's mostly for and better planning
>> decisions, I don't get any pathkey errors even with the first patch.
>
>On 12be7f7f997debe4e05e84b69c03ecf7051b1d79 (the last patch I sent,
>which is based on top of 5683b34956b4e8da9dccadc2e3a53b86104ebb33), I
>did this:
>
>patch -p1 < ~/Downloads/parallel-incremental-sort.patch
><rebuild> (FWIW I configure with ./configure
>--prefix=$HOME/postgresql-test --enable-cassert --enable-debug
>--enable-depend CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer
>-DOPTIMIZER_DEBUG")
>make check-world
>
>And I get the attached regression failures.
>

OK, thanks. Apparently it's the costing changes that make it go away, if
I try just the patch that tweaks generate_gather_paths() I see the same
failures. The failure happens during plan construction, so I think the
costing changes simply mean the path with incremental sort end up not
being the cheapest one (for the problematic queries), but that's just
pure luck - it's definitely an issue that needs fixing.

That error message is triggered in two places in createplan.c, and after
changing them to Assert(false) I get a core dump with this backtrace:

#0  0x0000702b3328857f in raise () from /lib64/libc.so.6
#1  0x0000702b33272895 in abort () from /lib64/libc.so.6
#2  0x0000000000a59a9d in ExceptionalCondition (conditionName=0xc52e84 "!(0)", errorType=0xc51f96 "FailedAssertion",
fileName=0xc51fe6"createplan.c", lineNumber=5937) at assert.c:54
 
#3  0x00000000007d4ab5 in prepare_sort_from_pathkeys (lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0,
reqColIdx=0x0,adjust_tlist_in_place=false, p_numsortkeys=0x7ffe1abcfd6c, p_sortColIdx=0x7ffe1abcfd60,
p_sortOperators=0x7ffe1abcfd58,p_collations=0x7ffe1abcfd50, 
 
    p_nullsFirst=0x7ffe1abcfd48) at createplan.c:5937
#4  0x00000000007d4e7f in make_incrementalsort_from_pathkeys (lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0,
presortedCols=1)at createplan.c:6101
 
#5  0x00000000007cdd3f in create_incrementalsort_plan (root=0x2b787c0, best_path=0x2bb92b0, flags=1) at
createplan.c:2019
#6  0x00000000007cb7ad in create_plan_recurse (root=0x2b787c0, best_path=0x2bb92b0, flags=1) at createplan.c:469
#7  0x00000000007cd778 in create_gather_merge_plan (root=0x2b787c0, best_path=0x2bb94a0) at createplan.c:1764
#8  0x00000000007cb8fb in create_plan_recurse (root=0x2b787c0, best_path=0x2bb94a0, flags=4) at createplan.c:516
#9  0x00000000007cdf10 in create_agg_plan (root=0x2b787c0, best_path=0x2bb9b28) at createplan.c:2115
#10 0x00000000007cb834 in create_plan_recurse (root=0x2b787c0, best_path=0x2bb9b28, flags=3) at createplan.c:484
#11 0x00000000007cdc16 in create_sort_plan (root=0x2b787c0, best_path=0x2bba1e8, flags=1) at createplan.c:1986
#12 0x00000000007cb78e in create_plan_recurse (root=0x2b787c0, best_path=0x2bba1e8, flags=1) at createplan.c:464
#13 0x00000000007cb4ae in create_plan (root=0x2b787c0, best_path=0x2bba1e8) at createplan.c:330
#14 0x00000000007db63c in standard_planner (parse=0x2bf5bc8, cursorOptions=256, boundParams=0x0) at planner.c:413
#15 0x00000000007db3b4 in planner (parse=0x2bf5bc8, cursorOptions=256, boundParams=0x0) at planner.c:275
#16 0x00000000008e404f in pg_plan_query (querytree=0x2bf5bc8, cursorOptions=256, boundParams=0x0) at postgres.c:878
#17 0x0000000000657afa in ExplainOneQuery (query=0x2bf5bc8, cursorOptions=256, into=0x0, es=0x2bf5fc0,
queryString=0x2a74be8"EXPLAIN (COSTS OFF)\nSELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3
ORDERBY 1, 2, 3;", params=0x0, queryEnv=0x0) at explain.c:371
 
#18 0x0000000000657804 in ExplainQuery (pstate=0x2a95ff8, stmt=0x2a76628, queryString=0x2a74be8 "EXPLAIN (COSTS
OFF)\nSELECTa, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;", params=0x0,
queryEnv=0x0,dest=0x2a95f60) at explain.c:259
 
#19 0x00000000008ec75e in standard_ProcessUtility (pstmt=0x2b8a050, queryString=0x2a74be8 "EXPLAIN (COSTS OFF)\nSELECT
a,sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,queryEnv=0x0, dest=0x2a95f60, 
 
    completionTag=0x7ffe1abd0430 "") at utility.c:675
#20 0x00000000008ebfbe in ProcessUtility (pstmt=0x2b8a050, queryString=0x2a74be8 "EXPLAIN (COSTS OFF)\nSELECT a,
sum(b),count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,queryEnv=0x0, dest=0x2a95f60, 
 
    completionTag=0x7ffe1abd0430 "") at utility.c:360
#21 0x00000000008eafab in PortalRunUtility (portal=0x2adc558, pstmt=0x2b8a050, isTopLevel=true, setHoldSnapshot=true,
dest=0x2a95f60,completionTag=0x7ffe1abd0430 "") at pquery.c:1175
 
#22 0x00000000008eacb2 in FillPortalStore (portal=0x2adc558, isTopLevel=true) at pquery.c:1035
#23 0x00000000008ea60e in PortalRun (portal=0x2adc558, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2b8a148,altdest=0x2b8a148, completionTag=0x7ffe1abd0620 "") at pquery.c:765
 
#24 0x00000000008e45be in exec_simple_query (query_string=0x2a74be8 "EXPLAIN (COSTS OFF)\nSELECT a, sum(b), count(*)
FROMpagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;") at postgres.c:1215
 
#25 0x00000000008e8912 in PostgresMain (argc=1, argv=0x2aa07c0, dbname=0x2aa0530 "regression", username=0x2a707e8
"user")at postgres.c:4249
 
#26 0x000000000083ed83 in BackendRun (port=0x2a99c50) at postmaster.c:4431
#27 0x000000000083e551 in BackendStartup (port=0x2a99c50) at postmaster.c:4122
#28 0x000000000083a977 in ServerLoop () at postmaster.c:1704
#29 0x000000000083a223 in PostmasterMain (argc=8, argv=0x2a6e670) at postmaster.c:1377
#30 0x000000000075a823 in main (argc=8, argv=0x2a6e670) at main.c:228

I think it's pretty obvious what's happening - I'll resist the urge
to just write "The proof is obvious and is left to the reader as a
homework." because I always despised that during math lectures ;-)

We're running query like this:

  SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3

but we're trying to add the incremental sort *before* the aggregation,
because the optimizer also considers group aggregate with a sorted
input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
actually can do this, but clearly that's nonsense, because we can't
possibly know the aggregates yet. Hence the error.

If this is the actual issue, we need to ensure we actually can evaluate
all the pathkeys. I don't know how to do that yet. I thought that maybe
we should modify pathkeys_common_contained_in() to set presorted_keys to
0 in this case.

But then I started wondering why we don't see this issue even for
regular (non-incremental-sort) paths built in create_ordered_paths().
How come we don't see these failures there? I've modified costing to
make all incremental sort paths very cheap, and still nothing.

So presumably there's a check elsewhere (either implicit or explicit),
because create_ordered_paths() uses pathkeys_common_contained_in() and
does not have the same issue.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Steven Pousty
Date:
Subject: Re: Switching PL/Python to Python 3 by default in PostgreSQL 12
Next
From: Alexander Korotkov
Date:
Subject: Re: SQL/JSON path issues/questions