Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys - Mailing list pgsql-hackers

From James Coleman
Subject Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys
Date
Msg-id CAAaqYe-m04ZGSYgvf3h-i_R6=PBkxSWHAFbAvxH8Y5p0wm1tWw@mail.gmail.com
Whole thread Raw
In response to Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Sun, Nov 22, 2020 at 4:59 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 11/21/20 2:55 AM, James Coleman wrote:
> > Over on the -bugs list we had a report [1] of a seg fault with
> > incremental sort. The short of the investigation there was that a
> > subplan wasn't being serialized since it wasn't parallel safe, and
> > that attempting to initialize that subplan resulted in the seg fault.
> > Tom pushed a change to raise an ERROR when this occurs so that at
> > least we don't crash the server.
> >
> > There was some small amount of discussion about this on a thread about
> > a somewhat similar issue [2] (where volatile expressions were the
> > issue), but that thread resulted in a committed patch, and beyond that
> > it seems that this issue deserves its own discussion.
> >
> > I've been digging further into the problem, and my first question was
> > "why doesn't this result in a similar error but with a full sort when
> > incremental sort is disabled?" After all, we have code to consider a
> > gather merge + full sort on the cheapest partial path in
> > generate_useful_gather_paths(), and the plan that I get on Luis's test
> > case with incremental sort disabled has an full sort above a gather,
> > which presumably it'd be cheaper to push down into the parallel plan
> > (using gather merge instead of gather).
> >
> > It turns out that there's a separate bug here: I'm guessing that in
> > the original commit of generate_useful_gather_paths() we had a
> > copy/paste thinko in this code:
> >
> > /*
> >  * If the path has no ordering at all, then we can't use either
> >  * incremental sort or rely on implicit sorting with a gather
> >  * merge.
> >  */
> > if (subpath->pathkeys == NIL)
> >      continue;
> >
> > The comment claims that we should skip subpaths that aren't sorted at
> > all since we can't possibly use a gather merge directly or with an
> > incremental sort applied first. It's true that we can't do either of
> > those things unless the subpath is already at least partially sorted.
> > But subsequently we attempt to construct a gather merge path on top of
> > a full sort (for the cheapest path), and there's no reason to limit
> > that to partially sorted paths (indeed in the presence of incremental
> > sort it seems unlikely that that would be the cheapest path).
> >
> > We still don't want to build incremental sort paths if the subpath has
> > no pathkeys, but that will imply presorted_keys = 0, which we already
> > check.
> >
> > So 0001 removes that branch entirely. And, as expected, we now get a
> > gather merge + full sort the resulting error about subplan not being
> > initialized.
> >
>
> Yeah, that's clearly a thinko in generate_useful_gather_paths. Thanks
> for spotting it! The 0001 patch seems fine to me.
>
> > With that oddity out of the way, I started looking at the actually
> > reported problem, and nominally issue is that we have a correlated
> > subquery (in the final target list and the sort clause), and
> > correlated subqueries (not sure why exactly in this case; see [3])
> > aren't considered parallel-safe.
> >
> > As to why that's happening:
> > 1. find_em_expr_usable_for_sorting_rel doesn't exclude that
> > expression, and arguably correctly so in the general case since one
> > might (though we don't now) use this same kind of logic for
> > non-parallel plans.
> > 2. generate_useful_pathkeys_for_relation is noting that it'd be useful
> > to sort on that expression and sees it as safe in part due to the (1).
> > 3. We create a sort node that includes that expression in its output.
> > That seems pretty much the same (in terms of safety given generalized
> > input paths/plans, not in terms of what Robert brought up in [4]) as
> > what would happen in the prepare_sort_from_pathkeys call in
> > create_gather_merge_plan.
> >
> > So to fix this problem I've added the option to find_em_expr_for_rel
> > to restrict to parallel-safe expressions in 0002.
> >
>
> OK, that seems like a valid fix. I wonder if we can have EC with members
> mixing parallel-safe and parallel-unsafe expressions? I guess no, but
> it's more a feeling than a clear reasoning and so I wonder what would
> happen in such cases.

An immediately contrived case would be something like "t.x =
unsafe(t.y)". As to whether that can actually result in us wanting to
sort by the safe member before we can execute the unsafe member (e.g.,
in a filter), I'm less sure, but it seems at least plausible to me (or
I don't see anything inherently preventing it).

> > Given point 3 above (and the discussion with Robert earlier) above it
> > seems to me that there's a bigger problem here that just hasn't
> > happened to have been exposed yet: in particular the
> > prepare_sort_from_pathkeys call in create_gather_merge_plan seems
> > suspect to me. But it's also possible other usages of
> > prepare_sort_from_pathkeys could be suspect given other seemingly
> > unrelated changed to path generation, since nothing other than
> > implicit knowledge about the conventions here prevents us from
> > introducing paths generating sorts with expressions not in the target
> > list (in fact a part of the issue here IMO is that non-var expression
> > pathkeys are added to the target list after path generation and
> > selection is completed).
> >
> > Alternatively we could "fix" this by being even more conservative in
> > find_em_expr_usable_for_sorting_rel and disregard any expressions not
> > currently found in the target list. But that seems unsatisfactory to
> > me since, given we know that there are cases where
> > prepare_sort_from_paths is explicitly intended to add pathkey
> > expressions to the target list, we'd be excluding possible useful
> > cases (and indeed we already have, albeit contrived, test cases that
> > fail if that change is made).
> >
> > There exists a very similar path in the postgres fdw code (in fact the
> > function name is the same: get_useful_pathkeys_for_relation) since
> > find_em_expr_for_rel is much simpler (it doesn't do many safety checks
> > at all like we do in find_em_expr_usable_for_sorting_rel), but I think
> > that's safe (though I haven't thought about it a ton) because I
> > believe those are sort keys we're going to send to the foreign server
> > anyway, as opposed to sorting ourselves (but I haven't verified that
> > that's the case and that we never create sort nodes there).
> >
>
> Yeah. I think the various FDW-related restrictions may easily make that
> safe, for the reasons you already mentioned. But also because IIRC FDWs
> in general are not parallel-safe, so there's no risk of running foreign
> scans under Gather [Merge].

There are some comments (IIRC in the parallel docs) about FDWs being
able to be marked as parallel-safe, but I'm not sure if that facility
is in play. Either way, it's quite a different case.

> Thanks for the patches, I'll take a closer look next week. The 0002
> patch is clearly something we need to backpatch, not sure about 0001 as
> it essentially enables new behavior in some cases (Sort on unsorted
> paths below Gather Merge).

Thanks for taking a look.

I had the same question re: backporting. On the one hand it will
change behavior (this is always a bit of a gray area since in one
sense bugs change behavior), but IMO it's not a new feature, because
the code clearly intended to have that feature in the first place (it
creates gather merges on top of a full sort). So I'd lean towards
considering it a bug fix, but I'm also not going to make that a hill
to die on.

One semi-related question: do you think we should add a comment to
prepare_sort_from_pathkeys explaining that modifying it may mean
find_em_expr_for_rel needs to be updated also?

James



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Why does create_gather_merge_plan need make_sort?
Next
From: Daniil Zakhlystov
Date:
Subject: Re: libpq compression