Re: enable_incremental_sort changes query behavior - Mailing list pgsql-hackers

From James Coleman
Subject Re: enable_incremental_sort changes query behavior
Date
Msg-id CAAaqYe_izhFTq1FLgUxa+J+_bY8FGBNiHjTj=EzrSA72tvacEw@mail.gmail.com
Whole thread Raw
In response to Re: enable_incremental_sort changes query behavior  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: enable_incremental_sort changes query behavior
List pgsql-hackers
On Fri, Oct 30, 2020 at 5:03 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
> >On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
> >> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> >> ><jaime.casanova@2ndquadrant.com> wrote:
> >> >> Can you please create an entry in the commitfest for this one so we
> >> >> don't lose track of it?
> >> >
> >>
> >> We're not too far from the next minor release, so I've been looking at
> >> this fix again and I intend to get it committed shortly (on Monday or
> >> so). Attached is a slightly modified version of the v4 patch that:
> >>
> >> (a) Removes comments about projections added to code that is not
> >> directly related to the fix. I'm not against adding such comments
> >> separately.
> >
> >Seems fine. Do you want to commit them at the same time (just a
> >separate commit)? Or have a separate patch? It seems a bit overkill to
> >start a new thread just for those.
> >
>
> Probably sometime later, as a separate patch. I haven't thought very
> much about those comments, it just seemed unrelated to the fix so I've
> removed that for now. Let's not bother with this until after the minor
> release.

For whatever it's worth, I didn't originally consider them unrelated;
the purpose was to explain why those places were safe relative to the
same kinds of issues under discussion here: whether an expr will be in
the target and when it might need to be added.

EIther way I've split it into a separate patch in this series.

> >> (b) Fixes comment in expected output of incremental_sort test.
> >
> >Thanks.
> >
> >> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
> >> not quite needed thanks to the "return" in the "if" branch. IMO this
> >> makes it more elegant.
> >
> >No objection.
> >
> >> I do have two questions about find_em_expr_usable_for_sorting_rel.
> >>
> >> (a) Where does the em_is_const check come from? The comment claims we
> >> should not be trying to sort by equivalence class members, but I can't
> >> convince myself it's actually true and I don't see it discussed in this
> >> thread.
> >
> >That comes from find_ec_member_for_tle which is called by
> >prepare_sort_from_pathkeys which we note in the comments contains the
> >set of rules we need to mirror.
> >
>
> Thanks for the pointer. I'll take a look.
>
> >> (b) In find_em_expr_for_rel, which was what was used before, the
> >> condition on relids was this:
> >>
> >>      if (bms_is_subset(em->em_relids, rel->relids) &&
> >>          !bms_is_empty(em->em_relids))
> >>      {
> >>          return em->em_expr;
> >>      }
> >>
> >> but here we're using only
> >>
> >>      if (!bms_is_subset(em->em_relids, rel->relids))
> >>          continue;
> >>
> >> Isn't this missing the second bms_is_empty condition?
> >
> >I definitely remember intentionally removing that condition. For one,
> >it's not present in prepare_sort_from_pathkeys. My memory is a bit
> >fuzzy on all of the whys, but isn't it the case that if we've already
> >excluded const expressions, that we must have vars (and therefore
> >rels) present, and therefore the relids bms must be non-empty?
> >Probably more importantly, we're going to check that an actual em expr
> >matches, which means the bms subset check is really just an
> >optimization to avoid unnecessarily scanning the exprs for equality.
> >Perhaps it's worth adding a comment saying as such?
> >
> >By the way, the fact that this is parallel to
> >prepare_sort_from_pathkeys is the reason the XXX comment is still
> >there about possibly needing to remove relabel types (since
> >prepare_sort_from_pathkeys does that). I don't think it's a hard
> >requirement: the worst case is that by not digging into relabel types
> >we're being unnecessarily strict.
> >
> >Both of these I can add if desired.
> >
>
> Yeah, it'd be good to explain the reasoning why it's fine to have the
> conditions like this. Thanks.

I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.

As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.

The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).

James

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: making update/delete of inheritance trees scale better
Next
From: Tomas Vondra
Date:
Subject: Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?