Re: segfault with incremental sort - Mailing list pgsql-bugs

From James Coleman
Subject Re: segfault with incremental sort
Date
Msg-id CAAaqYe-97SFzDo98=OvoZa2oTEZb4jZ_xJKfp1FB-x5cgv9sNQ@mail.gmail.com
Whole thread Raw
In response to Re: segfault with incremental sort  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: segfault with incremental sort
List pgsql-bugs
On Mon, Nov 23, 2020 at 7:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 21, 2020 at 1:21 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Tue, Nov 3, 2020 at 11:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > 1. The SELECT DISTINCT gives rise to a sort key expression that
> > > contains non-parallel-safe SubPlans.  (It's not immediately apparent
> > > to me why we don't consider these particular subqueries parallel safe,
> > > but they aren't.  Anyway such a situation surely has to be allowed for.)
> >
> > parallel.sgml says that parallel query is excluded any time we have
> > "Plan nodes which reference a correlated SubPlan". That would include
> > this query, though I'm not sure why that's actually unsafe. I haven't
> > thought much about the general case, but this query itself looks like
> > it'd be safe.
> >
>
> IIRC, the reason was that for correlated subplans each time we need to
> send the param for execution to workers, and for that, we don't have
> an implementation yet. Basically, if the param size changes each time
> (say for varchar type of params), the amount of memory required would
> be different each time. It is not that we can't implement it but I
> think we have left it originally because we were not sure of the
> number of cases it can benefit and certainly it needs some more work.
> I am not following this and related discussions closely but can
> explain to me why you think the query/plan you are talking about is
> safe with respect to the above hazard?

Thanks for the explanation.

In this particular case we're not dealing with variable length fields
(it's an int), so that particular problem wouldn't inherently apply
(though I understand the generalized rule).

But I'm not really quite sure how sending params to workers (from the
leader I assume) is relevant here. In another thread you can see the
full plan [1], but effectively we have:

Gather Merge
  Sort
    Nested Loop
      Bitmap Heap Scan
      Index Only Scan
      Subplan 1
      Subplan 2

where the two subplans are returning the single result of a correlated
subquery (in a SELECT). As I understand it this doesn't require any
coordination with/from the leader at all; all of the information in
this case should actually be local to the individual worker. Each
worker needs to, for each tuple in its index scan, do another index
lookup based on that tuple.

James

1: https://www.postgresql.org/message-id/attachment/116274/explain_verbose.txt



pgsql-bugs by date:

Previous
From: James Coleman
Date:
Subject: Re: segfault with incremental sort
Next
From: Tom Lane
Date:
Subject: Re: BUG #16739: Temporary files not deleting from data folder on disk