Re: SegFault on 9.6.14 - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: SegFault on 9.6.14 |
Date | |
Msg-id | CAA4eK1KXZwH+3ZsQi7PMQo53o+tDkK_gi=ASbPThAySsUg4cYA@mail.gmail.com Whole thread Raw |
In response to | Re: SegFault on 9.6.14 (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: SegFault on 9.6.14
|
List | pgsql-hackers |
On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Sep 13, 2019 at 1:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Robert, Thomas, do you have any more suggestions related to this. I > > > am planning to commit the above-discussed patch (Forbid Limit node to > > > shutdown resources.) coming Monday, so that at least the reported > > > problem got fixed. > > > > I think that your commit message isn't very clear about what the > > actual issue is. And the patch itself doesn't add any comments or > > anything to try to clear it up. So I wouldn't favor committing it in > > this form. > > Is the proposed commit message at the bottom of this email an improvement? > > Do I understand correctly that, with this patch, we can only actually > lose statistics in the case where we rescan? > No, it will lose without rescan as well. To understand in detail, you might want to read the emails pointed by me in one of the above email [1] in this thread. > That is, precisely the > case that crashes (9.6) or spews warnings (10+)? In a quick > non-rescan test with the ExecShutdownNode() removed, I don't see any > problem with the buffer numbers on my screen: > Try by removing aggregate function. Basically, the Limit node has to finish before consuming all the rows sent by a parallel node beneath it. > > === > Don't shut down Gather[Merge] early under Limit. > > Revert part of commit 19df1702f5. > > Early shutdown was added by that commit so that we could collect > statistics from workers, but unfortunately it interacted badly with > rescans. Rescanning a Limit over a Gather node could produce a SEGV > on 9.6 and resource leak warnings on later releases. By reverting the > early shutdown code, we might lose statistics in some cases of Limit > over Gather, but that will require further study to fix. > How about some text like below? I have added slightly different text to explain the reason for the problem. "Early shutdown was added by that commit so that we could collect statistics from workers, but unfortunately, it interacted badly with rescans. The problem is that we ended up destroying the parallel context which is required for rescans. This leads to rescans of a Limit node over a Gather node to produce unpredictable results as it tries to access destroyed parallel context. By reverting the early shutdown code, we might lose statistics in some cases of Limit over Gather, but that will require further study to fix." I am not sure but we can even add a comment in the place where we are removing some code (in nodeLimit.c) to indicate that 'Ideally we should shutdown parallel resources here to get the correct stats, but that would lead to rescans misbehaving when there is a Gather [Merge] node beneath it. (Explain the reason for misbehavior and the ideas we discussed in this thread to fix the same) ........." I can try to come up with comments in nodeLimit.c on the above lines if we think that is a good idea? [1] - https://www.postgresql.org/message-id/CAA4eK1Ja-eoavXcr0eq7w7hP%3D64VP49k%3DNMFxwhtK28NHfBOdA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: