Re: SegFault on 9.6.14 - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: SegFault on 9.6.14 |
Date | |
Msg-id | CA+hUKG+BO-inHQzeKVr4b9yxnUhGzgS=JKMiUZioyrULPV7P4Q@mail.gmail.com Whole thread Raw |
In response to | Re: SegFault on 9.6.14 (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: SegFault on 9.6.14
|
List | pgsql-hackers |
On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Right, that will be lesser code churn and it can also work. However, > > one thing that needs some thought is till now es_top_eflags is only > > set in ExecutorStart and same is mentioned in comments where it is > > declared and it seems we are going to change that with this idea. How > > about having a separate function ExecBlahShutdown which will clean up > > resources as parallel context and can be called only from ExecutePlan > > where we are calling ExecShutdownNode? I think both these and the > > other solution we have discussed are on similar lines and another idea > > could be to relax the assert which again is not a superb idea. > > It seems we don't have a clear preference for any particular solution > among these and neither there appears to be any better idea. I guess > we can wait for a few days to see if Robert has any views on this, > otherwise, pick one of the above and move ahead. I take the EXEC_FLAG_DONE idea back. It's ugly and too hard to verify that every appropriate path sets it, and a flag that means the opposite would be even more of a kluge, and generally I think I was looking at this too myopically: I was looking for a way to shut down processes ASAP without giving up the shared memory we'll need for rescanning, but what I should have been looking at is the reason you did that in the first place: to get the instrumentation data. Can you explain why it's necessary to do that explicitly for Limit? Wouldn't the right place to collect instrumentation be at the end of execution when Shutdown will run in all cases anyway (and possibly also during ExecParallelReinitialize() or something like that if it's being clobbered by rescans, I didn't check)? What's special about Limit? Today while poking at this and trying to answer those questions for myself, I realised that the repro I posted earlier[1] crashes exactly as Jerry reported on REL9_6_STABLE, but in later release branches it runs to completion. That's because the crashing code was removed in commit 41b0dd98 "Separate reinitialization of shared parallel-scan state from ExecReScan.". So newer branches get past that problem, but they all spit out tons of each of these three warnings: WARNING: buffer refcount leak: [172] (rel=base/12558/16390, blockNum=5, flags=0x93800000, refcount=1 2998) ... WARNING: relcache reference leak: relation "join_bar" not closed ... WARNING: Snapshot reference leak: Snapshot 0x7ff20383bfb0 still referenced ... Oops. I don't know exactly why yet, but the problem goes away if you just comment out the offending ExecShutdownNode() call in nodeLimit.c. I tried to understand whether the buffer stats were wrong with that code commented out (Adrien Nayrat's original complaint[2]), but I ran out of time for debugging adventures today. [1] https://www.postgresql.org/message-id/CA%2BhUKGJyqDp9FZSHLTjiNMcz-c6%3DRdStB%2BUjVZsR8wfHnJXy8Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/86137f17-1dfb-42f9-7421-82fd786b04a1%40anayrat.info -- Thomas Munro https://enterprisedb.com
pgsql-hackers by date: