Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb - Mailing list pgsql-hackers

From Neha Sharma
Subject Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Date
Msg-id CANiYTQtb1WJ+ZyHdJr_FJDSDZDh89VkGLcbRBh4P4RrnnBDejg@mail.gmail.com
Whole thread Raw
In response to Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb  (Amul Sul <sulamul@gmail.com>)
Responses Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers



On Tue, Mar 23, 2021 at 10:08 AM Amul Sul <sulamul@gmail.com> wrote:
On Mon, Mar 22, 2021 at 3:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 5:26 PM Amul Sul <sulamul@gmail.com> wrote:
> > In heapam_relation_copy_for_cluster(), begin_heap_rewrite() sets
> > rwstate->rs_new_rel->rd_smgr correctly but next line tuplesort_begin_cluster()
> > get called which cause the system cache invalidation and due to CCA setting,
> > wipe out rwstate->rs_new_rel->rd_smgr which wasn't restored for the subsequent
> > operations and causes segmentation fault.
> >
> > By calling RelationOpenSmgr() before calling smgrimmedsync() in
> > end_heap_rewrite() would fix the failure. Did the same in the attached patch.
>
> That makes sense.  I see a few commits in the git history adding
> RelationOpenSmgr() before a smgr* operation, whenever such a problem
> would have been discovered: 4942ee656ac, afa8f1971ae, bf347c60bdd7,
> for example.
>

Thanks for the confirmation.

> I do wonder if there are still other smgr* operations in the source
> code that are preceded by operations that would invalidate the
> SMgrRelation that those smgr* operations would be called with.  For
> example, the smgrnblocks() in gistBuildCallback() may get done too
> late than a corresponding RelationOpenSmgr() on the index relation.
>

I did the check for gistBuildCallback() by adding Assert(index->rd_smgr)  before
smgrnblocks() with CCA setting and didn't see any problem there.

I think the easiest way to find that is to run a regression suite with CCA
build, perhaps, there is no guarantee that regression will hit all smgr*
operations, but that might hit most of them.
 
Sure, will give a regression run with CCA enabled.

Regards,
Amul

Regards,
Neha Sharma

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Proposal: Save user's original authenticated identity for logging
Next
From: Amit Kapila
Date:
Subject: Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?