Home > mailing lists

Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Date	March 24, 2021 14:39:42
Msg-id	1056468.1616596782@sss.pgh.pa.us Whole thread Raw
In response to	Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb (Amul Sul <sulamul@gmail.com>)
Responses	Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
List	pgsql-hackers

Tree view

Amul Sul <sulamul@gmail.com> writes:
> On Tue, Mar 23, 2021 at 8:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On closer inspection, I believe the true culprit is c6b92041d,
>> which did this:
>> -               heap_sync(state->rs_new_rel);
>> +               smgrimmedsync(state->rs_new_rel->rd_smgr, MAIN_FORKNUM);
>> heap_sync was careful about opening rd_smgr, the new code not so much.

> So we also need to make sure of the
> RelationOpenSmgr() call before smgrimmedsync() as proposed previously.

I wonder if we should try to get rid of this sort of bug by banning
direct references to rd_smgr?  That is, write the above and all
similar code like

    smgrimmedsync(RelationGetSmgr(state->rs_new_rel), MAIN_FORKNUM);

where we provide something like

static inline struct SMgrRelationData *
RelationGetSmgr(Relation rel)
{
    if (unlikely(rel->rd_smgr == NULL))
        RelationOpenSmgr(rel);
    return rel->rd_smgr;
}

and then we could get rid of most or all other RelationOpenSmgr calls.

This might create more code bloat than it's really worth, but
it'd be a simple and mechanically-checkable scheme.

            regards, tom lane

pgsql-hackers by date:

From: Alvaro Herrera
Date: 24 March 2021, 14:11:41
Subject: psql lacking clearerr()

From: Robert Haas
Date: 24 March 2021, 14:43:22
Subject: Re: pg_amcheck contrib application

Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb - Mailing list pgsql-hackers

Previous

Next