Re: DSM robustness failure (was Re: Peripatus/failures) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: DSM robustness failure (was Re: Peripatus/failures)
Date
Msg-id CAEepm=0Oz7m_1EV8Bhc80mKn6XCXeFkRsVuFrXdFvjRcKBXR+A@mail.gmail.com
Whole thread Raw
In response to DSM robustness failure (was Re: Peripatus/failures)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: DSM robustness failure (was Re: Peripatus/failures)  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 2018-10-17 13:43:24.235 CDT [46467:6] LOG:  dynamic shared memory control segment is corrupt
> TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 181)
>
> It looks to me like what's happening is
>
> (1) crashing process corrupts the DSM control segment somehow.

I wonder how.  Apparently mapped size was tiny (least likely
explanation), control->magic was wrong, or control->maxitems and
control->nitems were inconsistent with each other or the mapped size.

> (2) dsm_postmaster_shutdown notices that, bleats to the log, and
> figures its job is done.

Right, that seems to be the main problem.

> (3) dsm_postmaster_startup crashes on Assert because
> dsm_control_mapped_size isn't 0, because the old seg is still mapped.

Right.

> I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
> are broken here; the former because it makes no attempt to unmap
> the old control segment (which it oughta be able to do no matter how badly
> broken the contents are), and the latter because it should not let
> garbage old state prevent it from establishing a valid new segment.

Looking.

> BTW, the header comment on dsm_postmaster_startup is a lie, which
> is probably not unrelated to its failure to consider this situation.

Agreed.

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Large writable variables
Next
From: James Coleman
Date:
Subject: Re: pageinspect: add tuple_data_record()