Re: O(1) DSM handle operations - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: O(1) DSM handle operations
Date
Msg-id CAEepm=0wa5qWAJgmFbHjFEMEFwtvSGM+7-Qib_MGeqq5j3f9ug@mail.gmail.com
Whole thread Raw
In response to Re: O(1) DSM handle operations  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: O(1) DSM handle operations  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Mar 28, 2017 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> This is just a thought for discussion, no patch attached...
>>
>> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
>> linear searches of the dsm_control->item array for either a free slot
>> or a slot matching a given handle.  Maybe no one thinks this is a
>> problem, because in practice the number of DSM slots you need to scan
>> should be something like number of backends * some small factor at
>> peak.
>
> One thing I thought about when designing the format of the DSM control
> segment was that we need to (attempt to) reread the old segment after
> recovering from a crash, even if it's borked.  With the current
> design, I think that nothing too bad can happen even if some or all of
> the old control segment has been overwritten with gibberish.  I mean,
> if we get particularly unlucky, we might manage to remove a DSM
> segment that some other cluster is using, but we'd have to be very
> unlucky for things to even get that bad, and we shouldn't crash
> outright.
>
> If we replace the array with some more complicated data structure,
> we'd have to be sure that reading it is robust against it having been
> scrambled by a previous crash.  Otherwise, it won't be possible to
> restart the cluster without manual intervention.

Couldn't cleanup code continue to work just the same way though?  The
only extra structure is an intrusive freelist, but that could be
completely ignored by code that wants to scan the whole array after
crash.  It would only be used to find a free slot after successful
restart, once the freelist is rebuilt and known to be sane, and could
be sanity checked when accessed by dsm_create.  So idea 2 doesn't seem
to make that code any less robust, does it?

Deterministic key_t values for SysV IPC do seem problematic thought,
for multiple PostgreSQL clusters.  Maybe that is a serious problem for
idea 1.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: logical decoding of two-phase transactions
Next
From: Rafia Sabih
Date:
Subject: Re: [COMMITTERS] pgsql: Improve access to parallel queryfrom procedural languages.