Thread: Re: O(1) DSM handle operations

Re: O(1) DSM handle operations

From
Robert Haas
Date:
On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> This is just a thought for discussion, no patch attached...
>
> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
> linear searches of the dsm_control->item array for either a free slot
> or a slot matching a given handle.  Maybe no one thinks this is a
> problem, because in practice the number of DSM slots you need to scan
> should be something like number of backends * some small factor at
> peak.

One thing I thought about when designing the format of the DSM control
segment was that we need to (attempt to) reread the old segment after
recovering from a crash, even if it's borked.  With the current
design, I think that nothing too bad can happen even if some or all of
the old control segment has been overwritten with gibberish.  I mean,
if we get particularly unlucky, we might manage to remove a DSM
segment that some other cluster is using, but we'd have to be very
unlucky for things to even get that bad, and we shouldn't crash
outright.

If we replace the array with some more complicated data structure,
we'd have to be sure that reading it is robust against it having been
scrambled by a previous crash.  Otherwise, it won't be possible to
restart the cluster without manual intervention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: O(1) DSM handle operations

From
Thomas Munro
Date:
On Tue, Mar 28, 2017 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> This is just a thought for discussion, no patch attached...
>>
>> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
>> linear searches of the dsm_control->item array for either a free slot
>> or a slot matching a given handle.  Maybe no one thinks this is a
>> problem, because in practice the number of DSM slots you need to scan
>> should be something like number of backends * some small factor at
>> peak.
>
> One thing I thought about when designing the format of the DSM control
> segment was that we need to (attempt to) reread the old segment after
> recovering from a crash, even if it's borked.  With the current
> design, I think that nothing too bad can happen even if some or all of
> the old control segment has been overwritten with gibberish.  I mean,
> if we get particularly unlucky, we might manage to remove a DSM
> segment that some other cluster is using, but we'd have to be very
> unlucky for things to even get that bad, and we shouldn't crash
> outright.
>
> If we replace the array with some more complicated data structure,
> we'd have to be sure that reading it is robust against it having been
> scrambled by a previous crash.  Otherwise, it won't be possible to
> restart the cluster without manual intervention.

Couldn't cleanup code continue to work just the same way though?  The
only extra structure is an intrusive freelist, but that could be
completely ignored by code that wants to scan the whole array after
crash.  It would only be used to find a free slot after successful
restart, once the freelist is rebuilt and known to be sane, and could
be sanity checked when accessed by dsm_create.  So idea 2 doesn't seem
to make that code any less robust, does it?

Deterministic key_t values for SysV IPC do seem problematic thought,
for multiple PostgreSQL clusters.  Maybe that is a serious problem for
idea 1.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: O(1) DSM handle operations

From
Robert Haas
Date:
On Mon, Mar 27, 2017 at 11:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Couldn't cleanup code continue to work just the same way though?  The
> only extra structure is an intrusive freelist, but that could be
> completely ignored by code that wants to scan the whole array after
> crash.  It would only be used to find a free slot after successful
> restart, once the freelist is rebuilt and known to be sane, and could
> be sanity checked when accessed by dsm_create.  So idea 2 doesn't seem
> to make that code any less robust, does it?

Agreed.

> Deterministic key_t values for SysV IPC do seem problematic thought,
> for multiple PostgreSQL clusters.  Maybe that is a serious problem for
> idea 1.

It might be.

Mostly, even if we had no PostgreSQL-imposed limits here at all, I
don't share your confidence that we create tons of DSM segments and
everything will just work.  I think we'll run into operating system
limits pretty quickly -- either hard limits, like limits on the number
of segments supported, or soft limits, like difficulties handling
large numbers of memory mappings with good performance.  I personally
think it would be more worthwhile to work on
http://postgr.es/m/CA+TgmoZVqXATOGEKFdnG_sMugx_iT8_B4L0OKJZeowHburMkiQ@mail.gmail.com
instead, but I will be a little too busy to write that patch this
week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company