Re: DSM robustness failure (was Re: Peripatus/failures) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: DSM robustness failure (was Re: Peripatus/failures)
Date
Msg-id CAEepm=0CHccUe35-G_9D-Xwpm_Nrpgqb6Z1jECc1tOo5ukh9Ow@mail.gmail.com
Whole thread Raw
In response to Re: DSM robustness failure (was Re: Peripatus/failures)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: DSM robustness failure (was Re: Peripatus/failures)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Oct 18, 2018 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The below code seems to be problemetic:
> dsm_cleanup_using_control_segment()
> {
> ..
> if (!dsm_control_segment_sane(old_control, mapped_size))
> {
> dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
> &mapped_address, &mapped_size, LOG);
> ..
> }
>
> Here, don't we need to use dsm_control_* variables instead of local
> variable mapped_* variables?

I was a little fuzzy on when exactly
dsm_cleanup_using_control_segment() and dsm_postmaster_shutdown() run,
but after some more testing I think I have this straight now.  You can
test by setting dsm_control->magic to 42 in a debugger and trying
three cases:

1.  Happy shutdown: dsm_postmaster_shutdown() complains on shutdown.
2.  kill -9 a non-postmaster process: dsm_postmaster_shutdown()
complains during auto-restart.
3.  kill -9 the postmaster, manually start up again:
dsm_cleanup_using_control_segment() runs.  It ignores the old segment
quietly if it doesn't pass the sanity test.

So to answer your question: no, dsm_cleanup_using_control_segment() is
case 3.  This entirely new postmaster process has never had the
segment mapped in, so the dsm_control_* variables are not relevant
here.

Hmm.... but if you're running N other independent clusters on the same
machine that started up after this cluster crashed in case 3, I think
there is an N-in-four-billion chance that the segment with that ID now
belongs to another cluster and happens to be its DSM control segment,
and therefore passes the magic-number sanity test, and then we'll nuke
it and all the segments it references.  Am I missing something?  Could
we fix that by putting some kind of cluster ID in the control segment?

So on second thoughts, I think I should remove the
dsm_cleanup_using_control_segment() change from the patch, because it
widens the risk of case 3 nuking someone else's stuff to include even
segments that don't pass the sanity test.  That's not good.  Hunk
removed.

> ... Do we need to reset the dsm_control_*
> variables in dsm_cleanup_for_mmap?

I don't think so -- that unlinks the files in file-backed DSM, but we
also call the other code in that case.

> @@ -336,13 +333,14 @@ dsm_postmaster_shutdown(int code, Datum arg)
>   * stray shared memory segments, but there's not much we can do about that
>   * if the metadata is gone.
>   */
> - nitems = dsm_control->nitems;
>   if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
>   {
>   ereport(LOG,
>   (errmsg("dynamic shared memory control segment is corrupt")));
> - return;
> + nitems = 0;
>
> The comment above this code says:
> /*
> * If some other backend exited uncleanly, it might have corrupted the
> * control segment while it was dying.  In that case, we warn and ignore
> * the contents of the control segment.  This may end up leaving behind
> * stray shared memory segments, but there's not much we can do about that
> * if the metadata is gone.
> */
>
> Don't we need to adjust this comment, if we want to go with what you
> have proposed?

The comment seems correct: we ignore the *contents* of the control
segment.  But to be clearer I'll add a sentence to say that we still
attempt to destroy the control segment itself.

>  BTW, it doesn't appear insane to me that if the
> control segment got corrupted, then we leave it as it is rather than
> trying to destroy it.  I am not sure, but if it is corrupted, then is
> it guaranteed that destroy will always succeed?

You're right that it might fail.  If the shm_unlink() fails it doesn't
matter, we just produce LOG, but if the unmap() fails we still
consider the segment to be mapped (because it is!), so ... hmm, we'd
better forget about it if we plan to continue running, by clearing
those variables explicitly.  That's actually another pre-existing way
for the assertion to fail in current master.  (Realistically, that
unmmap() could only fail if our global variables are trashed so we try
to unmap() a junk address; AFAIK our model here is that the postmaster
is sane, even if everything else is insane, so that's sort of outside
the model.  But this new patch addresses it anyway.)

Thanks for the review!

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: DSM robustness failure (was Re: Peripatus/failures)
Next
From: Andrew Dunstan
Date:
Subject: lowering pg_regress privileges on Windows