Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Date
Msg-id 20160225172958.t7pdlt4puydblchx@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
List pgsql-bugs
Hi,

On 2016-02-25 12:20:06 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-02-25 09:51:49 -0500, Tom Lane wrote:
> >> Marking pgprocno volatile is silly.  What *is* missing is this:
> >>
> >> -    ProcArrayStruct *arrayP = procArray;
> >> +    volatile ProcArrayStruct *arrayP = procArray;
>
> > Well, that'll also force arrayP->numProcs to be loaded from memory every
> > loop iteration. Not sure if we really want that.
>
> I think we do.  The entire point here is not to assume that that storage
> isn't changing.

Well, but what does it buy us? Given there's no locks involved, the
index < arrayP->numProcs check fundamentally cannot be anything than an
optimization, preventing us from looking at all PGPROC/PGXACT entries,
no?  Reloading it at every loop iteration doesn't seem to give us
anything other than some sense of false security?  It's a) perfectly
possible that numProcs is outdated because we're hitting a local
cacheline whereas another backend has it modified locally (store
buffers) b) that it's decremented after we entered the loop.

> > What bothers me about this right now is that we currently write the
> > pgprocno array with:
> >     memmove(&arrayP->pgprocnos[index + 1], &arrayP->pgprocnos[index],
> >             (arrayP->numProcs - index) * sizeof(int));
> >     arrayP->pgprocnos[index] = proc->pgprocno;
> >     arrayP->numProcs++;
> > afaics there's absolutely no guarantee that memmov() will only use
> > aligned sizeof(int) writes.
>
> Ugh.  That's a separate problem, but yes, it's a problem.

While it's a separate bug, it seems to me that this could just as well
be the reason for crashes triggering this report :/. If pgprocno points
behind the end of the allPgXact array, it's quite possible that we're
reading zeroes :(

> Seems like we can either (1) get rid of that memmove in favor of
> a handwritten loop, or (2) give up on unlocked access to the
> pgprocnos array.  Which performance hit would you rather take?

I think 1), while it'll be noticeable, it's probably going to degrade
much more gracefully.

I think we additionally also should add a security check to
MinimumActiveBackends that prevents a pgprocno above the max number of
backends to be seen as valid.

- Andres

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Next
From: Christopher Browne
Date:
Subject: Re: Query-Sending mail from PostgresSQL