On Mon, Mar 30, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote:
> One important issue seems to me to be the size of the array that
> TransactionIdIsInProgress() allocates:
> /*
> * If first time through, get workspace to remember main XIDs in. We
> * malloc it permanently to avoid repeated palloc/pfree overhead.
> */
> if (xids == NULL)
> {
> /*
> * In hot standby mode, reserve enough space to hold all xids in the
> * known-assigned list. If we later finish recovery, we no longer need
> * the bigger array, but we don't bother to shrink it.
> */
> int maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS :
arrayP->maxProcs;
>
> xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
> if (xids == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
> errmsg("out of memory")));
> }
>
> Which I think means we'll just overrun the xids array in some cases,
> e.g. if KnownAssignedXids overflowed. Obviously we should have a
> crosscheck in the code (which we don't), but it was previously a
> supposedly unreachable path.
I think this patch needs to be reverted. The only places where it
changes anything are places where we were about to throw some error
anyway. But as Andres's analysis shows, that's not nearly good enough.
I am kind of surprised that Peter thought that would be good enough.
It is necessary, for something like this, to investigate all the
places where the code may be relying on a certain assumption, not just
assume that there's an error check everywhere that we rely on that
assumption and change only those places.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company