Re: pgsql: Improve handling of parameter differences in physicalreplicatio - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: pgsql: Improve handling of parameter differences in physicalreplicatio |
Date | |
Msg-id | b8802e12-85d3-4784-142d-24671c9eebfd@oss.nttdata.com Whole thread Raw |
In response to | Re: pgsql: Improve handling of parameter differences in physicalreplicatio (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On 2020/03/31 3:10, Andres Freund wrote: > Hi, > > On 2020-03-30 19:41:43 +0900, Fujii Masao wrote: >> On 2020/03/30 16:58, Peter Eisentraut wrote: >>> Improve handling of parameter differences in physical replication >>> >>> When certain parameters are changed on a physical replication primary, >>> this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL >>> record. The standby then checks whether its own settings are at least >>> as big as the ones on the primary. If not, the standby shuts down >>> with a fatal error. >>> >>> The correspondence of settings between primary and standby is required >>> because those settings influence certain shared memory sizings that >>> are required for processing WAL records that the primary might send. >>> For example, if the primary sends a prepared transaction, the standby >>> must have had max_prepared_transaction set appropriately or it won't >>> be able to process those WAL records. >>> >>> However, fatally shutting down the standby immediately upon receipt of >>> the parameter change record might be a bit of an overreaction. The >>> resources related to those settings are not required immediately at >>> that point, and might never be required if the activity on the primary >>> does not exhaust all those resources. If we just let the standby roll >>> on with recovery, it will eventually produce an appropriate error when >>> those resources are used. >>> >>> So this patch relaxes this a bit. Upon receipt of >>> XLOG_PARAMETER_CHANGE, we still check the settings but only issue a >>> warning and set a global flag if there is a problem. Then when we >>> actually hit the resource issue and the flag was set, we issue another >>> warning message with relevant information. > > I find it somewhat hostile that we don't display the actual resource > error once the problem is hit - we just pause. Sure, there's going to be > some previous log entry explaining what the actual parameter difference > is - but that could have been weeks ago. So either hard to find, or even > rotated out. > > >>> At that point we pause recovery, so a hot standby remains usable. >>> We also repeat the last warning message once a minute so it is >>> harder to miss or ignore. > > > I can't really imagine that the adjustments made in this patch are > sufficient. > > One important issue seems to me to be the size of the array that > TransactionIdIsInProgress() allocates: > /* > * If first time through, get workspace to remember main XIDs in. We > * malloc it permanently to avoid repeated palloc/pfree overhead. > */ > if (xids == NULL) > { > /* > * In hot standby mode, reserve enough space to hold all xids in the > * known-assigned list. If we later finish recovery, we no longer need > * the bigger array, but we don't bother to shrink it. > */ > int maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs; > > xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId)); > if (xids == NULL) > ereport(ERROR, > (errcode(ERRCODE_OUT_OF_MEMORY), > errmsg("out of memory"))); > } > > Which I think means we'll just overrun the xids array in some cases, > e.g. if KnownAssignedXids overflowed. Obviously we should have a > crosscheck in the code (which we don't), but it was previously a > supposedly unreachable path. > > Similarly, the allocation in GetSnapshotData() will be too small, I > think: > if (snapshot->xip == NULL) > { > /* > * First call for this snapshot. Snapshot is same size whether or not > * we are in recovery, see later comments. > */ > snapshot->xip = (TransactionId *) > malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId)); > if (snapshot->xip == NULL) > ereport(ERROR, > (errcode(ERRCODE_OUT_OF_MEMORY), > errmsg("out of memory"))); > Assert(snapshot->subxip == NULL); > snapshot->subxip = (TransactionId *) > malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId)); > if (snapshot->subxip == NULL) > ereport(ERROR, > (errcode(ERRCODE_OUT_OF_MEMORY), > errmsg("out of memory"))); > } > > I think basically all code using TOTAL_MAX_CACHED_SUBXIDS, > GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed > much more carefully than done here. > > > Also, shouldn't dynahash be adjusted as well? There's e.g. the > following HASH_ENTER path: > /* report a generic message */ > if (hashp->isshared) > ereport(ERROR, > (errcode(ERRCODE_OUT_OF_MEMORY), > errmsg("out of shared memory"))); > > > I'm also not sure it's ok to not have the waiting in > ProcArrayAdd(). Is it guaranteed that can't be hit now, due to the WAL > replay path sometimes adding procs? > > > How is it safe to just block in StandbyParamErrorPauseRecovery() before > raising an error? The error would have released lwlocks, but now we > don't? If we e.g. block in PrepareRedoAdd() we'll continue to hold > TwoPhaseStateLock(), but shutting the database down might also acquire > that (CheckPointTwoPhase()). Similar with the error in > KnownAssignedXidsAdd(). > > > This does not seem ready. > > >> I encountered the trouble maybe related to this commit. >> >> Firstly I set up the master and the standby with max_connections=100 (default value). >> Then I decreased max_connections to 1 only in the standby and restarted >> the server. Thanks to the commit, I saw the following warning message >> in the standby. >> >> WARNING: insufficient setting for parameter max_connections >> DETAIL: max_connections = 1 is a lower setting than on the master server (where its value was 100). >> HINT: Change parameters and restart the server, or there may be resource exhaustion errors sooner or later. >> >> Then I made the script that inserted 1,000,000 rows in one transaction, >> and ran it 30 times at the same time. That is, 30 transactions inserting >> lots of rows were running at the same time. >> >> I confirmed that there are expected number of rows in the master, >> but found 0 row in the standby unxpectedly. > > Have the relevant records actually been replicated? Yeah, I thought I confirmed that, but when I tried to reproduce the issue in the clean env, I failed to do that. So which means that I did mistake... :( Sorry for noise.. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
pgsql-hackers by date: