Re: pgsql: Improve handling of parameter differences in physicalreplicatio - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pgsql: Improve handling of parameter differences in physicalreplicatio
Date
Msg-id b8802e12-85d3-4784-142d-24671c9eebfd@oss.nttdata.com
Whole thread Raw
In response to Re: pgsql: Improve handling of parameter differences in physicalreplicatio  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

On 2020/03/31 3:10, Andres Freund wrote:
> Hi,
> 
> On 2020-03-30 19:41:43 +0900, Fujii Masao wrote:
>> On 2020/03/30 16:58, Peter Eisentraut wrote:
>>> Improve handling of parameter differences in physical replication
>>>
>>> When certain parameters are changed on a physical replication primary,
>>> this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
>>> record.  The standby then checks whether its own settings are at least
>>> as big as the ones on the primary.  If not, the standby shuts down
>>> with a fatal error.
>>>
>>> The correspondence of settings between primary and standby is required
>>> because those settings influence certain shared memory sizings that
>>> are required for processing WAL records that the primary might send.
>>> For example, if the primary sends a prepared transaction, the standby
>>> must have had max_prepared_transaction set appropriately or it won't
>>> be able to process those WAL records.
>>>
>>> However, fatally shutting down the standby immediately upon receipt of
>>> the parameter change record might be a bit of an overreaction.  The
>>> resources related to those settings are not required immediately at
>>> that point, and might never be required if the activity on the primary
>>> does not exhaust all those resources.  If we just let the standby roll
>>> on with recovery, it will eventually produce an appropriate error when
>>> those resources are used.
>>>
>>> So this patch relaxes this a bit.  Upon receipt of
>>> XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
>>> warning and set a global flag if there is a problem.  Then when we
>>> actually hit the resource issue and the flag was set, we issue another
>>> warning message with relevant information.
> 
> I find it somewhat hostile that we don't display the actual resource
> error once the problem is hit - we just pause. Sure, there's going to be
> some previous log entry explaining what the actual parameter difference
> is - but that could have been weeks ago. So either hard to find, or even
> rotated out.
> 
> 
>>> At that point we pause recovery, so a hot standby remains usable.
>>> We also repeat the last warning message once a minute so it is
>>> harder to miss or ignore.
> 
> 
> I can't really imagine that the adjustments made in this patch are
> sufficient.
> 
> One important issue seems to me to be the size of the array that
> TransactionIdIsInProgress() allocates:
>     /*
>      * If first time through, get workspace to remember main XIDs in. We
>      * malloc it permanently to avoid repeated palloc/pfree overhead.
>      */
>     if (xids == NULL)
>     {
>         /*
>          * In hot standby mode, reserve enough space to hold all xids in the
>          * known-assigned list. If we later finish recovery, we no longer need
>          * the bigger array, but we don't bother to shrink it.
>          */
>         int            maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
> 
>         xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
>         if (xids == NULL)
>             ereport(ERROR,
>                     (errcode(ERRCODE_OUT_OF_MEMORY),
>                      errmsg("out of memory")));
>     }
> 
> Which I think means we'll just overrun the xids array in some cases,
> e.g. if KnownAssignedXids overflowed.  Obviously we should have a
> crosscheck in the code (which we don't), but it was previously a
> supposedly unreachable path.
> 
> Similarly, the allocation in GetSnapshotData() will be too small, I
> think:
>     if (snapshot->xip == NULL)
>     {
>         /*
>          * First call for this snapshot. Snapshot is same size whether or not
>          * we are in recovery, see later comments.
>          */
>         snapshot->xip = (TransactionId *)
>             malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
>         if (snapshot->xip == NULL)
>             ereport(ERROR,
>                     (errcode(ERRCODE_OUT_OF_MEMORY),
>                      errmsg("out of memory")));
>         Assert(snapshot->subxip == NULL);
>         snapshot->subxip = (TransactionId *)
>             malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
>         if (snapshot->subxip == NULL)
>             ereport(ERROR,
>                     (errcode(ERRCODE_OUT_OF_MEMORY),
>                      errmsg("out of memory")));
>     }
> 
> I think basically all code using TOTAL_MAX_CACHED_SUBXIDS,
> GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed
> much more carefully than done here.
> 
> 
> Also, shouldn't dynahash be adjusted as well? There's e.g. the
> following HASH_ENTER path:
>                 /* report a generic message */
>                 if (hashp->isshared)
>                     ereport(ERROR,
>                             (errcode(ERRCODE_OUT_OF_MEMORY),
>                              errmsg("out of shared memory")));
> 
> 
> I'm also not sure it's ok to not have the waiting in
> ProcArrayAdd(). Is it guaranteed that can't be hit now, due to the WAL
> replay path sometimes adding procs?
> 
> 
> How is it safe to just block in StandbyParamErrorPauseRecovery() before
> raising an error? The error would have released lwlocks, but now we
> don't? If we e.g. block in PrepareRedoAdd() we'll continue to hold
> TwoPhaseStateLock(), but shutting the database down might also acquire
> that (CheckPointTwoPhase()). Similar with the error in
> KnownAssignedXidsAdd().
> 
> 
> This does not seem ready.
> 
> 
>> I encountered the trouble maybe related to this commit.
>>
>> Firstly I set up the master and the standby with max_connections=100 (default value).
>> Then I decreased max_connections to 1 only in the standby and restarted
>> the server. Thanks to the commit, I saw the following warning message
>> in the standby.
>>
>>      WARNING:  insufficient setting for parameter max_connections
>>      DETAIL:  max_connections = 1 is a lower setting than on the master server (where its value was 100).
>>      HINT:  Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
>>
>> Then I made the script that inserted 1,000,000 rows in one transaction,
>> and ran it 30 times at the same time. That is, 30 transactions inserting
>> lots of rows were running at the same time.
>>
>> I confirmed that there are expected number of rows in the master,
>> but found 0 row in the standby unxpectedly.
> 
> Have the relevant records actually been replicated?

Yeah, I thought I confirmed that, but when I tried to reproduce
the issue in the clean env, I failed to do that. So which means
that I did mistake... :( Sorry for noise..

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Justin Pryzby
Date:
Subject: Re: BUG #16109: Postgres planning time is high across version(Expose buffer usage during planning in EXPLAIN)