Re: pgsql: Improve handling of parameter differences in physicalreplicatio - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgsql: Improve handling of parameter differences in physicalreplicatio
Date
Msg-id 20200330181030.z5yiftm65mxvr46p@alap3.anarazel.de
Whole thread Raw
Responses Re: pgsql: Improve handling of parameter differences in physicalreplicatio  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Re: pgsql: Improve handling of parameter differences in physicalreplicatio  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Re: pgsql: Improve handling of parameter differences in physical replicatio  (Robert Haas <robertmhaas@gmail.com>)
Re: pgsql: Improve handling of parameter differences in physical replicatio  (Robert Haas <robertmhaas@gmail.com>)
Re: pgsql: Improve handling of parameter differences in physicalreplicatio  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2020-03-30 19:41:43 +0900, Fujii Masao wrote:
> On 2020/03/30 16:58, Peter Eisentraut wrote:
> > Improve handling of parameter differences in physical replication
> > 
> > When certain parameters are changed on a physical replication primary,
> > this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
> > record.  The standby then checks whether its own settings are at least
> > as big as the ones on the primary.  If not, the standby shuts down
> > with a fatal error.
> > 
> > The correspondence of settings between primary and standby is required
> > because those settings influence certain shared memory sizings that
> > are required for processing WAL records that the primary might send.
> > For example, if the primary sends a prepared transaction, the standby
> > must have had max_prepared_transaction set appropriately or it won't
> > be able to process those WAL records.
> > 
> > However, fatally shutting down the standby immediately upon receipt of
> > the parameter change record might be a bit of an overreaction.  The
> > resources related to those settings are not required immediately at
> > that point, and might never be required if the activity on the primary
> > does not exhaust all those resources.  If we just let the standby roll
> > on with recovery, it will eventually produce an appropriate error when
> > those resources are used.
> > 
> > So this patch relaxes this a bit.  Upon receipt of
> > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> > warning and set a global flag if there is a problem.  Then when we
> > actually hit the resource issue and the flag was set, we issue another
> > warning message with relevant information.

I find it somewhat hostile that we don't display the actual resource
error once the problem is hit - we just pause. Sure, there's going to be
some previous log entry explaining what the actual parameter difference
is - but that could have been weeks ago. So either hard to find, or even
rotated out.


> > At that point we pause recovery, so a hot standby remains usable.
> > We also repeat the last warning message once a minute so it is
> > harder to miss or ignore.


I can't really imagine that the adjustments made in this patch are
sufficient.

One important issue seems to me to be the size of the array that
TransactionIdIsInProgress() allocates:
    /*
     * If first time through, get workspace to remember main XIDs in. We
     * malloc it permanently to avoid repeated palloc/pfree overhead.
     */
    if (xids == NULL)
    {
        /*
         * In hot standby mode, reserve enough space to hold all xids in the
         * known-assigned list. If we later finish recovery, we no longer need
         * the bigger array, but we don't bother to shrink it.
         */
        int            maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;

        xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
        if (xids == NULL)
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("out of memory")));
    }

Which I think means we'll just overrun the xids array in some cases,
e.g. if KnownAssignedXids overflowed.  Obviously we should have a
crosscheck in the code (which we don't), but it was previously a
supposedly unreachable path.

Similarly, the allocation in GetSnapshotData() will be too small, I
think:
    if (snapshot->xip == NULL)
    {
        /*
         * First call for this snapshot. Snapshot is same size whether or not
         * we are in recovery, see later comments.
         */
        snapshot->xip = (TransactionId *)
            malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
        if (snapshot->xip == NULL)
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("out of memory")));
        Assert(snapshot->subxip == NULL);
        snapshot->subxip = (TransactionId *)
            malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
        if (snapshot->subxip == NULL)
            ereport(ERROR,
                    (errcode(ERRCODE_OUT_OF_MEMORY),
                     errmsg("out of memory")));
    }

I think basically all code using TOTAL_MAX_CACHED_SUBXIDS,
GetMaxSnapshotSubxidCount(), PROCARRAY_MAXPROCS needs to be reviewed
much more carefully than done here.


Also, shouldn't dynahash be adjusted as well? There's e.g. the
following HASH_ENTER path:
                /* report a generic message */
                if (hashp->isshared)
                    ereport(ERROR,
                            (errcode(ERRCODE_OUT_OF_MEMORY),
                             errmsg("out of shared memory")));


I'm also not sure it's ok to not have the waiting in
ProcArrayAdd(). Is it guaranteed that can't be hit now, due to the WAL
replay path sometimes adding procs?


How is it safe to just block in StandbyParamErrorPauseRecovery() before
raising an error? The error would have released lwlocks, but now we
don't? If we e.g. block in PrepareRedoAdd() we'll continue to hold
TwoPhaseStateLock(), but shutting the database down might also acquire
that (CheckPointTwoPhase()). Similar with the error in
KnownAssignedXidsAdd().


This does not seem ready.


> I encountered the trouble maybe related to this commit.
> 
> Firstly I set up the master and the standby with max_connections=100 (default value).
> Then I decreased max_connections to 1 only in the standby and restarted
> the server. Thanks to the commit, I saw the following warning message
> in the standby.
> 
>     WARNING:  insufficient setting for parameter max_connections
>     DETAIL:  max_connections = 1 is a lower setting than on the master server (where its value was 100).
>     HINT:  Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
> 
> Then I made the script that inserted 1,000,000 rows in one transaction,
> and ran it 30 times at the same time. That is, 30 transactions inserting
> lots of rows were running at the same time.
> 
> I confirmed that there are expected number of rows in the master,
> but found 0 row in the standby unxpectedly.

Have the relevant records actually been replicated?


> Also I suspected that issue
> happened because recovery is paused, but pg_is_wal_replay_paused()
> returned false in the standby.

As far as I understand the commit it shouldn't cause a pause until
there's an actual resource exhaustion - in which case there should be
another message first (WARNING: recovery paused because of insufficient
parameter settings).


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: [PATCH] remove condition always true (/src/backend/access/heap/vacuumlazy.c)
Next
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)