Thread: Optimization for hot standby XLOG_STANDBY_LOCK redo

Optimization for hot standby XLOG_STANDBY_LOCK redo

From
邱宇航
Date:
I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blocked by this lock getting operation, which is not good and often happed in my database, so the hot standby will be left behind and master will store a lot of WAL which can’t be purged.

So here is the idea:
We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue.
And I wonder will LogStandbySnapshot influence the consistency in hot standby, for the redo is not by order. And how to avoid this.

// ------ startup ------
StartupXLOG()
{
    while (readRecord())
    {
        check_lock_get_state();
        if (record.tx is in pending tbl):
            append this record to the pending lock for further redo.
        redo_record();
    }
}

check_lock_get_state()
{
    for (tx in pending_tx):
        if (tx.all_lock are got):
             redo the rest record for this tx
             free this tx
}

standby_redo
{
    if (XLOG_STANDBY_LOCK redo falied)
    {
        add_lock_to_pending_tx_tbl();
    }
}

// 
------ worker process ------
main()
{
     while(true)
    {
        for (lock in pending locks order by lsn)
            try_to_get_lock_from_pending_tbl();
    }
}


regards.
Yuhang

Re: Optimization for hot standby XLOG_STANDBY_LOCK redo

From
Amit Kapila
Date:
On Thu, Apr 30, 2020 at 4:07 PM 邱宇航 <iamqyh@gmail.com> wrote:
>
> I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is
blockedby this lock getting operation, which is not good and often happed in my database, so the hot standby will be
leftbehind and master will store a lot of WAL which can’t be purged. 
>
> So here is the idea:
> We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue.
>

Hmm, I don't think we can do this.  The XLOG_STANDBY_LOCK WAL is used
for AccessExclusiveLock on a Relation which means it is a lock for a
DDL operation.  If you skip processing the WAL for this lock, the
behavior of queries running on standby will be unpredictable.
Consider a case where on the master, the user has dropped the table
<t1> and when it will replay such an operation on standby the
concurrent queries on t1 will be blocked due to replay of
XLOG_STANDBY_LOCK WAL and if you skip that WAL, the drop of table and
query on the same table can happen simultaneously leading to
unpredictable behavior.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Optimization for hot standby XLOG_STANDBY_LOCK redo

From
邱宇航
Date:
I mean that all resources protected by XLOG_STANDBY_LOCK should redo later.
The semantics of XLOG_STANDBY_LOCK is still kept.

2020年4月30日 下午7:12,Amit Kapila <amit.kapila16@gmail.com> 写道:

On Thu, Apr 30, 2020 at 4:07 PM 邱宇航 <iamqyh@gmail.com> wrote:

I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blocked by this lock getting operation, which is not good and often happed in my database, so the hot standby will be left behind and master will store a lot of WAL which can’t be purged.

So here is the idea:
We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue.


Hmm, I don't think we can do this.  The XLOG_STANDBY_LOCK WAL is used
for AccessExclusiveLock on a Relation which means it is a lock for a
DDL operation.  If you skip processing the WAL for this lock, the
behavior of queries running on standby will be unpredictable.
Consider a case where on the master, the user has dropped the table
<t1> and when it will replay such an operation on standby the
concurrent queries on t1 will be blocked due to replay of
XLOG_STANDBY_LOCK WAL and if you skip that WAL, the drop of table and
query on the same table can happen simultaneously leading to
unpredictable behavior.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Optimization for hot standby XLOG_STANDBY_LOCK redo

From
邱宇航
Date:
And one more question, what LogAccessExclusiveLocks in LogStandbySnapshot is used for? Can We remove this.

2020年5月6日 上午10:36,邱宇航 <iamqyh@gmail.com> 写道:

I mean that all resources protected by XLOG_STANDBY_LOCK should redo later.
The semantics of XLOG_STANDBY_LOCK is still kept.


Re: Optimization for hot standby XLOG_STANDBY_LOCK redo

From
Amit Kapila
Date:
On Wed, May 6, 2020 at 8:35 AM 邱宇航 <iamqyh@gmail.com> wrote:
>
> And one more question, what LogAccessExclusiveLocks in LogStandbySnapshot is used for?
>

As far as I understand, this is required to ensure that we have
acquired all the AccessExclusiveLocks on relations before we can say
standby has reached STANDBY_SNAPSHOT_READY and allow read-only queries
in standby.  Read comments above LogStandbySnapshot.

> Can We remove this.
>

I don't think so.  In general, if you want to change and or remove
some code, it is your responsibility to come up with a reason/theory
why it is OK to do so.

> 2020年5月6日 上午10:36,邱宇航 <iamqyh@gmail.com> 写道:
>
> I mean that all resources protected by XLOG_STANDBY_LOCK should redo later.
> The semantics of XLOG_STANDBY_LOCK is still kept.
>

I don't think we can postpone it. If we delay applying
XLOG_STANDBY_LOCK and apply others then the result could be
unpredictable as explained in my previous email.

Note - Please don't top-post. Use the style that I and or others are
using in this list as that will make it easier to understand and
respond to your emails.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com