Thread: Optimization for hot standby XLOG_STANDBY_LOCK redo
I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blocked by this lock getting operation, which is not good and often happed in my database, so the hot standby will be left behind and master will store a lot of WAL which can’t be purged.
standby_redo
So here is the idea:
We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue.
And I wonder will LogStandbySnapshot influence the consistency in hot standby, for the redo is not by order. And how to avoid this.
// ------ startup ------
StartupXLOG()
StartupXLOG()
{
while (readRecord())
{
check_lock_get_state(); if (record.tx is in pending tbl):
append this record to the pending lock for further redo.
redo_record();
}
}
check_lock_get_state()
{
for (tx in pending_tx):
if (tx.all_lock are got):
redo the rest record for this tx
free this tx
}
standby_redo
{
if (XLOG_STANDBY_LOCK redo falied)
// ------ worker process ------
if (XLOG_STANDBY_LOCK redo falied)
{
add_lock_to_pending_tx_tbl(); }
}
// ------ worker process ------
main()
{
while(true)
{
for (lock in pending locks order by lsn)
try_to_get_lock_from_pending_tbl(); }
}
regards.
Yuhang
On Thu, Apr 30, 2020 at 4:07 PM 邱宇航 <iamqyh@gmail.com> wrote: > > I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blockedby this lock getting operation, which is not good and often happed in my database, so the hot standby will be leftbehind and master will store a lot of WAL which can’t be purged. > > So here is the idea: > We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue. > Hmm, I don't think we can do this. The XLOG_STANDBY_LOCK WAL is used for AccessExclusiveLock on a Relation which means it is a lock for a DDL operation. If you skip processing the WAL for this lock, the behavior of queries running on standby will be unpredictable. Consider a case where on the master, the user has dropped the table <t1> and when it will replay such an operation on standby the concurrent queries on t1 will be blocked due to replay of XLOG_STANDBY_LOCK WAL and if you skip that WAL, the drop of table and query on the same table can happen simultaneously leading to unpredictable behavior. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
I mean that all resources protected by XLOG_STANDBY_LOCK should redo later.
The semantics of XLOG_STANDBY_LOCK is still kept.
2020年4月30日 下午7:12,Amit Kapila <amit.kapila16@gmail.com> 写道:On Thu, Apr 30, 2020 at 4:07 PM 邱宇航 <iamqyh@gmail.com> wrote:
I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blocked by this lock getting operation, which is not good and often happed in my database, so the hot standby will be left behind and master will store a lot of WAL which can’t be purged.
So here is the idea:
We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue.
Hmm, I don't think we can do this. The XLOG_STANDBY_LOCK WAL is used
for AccessExclusiveLock on a Relation which means it is a lock for a
DDL operation. If you skip processing the WAL for this lock, the
behavior of queries running on standby will be unpredictable.
Consider a case where on the master, the user has dropped the table
<t1> and when it will replay such an operation on standby the
concurrent queries on t1 will be blocked due to replay of
XLOG_STANDBY_LOCK WAL and if you skip that WAL, the drop of table and
query on the same table can happen simultaneously leading to
unpredictable behavior.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
And one more question, what LogAccessExclusiveLocks in LogStandbySnapshot is used for? Can We remove this.
2020年5月6日 上午10:36,邱宇航 <iamqyh@gmail.com> 写道:I mean that all resources protected by XLOG_STANDBY_LOCK should redo later.The semantics of XLOG_STANDBY_LOCK is still kept.
On Wed, May 6, 2020 at 8:35 AM 邱宇航 <iamqyh@gmail.com> wrote: > > And one more question, what LogAccessExclusiveLocks in LogStandbySnapshot is used for? > As far as I understand, this is required to ensure that we have acquired all the AccessExclusiveLocks on relations before we can say standby has reached STANDBY_SNAPSHOT_READY and allow read-only queries in standby. Read comments above LogStandbySnapshot. > Can We remove this. > I don't think so. In general, if you want to change and or remove some code, it is your responsibility to come up with a reason/theory why it is OK to do so. > 2020年5月6日 上午10:36,邱宇航 <iamqyh@gmail.com> 写道: > > I mean that all resources protected by XLOG_STANDBY_LOCK should redo later. > The semantics of XLOG_STANDBY_LOCK is still kept. > I don't think we can postpone it. If we delay applying XLOG_STANDBY_LOCK and apply others then the result could be unpredictable as explained in my previous email. Note - Please don't top-post. Use the style that I and or others are using in this list as that will make it easier to understand and respond to your emails. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com