Thread: unnecessary (same) restart_lsn handing in LogicalIncreaseRestartDecodingForSlot
unnecessary (same) restart_lsn handing in LogicalIncreaseRestartDecodingForSlot
From
Ashutosh Bapat
Date:
Hi All, At the beginning of LogicalIncreaseRestartDecodingForSlot(), we have codeine ``` 1623 /* don't overwrite if have a newer restart lsn */ 1624 if (restart_lsn <= slot->data.restart_lsn) 1625 { 1626 } 1627 1628 /* 1629 * We might have already flushed far enough to directly accept this lsn, 1630 * in this case there is no need to check for existing candidate LSNs 1631 */ 1632 else if (current_lsn <= slot->data.confirmed_flush) 1633 { 1634 slot->candidate_restart_valid = current_lsn; 1635 slot->candidate_restart_lsn = restart_lsn; 1636 1637 /* our candidate can directly be used */ 1638 updated_lsn = true; 1639 } ``` This code avoids directly writing slot's persistent data multiple times if the restart_lsn does not change between successive running transactions WAL records and the confirmed flush LSN is higher than all of those. But if the downstream is fast enough to send at least one newer confirmed flush between every two running transactions WAL records, we end up overwriting slot's restart LSN even if it does not change because of following code ``` 1641 /* 1642 * Only increase if the previous values have been applied, otherwise we 1643 * might never end up updating if the receiver acks too slowly. A missed 1644 * value here will just cause some extra effort after reconnecting. 1645 */ 1646 if (slot->candidate_restart_valid == InvalidXLogRecPtr) 1647 { 1648 slot->candidate_restart_valid = current_lsn; 1649 slot->candidate_restart_lsn = restart_lsn; 1650 SpinLockRelease(&slot->mutex); 1651 1652 elog(LOG, "got new restart lsn %X/%X at %X/%X", 1653 LSN_FORMAT_ARGS(restart_lsn), 1654 LSN_FORMAT_ARGS(current_lsn)); 1655 } ``` Let's say RLSN is the restart LSN computed when processing successive running transaction WAL records at RT_LSN1, RT_LSN2, RT_LSN3 .... Let's say downstream sends confirmed flush LSNs CF_LSN1, CF_LSN2, CF_LSN3, ... such that RT_LSNx <= CF_LSNx <= RT_LSN(x+1). CF_LSNx arrives between processing records at RT_LSNx and RT_LSN(x+1). So the candidate_restart_lsn is always InvalidXlogRecPtr and we install the same RLSN as candidate_restart_lsn again and again with a different candidate_restart_valid. Every installed candidate is updated in the slot by LogicalConfirmReceivedLocation() when the next confirmed flush arrives. Such an update also causes a disk write which looks unnecessary. I think the function should ignore a restart_lsn older than data.restart_lsn right away at the beginning of this function. -- Best Wishes, Ashutosh Bapat
Re: unnecessary (same) restart_lsn handing in LogicalIncreaseRestartDecodingForSlot
From
Amit Kapila
Date:
On Wed, Mar 16, 2022 at 5:02 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi All, > At the beginning of LogicalIncreaseRestartDecodingForSlot(), we have codeine > ``` > 1623 /* don't overwrite if have a newer restart lsn */ > 1624 if (restart_lsn <= slot->data.restart_lsn) > 1625 { > 1626 } > 1627 > 1628 /* > 1629 * We might have already flushed far enough to directly > accept this lsn, > 1630 * in this case there is no need to check for existing candidate LSNs > 1631 */ > 1632 else if (current_lsn <= slot->data.confirmed_flush) > 1633 { > 1634 slot->candidate_restart_valid = current_lsn; > 1635 slot->candidate_restart_lsn = restart_lsn; > 1636 > 1637 /* our candidate can directly be used */ > 1638 updated_lsn = true; > 1639 } > ``` > This code avoids directly writing slot's persistent data multiple > times if the restart_lsn does not change between successive running > transactions WAL records and the confirmed flush LSN is higher than > all of those. > > But if the downstream is fast enough to send at least one newer > confirmed flush between every two running transactions WAL records, we > end up overwriting slot's restart LSN even if it does not change > because of following code > ``` > 1641 /* > 1642 * Only increase if the previous values have been applied, otherwise we > 1643 * might never end up updating if the receiver acks too > slowly. A missed > 1644 * value here will just cause some extra effort after reconnecting. > 1645 */ > 1646 if (slot->candidate_restart_valid == InvalidXLogRecPtr) > 1647 { > 1648 slot->candidate_restart_valid = current_lsn; > 1649 slot->candidate_restart_lsn = restart_lsn; > 1650 SpinLockRelease(&slot->mutex); > 1651 > 1652 elog(LOG, "got new restart lsn %X/%X at %X/%X", > 1653 LSN_FORMAT_ARGS(restart_lsn), > 1654 LSN_FORMAT_ARGS(current_lsn)); > 1655 } > ``` > Are you worried that in corner cases we will update the persistent copy of slot with same restart_lsn multiple times? AFAICS, we update persistent copy via LogicalConfirmReceivedLocation() which is called only when 'updated_lsn' is set and that doesn't get set in the if check (slot->candidate_restart_valid == InvalidXLogRecPtr) you quoted. It is not very clear to me after reading your email what exactly is your concern, so I might be missing something. -- With Regards, Amit Kapila.