On Tue, Jan 25, 2022 at 11:35 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 5:52 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>>
>> On 25.01.22 06:18, Amit Kapila wrote:
>> > I think to avoid this we can send a message to clear this (at least to
>> > clear XID in the view) after skipping the xact but there is no
>> > guarantee that it will be received by the stats collector.
>> > Additionally, the worker can periodically (say after every N (100,
>> > 500, etc) successful transaction) send a clear message after
>> > successful apply. This will ensure that eventually the error entry
>> > will be cleared.
>>
>> Well, I think we need *some* solution for now. We can't leave a footgun
>> where you say, "skip transaction 700", somehow transaction 700 doesn't
>> happen, the whole thing gets forgotten, but then 3 months later, the
>> next transaction 700 mysteriously gets dropped.
>
>
> This is indeed part of why I feel that the xid being skipped should be validated. As the feature is presented the
useris supposed to read the xid from the system (the new stat view or the error log) and supply it and then the worker,
whenit goes to skip, should find that the very first transaction xid it encounters is the one it is being told to skip.
It skips that transaction, clears the skipxid, and puts the system back into normal operating mode. If that first
transactionxid isn't the one being specified to skip the worker should error with "skipping transaction failed, xid 123
expectedbut 456 found".
Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/