Re: Bug on update timing of walrcv->flushedUpto variable - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Bug on update timing of walrcv->flushedUpto variable |
Date | |
Msg-id | 20210329.105441.1978082841561262877.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Bug on update timing of walrcv->flushedUpto variable ("蔡梦娟(玊于)" <mengjuan.cmj@alibaba-inc.com>) |
Responses |
回复:Bug on update timing of walrcv->flushedUpto variable
|
List | pgsql-hackers |
Hi. (Added Nathan, Andrey and Heikki in Cc:) At Fri, 26 Mar 2021 23:44:21 +0800, "蔡梦娟(玊于)" <mengjuan.cmj@alibaba-inc.com> wrote in > Hi, all > > Recently, I found a bug on update timing of walrcv->flushedUpto variable, consider the following scenario, there is onePrimary node, one Standby node which streaming from Primary: > There are a large number of SQL running in the Primary, and the length of the xlog record generated by these SQL maybegreater than the left space of current page so that it needs to be written cross pages. As shown below, the length ofthe last_xlog of wal_1 is greater than the left space of last_page, so it has to be written in wal_2. If Primary crashedafter flused the last_page of wal_1 to disk, the remian content of last_xlog hasn't been flushed in time, then thelast_xlog in wal_1 will be incomplete. And Standby also received the wal_1 by wal-streaming in this case. It seems like the same with the issue discussed in [1]. There are two symptom of the issue, one is that archive ends with a segment that ends with a immature WAL record, which causes inconsistency between archive and pg_wal directory. Another is , as you saw, walreceiver receives an immature record at the end of a segment, which prevents recovery from proceeding. In the thread, trying to solve that by preventing such an immature records at a segment boundary from being archived and inhibiting being sent to standby. > [日志1.png] It doesn't seem attached.. > The confusing point is: why only updates the walrcv->flushedUpto at the first startup of walreceiver on a specific timeline,not each time when request xlog streaming? In above case, it is also reasonable to update walrcv->flushedUpto towal_1 when Standby re-receive wal_1. So I changed to update the walrcv->flushedUpto each time when request xlog streaming,which is the patch I want to share with you, based on postgresql-13.2, what do you think of this change? > > By the way, I also want to know why call pgstat_reset_all function during recovery process? We shouldn't rewind flushedUpto to backward. The variable notifies how far recovery (or startup process) can read WAL content safely. Once startup process reads the beginning of a record, XLogReadRecord tries to continue fetching *only the rest* of the record, which is inconsistent from the first part in this scenario. So at least only this fix doesn't work fine. And we also need to fix the archive inconsistency, maybe as a part of a fix for this issue. We are trying to fix this by refraining from archiving (or streaming) until a record crossing a segment boundary is completely flushed. regards. [1] https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: