Re: Infinite loop in XLogPageRead() on standby - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Infinite loop in XLogPageRead() on standby |
Date | |
Msg-id | ZeEQcMBA98eZOHpC@paquier.xyz Whole thread Raw |
In response to | Re: Infinite loop in XLogPageRead() on standby (Alexander Kukushkin <cyberdemn@gmail.com>) |
Responses |
Re: Infinite loop in XLogPageRead() on standby
|
List | pgsql-hackers |
On Thu, Feb 29, 2024 at 05:44:25PM +0100, Alexander Kukushkin wrote: > On Thu, 29 Feb 2024 at 08:18, Kyotaro Horiguchi <horikyota.ntt@gmail.com> > wrote: >> In the first place, it's important to note that we do not guarantee >> that an async standby can always switch its replication connection to >> the old primary or another sibling standby. This is due to the >> variations in replication lag among standbys. pg_rewind is required to >> adjust such discrepancies. > > Sure, I know. But in this case the async standby received and flushed > absolutely the same amount of WAL as the promoted one. Ugh. If it can happen transparently to the user without the user knowing directly about it, that does not sound good to me. I did not look very closely at monitoring tools available out there, but if both standbys flushed the same WAL locations a rewind should not be required. It is not something that monitoring tools would be able to detect because they just look at LSNs. >> In the repro script, the replication connection of the second standby >> is switched from the old primary to the first standby after its >> promotion. After the switching, replication is expected to continue >> from the beginning of the last replayed segment. > > Well, maybe, but apparently the standby is busy trying to decode a record > that spans multiple pages, and it is just infinitely waiting for the next > page to arrive. Also, the restart "fixes" the problem, because indeed it is > reading the file from the beginning. What happens if the continuation record spawns across multiple segment files boundaries in this case? We would go back to the beginning of the segment where the record spawning across multiple segments began, right? (I may recall this part of contrecords incorrectly, feel free to correct me if necessary.) >> But with the script, >> the second standby copies the intentionally broken file, which differs >> from the data that should be received via streaming. > > As I already said, this is a simple way to emulate the primary crash while > standbys receiving WAL. > It could easily happen that the record spans on multiple pages is not fully > received and flushed. I think that's OK to do so at test level to force a test in the backend, FWIW, because that's cheaper, and 039_end_of_wal.pl has proved that this can be designed to be cheap and stable across the buildfarm fleet. For anything like that, the result had better have solid test coverage, where perhaps we'd better refactor some of the routines of 039_end_of_wal.pl into a module to use them here, rather than duplicate the code. The other test has a few assumptions with the calculation of page boundaries, and I'd rather not duplicate that across the tree. Seeing that Alexander is a maintainer of Patroni, which is very probably used by his employer across a large set of PostgreSQL instances, if he says that he's seen that in the field, that's good enough for me to respond to the problem, especially if reconnecting a standby to a promoted node where both flushed the same LSN. Now the level of response also depends on the invasiness of the change, and we need a very careful evaluation here. -- Michael
Attachment
pgsql-hackers by date: