Thread: [BUGS] BUG #14702: Streaming replication broken after server closedconnection unexpectedly
[BUGS] BUG #14702: Streaming replication broken after server closedconnection unexpectedly
From
girgen@pingpong.net
Date:
The following bug has been logged on the website: Bug reference: 14702 Logged by: Palle Girgensohn Email address: girgen@pingpong.net PostgreSQL version: 9.6.2 Operating system: FreeBSD 11.0 Description: Hi! Setup is simple streaming replication: master -> slave. There is a replication slot at the master, so xlogs should not be removed until the client has received them properly. Suddenly, the slave logs this: [75678-1] db=,user= LOG: restartpoint complete: wrote 2795 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=269.858 s, sync=3.768 s, total=273.636 s; sync files=242, longest=3.763 s, average=0.015 s; distance=26953 kB, estimate=58915 kB [75679-1] db=,user= LOG: recovery restart point at 226B/E9DF4E30 [75679-2] db=,user= DETAIL: last completed transaction was at log time 2017-05-29 00:57:19.221918+02 [75680-1] db=,user= LOG: restartpoint starting: time [5-1] db=,user= FATAL: could not send data to WAL stream: server closed the connection unexpectedly [5-2] This probably means the server terminated abnormally [5-3] before or while processing the request. [5-4] [246-1] db=,user= FATAL: invalid memory alloc request size 1600487424 The master has not relevant information in its log. After this, the slave could not be started again, each time the same error about "invalid memory alloc request size 1600487424". Looking more closely, the last xlog file, let's call it 0000EB, is currupt on the slave, having a different checksum from the proper one at the master. The oldest xlog saved on the master is 0000EB, all older ones are properly deleted due to retention; they have been delivered to the slave. On the slave, the youngest one is 0000EA (the file before 0000EB). Hence there is no common xlog file on both systems, since the 0000EB file is corrupt on the slave. Removing the corrupt 0000EB from the slave, and starting the slave again, changes the error messate from "invalid memory alloc" to [1-1] db=,user= LOG: ending log output to stderr [1-2] db=,user= HINT: Future log output will go to log destination "syslog". [2-1] db=,user= LOG: database system was interrupted while in recovery at log time 2017-05-29 00:48:11 CEST [2-2] db=,user= HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. [3-1] db=,user= LOG: entering standby mode [4-1] db=,user= LOG: redo starts at 226B/E9DF4E30 [2-1] db=,user= LOG: started streaming WAL from primary at 226B/EA000000 on timeline 1 [3-1] db=,user= FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000226B000000EA has already been removed OK, better... I have the 0000EA on the slave. So I copied that 0000EA over back to the master's pg_xlog directory. The slave was happy without further ado: [2-1] db=,user= LOG: started streaming WAL from primary at 226B/EA000000 on timeline 1 [5-1] db=,user= LOG: consistent recovery state reached at 226B/EBE89CB8 [2-1] db=,user= LOG: database system is ready to accept read only connections [2-1] db=,user= LOG: restartpoint starting: time And life is peachy again. Now, I don't know exactly what happened when the slave lost track, but the bug, I think, is that the streamed WAL was corrupt, and still accepted by the slave *and* hence removed from the master. It required too much experience to fix that. The slave should not accept a not fully transported WAL file. It seems it happened during some connection failure between the slave and master, but still it should preferrably fail more gracefully. Are the mechanisms implemented to support that, and they failed, or is it just not implemented? Palle -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14702: Streaming replication broken after serverclosed connection unexpectedly
From
Michael Paquier
Date:
On Tue, Jun 13, 2017 at 6:52 AM, <girgen@pingpong.net> wrote: > Setup is simple streaming replication: master -> slave. There is a > replication slot at the master, so xlogs should not be removed until the > client has received them properly. Hm. There has been the following discussion as well, which refers to a legit bug where WAL segments could be removed even if a slot is used: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP+JoJAjoGx=GNuOAshEDWCext7BFvCQ@mail.gmail.com The circumstances to trigger the problem are quite particular though as it needs an incomplete WAL record at the end of a segment. > After this, the slave could not be started again, each time the same error > about "invalid memory alloc request size 1600487424". Hm. That smells of data corruption.. Be careful going forward. > Looking more closely, the last xlog file, let's call it 0000EB, is corrupt > on the slave, having a different checksum from the proper one at the master. To which checksum are you referring here? Did you do yourself a calculation using what is on-disk? Note that during streaming replication the content of an unfinished segment may be different than what is on the primary. > Now, I don't know exactly what happened when the slave lost track, but the > bug, I think, is that the streamed WAL was corrupt, and still accepted by > the slave *and* hence removed from the master. It required too much > experience to fix that. The slave should not accept a not fully transported > WAL file. It seems it happened during some connection failure between the > slave and master, but still it should preferrably fail more gracefully. Are > the mechanisms implemented to support that, and they failed, or is it just > not implemented? There is a per-record CRC calculation to check the validity of each record, and this is processed when fetching each record at recovery as a sanity check. That's one way to prevent applying an incorrect record. In the event of such an error you would have seen "incorrect resource manager data checksum in record at" or similar. It seems to me that you should be careful with the primary as well. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14702: Streaming replication broken after serverclosed connection unexpectedly
From
Palle Girgensohn
Date:
> 13 juni 2017 kl. 03:57 skrev Michael Paquier <michael.paquier@gmail.com>: > > On Tue, Jun 13, 2017 at 6:52 AM, <girgen@pingpong.net> wrote: >> Setup is simple streaming replication: master -> slave. There is a >> replication slot at the master, so xlogs should not be removed until the >> client has received them properly. > > Hm. There has been the following discussion as well, which refers to a > legit bug where WAL segments could be removed even if a slot is used: > https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP+JoJAjoGx=GNuOAshEDWCext7BFvCQ@mail.gmail.com > The circumstances to trigger the problem are quite particular though > as it needs an incomplete WAL record at the end of a segment. > >> After this, the slave could not be started again, each time the same error >> about "invalid memory alloc request size 1600487424". > > Hm. That smells of data corruption.. Be careful going forward. I believe that corruption was in the broken WAL file though. I saw some notes pointing in that direction on the list, butsure, I could be mistaken. > >> Looking more closely, the last xlog file, let's call it 0000EB, is corrupt >> on the slave, having a different checksum from the proper one at the master. > > To which checksum are you referring here? Did you do yourself a > calculation using what is on-disk? Note that during streaming > replication the content of an unfinished segment may be different than > what is on the primary. I calculated that myself using sha256 from the command line. As you say, it was probably an unfinished segment. Problem is that the slave expects the *previous* wal file to still besaved on the master, but it was already removed. The slave *has* it though, so why would it required it to be transferredagain? 0000EA was requested, although it was already completeley transferred to the slave. I had to copy that0000EA back to the master so it could be transferred again. > >> Now, I don't know exactly what happened when the slave lost track, but the >> bug, I think, is that the streamed WAL was corrupt, and still accepted by >> the slave *and* hence removed from the master. It required too much >> experience to fix that. The slave should not accept a not fully transported >> WAL file. It seems it happened during some connection failure between the >> slave and master, but still it should preferrably fail more gracefully. Are >> the mechanisms implemented to support that, and they failed, or is it just >> not implemented? > > There is a per-record CRC calculation to check the validity of each > record, and this is processed when fetching each record at recovery as > a sanity check. That's one way to prevent applying an incorrect > record. In the event of such an error you would have seen "incorrect > resource manager data checksum in record at" or similar. It seems to > me that you should be careful with the primary as well. OK. "Be careful" is somewhat vague, but I get it. Would a pg_dump + pg_restore, for example, reveal any data corruption.Or is it just not possibly to be totally sure unless checksums would have been activated (they're not, this isan old datbase). > -- > Michael