Thread: [BUGS] BUG #14702: Streaming replication broken after server closedconnection unexpectedly

The following bug has been logged on the website:

Bug reference:      14702
Logged by:          Palle Girgensohn
Email address:      girgen@pingpong.net
PostgreSQL version: 9.6.2
Operating system:   FreeBSD 11.0
Description:

Hi!

Setup is simple streaming replication: master -> slave. There is a
replication slot at the master, so xlogs should not be removed until the
client has received them properly. 

Suddenly, the slave logs this:

[75678-1] db=,user= LOG:  restartpoint complete: wrote 2795 buffers (1.6%);
0 transaction log file(s) added, 0 removed, 1 recycled; write=269.858 s,
sync=3.768 s, total=273.636 s; sync files=242, longest=3.763 s,
average=0.015 s; distance=26953 kB, estimate=58915 kB
[75679-1] db=,user= LOG:  recovery restart point at 226B/E9DF4E30
[75679-2] db=,user= DETAIL:  last completed transaction was at log time
2017-05-29 00:57:19.221918+02
[75680-1] db=,user= LOG:  restartpoint starting: time
[5-1] db=,user= FATAL:  could not send data to WAL stream: server closed the
connection unexpectedly
[5-2]          This probably means the server terminated abnormally
[5-3]          before or while processing the request.
[5-4]
[246-1] db=,user= FATAL:  invalid memory alloc request size 1600487424


The master has not relevant information in its log.

After this, the slave could not be started again, each time the same error
about "invalid memory alloc request size 1600487424".

Looking more closely, the last xlog file, let's call it 0000EB, is currupt
on the slave, having a different checksum from the proper one at the master.
The oldest xlog saved on the master is 0000EB, all older ones are properly
deleted due to retention; they have been delivered to the slave. On the
slave, the youngest one is 0000EA (the file before 0000EB). Hence there is
no common xlog file on both systems, since the 0000EB file is corrupt on the
slave.

Removing the corrupt 0000EB from the slave, and starting the slave again,
changes the error messate from "invalid memory alloc" to 

[1-1] db=,user= LOG:  ending log output to stderr
[1-2] db=,user= HINT:  Future log output will go to log destination
"syslog".
[2-1] db=,user= LOG:  database system was interrupted while in recovery at
log time 2017-05-29 00:48:11 CEST
[2-2] db=,user= HINT:  If this has occurred more than once some data might
be corrupted and you might need to choose an earlier recovery target.
[3-1] db=,user= LOG:  entering standby mode
[4-1] db=,user= LOG:  redo starts at 226B/E9DF4E30
[2-1] db=,user= LOG:  started streaming WAL from primary at 226B/EA000000 on
timeline 1
[3-1] db=,user= FATAL:  could not receive data from WAL stream: ERROR:
requested WAL segment 000000010000226B000000EA has already been removed


OK, better... I have the 0000EA on the slave. So I copied that 0000EA over
back to the master's pg_xlog directory. The slave was happy without further
ado:

[2-1] db=,user= LOG:  started streaming WAL from primary at 226B/EA000000 on
timeline 1
[5-1] db=,user= LOG:  consistent recovery state reached at 226B/EBE89CB8
[2-1] db=,user= LOG:  database system is ready to accept read only
connections
[2-1] db=,user= LOG:  restartpoint starting: time


And life is peachy again.

Now, I don't know exactly what happened when the slave lost track, but the
bug, I think, is that the streamed WAL was corrupt, and still accepted by
the slave *and* hence removed from the master. It required too much
experience to fix that. The slave should not accept a not fully transported
WAL file. It seems it happened during some connection failure between the
slave and master, but still it should preferrably fail more gracefully. Are
the mechanisms implemented to support that, and they failed, or is it just
not implemented?

Palle



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On Tue, Jun 13, 2017 at 6:52 AM,  <girgen@pingpong.net> wrote:
> Setup is simple streaming replication: master -> slave. There is a
> replication slot at the master, so xlogs should not be removed until the
> client has received them properly.

Hm. There has been the following discussion as well, which refers to a
legit bug where WAL segments could be removed even if a slot is used:
https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP+JoJAjoGx=GNuOAshEDWCext7BFvCQ@mail.gmail.com
The circumstances to trigger the problem are quite particular though
as it needs an incomplete WAL record at the end of a segment.

> After this, the slave could not be started again, each time the same error
> about "invalid memory alloc request size 1600487424".

Hm. That smells of data corruption.. Be careful going forward.

> Looking more closely, the last xlog file, let's call it 0000EB, is corrupt
> on the slave, having a different checksum from the proper one at the master.

To which checksum are you referring here? Did you do yourself a
calculation using what is on-disk? Note that during streaming
replication the content of an unfinished segment may be different than
what is on the primary.

> Now, I don't know exactly what happened when the slave lost track, but the
> bug, I think, is that the streamed WAL was corrupt, and still accepted by
> the slave *and* hence removed from the master. It required too much
> experience to fix that. The slave should not accept a not fully transported
> WAL file. It seems it happened during some connection failure between the
> slave and master, but still it should preferrably fail more gracefully. Are
> the mechanisms implemented to support that, and they failed, or is it just
> not implemented?

There is a per-record CRC calculation to check the validity of each
record, and this is processed when fetching each record at recovery as
a sanity check. That's one way to prevent applying an incorrect
record. In the event of such an error you would have seen "incorrect
resource manager data checksum in record at" or similar. It seems to
me that you should be careful with the primary as well.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

> 13 juni 2017 kl. 03:57 skrev Michael Paquier <michael.paquier@gmail.com>:
>
> On Tue, Jun 13, 2017 at 6:52 AM,  <girgen@pingpong.net> wrote:
>> Setup is simple streaming replication: master -> slave. There is a
>> replication slot at the master, so xlogs should not be removed until the
>> client has received them properly.
>
> Hm. There has been the following discussion as well, which refers to a
> legit bug where WAL segments could be removed even if a slot is used:
> https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP+JoJAjoGx=GNuOAshEDWCext7BFvCQ@mail.gmail.com
> The circumstances to trigger the problem are quite particular though
> as it needs an incomplete WAL record at the end of a segment.
>
>> After this, the slave could not be started again, each time the same error
>> about "invalid memory alloc request size 1600487424".
>
> Hm. That smells of data corruption.. Be careful going forward.

I believe that corruption was in the broken WAL file though. I saw some notes pointing in that direction on the list,
butsure, I could be mistaken. 

>
>> Looking more closely, the last xlog file, let's call it 0000EB, is corrupt
>> on the slave, having a different checksum from the proper one at the master.
>
> To which checksum are you referring here? Did you do yourself a
> calculation using what is on-disk? Note that during streaming
> replication the content of an unfinished segment may be different than
> what is on the primary.

I calculated that myself using sha256 from the command line.

As you say, it was probably an unfinished segment. Problem is that the slave expects the *previous* wal file to still
besaved on the master, but it was already removed. The slave *has* it though, so why would it required it to be
transferredagain? 0000EA was requested, although it was already completeley transferred to the slave. I had to copy
that0000EA back to the master so it could be transferred again. 

>
>> Now, I don't know exactly what happened when the slave lost track, but the
>> bug, I think, is that the streamed WAL was corrupt, and still accepted by
>> the slave *and* hence removed from the master. It required too much
>> experience to fix that. The slave should not accept a not fully transported
>> WAL file. It seems it happened during some connection failure between the
>> slave and master, but still it should preferrably fail more gracefully. Are
>> the mechanisms implemented to support that, and they failed, or is it just
>> not implemented?
>
> There is a per-record CRC calculation to check the validity of each
> record, and this is processed when fetching each record at recovery as
> a sanity check. That's one way to prevent applying an incorrect
> record. In the event of such an error you would have seen "incorrect
> resource manager data checksum in record at" or similar. It seems to
> me that you should be careful with the primary as well.

OK. "Be careful" is somewhat vague, but I get it. Would a pg_dump + pg_restore, for example, reveal any data
corruption.Or is it just not possibly to be totally sure unless checksums would have been activated (they're not, this
isan old datbase). 



> --
> Michael