Re: Reliable WAL file shipping over unreliable network - Mailing list pgsql-admin

From Rui DeSousa
Subject Re: Reliable WAL file shipping over unreliable network
Date
Msg-id 21AF0CB4-0873-4960-BA14-E72FA08B352E@icloud.com
Whole thread Raw
In response to Re: Reliable WAL file shipping over unreliable network  (Rui DeSousa <rui.desousa@icloud.com>)
Responses Re: Reliable WAL file shipping over unreliable network  (Dianne Skoll <dfs@roaringpenguin.com>)
Re: Reliable WAL file shipping over unreliable network  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-admin
I’ve tested this and it seems that there is a still a bug in rsync (rsync  version 3.1.2  protocol version 31).  I used
a1GB archive filesytem to allow for an out of space test case.  Not sure of the actual cause as it seems to work a few
times;however, it then fails leaving a truncated file and returning a success code. 

Example: 00000001000000590000003E  - failed 4 fimes and on the fifth try rsync returned success and left a truncated
file.  

When there is actually no space left; rsync fails immidately and never returns a success code; i.e.
00000001000000590000003F. When freeing up space; archive resumes again.  It seems that if rsync is already syncing when
thefiles system fills up then there is a high risk the bug will occur; i.e. 000000010000005A00000001 is also truncated
andwith a rsync success code. 

Since rsync is returning success on a failed sync; even the "-c" option will not help here.

Archive Script critical code:

OUTPUT=$(rsync -ac $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then
   STS=0
   echo "Success: $WALFILE" >> /tmp/waltest.log
else
   echo "Failed: $WALFILE" >> /tmp/waltest.log
fi

exit $STS


Archive Directory (Note: useing 64MB WALs):

[postgres@hades ~/arch/dbc1/wal]$ ls -al
total 1044351
drwxr-xr-x  2 postgres  postgres        57 Feb 28 19:46 .
drwxr-xr-x  3 postgres  postgres         3 Feb 28 17:33 ..
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000010
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000011
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000012
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000013
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000014
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000015
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000016
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000017
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000018
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000019
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000020
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000021
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000022
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000023
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000024
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000025
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000026
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000027
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000028
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000029
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000030
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000031
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000032
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000033
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000034
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000035
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000036
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000037
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000038
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000039
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003D
-rw-------  1 postgres  postgres   3670016 Feb 28 17:33 00000001000000590000003E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005A00000000
-rw-------  1 postgres  postgres  12713984 Feb 28 17:33 000000010000005A00000001


Success/Failure Log:

[postgres@hades ~/arch/dbc1/wal]$ cat /tmp/waltest.log
Success: 000000010000005900000008
Success: 000000010000005900000009
Success: 00000001000000590000000A
Success: 00000001000000590000000B
Success: 00000001000000590000000C
Success: 00000001000000590000000D
Success: 00000001000000590000000E
Success: 00000001000000590000000F
Success: 000000010000005900000010
Success: 000000010000005900000011
Success: 000000010000005900000012
Success: 000000010000005900000013
Success: 000000010000005900000014
Success: 000000010000005900000015
Success: 000000010000005900000016
Success: 000000010000005900000017
Success: 000000010000005900000018
Success: 000000010000005900000019
Success: 00000001000000590000001A
Success: 00000001000000590000001B
Success: 00000001000000590000001C
Success: 00000001000000590000001D
Success: 00000001000000590000001E
Success: 00000001000000590000001F
Success: 000000010000005900000020
Success: 000000010000005900000021
Success: 000000010000005900000022
Success: 000000010000005900000023
Success: 000000010000005900000024
Success: 000000010000005900000025
Success: 000000010000005900000026
Success: 000000010000005900000027
Success: 000000010000005900000028
Success: 000000010000005900000029
Success: 00000001000000590000002A
Success: 00000001000000590000002B
Success: 00000001000000590000002C
Success: 00000001000000590000002D
Success: 00000001000000590000002E
Success: 00000001000000590000002F
Success: 000000010000005900000030
Success: 000000010000005900000031
Success: 000000010000005900000032
Success: 000000010000005900000033
Success: 000000010000005900000034
Success: 000000010000005900000035
Success: 000000010000005900000036
Success: 000000010000005900000037
Success: 000000010000005900000038
Success: 000000010000005900000039
Success: 00000001000000590000003A
Success: 00000001000000590000003B
Success: 00000001000000590000003C
Success: 00000001000000590000003D
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Success: 00000001000000590000003E
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Success: 00000001000000590000003F
Success: 000000010000005A00000000
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Success: 000000010000005A00000001
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002

pgsql-admin by date:

Previous
From: Andres Freund
Date:
Subject: Re: postgresql 9.6 - cannot freeze committed xmax
Next
From: Dianne Skoll
Date:
Subject: Re: Reliable WAL file shipping over unreliable network