RE: prevent immature WAL streaming - Mailing list pgsql-hackers

From Jakub Wartak
Subject RE: prevent immature WAL streaming
Date
Msg-id VI1PR0701MB69608F806BD102C8B4C3C6FDF6C69@VI1PR0701MB6960.eurprd07.prod.outlook.com
Whole thread Raw
In response to Re: prevent immature WAL streaming  ("alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org>)
Responses Re: prevent immature WAL streaming  ("alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org>)
List pgsql-hackers
Hi Álvaro, -hackers,

> I attach the patch with the change you suggested.

I've gave a shot to to the v02 patch on top of REL_12_STABLE (already including
5065aeafb0b7593c04d3bc5bc2a86037f32143fc).Previously(yesterday) without the v02 patch I was getting standby corruption
alwaysvia simulation by having separate /pg_xlog dedicated fs, and archive_mode=on, wal_keep_segments=120,
archive_commandset to rsync to different dir on same fs, wal_init_zero at default(true).  

Today (with v02) I've got corruption in only initial 2 runs out of ~ >30 tries on standby. Probably the 2 failures were
somehowmy fault (?) or some rare condition (and in 1 of those 2 cases simply restarting standby did help). To be honest
I'vetried to force this error, but with v02 I simply cannot force this error anymore, so that's good! :) 

> I didn't have a lot of luck with a reliable reproducer script.  I was able to
> reproduce the problem starting with Ryo Matsumura's script and attaching
> a replica; most of the time the replica would recover by restarting from a
> streaming position earlier than where the problem occurred; but a few
> times it would just get stuck with a WAL segment containing a bogus
> record.

In order to get reliable reproducer and get proper the fault injection instead of playing with really filling up fs,
apparentlyone could substitute fd with fd of /dev/full using e.g. dup2() so that every write is going to throw this
errortoo: 

root@hive:~# ./t & # simple while(1) { fprintf() flush () } testcase
root@hive:~# ls -l /proc/27296/fd/3
lrwx------ 1 root root 64 Aug 25 06:22 /proc/27296/fd/3 -> /tmp/testwrite
root@hive:~# gdb -q -p 27296
-- 1089 is bitmask O_WRONLY|..
(gdb) p dup2(open("/dev/full", 1089, 0777), 3)
$1 = 3
(gdb) c
Continuing.
==>
fflush/write(): : No space left on device

So I've also tried to be malicious while writing to the DB and inject ENOSPCE near places like:

a) XLogWrite()->XLogFileInit() near line 3322 // assuming: if (wal_init_zero) is true, one gets classic "PANIC:  could
notwrite to file "pg_wal/xlogtemp.90670": No space left on device" 
b) XLogWrite() near line 2547 just after pg_pwrite // one can get "PANIC:  could not write to log file
000000010000003B000000A8at offset 0, length 15466496: No space left on device" (that would be possible with
wal_init_zero=false?)
c) XLogWrite() near line 2592 // just before issue_xlog_fsync to get "PANIC:  could not fdatasync file
"000000010000004300000004":Invalid argument" that would pretty much mean same as above but with last possible offset
nearend of WAL?  

This was done with gdb voodoo:
handle SIGUSR1 noprint nostop
break xlog.c:<LINE> // https://github.com/postgres/postgres/blob/REL_12_STABLE/src/backend/access/transam/xlog.c#L3311
c
print fd or openLogFile -- to verify it is 3
p dup2(open("/dev/full", 1089, 0777), 3) -- during most of walwriter runtime it has current log as fd=3

After restarting master and inspecting standby - in all of those above 3 cases - the standby didn't inhibit the
"invalidcontrecord length" at least here, while without corruption this v02 patch it is notorious. So if it passes the
worst-casecode review assumptions I would be wondering if it shouldn't even be committed as it stands right now. 

-J.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Failure of subscription tests with topminnow
Next
From: Robert Haas
Date:
Subject: Re: replay of CREATE TABLESPACE eats data at wal_level=minimal