On Thu, Aug 4, 2022 at 11:59 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sun, Jul 31, 2022 at 8:36 PM mahendrakar s
> <mahendrakarforpg@gmail.com> wrote:
> >
> >> On Mon, 25 Jul 2022 at 16:42, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> Here's the v3 patch after rebasing.
> >>
> >> I just would like to reiterate the issue the patch is trying to solve:
> >> At times (when no space is left on the device or when the process is
> >> taken down forcibly (VM/container crash)), there can be leftover
> >> uninitialized .partial files (note that padding i.e. filling 16MB WAL
> >> files with all zeros is done in non-compression cases) due to which
> >> pg_receivewal fails to come up after the crash. To address this, the
> >> proposed patch makes the padding 16MB WAL files atomic (first write a
> >> .temp file, pad it and durably rename it to the original .partial
> >> file, ready to be filled with received WAL records). This approach is
> >> similar to what core postgres achieves atomicity while creating new
> >> WAL file (see how XLogFileInitInternal() creates xlogtemp.%d file
> >> first and then how InstallXLogFileSegment() durably renames it to
> >> original WAL file).
> >>
> >> Thoughts?
> >
> > Hi Bharath,
> >
> > Idea to atomically allocate WAL file by creating tmp file and renaming it is nice.
>
> Thanks for reviewing it.
>
> > I have one question though:
> > How is partially temp file created will be cleaned if the VM crashes or out of disk space cases? Does it endup
creatingmultiple files for every VM crash/disk space during process of pg_receivewal?
> >
> > Thoughts?
>
> It is handled in the patch, see [1].
>
> Attaching v4 patch herewith which now uses the temporary file suffix
> '.tmp' as opposed to v3 patch '.temp'. This is just to be in sync with
> other atomic file write codes in the core - autoprewarm,
> pg_stat_statement, slot, basebacup, replorigin, snapbuild, receivelog
> and so on.
>
> Please review the v4 patch.
I've done some more testing today (hacked the code a bit by adding
pg_usleep(10000L); in pre-padding loop and crashing the pg_receivewal
process to produce the warning [1]) and found that the better place to
remove ".partial.tmp" leftover files is in FindStreamingStart()
because there we do a traversal of all the files in target directory
along the way to remove if ".partial.tmp" file(s) is/are found.
Please review the v5 patch further.
[1] pg_receivewal: warning: segment file
"0000000100000006000000B9.partial" has incorrect size 15884288,
skipping
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/