Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Date
Msg-id CAHg+QDdPrVig+h2X5s-2MRR_19Vmehpv0upktyRc85ZA_UQHbg@mail.gmail.com
Whole thread Raw
In response to Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
List pgsql-hackers
Thanks Michael!

On Sun, Jan 2, 2022 at 11:56 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Jan 02, 2022 at 09:27:43PM -0800, SATYANARAYANA NARLAPURAM wrote:
> I noticed that pg_receivewal fails to stream when the partial file to write
> is not fully initialized and fails with the error message something like
> below. This requires an extra step of deleting the partial file that is not
> fully initialized before starting the pg_receivewal. Attaching a simple
> patch that creates a temp file, fully initialize it and rename the file to
> the desired wal segment name.

Are you referring to the pre-padding when creating a new partial
segment, aka when we write chunks of XLOG_BLCKSZ full of zeros until
the file is fully created?  What kind of error did you see?  I guess
that a write() with ENOSPC would be more likely, but you got a
different problem?

I see two cases, 1/ when no space  is left on the device and 2/ when the process is taken down forcibly (a VM/container crash)
 
  I don't disagree with improving such cases, but we
should not do things so as there is a risk of leaving behind an
infinite set of segments in case of repeated errors
 
Do you see a problem with the proposed patch that leaves the files behind, at least in my testing I don't see any files left behind?
 
, and partial
segments are already a kind of temporary file.

if the .partial file exists with not zero-padded up to the wal segment size (WalSegSz), then open_walfile fails with the below error. I have two options here, 1/ to continue padding the existing partial file and let it zero up to WalSegSz , 2/create a temp file as I did in the patch. I thought the latter is safe because it can handle corrupt cases as described below. Thoughts?

* When streaming to files, if an existing file exists we verify that it's
* either empty (just created), or a complete WalSegSz segment (in which
* case it has been created and padded). Anything else indicates a corrupt
* file. Compressed files have no need for padding, so just ignore this
* case.
 

-       if (dir_data->sync)
+       if (shouldcreatetempfile)
+       {
+               if (durable_rename(tmpsuffixpath, targetpath) != 0)
+               {
+                       close(fd);
+                       unlink(tmpsuffixpath);
+                       return NULL;
+               }
+       }

durable_rename() does a set of fsync()'s, but --no-sync should not
flush any data.
I need to look into this further, without this I am seeing random file close and rename failures and disconnecting the stream. Also it appears we are calling durable_rename when we are closing the file (dir_close) even without --no-sync. Should we worry about the padding case?
--
Michael

pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: TYPCATEGORY_{NETWORK,USER} [was Dubious usage of TYPCATEGORY_STRING]
Next
From: Stanislav Bashkyrtsev
Date:
Subject: Re: PostgreSQL stops when adding a breakpoint in CLion