Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date
Msg-id CA+hUKGL8RneKhL5u6giWbDcna1puD5_6OaUgxSjnq10pw4Es=w@mail.gmail.com
Whole thread Raw
In response to Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
List pgsql-hackers
On Sat, Sep 24, 2022 at 8:24 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> +#ifdef WIN32
> +        /*
> +         * XXX: It looks like on Windows, we need an explicit lseek() call here
> +         * despite using pwrite() implementation from win32pwrite.c. Otherwise
> +         * an error occurs.
> +         */
>
> I think this comment is too vague.  Can we describe the error in more
> detail?  Or better yet, can we fix it as a prerequisite to this patch set?

Although WriteFile() with a synchronous file handle and an explicit
offset doesn't use the current file position, it appears that it still
changes it.  :-(

You'd think from the documentation[1] that that isn't the case, because it says:

"Considerations for working with synchronous file handles:

 * If lpOverlapped is NULL, the write operation starts at the current
file position and WriteFile does not return until the operation is
complete, and the system updates the file pointer before WriteFile
returns.

 * If lpOverlapped is not NULL, the write operation starts at the
offset that is specified in the OVERLAPPED structure and WriteFile
does not return until the write operation is complete. The system
updates the OVERLAPPED Internal and InternalHigh fields before
WriteFile returns."

So it's explicitly saying the file pointer is updated in the first
bullet point and not the second, but src/port/win32pwrite.c is doing
the second thing.  Yet the following assertion added to Bharath's code
fails[2]:

+++ b/src/bin/pg_basebackup/walmethods.c
@@ -221,6 +221,10 @@ dir_open_for_write(WalWriteMethod *wwmethod,
const char *pathname,
        if (pad_to_size && wwmethod->compression_algorithm ==
PG_COMPRESSION_NONE)
        {
                ssize_t rc;
+               off_t before_offset;
+               off_t after_offset;
+
+               before_offset = lseek(fd, 0, SEEK_CUR);

                rc = pg_pwritev_zeros(fd, pad_to_size);

@@ -231,6 +235,9 @@ dir_open_for_write(WalWriteMethod *wwmethod, const
char *pathname,
                        return NULL;
                }

+               after_offset = lseek(fd, 0, SEEK_CUR);
+               Assert(before_offset == after_offset);
+

[1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position
[2]
https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply