Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Date
Msg-id CALj2ACW5Cy2uC_Hu1V-aq5uJx+4i=UDabQr_WvQwOPsPQULEEA@mail.gmail.com
Whole thread Raw
In response to Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
List pgsql-hackers
On Sat, Sep 24, 2022 at 9:44 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> Although WriteFile() with a synchronous file handle and an explicit
> offset doesn't use the current file position, it appears that it still
> changes it.  :-(
>
> You'd think from the documentation[1] that that isn't the case, because it says:
>
> "Considerations for working with synchronous file handles:
>
>  * If lpOverlapped is NULL, the write operation starts at the current
> file position and WriteFile does not return until the operation is
> complete, and the system updates the file pointer before WriteFile
> returns.
>
>  * If lpOverlapped is not NULL, the write operation starts at the
> offset that is specified in the OVERLAPPED structure and WriteFile
> does not return until the write operation is complete. The system
> updates the OVERLAPPED Internal and InternalHigh fields before
> WriteFile returns."
>
> So it's explicitly saying the file pointer is updated in the first
> bullet point and not the second, but src/port/win32pwrite.c is doing
> the second thing.

The WriteFile() and pwrite() implementation in win32pwrite.c are
changing the file pointer on Windows, see the following and [4] for
more details.

# Running: pg_basebackup --no-sync -cfast -D
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/backup2
--no-manifest --waldir
C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/xlog2
pg_basebackup: offset_before 0 and offset_after 16777216 aren't the same
Assertion failed: offset_before == offset_after, file
../src/bin/pg_basebackup/walmethods.c, line 254

Irrespective of what Windows does with file pointers in WriteFile(),
should we add lseek(SEEK_SET) in our own pwrite()'s implementation,
something like [5]? This is rather hackish without fully knowing what
Windows does internally in WriteFile(), but this does fix inherent
issues that our pwrite() callers (there are quite a number of places
that use pwrite() and presumes file pointer doesn't change on Windows)
may have on Windows. See the regression tests passing [6] with the fix
[5].

> Yet the following assertion added to Bharath's code
> fails[2]:

That was a very quick patch though, here's another version adding
offset checks and an assertion [3], see the assertion failures here
[4].

I also think that we need to discuss this issue separately.

Thoughts?

> [1]
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position
> [2]
https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[3] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v2
[4] -
https://api.cirrus-ci.com/v1/artifact/task/5294264635621376/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
[5]
diff --git a/src/port/win32pwrite.c b/src/port/win32pwrite.c
index 7f2e62e8a7..542b548279 100644
--- a/src/port/win32pwrite.c
+++ b/src/port/win32pwrite.c
@@ -37,5 +37,16 @@ pwrite(int fd, const void *buf, size_t size, off_t offset)
                return -1;
        }

+       /*
+        * On Windows, it is found that WriteFile() changes the file
pointer and we
+        * want pwrite() to not change. Hence, we explicitly reset the
file pointer
+        * to beginning of the file.
+        */
+       if (lseek(fd, 0, SEEK_SET) != 0)
+       {
+               _dosmaperr(GetLastError());
+               return -1;
+       }
+
        return result;
 }
[6] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v3

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Next
From: "Jonathan S. Katz"
Date:
Subject: PostgreSQL 15 GA release date