Thread: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
Hi, I noticed that dir_open_for_write() in walmethods.c uses write() for WAL file initialization (note that this code is used by pg_receivewal and pg_basebackup) as opposed to core using pg_pwritev_with_retry() in XLogFileInitInternal() to avoid partial writes. Do we need to fix this? Thoughts? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Fri, Aug 05, 2022 at 03:55:26PM +0530, Bharath Rupireddy wrote: > I noticed that dir_open_for_write() in walmethods.c uses write() for > WAL file initialization (note that this code is used by pg_receivewal > and pg_basebackup) as opposed to core using pg_pwritev_with_retry() in > XLogFileInitInternal() to avoid partial writes. Do we need to fix > this? 0d56acfb has moved pg_pwritev_with_retry to be backend-only in fd.c :/ > Thoughts? Makes sense to me for the WAL segment pre-padding initialization, as we still want to point to the beginning of the segment after we are done with the pre-padding, and the code has an extra lseek(). -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 05, 2022 at 03:55:26PM +0530, Bharath Rupireddy wrote: > > I noticed that dir_open_for_write() in walmethods.c uses write() for > > WAL file initialization (note that this code is used by pg_receivewal > > and pg_basebackup) as opposed to core using pg_pwritev_with_retry() in > > XLogFileInitInternal() to avoid partial writes. Do we need to fix > > this? > > 0d56acfb has moved pg_pwritev_with_retry to be backend-only in fd.c :/ Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h so that everyone can use it. > > Thoughts? > > Makes sense to me for the WAL segment pre-padding initialization, as > we still want to point to the beginning of the segment after we are > done with the pre-padding, and the code has an extra lseek(). Thanks. I attached the v1 patch, please review it. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 1:12 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier <michael@paquier.xyz> wrote: > Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h > so that everyone can use it. > > > > Thoughts? > > > > Makes sense to me for the WAL segment pre-padding initialization, as > > we still want to point to the beginning of the segment after we are > > done with the pre-padding, and the code has an extra lseek(). > > Thanks. I attached the v1 patch, please review it. Hi Bharath, +1 for moving pg_pwritev_with_retry() to file_utils.c, but I think the commit message should say that is happening. Maybe the move should even be in a separate patch (IMHO it's nice to separate mechanical change patches from new logic/work patches). +/* + * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the given + * file of size total_sz in batches of size block_sz. + */ +ssize_t +pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz) Hmm, why not give it a proper name that says it writes zeroes? Size arguments around syscall-like facilities are usually size_t. + memset(zbuffer.data, 0, block_sz); I believe the modern fashion as of a couple of weeks ago is to tell the compiler to zero-initialise. Since it's a union you'd need designated initialiser syntax, something like zbuffer = { .data = {0} }? + iov[i].iov_base = zbuffer.data; + iov[i].iov_len = block_sz; How can it be OK to use caller supplied block_sz, when sizeof(zbuffer.data) is statically determined? What is the point of this argument? - dir_data->lasterrno = errno; + /* If errno isn't set, assume problem is no disk space */ + dir_data->lasterrno = errno ? errno : ENOSPC; This coding pattern is used in places where we blame short writes on lack of disk space without bothering to retry. But the code used in this patch already handles short writes: it always retries, until eventually, if you really are out of disk space, you should get an actual ENOSPC from the operating system. So I don't think the guess-it-must-be-ENOSPC technique applies here.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sun, Aug 7, 2022 at 7:43 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sun, Aug 7, 2022 at 1:12 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Sat, Aug 6, 2022 at 12:11 PM Michael Paquier <michael@paquier.xyz> wrote: > > Yeah. pg_pwritev_with_retry can also be part of common/file_utils.c/.h > > so that everyone can use it. > > > > > > Thoughts? > > > > > > Makes sense to me for the WAL segment pre-padding initialization, as > > > we still want to point to the beginning of the segment after we are > > > done with the pre-padding, and the code has an extra lseek(). > > > > Thanks. I attached the v1 patch, please review it. > > Hi Bharath, > > +1 for moving pg_pwritev_with_retry() to file_utils.c, but I think the > commit message should say that is happening. Maybe the move should > even be in a separate patch (IMHO it's nice to separate mechanical > change patches from new logic/work patches). Agree. I separated out the changes. > +/* > + * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the given > + * file of size total_sz in batches of size block_sz. > + */ > +ssize_t > +pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz) > > Hmm, why not give it a proper name that says it writes zeroes? Done. > Size arguments around syscall-like facilities are usually size_t. > > + memset(zbuffer.data, 0, block_sz); > > I believe the modern fashion as of a couple of weeks ago is to tell > the compiler to zero-initialise. Since it's a union you'd need > designated initialiser syntax, something like zbuffer = { .data = {0} > }? Hm, but we have many places still using memset(). If we were to change these syntaxes, IMO, it must be done separately. > + iov[i].iov_base = zbuffer.data; > + iov[i].iov_len = block_sz; > > How can it be OK to use caller supplied block_sz, when > sizeof(zbuffer.data) is statically determined? What is the point of > this argument? Yes, removed block_sz function parameter. > - dir_data->lasterrno = errno; > + /* If errno isn't set, assume problem is no disk space */ > + dir_data->lasterrno = errno ? errno : ENOSPC; > > This coding pattern is used in places where we blame short writes on > lack of disk space without bothering to retry. But the code used in > this patch already handles short writes: it always retries, until > eventually, if you really are out of disk space, you should get an > actual ENOSPC from the operating system. So I don't think the > guess-it-must-be-ENOSPC technique applies here. Done. Thanks for reviewing. PSA v2 patch-set. Also,I added a CF entry https://commitfest.postgresql.org/39/3803/ to give the patches a chance to get tested. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Sun, Aug 07, 2022 at 10:41:49AM +0530, Bharath Rupireddy wrote: > Agree. I separated out the changes. + +/* + * A convenience wrapper for pwritev() that retries on partial write. If an + * error is returned, it is unspecified how much has been written. + */ +ssize_t +pg_pwritev_with_retry(int fd, const struct iovec *iov, int iovcnt, off_t offset) If moving this routine, this could use a more explicit description, especially on errno, for example, that could be consumed by the caller on failure to know what's happening. >> +/* >> + * A convenience wrapper for pg_pwritev_with_retry() that zero-fills the given >> + * file of size total_sz in batches of size block_sz. >> + */ >> +ssize_t >> +pg_pwritev_with_retry_and_init(int fd, int total_sz, int block_sz) >> >> Hmm, why not give it a proper name that says it writes zeroes? > > Done. FWIW, when it comes to that we have a couple of routines that just use '0' to mean such a thing, aka palloc0(). I find 0002 confusing, as it introduces in fe_utils.c a new wrapper (pg_pwritev_with_retry_and_write_zeros) on what's already a wrapper (pg_pwritev_with_retry) for pwrite(). A second thing is that pg_pwritev_with_retry_and_write_zeros() is designed to work on WAL segments initialization and it uses XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing in its name that tells us so. This makes me question whether file_utils.c is a good location for this second thing. Could a new file be a better location? We have a xlogutils.c in the backend, and a name similar to that in src/common/ would be one possibility. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Sun, Aug 7, 2022 at 7:56 PM Michael Paquier <michael@paquier.xyz> wrote: > FWIW, when it comes to that we have a couple of routines that just use > '0' to mean such a thing, aka palloc0(). I find 0002 confusing, as it > introduces in fe_utils.c a new wrapper > (pg_pwritev_with_retry_and_write_zeros) on what's already a wrapper > (pg_pwritev_with_retry) for pwrite(). What about pg_write_initial_zeros(fd, size)? How it writes zeros (ie using vector I/O, and retrying) seems to be an internal detail, no? "initial" to make it clearer that it's at offset 0, or maybe pg_pwrite_zeros(fd, size, offset). > A second thing is that pg_pwritev_with_retry_and_write_zeros() is > designed to work on WAL segments initialization and it uses > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing > in its name that tells us so. This makes me question whether > file_utils.c is a good location for this second thing. Could a new > file be a better location? We have a xlogutils.c in the backend, and > a name similar to that in src/common/ would be one possibility. Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or maybe it's OK to use BLCKSZ with a comment to say that it's a bit arbitrary, or maybe it's better to define a new zero buffer of some arbitrary size just in this code if that is too strange. We could experiment with different size buffers to see how it performs, bearing in mind that every time we double it you halve the number of system calls, but also bearing in mind that at some point it's too much for the stack. I can tell you that the way that code works today was not really written with performance in mind (unlike, say, the code reverted from 9.4 that tried to do this with posix_fallocate()), it was just finding an excuse to call pwritev(), to exercise new fallback code being committed for use by later AIO stuff (more patches coming soon). The retry support was added because it seemed plausible that some system out there would start to do short writes as we cranked up the sizes for some implementation reason other than ENOSPC, so we should make a reusable retry routine. I think this should also handle the remainder after processing whole blocks, just for completeness. If I call the code as presented with size 8193, I think this code will only write 8192 bytes. I think if this ever needs to work on O_DIRECT files there would be an alignment constraint on the buffer and size, but I don't think we have to worry about that for now.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sun, Aug 7, 2022 at 3:19 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > A second thing is that pg_pwritev_with_retry_and_write_zeros() is > > designed to work on WAL segments initialization and it uses > > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing > > in its name that tells us so. This makes me question whether > > file_utils.c is a good location for this second thing. Could a new > > file be a better location? We have a xlogutils.c in the backend, and > > a name similar to that in src/common/ would be one possibility. > > Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or > maybe it's OK to use BLCKSZ with a comment to say that it's a bit > arbitrary, or maybe it's better to define a new zero buffer of some > arbitrary size just in this code if that is too strange. We could > experiment with different size buffers to see how it performs, bearing > in mind that every time we double it you halve the number of system > calls, but also bearing in mind that at some point it's too much for > the stack. I can tell you that the way that code works today was not > really written with performance in mind (unlike, say, the code > reverted from 9.4 that tried to do this with posix_fallocate()), it > was just finding an excuse to call pwritev(), to exercise new fallback > code being committed for use by later AIO stuff (more patches coming > soon). The retry support was added because it seemed plausible that > some system out there would start to do short writes as we cranked up > the sizes for some implementation reason other than ENOSPC, so we > should make a reusable retry routine. Yes, doubling the zerobuffer size to say 2 * XLOG_BLCKSZ or 2 * BLCKSZ reduces the system calls to half (right now, pg_pwritev_with_retry() gets called 64 times per 16MB WAL file, it writes in the batches of 32 blocks per call). Is there a ready-to-use tool or script or specific settings for pgbench (pgbench command line options or GUC settings) that I can play with to measure the performance? > I think this should also handle the remainder after processing whole > blocks, just for completeness. If I call the code as presented with size > 8193, I think this code will only write 8192 bytes. Hm, I will fix it. > I think if this ever needs to work on O_DIRECT files there would be an > alignment constraint on the buffer and size, but I don't think we have > to worry about that for now. We can add a comment about the above limitation, if required. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sun, Aug 7, 2022 at 9:22 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Sun, Aug 7, 2022 at 3:19 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > > A second thing is that pg_pwritev_with_retry_and_write_zeros() is > > > designed to work on WAL segments initialization and it uses > > > XLOG_BLCKSZ and PGAlignedXLogBlock for the job, but there is nothing > > > in its name that tells us so. This makes me question whether > > > file_utils.c is a good location for this second thing. Could a new > > > file be a better location? We have a xlogutils.c in the backend, and > > > a name similar to that in src/common/ would be one possibility. > > > > Yeah, I think it should probably be disconnected from XLOG_BLCKSZ, or > > maybe it's OK to use BLCKSZ with a comment to say that it's a bit > > arbitrary, or maybe it's better to define a new zero buffer of some > > arbitrary size just in this code if that is too strange. We could > > experiment with different size buffers to see how it performs, bearing > > in mind that every time we double it you halve the number of system > > calls, but also bearing in mind that at some point it's too much for > > the stack. I can tell you that the way that code works today was not > > really written with performance in mind (unlike, say, the code > > reverted from 9.4 that tried to do this with posix_fallocate()), it > > was just finding an excuse to call pwritev(), to exercise new fallback > > code being committed for use by later AIO stuff (more patches coming > > soon). The retry support was added because it seemed plausible that > > some system out there would start to do short writes as we cranked up > > the sizes for some implementation reason other than ENOSPC, so we > > should make a reusable retry routine. > > Yes, doubling the zerobuffer size to say 2 * XLOG_BLCKSZ or 2 * BLCKSZ > reduces the system calls to half (right now, pg_pwritev_with_retry() > gets called 64 times per 16MB WAL file, it writes in the batches of 32 > blocks per call). > > Is there a ready-to-use tool or script or specific settings for > pgbench (pgbench command line options or GUC settings) that I can play > with to measure the performance? I played with a simple insert use-case [1] that generates ~380 WAL files, with different block sizes. To my surprise, I have not seen any improvement with larger block sizes. I may be doing something wrong here, suggestions on to test and see the benefits are welcome. > > I think this should also handle the remainder after processing whole > > blocks, just for completeness. If I call the code as presented with size > > 8193, I think this code will only write 8192 bytes. > > Hm, I will fix it. Fixed. I'm attaching v5 patch-set. I've addressed review comments received so far and fixed a compiler warning that CF bot complained about. Please review it further. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Mon, Aug 8, 2022 at 6:10 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > I played with a simple insert use-case [1] that generates ~380 WAL > files, with different block sizes. To my surprise, I have not seen any > improvement with larger block sizes. I may be doing something wrong > here, suggestions on to test and see the benefits are welcome. > > > > I think this should also handle the remainder after processing whole > > > blocks, just for completeness. If I call the code as presented with size > > > 8193, I think this code will only write 8192 bytes. > > > > Hm, I will fix it. > > Fixed. > > I'm attaching v5 patch-set. I've addressed review comments received so > far and fixed a compiler warning that CF bot complained about. > > Please review it further. I tried to vary the zero buffer size to see if there's any huge benefit for the WAL-generating queries. Unfortunately, I didn't see any benefit on my dev system (16 vcore, 512GB SSD, 32GB RAM) . The use case I've tried is at [1] and the results are at [2]. Having said that, the use of pg_pwritev_with_retry() in walmethods.c will definitely reduce number of system calls - on HEAD the dir_open_for_write() makes pad_to_size/XLOG_BLCKSZ i.e. 16MB/8KB = 2,048 write() calls and with patch it makes only 64 pg_pwritev_with_retry() calls with XLOG_BLCKSZ zero buffer size. The proposed patches will provide straight 32x reduction in system calls (for pg_receivewal and pg_basebackup) apart from the safety against partial writes. [1] /* built source code with release flags */ ./configure --with-zlib --enable-depend --prefix=$PWD/inst/ --with-openssl --with-readline --with-perl --with-libxml CFLAGS='-O2' > install.log && make -j 8 install > install.log 2>&1 & \q ./pg_ctl -D data -l logfile stop rm -rf data /* ensured that nothing exists in OS page cache */ free -m sudo su sync; echo 3 > /proc/sys/vm/drop_caches exit free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "64GB";' ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET work_mem = "16MB";' ./psql -d postgres -c 'ALTER SYSTEM SET checkpoint_timeout = "1d";' ./pg_ctl -D data -l logfile restart ./psql -d postgres -c 'create table foo(bar int);' ./psql -d postgres \timing insert into foo select * from generate_series(1, 100000000); /* this query generates about 385 WAL files, no checkpoint hence no recycle of old WAL files, all new WAL files */ [2] HEAD Time: 84249.535 ms (01:24.250) HEAD with wal_init_zero off Time: 75086.300 ms (01:15.086) #define PWRITEV_BLCKSZ XLOG_BLCKSZ Time: 85254.302 ms (01:25.254) #define PWRITEV_BLCKSZ (4 * XLOG_BLCKSZ) Time: 83542.885 ms (01:23.543) #define PWRITEV_BLCKSZ (16 * XLOG_BLCKSZ) Time: 84035.770 ms (01:24.036) #define PWRITEV_BLCKSZ (64 * XLOG_BLCKSZ) Time: 84749.021 ms (01:24.749) #define PWRITEV_BLCKSZ (256 * XLOG_BLCKSZ) Time: 84273.466 ms (01:24.273) #define PWRITEV_BLCKSZ (512 * XLOG_BLCKSZ) Time: 84233.576 ms (01:24.234) -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Mon, Aug 08, 2022 at 06:10:23PM +0530, Bharath Rupireddy wrote: > I'm attaching v5 patch-set. I've addressed review comments received so > far and fixed a compiler warning that CF bot complained about. > > Please review it further. 0001 looks reasonable to me. + errno = 0; + rc = pg_pwritev_zeros(fd, pad_to_size); Do we need to reset errno? pg_pwritev_zeros() claims to set errno appropriately. +/* + * PWRITEV_BLCKSZ is same as XLOG_BLCKSZ for now, however it may change if + * writing more bytes per pg_pwritev_with_retry() call is proven to be more + * performant. + */ +#define PWRITEV_BLCKSZ XLOG_BLCKSZ This seems like something we should sort out now instead of leaving as future work. Given your recent note, I think we should just use XLOG_BLCKSZ and PGAlignedXLogBlock and add a comment about the performance findings with different buffer sizes. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Tue, Sep 20, 2022 at 04:00:26PM -0700, Nathan Bossart wrote: > On Mon, Aug 08, 2022 at 06:10:23PM +0530, Bharath Rupireddy wrote: >> I'm attaching v5 patch-set. I've addressed review comments received so >> far and fixed a compiler warning that CF bot complained about. >> >> Please review it further. > > 0001 looks reasonable to me. > > + errno = 0; > + rc = pg_pwritev_zeros(fd, pad_to_size); > > Do we need to reset errno? pg_pwritev_zeros() claims to set errno > appropriately. > > +/* > + * PWRITEV_BLCKSZ is same as XLOG_BLCKSZ for now, however it may change if > + * writing more bytes per pg_pwritev_with_retry() call is proven to be more > + * performant. > + */ > +#define PWRITEV_BLCKSZ XLOG_BLCKSZ > > This seems like something we should sort out now instead of leaving as > future work. Given your recent note, I think we should just use > XLOG_BLCKSZ and PGAlignedXLogBlock and add a comment about the performance > findings with different buffer sizes. I also noticed that the latest patch set no longer applies, so I've marked the commitfest entry as waiting-on-author. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Wed, Sep 21, 2022 at 4:30 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > 0001 looks reasonable to me. Thanks for reviewing. > + errno = 0; > + rc = pg_pwritev_zeros(fd, pad_to_size); > > Do we need to reset errno? pg_pwritev_zeros() claims to set errno > appropriately. Right, pg_pwritev_zeros(), (rather pg_pwritev_with_retry() ensures that pwritev() or pwrite()) sets the correct errno, please see Thomas's comments [1] as well. Removed it. > +/* > + * PWRITEV_BLCKSZ is same as XLOG_BLCKSZ for now, however it may change if > + * writing more bytes per pg_pwritev_with_retry() call is proven to be more > + * performant. > + */ > +#define PWRITEV_BLCKSZ XLOG_BLCKSZ > > This seems like something we should sort out now instead of leaving as > future work. Given your recent note, I think we should just use > XLOG_BLCKSZ and PGAlignedXLogBlock and add a comment about the performance > findings with different buffer sizes. Agreed. Removed the new structure and added a comment. Another change that I had to do was to allow lseek() even after pwrite() (via pg_pwritev_zeros()) on Windows in walmethods.c. Without this, the regression tests start to fail on Windows. And I figured out that it's not an issue with this patch, it looks like an issue with pwrite() implementation in win32pwrite.c, see the failure here [2], I plan to start a separate thread to discuss this. Please review the attached v4 patch set further. [1] https://www.postgresql.org/message-id/CA%2BhUKGJKwUrpP0igOFAv5khj3dbHvfWqLzFeo7vtNzDYXv_YZw%40mail.gmail.com [2] https://github.com/BRupireddy/postgres/tree/use_pwrite_without_lseek_on_windiws -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
+ PGAlignedXLogBlock zbuffer; + + memset(zbuffer.data, 0, XLOG_BLCKSZ); This seems excessive for only writing a single byte. +#ifdef WIN32 + /* + * XXX: It looks like on Windows, we need an explicit lseek() call here + * despite using pwrite() implementation from win32pwrite.c. Otherwise + * an error occurs. + */ I think this comment is too vague. Can we describe the error in more detail? Or better yet, can we fix it as a prerequisite to this patch set? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Sat, Sep 24, 2022 at 8:24 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > +#ifdef WIN32 > + /* > + * XXX: It looks like on Windows, we need an explicit lseek() call here > + * despite using pwrite() implementation from win32pwrite.c. Otherwise > + * an error occurs. > + */ > > I think this comment is too vague. Can we describe the error in more > detail? Or better yet, can we fix it as a prerequisite to this patch set? Although WriteFile() with a synchronous file handle and an explicit offset doesn't use the current file position, it appears that it still changes it. :-( You'd think from the documentation[1] that that isn't the case, because it says: "Considerations for working with synchronous file handles: * If lpOverlapped is NULL, the write operation starts at the current file position and WriteFile does not return until the operation is complete, and the system updates the file pointer before WriteFile returns. * If lpOverlapped is not NULL, the write operation starts at the offset that is specified in the OVERLAPPED structure and WriteFile does not return until the write operation is complete. The system updates the OVERLAPPED Internal and InternalHigh fields before WriteFile returns." So it's explicitly saying the file pointer is updated in the first bullet point and not the second, but src/port/win32pwrite.c is doing the second thing. Yet the following assertion added to Bharath's code fails[2]: +++ b/src/bin/pg_basebackup/walmethods.c @@ -221,6 +221,10 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, if (pad_to_size && wwmethod->compression_algorithm == PG_COMPRESSION_NONE) { ssize_t rc; + off_t before_offset; + off_t after_offset; + + before_offset = lseek(fd, 0, SEEK_CUR); rc = pg_pwritev_zeros(fd, pad_to_size); @@ -231,6 +235,9 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, return NULL; } + after_offset = lseek(fd, 0, SEEK_CUR); + Assert(before_offset == after_offset); + [1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position [2] https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sat, Sep 24, 2022 at 9:44 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > Although WriteFile() with a synchronous file handle and an explicit > offset doesn't use the current file position, it appears that it still > changes it. :-( > > You'd think from the documentation[1] that that isn't the case, because it says: > > "Considerations for working with synchronous file handles: > > * If lpOverlapped is NULL, the write operation starts at the current > file position and WriteFile does not return until the operation is > complete, and the system updates the file pointer before WriteFile > returns. > > * If lpOverlapped is not NULL, the write operation starts at the > offset that is specified in the OVERLAPPED structure and WriteFile > does not return until the write operation is complete. The system > updates the OVERLAPPED Internal and InternalHigh fields before > WriteFile returns." > > So it's explicitly saying the file pointer is updated in the first > bullet point and not the second, but src/port/win32pwrite.c is doing > the second thing. The WriteFile() and pwrite() implementation in win32pwrite.c are changing the file pointer on Windows, see the following and [4] for more details. # Running: pg_basebackup --no-sync -cfast -D C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/backup2 --no-manifest --waldir C:\cirrus\build\testrun\pg_basebackup\010_pg_basebackup\data\tmp_test_sV4r/xlog2 pg_basebackup: offset_before 0 and offset_after 16777216 aren't the same Assertion failed: offset_before == offset_after, file ../src/bin/pg_basebackup/walmethods.c, line 254 Irrespective of what Windows does with file pointers in WriteFile(), should we add lseek(SEEK_SET) in our own pwrite()'s implementation, something like [5]? This is rather hackish without fully knowing what Windows does internally in WriteFile(), but this does fix inherent issues that our pwrite() callers (there are quite a number of places that use pwrite() and presumes file pointer doesn't change on Windows) may have on Windows. See the regression tests passing [6] with the fix [5]. > Yet the following assertion added to Bharath's code > fails[2]: That was a very quick patch though, here's another version adding offset checks and an assertion [3], see the assertion failures here [4]. I also think that we need to discuss this issue separately. Thoughts? > [1] https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-writefile#synchronization-and-file-position > [2] https://api.cirrus-ci.com/v1/artifact/task/6201051266154496/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup [3] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v2 [4] - https://api.cirrus-ci.com/v1/artifact/task/5294264635621376/testrun/build/testrun/pg_basebackup/010_pg_basebackup/log/regress_log_010_pg_basebackup [5] diff --git a/src/port/win32pwrite.c b/src/port/win32pwrite.c index 7f2e62e8a7..542b548279 100644 --- a/src/port/win32pwrite.c +++ b/src/port/win32pwrite.c @@ -37,5 +37,16 @@ pwrite(int fd, const void *buf, size_t size, off_t offset) return -1; } + /* + * On Windows, it is found that WriteFile() changes the file pointer and we + * want pwrite() to not change. Hence, we explicitly reset the file pointer + * to beginning of the file. + */ + if (lseek(fd, 0, SEEK_SET) != 0) + { + _dosmaperr(GetLastError()); + return -1; + } + return result; } [6] - https://github.com/BRupireddy/postgres/tree/add_pwrite_and_offset_checks_in_walmethods_v3 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Mon, Sep 26, 2022 at 08:33:53PM +0530, Bharath Rupireddy wrote: > Irrespective of what Windows does with file pointers in WriteFile(), > should we add lseek(SEEK_SET) in our own pwrite()'s implementation, > something like [5]? This is rather hackish without fully knowing what > Windows does internally in WriteFile(), but this does fix inherent > issues that our pwrite() callers (there are quite a number of places > that use pwrite() and presumes file pointer doesn't change on Windows) > may have on Windows. See the regression tests passing [6] with the fix > [5]. I think so. I don't see why we would rather have each caller ensure pwrite() behaves as documented. > + /* > + * On Windows, it is found that WriteFile() changes the file > pointer and we > + * want pwrite() to not change. Hence, we explicitly reset the > file pointer > + * to beginning of the file. > + */ > + if (lseek(fd, 0, SEEK_SET) != 0) > + { > + _dosmaperr(GetLastError()); > + return -1; > + } > + > return result; > } Why reset to the beginning of the file? Shouldn't we reset it to what it was before the call to pwrite()? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Tue, Sep 27, 2022 at 10:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Mon, Sep 26, 2022 at 08:33:53PM +0530, Bharath Rupireddy wrote: > > Irrespective of what Windows does with file pointers in WriteFile(), > > should we add lseek(SEEK_SET) in our own pwrite()'s implementation, > > something like [5]? This is rather hackish without fully knowing what > > Windows does internally in WriteFile(), but this does fix inherent > > issues that our pwrite() callers (there are quite a number of places > > that use pwrite() and presumes file pointer doesn't change on Windows) > > may have on Windows. See the regression tests passing [6] with the fix > > [5]. > > I think so. I don't see why we would rather have each caller ensure > pwrite() behaves as documented. I don't think so, that's an extra kernel call. I think I'll just have to revert part of my recent change that removed the pg_ prefix from those function names in our code, and restore the comment that warns you about the portability hazard (I thought it went away with HP-UX 10, where we were literally calling lseek() before every write()). The majority of users of these functions don't intermix them with calls to read()/write(), so they don't care about the file position, so I think it's just something we'll have to continue to be mindful of in the places that do. Unless, that is, I can find a way to stop it from doing that... I've added this to my Windows to-do list. I am going to have a round of Windows hacking quite soon.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Tue, Sep 27, 2022 at 10:37:38AM +1300, Thomas Munro wrote: > I don't think so, that's an extra kernel call. I think I'll just have > to revert part of my recent change that removed the pg_ prefix from > those function names in our code, and restore the comment that warns > you about the portability hazard (I thought it went away with HP-UX > 10, where we were literally calling lseek() before every write()). > The majority of users of these functions don't intermix them with > calls to read()/write(), so they don't care about the file position, > so I think it's just something we'll have to continue to be mindful of > in the places that do. Ah, you're right, it's probably best to avoid the extra system call for the majority of callers that don't care about the file position. I retract my previous message. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Tue, Sep 27, 2022 at 3:08 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > I don't think so, that's an extra kernel call. I think I'll just have > to revert part of my recent change that removed the pg_ prefix from > those function names in our code, and restore the comment that warns > you about the portability hazard (I thought it went away with HP-UX > 10, where we were literally calling lseek() before every write()). > The majority of users of these functions don't intermix them with > calls to read()/write(), so they don't care about the file position, > so I think it's just something we'll have to continue to be mindful of > in the places that do. Yes, all of the existing pwrite() callers don't care about the file position, but the new callers such as the actual idea and patch proposed here in this thread cares. Is this the commit cf112c122060568aa06efe4e6e6fb9b2dd4f1090 part of which [1] you're planning to revert? If so, will we have the pg_{pwrite, pread} back? IIUC, we don't have lseek(SEEK_SET) in pg_{pwrite, pread} right? It is the callers responsibility to set the file position correctly if they wish to, isn't it? Oftentimes, developers miss the notes in the function comments and use these functions expecting them to not change file position which works well on non-Windows platforms but fails on Windows. This makes me think that we can have pwrite(), pread() introduced by cf112c122060568aa06efe4e6e6fb9b2dd4f1090 as-is and re-introduce pg_{pwrite, pread} with pwrite()/pread()+lseek(SEEK_SET) in win32pwrite.c and win32pread.c. These functions reduce the caller's efforts and reduce the duplicate code. If okay, I'm happy to lend my hands on this patch. Thoughts? > Unless, that is, I can find a way to stop it from doing that... I've > added this to my Windows to-do list. I am going to have a round of > Windows hacking quite soon. Thanks! I think it's time for me to start a new thread just for this to get more attention and opinion from other hackers and not to sidetrack the original idea and patch proposed in this thread. [1] https://github.com/postgres/postgres/blob/REL_14_STABLE/src/port/pwrite.c#L11 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Tue, Sep 27, 2022 at 6:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Tue, Sep 27, 2022 at 3:08 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > I don't think so, that's an extra kernel call. I think I'll just have > > to revert part of my recent change that removed the pg_ prefix from > > those function names in our code, and restore the comment that warns > > you about the portability hazard (I thought it went away with HP-UX > > 10, where we were literally calling lseek() before every write()). > > The majority of users of these functions don't intermix them with > > calls to read()/write(), so they don't care about the file position, > > so I think it's just something we'll have to continue to be mindful of > > in the places that do. > > Yes, all of the existing pwrite() callers don't care about the file > position, but the new callers such as the actual idea and patch > proposed here in this thread cares. > > Is this the commit cf112c122060568aa06efe4e6e6fb9b2dd4f1090 part of > which [1] you're planning to revert? Yeah, just the renaming parts of that. The lseek()-based emulation is definitely not coming back. Something like the attached.
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Something like the attached. Isn't it also better to add notes in win32pread.c and win32pwrite.c about the callers doing lseek(SEEK_SET) if they wish to and Windows implementations changing file position? We can also add a TODO item about replacing pg_ versions with pread and friends someday when Windows fixes the issue? Having it in the commit and include/port.h is good, but the comments in win32pread.c and win32pwrite.c make life easier IMO. Otherwise, the patch LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:> > On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Something like the attached. > > Isn't it also better to add notes in win32pread.c and win32pwrite.c > about the callers doing lseek(SEEK_SET) if they wish to and Windows > implementations changing file position? We can also add a TODO item > about replacing pg_ versions with pread and friends someday when > Windows fixes the issue? Having it in the commit and include/port.h is > good, but the comments in win32pread.c and win32pwrite.c make life > easier IMO. Otherwise, the patch LGTM. Thanks, will do. FWIW I doubt the OS itself will change released behaviour like that, but I did complain about the straight-up misleading documentation and they agreed and fixed it[1]. After some looking around, the only way I could find to avoid this behaviour is to switch to async handles, which do behave as documented in this respect, but then you can't use plain read/write at all unless you write replacement wrappers for those too, and AFAICT that adds more system calls (start write, wait for write to finish) and complexity/weirdness without any real payoff so it seems like the wrong direction, at least without more research that I'm not pursuing currently. (FWIW in AIO porting experiments a while back we used async handles to get IOs running concurrently with CPU work and possibly other IOs so it was actually worth doing, but that was only for smgr and the main wal writing code where there's no intermixed plain read/write calls as you have here, so the problem doesn't even come up.) [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Wed, Sep 28, 2022 at 12:32 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Sep 28, 2022 at 1:03 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote:> > > On Tue, Sep 27, 2022 at 3:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Something like the attached. > > > > Isn't it also better to add notes in win32pread.c and win32pwrite.c > > about the callers doing lseek(SEEK_SET) if they wish to and Windows > > implementations changing file position? We can also add a TODO item > > about replacing pg_ versions with pread and friends someday when > > Windows fixes the issue? Having it in the commit and include/port.h is > > good, but the comments in win32pread.c and win32pwrite.c make life > > easier IMO. Otherwise, the patch LGTM. > > Thanks, will do. I'm looking forward to getting it in. > FWIW I doubt the OS itself will change released > behaviour like that, but I did complain about the straight-up > misleading documentation and they agreed and fixed it[1]. > > [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309 Great! Is there any plan to request for change in WriteFile() to not alter file position? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Wed, Sep 28, 2022 at 5:43 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Sep 28, 2022 at 12:32 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > FWIW I doubt the OS itself will change released > > behaviour like that, but I did complain about the straight-up > > misleading documentation and they agreed and fixed it[1]. > > > > [1] https://github.com/MicrosoftDocs/sdk-api/pull/1309 > > Great! Is there any plan to request for change in WriteFile() to not > alter file position? Not from me. I stick to open source problems. Reporting bugs in documentation is legitimate self defence, though.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sat, Sep 24, 2022 at 1:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > + PGAlignedXLogBlock zbuffer; > + > + memset(zbuffer.data, 0, XLOG_BLCKSZ); > > This seems excessive for only writing a single byte. Yes, I removed it now, instead doing pg_pwrite(fd, "\0", 1, wal_segment_size - 1). > +#ifdef WIN32 > + /* > + * XXX: It looks like on Windows, we need an explicit lseek() call here > + * despite using pwrite() implementation from win32pwrite.c. Otherwise > + * an error occurs. > + */ > > I think this comment is too vague. Can we describe the error in more > detail? Or better yet, can we fix it as a prerequisite to this patch set? The commit b6d8a60aba322678585ebe11dab072a37ac32905 brings back pg_pwrite() and its friends. This puts the responsibility of doing lseek(SEEK_SET) on the callers if they wish to. Please see the v5 patch set and review it further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Thu, Sep 29, 2022 at 11:32:32AM +0530, Bharath Rupireddy wrote: > On Sat, Sep 24, 2022 at 1:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> >> + PGAlignedXLogBlock zbuffer; >> + >> + memset(zbuffer.data, 0, XLOG_BLCKSZ); >> >> This seems excessive for only writing a single byte. > > Yes, I removed it now, instead doing pg_pwrite(fd, "\0", 1, > wal_segment_size - 1). I don't think removing the use of PGAlignedXLogBlock here introduces any sort of alignment risk, so this should be alright. +#ifdef WIN32 + /* + * WriteFile() on Windows changes the current file position, hence we + * need an explicit lseek() here. See pg_pwrite() implementation in + * win32pwrite.c for more details. + */ Should we really surround this with a WIN32 check, or should we just unconditionally lseek() here? I understand that this helps avoid an extra system call on many platforms, but in theory another platform introduced in the future could have the same problem, and this seems like something that could easily be missed. Presumably we could do something fancier to indicate pg_pwrite()'s behavior in this regard, but I don't know if that sort of complexity is really worth it in order to save an lseek(). + iov[0].iov_base = zbuffer.data; This seems superfluous, but I don't think it's hurting anything. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Fri, Sep 30, 2022 at 6:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Thu, Sep 29, 2022 at 11:32:32AM +0530, Bharath Rupireddy wrote: > > On Sat, Sep 24, 2022 at 1:54 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> > >> + PGAlignedXLogBlock zbuffer; > >> + > >> + memset(zbuffer.data, 0, XLOG_BLCKSZ); > >> > >> This seems excessive for only writing a single byte. > > > > Yes, I removed it now, instead doing pg_pwrite(fd, "\0", 1, > > wal_segment_size - 1). > > I don't think removing the use of PGAlignedXLogBlock here introduces any > sort of alignment risk, so this should be alright. > > +#ifdef WIN32 > + /* > + * WriteFile() on Windows changes the current file position, hence we > + * need an explicit lseek() here. See pg_pwrite() implementation in > + * win32pwrite.c for more details. > + */ > > Should we really surround this with a WIN32 check, or should we just > unconditionally lseek() here? I understand that this helps avoid an extra > system call on many platforms, but in theory another platform introduced in > the future could have the same problem, and this seems like something that > could easily be missed. Presumably we could do something fancier to > indicate pg_pwrite()'s behavior in this regard, but I don't know if that > sort of complexity is really worth it in order to save an lseek(). +1 for just doing it always, with a one-liner comment like "pg_pwritev*() might move the file position". No reason to spam the source tree with more explanations of the exact reason. If someone ever comes up with another case where p- and non-p- I/O functions are intermixed and it's really worth saving a system call (don't get me wrong, we call lseek() an obscene amount elsewhere and I'd like to fix that, but this case isn't hot?) then I like your idea of a macro to tell you whether you need to. Earlier I wondered why we'd want to include "pg_pwritev" in the name of this zero-filling function (pwritev being an internal implementation detail), but now it seems like maybe a good idea because it highlights the file position portability problem by being a member of that family of similarly-named functions.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Fri, Sep 30, 2022 at 6:46 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > +1 for just doing it always, with a one-liner comment like > "pg_pwritev*() might move the file position". No reason to spam the > source tree with more explanations of the exact reason. +1 for resetting the file position in a platform-independent manner. But, a description there won't hurt IMO and it saves time for the hackers who spend time there and think why it's that way. > If someone > ever comes up with another case where p- and non-p- I/O functions are > intermixed and it's really worth saving a system call (don't get me > wrong, we call lseek() an obscene amount elsewhere and I'd like to fix > that, but this case isn't hot?) then I like your idea of a macro to > tell you whether you need to. I don't think we go that route as the code isn't a hot path and an extra system call wouldn't hurt performance much, a comment there should work. > Earlier I wondered why we'd want to include "pg_pwritev" in the name > of this zero-filling function (pwritev being an internal > implementation detail), but now it seems like maybe a good idea > because it highlights the file position portability problem by being a > member of that family of similarly-named functions. Hm. On Thu, Sep 29, 2022 at 10:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > + iov[0].iov_base = zbuffer.data; > > This seems superfluous, but I don't think it's hurting anything. Yes, I removed it. Adding a comment, something like [1], would make it more verbose, hence I've not added. I'm attaching the v6 patch set, please review it further. [1] /* * Use the first vector buffer to write the remaining size. Note that * zero buffer was already pointed to it above, hence just specifying * the size is enough here. */ -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Nathan Bossart
Date:
On Fri, Sep 30, 2022 at 08:09:04AM +0530, Bharath Rupireddy wrote: > I'm attaching the v6 patch set, please review it further. Looks reasonable to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Thu, Sep 29, 2022 at 08:09:56PM -0700, Nathan Bossart wrote: > Looks reasonable to me. 0001, to move pg_pwritev_with_retry() to a new home, seems fine, so applied. Regarding 0002, using pg_pwrite_zeros() as a routine name, as suggested by Thomas, sounds good to me. However, I am not really a fan of its dependency with PGAlignedXLogBlock, because it should be able to work with any buffers of any sizes, as long as the input buffer is aligned, shouldn't it? For example, what about PGAlignedBlock? So, should we make this more extensible? My guess would be the addition of the block size and the block pointer to the arguments of pg_pwrite_zeros(), in combination with a check to make sure that the input buffer is MAXALIGN()'d (with an Assert() rather than just an elog/pg_log_error?). -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Thu, Oct 27, 2022 at 11:24 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Sep 29, 2022 at 08:09:56PM -0700, Nathan Bossart wrote: > > Looks reasonable to me. > > 0001, to move pg_pwritev_with_retry() to a new home, seems fine, so > applied. Thanks. > Regarding 0002, using pg_pwrite_zeros() as a routine name, as > suggested by Thomas, sounds good to me. Changed. > However, I am not really a > fan of its dependency with PGAlignedXLogBlock, because it should be > able to work with any buffers of any sizes, as long as the input > buffer is aligned, shouldn't it? For example, what about > PGAlignedBlock? So, should we make this more extensible? My guess > would be the addition of the block size and the block pointer to the > arguments of pg_pwrite_zeros(), in combination with a check to make > sure that the input buffer is MAXALIGN()'d (with an Assert() rather > than just an elog/pg_log_error?). +1 to pass in the aligned buffer, its size and an assertion on the buffer size. Please see the attached v7 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, Interestingly, I also needed something like pg_pwrite_zeros() today. Exposed via smgr, for more efficient relation extensions. On 2022-10-27 14:54:00 +0900, Michael Paquier wrote: > Regarding 0002, using pg_pwrite_zeros() as a routine name, as > suggested by Thomas, sounds good to me. However, I am not really a > fan of its dependency with PGAlignedXLogBlock, because it should be > able to work with any buffers of any sizes, as long as the input > buffer is aligned, shouldn't it? For example, what about > PGAlignedBlock? So, should we make this more extensible? My guess > would be the addition of the block size and the block pointer to the > arguments of pg_pwrite_zeros(), in combination with a check to make > sure that the input buffer is MAXALIGN()'d (with an Assert() rather > than just an elog/pg_log_error?). I don't like passing in the buffer. That leads to code like in Bharat's latest version, where we now zero that buffer on every invocation of pg_pwrite_zeros() - not at all cheap. And every caller has to have provisions to provide that buffer. The block sizes don't need to match, do they? As long as the block is properly aligned, we can change the iov_len of the final iov to match whatever the size is being passed in, no? Why don't we define a static PGAlignedBlock source_of_zeroes; in file_utils.c, and use that in pg_pwrite_zeros(), being careful to set the iov_len arguments correctly? Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Thu, Oct 27, 2022 at 03:58:25PM -0700, Andres Freund wrote: > The block sizes don't need to match, do they? As long as the block is properly > aligned, we can change the iov_len of the final iov to match whatever the size > is being passed in, no? Hmm. Based on what Bharath has written upthread, it does not seem to matter if the size of the aligned block changes, either: https://www.postgresql.org/message-id/CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com I am honestly not sure whether it is a good idea to make file_utils.c depend on one of the compile-time page sizes in this routine, be it the page size of the WAL page size, as pg_write_zeros() would be used for some rather low-level operations. But we could as well just use a locally-defined structure with a buffer at 4kB or 8kB and call it a day? -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Fri, Oct 28, 2022 at 7:39 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 27, 2022 at 03:58:25PM -0700, Andres Freund wrote:
> > The block sizes don't need to match, do they? As long as the block is properly
> > aligned, we can change the iov_len of the final iov to match whatever the size
> > is being passed in, no?
>
> Hmm. Based on what Bharath has written upthread, it does not seem to
> matter if the size of the aligned block changes, either:
> https://www.postgresql.org/message-id/CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com
>
> I am honestly not sure whether it is a good idea to make file_utils.c
> depend on one of the compile-time page sizes in this routine, be it
> the page size of the WAL page size, as pg_write_zeros() would be used
> for some rather low-level operations. But we could as well just use a
> locally-defined structure with a buffer at 4kB or 8kB and call it a
> day?
+1. Please see the attached v8 patch.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>
> On Thu, Oct 27, 2022 at 03:58:25PM -0700, Andres Freund wrote:
> > The block sizes don't need to match, do they? As long as the block is properly
> > aligned, we can change the iov_len of the final iov to match whatever the size
> > is being passed in, no?
>
> Hmm. Based on what Bharath has written upthread, it does not seem to
> matter if the size of the aligned block changes, either:
> https://www.postgresql.org/message-id/CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com
>
> I am honestly not sure whether it is a good idea to make file_utils.c
> depend on one of the compile-time page sizes in this routine, be it
> the page size of the WAL page size, as pg_write_zeros() would be used
> for some rather low-level operations. But we could as well just use a
> locally-defined structure with a buffer at 4kB or 8kB and call it a
> day?
+1. Please see the attached v8 patch.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Fri, Oct 28, 2022 at 11:38:51AM +0530, Bharath Rupireddy wrote: > +1. Please see the attached v8 patch. + char data[PG_WRITE_BLCKSZ]; + double force_align_d; + int64 force_align_i64; +} PGAlignedWriteBlock; I have not checked in details, but that should do the job. Andres, Thomas, Nathan, perhaps you have a different view on the matter? -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2022-10-28 11:09:38 +0900, Michael Paquier wrote: > On Thu, Oct 27, 2022 at 03:58:25PM -0700, Andres Freund wrote: > > The block sizes don't need to match, do they? As long as the block is properly > > aligned, we can change the iov_len of the final iov to match whatever the size > > is being passed in, no? > > Hmm. Based on what Bharath has written upthread, it does not seem to > matter if the size of the aligned block changes, either: > https://www.postgresql.org/message-id/CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com > > I am honestly not sure whether it is a good idea to make file_utils.c > depend on one of the compile-time page sizes in this routine, be it > the page size of the WAL page size, as pg_write_zeros() would be used > for some rather low-level operations. But we could as well just use a > locally-defined structure with a buffer at 4kB or 8kB and call it a > day? Shrug. I don't think we gain much by having yet another PGAlignedXYZBlock. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sat, Oct 29, 2022 at 3:37 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-10-28 11:09:38 +0900, Michael Paquier wrote: > > On Thu, Oct 27, 2022 at 03:58:25PM -0700, Andres Freund wrote: > > > The block sizes don't need to match, do they? As long as the block is properly > > > aligned, we can change the iov_len of the final iov to match whatever the size > > > is being passed in, no? > > > > Hmm. Based on what Bharath has written upthread, it does not seem to > > matter if the size of the aligned block changes, either: > > https://www.postgresql.org/message-id/CALj2ACUccjR7KbKqWOsQmqH1ZGEDyJ7hH5Ef+DOhcv7+kOnjCQ@mail.gmail.com > > > > I am honestly not sure whether it is a good idea to make file_utils.c > > depend on one of the compile-time page sizes in this routine, be it > > the page size of the WAL page size, as pg_write_zeros() would be used > > for some rather low-level operations. But we could as well just use a > > locally-defined structure with a buffer at 4kB or 8kB and call it a > > day? > > Shrug. I don't think we gain much by having yet another PGAlignedXYZBlock. Hm. I tend to agree with Andres here, i.e. using PGAlignedBlock is sufficient. It seems like we are using PGAlignedBlock for heap, index, history file, fsm, visibility map file pages as well. Please see the attached v9 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Alvaro Herrera
Date:
On 2022-Oct-27, Michael Paquier wrote: > On Thu, Sep 29, 2022 at 08:09:56PM -0700, Nathan Bossart wrote: > > Looks reasonable to me. > > 0001, to move pg_pwritev_with_retry() to a new home, seems fine, so > applied. Maybe something a bit useless, but while perusing the commits I noticed a forward struct declaration was moved from one file to another; this is claimed to avoid including port/pg_iovec.h in common/file_utils.h. We do that kind of thing in a few places, but in this particular case it seems a bit of a pointless exercise, since pg_iovec.h doesn't include anything else and it's a quite simple header. So I'm kinda proposing that we only do the forward struct initialization dance when it really saves on things -- in particular, when it helps avoid or reduce massive indirect header inclusion. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Sun, Oct 30, 2022 at 03:44:32PM +0100, Alvaro Herrera wrote: > So I'm kinda proposing that we only do the forward struct initialization > dance when it really saves on things -- in particular, when it helps > avoid or reduce massive indirect header inclusion. Sure. > extern ssize_t pg_pwritev_with_retry(int fd, > - const struct iovec *iov, > + const iovec *iov, > int iovcnt, > off_t offset); However this still needs to be defined as a struct, no? -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Mon, Oct 31, 2022 at 5:01 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Oct 30, 2022 at 03:44:32PM +0100, Alvaro Herrera wrote: > > So I'm kinda proposing that we only do the forward struct initialization > > dance when it really saves on things -- in particular, when it helps > > avoid or reduce massive indirect header inclusion. > > Sure. I don't think including pg_iovec.h in file_utils.h is a good idea. I agree that pg_iovec.h is fairly a small header file but file_utils.h is included in 21 .c files, as of today and the file_utils.h footprint might increase in future. Therefore, I'd vote for forward struct initialization as it is on HEAD today. > > extern ssize_t pg_pwritev_with_retry(int fd, > > - const struct iovec *iov, > > + const iovec *iov, > > int iovcnt, > > off_t offset); > > However this still needs to be defined as a struct, no? Yes, we need a struct there because we haven't typedef'ed struct iovec. Also, the patch forgets to remove "port/pg_iovec.h" from file_utils.c -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Mon, Oct 31, 2022 at 11:50 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Oct 31, 2022 at 5:01 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Sun, Oct 30, 2022 at 03:44:32PM +0100, Alvaro Herrera wrote: > > > So I'm kinda proposing that we only do the forward struct initialization > > > dance when it really saves on things -- in particular, when it helps > > > avoid or reduce massive indirect header inclusion. > > > > Sure. > > I don't think including pg_iovec.h in file_utils.h is a good idea. I > agree that pg_iovec.h is fairly a small header file but file_utils.h > is included in 21 .c files, as of today and the file_utils.h footprint > might increase in future. Therefore, I'd vote for forward struct > initialization as it is on HEAD today. I'm attaching the v9 patch from upthread here again for further review and to make CF bot happy. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Tue, Nov 01, 2022 at 08:32:48AM +0530, Bharath Rupireddy wrote: > I'm attaching the v9 patch from upthread here again for further review > and to make CF bot happy. So, I have looked at that, and at the end concluded that Andres' suggestion to use PGAlignedBlock in pg_write_zeros() will serve better in the long run. Thomas has mentioned upthread that some of the comments don't need to be that long, so I have tweaked these to be minimal, and updated a few more areas. Note that this has been split into two commits: one to introduce the new routine in file_utils.c and a second for the switch in walmethods.c. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
John Naylor
Date:
On Tue, Nov 8, 2022 at 11:08 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> So, I have looked at that, and at the end concluded that Andres'
> suggestion to use PGAlignedBlock in pg_write_zeros() will serve better
> in the long run. Thomas has mentioned upthread that some of the
> comments don't need to be that long, so I have tweaked these to be
> minimal, and updated a few more areas. Note that this has been split
> into two commits: one to introduce the new routine in file_utils.c and
> a second for the switch in walmethods.c.
Was there supposed to be an attachment here?
--
John Naylor
EDB: http://www.enterprisedb.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Fri, Nov 11, 2022 at 11:14 AM John Naylor <john.naylor@enterprisedb.com> wrote: > > On Tue, Nov 8, 2022 at 11:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > So, I have looked at that, and at the end concluded that Andres' > > suggestion to use PGAlignedBlock in pg_write_zeros() will serve better > > in the long run. Thomas has mentioned upthread that some of the > > comments don't need to be that long, so I have tweaked these to be > > minimal, and updated a few more areas. Note that this has been split > > into two commits: one to introduce the new routine in file_utils.c and > > a second for the switch in walmethods.c. > > Was there supposed to be an attachment here? Nope. The patches have already been committed - 3bdbdf5d06f2179d4c17926d77ff734ea9e7d525 and 28cc2976a9cf0ed661dbc55f49f669192cce1c89. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Fri, Nov 11, 2022 at 11:53:08AM +0530, Bharath Rupireddy wrote: > On Fri, Nov 11, 2022 at 11:14 AM John Naylor <john.naylor@enterprisedb.com> wrote: >> Was there supposed to be an attachment here? > > Nope. The patches have already been committed - > 3bdbdf5d06f2179d4c17926d77ff734ea9e7d525 and > 28cc2976a9cf0ed661dbc55f49f669192cce1c89. The committed patches are pretty much the same as the last version sent on this thread, except that the changes have been split across the files they locally impact, with a few simplifications tweaks to the comments. Hencem I did not see any need to send a new version for this case. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
John Naylor
Date:
On Fri, Nov 11, 2022 at 2:12 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Nov 11, 2022 at 11:53:08AM +0530, Bharath Rupireddy wrote:
> > On Fri, Nov 11, 2022 at 11:14 AM John Naylor <john.naylor@enterprisedb.com> wrote:
> >> Was there supposed to be an attachment here?
> >
> > Nope. The patches have already been committed -
> > 3bdbdf5d06f2179d4c17926d77ff734ea9e7d525 and
> > 28cc2976a9cf0ed661dbc55f49f669192cce1c89.
>
> The committed patches are pretty much the same as the last version
> sent on this thread, except that the changes have been split across
> the files they locally impact, with a few simplifications tweaks to
> the comments. Hencem I did not see any need to send a new version for
> this case.
Ah, I missed that -- sorry for the noise.
--
John Naylor
EDB: http://www.enterprisedb.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2022-11-01 08:32:48 +0530, Bharath Rupireddy wrote: > +/* > + * pg_pwrite_zeros > + * > + * Writes zeros to a given file. Input parameters are "fd" (file descriptor of > + * the file), "size" (size of the file in bytes). > + * > + * On failure, a negative value is returned and errno is set appropriately so > + * that the caller can use it accordingly. > + */ > +ssize_t > +pg_pwrite_zeros(int fd, size_t size) > +{ > + PGAlignedBlock zbuffer; > + size_t zbuffer_sz; > + struct iovec iov[PG_IOV_MAX]; > + int blocks; > + size_t remaining_size = 0; > + int i; > + ssize_t written; > + ssize_t total_written = 0; > + > + zbuffer_sz = sizeof(zbuffer.data); > + > + /* Zero-fill the buffer. */ > + memset(zbuffer.data, 0, zbuffer_sz); I previously commented on this - why are we memseting a buffer on every call to this? That's not at all free. Something like static const PGAlignedBlock zerobuf = {0}; would do the trick. You do need to cast the const away, to assign to iov_base, but that's not too ugly. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sun, Feb 12, 2023 at 4:14 AM Andres Freund <andres@anarazel.de> wrote: > > > +ssize_t > > +pg_pwrite_zeros(int fd, size_t size) > > +{ > > + PGAlignedBlock zbuffer; > > + size_t zbuffer_sz; > > + struct iovec iov[PG_IOV_MAX]; > > + int blocks; > > + size_t remaining_size = 0; > > + int i; > > + ssize_t written; > > + ssize_t total_written = 0; > > + > > + zbuffer_sz = sizeof(zbuffer.data); > > + > > + /* Zero-fill the buffer. */ > > + memset(zbuffer.data, 0, zbuffer_sz); > > I previously commented on this - why are we memseting a buffer on every call > to this? That's not at all free. > > Something like > static const PGAlignedBlock zerobuf = {0}; > would do the trick. You do need to cast the const away, to assign to > iov_base, but that's not too ugly. Thanks for looking at it. We know that we don't change the zbuffer in the function, so can we avoid static const and have just a static variable, like the attached v1-0001-Use-static-variable-to-avoid-memset-calls-in-pg_p.patch? Do you see any problem with it? FWIW, it comes out like the attached v1-0001-Use-static-const-variable-to-avoid-memset-calls-i.patch with static const. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote: > Thanks for looking at it. We know that we don't change the zbuffer in > the function, so can we avoid static const and have just a static > variable, like the attached > v1-0001-Use-static-variable-to-avoid-memset-calls-in-pg_p.patch? Do > you see any problem with it? Making it static const is better, because it allows the memory for the variable to be put in a readonly section. > /* Prepare to write out a lot of copies of our zero buffer at once. */ > for (i = 0; i < lengthof(iov); ++i) > { > - iov[i].iov_base = zbuffer.data; > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, &zbuffer)->data); > iov[i].iov_len = zbuffer_sz; > } Another thing: I think we should either avoid iterating over all the IOVs if we don't need them, or, even better, initialize the array as a constant, once. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Sun, Feb 12, 2023 at 09:31:36AM -0800, Andres Freund wrote: > Another thing: I think we should either avoid iterating over all the IOVs if > we don't need them, or, even better, initialize the array as a constant, once. Where you imply that we'd still use memset() once on iov[PG_IOV_MAX], right? -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Sun, Feb 12, 2023 at 11:01 PM Andres Freund <andres@anarazel.de> wrote: > > On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote: > > Thanks for looking at it. We know that we don't change the zbuffer in > > the function, so can we avoid static const and have just a static > > variable, like the attached > > v1-0001-Use-static-variable-to-avoid-memset-calls-in-pg_p.patch? Do > > you see any problem with it? > > Making it static const is better, because it allows the memory for the > variable to be put in a readonly section. Okay. > > /* Prepare to write out a lot of copies of our zero buffer at once. */ > > for (i = 0; i < lengthof(iov); ++i) > > { > > - iov[i].iov_base = zbuffer.data; > > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, &zbuffer)->data); > > iov[i].iov_len = zbuffer_sz; > > } > > Another thing: I think we should either avoid iterating over all the IOVs if > we don't need them, or, even better, initialize the array as a constant, once. How about like the attached patch? It makes the iovec static variable and points the zero buffer only once/for the first time to iovec. This avoids for-loop on every call. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Kyotaro Horiguchi
Date:
At Mon, 13 Feb 2023 10:15:03 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Sun, Feb 12, 2023 at 11:01 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote: > > > /* Prepare to write out a lot of copies of our zero buffer at once. */ > > > for (i = 0; i < lengthof(iov); ++i) > > > { > > > - iov[i].iov_base = zbuffer.data; > > > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, &zbuffer)->data); > > > iov[i].iov_len = zbuffer_sz; > > > } > > > > Another thing: I think we should either avoid iterating over all the IOVs if > > we don't need them, or, even better, initialize the array as a constant, once. FWIW, I tried to use the "{[start .. end] = {}}" trick (GNU extension? [*1]) for constant array initialization, but individual members don't accept assigning a const value, thus I did deconstify as the follows. > static const struct iovec iov[PG_IOV_MAX] = > {[0 ... PG_IOV_MAX - 1] = > { > .iov_base = (void *)&zbuffer.data, > .iov_len = BLCKSZ > } > }; I didn't checked the actual mapping, but if I tried an assignment "iov[0].iov_base = NULL", it failed as "assignment of member ‘iov_base’ in read-only object", so is it successfully placed in a read-only segment? Later code assigns iov[0].iov_len thus we need to provide a separate iov non-const variable, or can we use pwrite instead there? (I didn't find pg_pwrite_with_retry(), though) > How about like the attached patch? It makes the iovec static variable > and points the zero buffer only once/for the first time to iovec. This > avoids for-loop on every call. As the patch itself, it seems forgetting to reset iov[0].iov_len after writing a partial block. retards. *1: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
On 2023-02-13 18:33:34 +0900, Kyotaro Horiguchi wrote: > At Mon, 13 Feb 2023 10:15:03 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > On Sun, Feb 12, 2023 at 11:01 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > On 2023-02-12 19:59:00 +0530, Bharath Rupireddy wrote: > > > > /* Prepare to write out a lot of copies of our zero buffer at once. */ > > > > for (i = 0; i < lengthof(iov); ++i) > > > > { > > > > - iov[i].iov_base = zbuffer.data; > > > > + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, &zbuffer)->data); > > > > iov[i].iov_len = zbuffer_sz; > > > > } > > > > > > Another thing: I think we should either avoid iterating over all the IOVs if > > > we don't need them, or, even better, initialize the array as a constant, once. > > FWIW, I tried to use the "{[start .. end] = {}}" trick (GNU extension? > [*1]) for constant array initialization, but individual members don't > accept assigning a const value, thus I did deconstify as the follows. > > > static const struct iovec iov[PG_IOV_MAX] = > > {[0 ... PG_IOV_MAX - 1] = > > { > > .iov_base = (void *)&zbuffer.data, > > .iov_len = BLCKSZ > > } > > }; > > I didn't checked the actual mapping, but if I tried an assignment > "iov[0].iov_base = NULL", it failed as "assignment of member > ‘iov_base’ in read-only object", so is it successfully placed in a > read-only segment? > > Later code assigns iov[0].iov_len thus we need to provide a separate > iov non-const variable, or can we use pwrite instead there? (I didn't > find pg_pwrite_with_retry(), though) Given that we need to do that, and given that we already need to loop to handle writes that are longer than PG_IOV_MAX * BLCKSZ, it's probably not worth avoiding iov initialization. But I think it's worth limiting the initialization to blocks. I'd also try to combine the first pg_writev_* with the second one. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-12 09:31:36 -0800, Andres Freund wrote: > Another thing: I think we should either avoid iterating over all the IOVs if > we don't need them, or, even better, initialize the array as a constant, once. I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an offset parameter. Huh, what lead to the function being so constrained? - Andres
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Mon, Feb 13, 2023 at 05:10:56PM -0800, Andres Freund wrote: > I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an > offset parameter. Huh, what lead to the function being so constrained? Its current set of uses cases, where we only use it now to initialize with zeros with WAL segments. If you have a case that plans to use that stuff with an offset, no problem with me. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Mon, Feb 13, 2023 at 11:09 PM Andres Freund <andres@anarazel.de> wrote: > > > Later code assigns iov[0].iov_len thus we need to provide a separate > > iov non-const variable, or can we use pwrite instead there? (I didn't > > find pg_pwrite_with_retry(), though) > > Given that we need to do that, and given that we already need to loop to > handle writes that are longer than PG_IOV_MAX * BLCKSZ, it's probably not > worth avoiding iov initialization. > > But I think it's worth limiting the initialization to blocks. We can still optimize away the for loop by using a single iovec for remaining size, like the attached v2 patch. > I'd also try to combine the first pg_writev_* with the second one. Done, PSA v2 patch. On Tue, Feb 14, 2023 at 6:40 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-02-12 09:31:36 -0800, Andres Freund wrote: > > Another thing: I think we should either avoid iterating over all the IOVs if > > we don't need them, or, even better, initialize the array as a constant, once. > > I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an > offset parameter. Huh, what lead to the function being so constrained? Done, PSA v2 patch. We could do few more things, but honestly I feel they're unnecessary: 1) An assert-only code that checks if the asked file contents are zeroed at the end of pg_pwrite_zeros (to be more defensive, but reading 16MB files and checking if it's zero-filled will surely slowdown the Assert builds). 2) A small test module passing in a file with the size to write isn't multiple of block size, meaning, the code we have in the function to write last remaining bytes (less than BLCKSZ) gets covered which isn't covered right now - https://coverage.postgresql.org/src/common/file_utils.c.gcov.html. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-14 16:06:24 +0900, Michael Paquier wrote: > On Mon, Feb 13, 2023 at 05:10:56PM -0800, Andres Freund wrote: > > I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an > > offset parameter. Huh, what lead to the function being so constrained? > > Its current set of uses cases, where we only use it now to initialize > with zeros with WAL segments. If you have a case that plans to use > that stuff with an offset, no problem with me. Then it really shouldn't have been named pg_pwrite_zeros(). The point of the p{write,read}{,v} family of functions is to be able to specify the offset to read/write at. I assume the p is for position, but I'm not sure. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-14 18:00:00 +0530, Bharath Rupireddy wrote: > On Mon, Feb 13, 2023 at 11:09 PM Andres Freund <andres@anarazel.de> wrote: > > > > > Later code assigns iov[0].iov_len thus we need to provide a separate > > > iov non-const variable, or can we use pwrite instead there? (I didn't > > > find pg_pwrite_with_retry(), though) > > > > Given that we need to do that, and given that we already need to loop to > > handle writes that are longer than PG_IOV_MAX * BLCKSZ, it's probably not > > worth avoiding iov initialization. > > > > But I think it's worth limiting the initialization to blocks. > > We can still optimize away the for loop by using a single iovec for > remaining size, like the attached v2 patch. > > > I'd also try to combine the first pg_writev_* with the second one. > > Done, PSA v2 patch. This feels way too complicated to me. How about something more like the attached? > 2) A small test module passing in a file with the size to write isn't > multiple of block size, meaning, the code we have in the function to > write last remaining bytes (less than BLCKSZ) gets covered which isn't > covered right now - FWIW, I tested this locally by just specifying a smaller size than BLCKSZ for the write size. Greetings, Andres Freund
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Tue, Feb 14, 2023 at 04:46:07PM -0800, Andres Freund wrote: > Then it really shouldn't have been named pg_pwrite_zeros(). The point of the > p{write,read}{,v} family of functions is to be able to specify the offset to > read/write at. I assume the p is for position, but I'm not sure. 'p' could stand for POSIX, though both read() and pread() are in it. Anyway, it looks that your guess may be right: https://stackoverflow.com/questions/17877556/what-does-p-stand-for-in-function-names-pwrite-and-pread Even there, people don't seem completely sure. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Kyotaro Horiguchi
Date:
At Tue, 14 Feb 2023 16:55:25 -0800, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2023-02-14 18:00:00 +0530, Bharath Rupireddy wrote: > > Done, PSA v2 patch. > > This feels way too complicated to me. How about something more like the > attached? I like this one, but the parameters offset and size are in a different order from pwrite(fd, buf, count, offset). I perfer the arrangement suggested by Bharath. And isn't it better to use Min(remaining_size, BLCKSZ) instead of a bare if statement? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi On 2023-02-15 10:28:37 +0900, Kyotaro Horiguchi wrote: > At Tue, 14 Feb 2023 16:55:25 -0800, Andres Freund <andres@anarazel.de> wrote in > > Hi, > > > > On 2023-02-14 18:00:00 +0530, Bharath Rupireddy wrote: > > > Done, PSA v2 patch. > > > > This feels way too complicated to me. How about something more like the > > attached? > > I like this one, but the parameters offset and size are in a different > order from pwrite(fd, buf, count, offset). I perfer the arrangement > suggested by Bharath. Yes, it probably is better. Not sure why I went with that order. > And isn't it better to use Min(remaining_size, BLCKSZ) instead of a bare if > statement? I really can't make myself care about which version is better :) Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Wed, Feb 15, 2023 at 6:25 AM Andres Freund <andres@anarazel.de> wrote: > > > Done, PSA v2 patch. > > This feels way too complicated to me. How about something more like the > attached? Thanks. I kind of did cost analysis of v2 and v3: Input: zero-fill a file of size 256*8K. v2 patch: iovec initialization with zerobuf for loop - 1 time zero-fill 32 blocks at once - 8 times stack memory - sizeof(PGAlignedBlock) + sizeof(struct iovec) * PG_IOV_MAX v3 patch: iovec initialization with zerobuf for loop - 8 times (7 times more than v2 patch) zero-fill 32 blocks at once - 8 times (no change from v2 patch) stack memory - sizeof(PGAlignedBlock) + sizeof(struct iovec) * PG_IOV_MAX (no change from v2 patch) The v3 patch reduces initialization of iovec array elements which is a clear win when pg_pwrite_zeros is called for sizes less than BLCKSZ many times (I assume this is what is needed for the relation extension lock improvements feature). However, it increases the number of iovec initialization with zerobuf for the cases when pg_pwrite_zeros is called for sizes far greater than BLCKSZ (for instance, WAL file initialization). FWIW, I attached v4 patch, a simplified version of the v2 - it initializes all the iovec array elements if the total blocks to be written crosses lengthof(iovec array), otherwise it initializes only the needed blocks. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Wed, Feb 15, 2023 at 01:00:00PM +0530, Bharath Rupireddy wrote: > The v3 patch reduces initialization of iovec array elements which is a > clear win when pg_pwrite_zeros is called for sizes less than BLCKSZ > many times (I assume this is what is needed for the relation extension > lock improvements feature). However, it increases the number of iovec > initialization with zerobuf for the cases when pg_pwrite_zeros is > called for sizes far greater than BLCKSZ (for instance, WAL file > initialization). It seems to me that v3 would do extra initializations only if pg_pwritev_with_retry() does *not* retry its writes, but that's not the case as it retries on a partial write as per its name. The number of iov buffers is stricly capped by remaining_size. FWIW, I find v3 proposed more elegant. > FWIW, I attached v4 patch, a simplified version of the v2 - it > initializes all the iovec array elements if the total blocks to be > written crosses lengthof(iovec array), otherwise it initializes only > the needed blocks. + static size_t zbuf_sz = BLCKSZ; In v4, what's the advantage of marking that as static? It could actually be dangerous if this is carelessly updated. Well, that's not the case, still.. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-16 16:58:23 +0900, Michael Paquier wrote: > On Wed, Feb 15, 2023 at 01:00:00PM +0530, Bharath Rupireddy wrote: > > The v3 patch reduces initialization of iovec array elements which is a > > clear win when pg_pwrite_zeros is called for sizes less than BLCKSZ > > many times (I assume this is what is needed for the relation extension > > lock improvements feature). However, it increases the number of iovec > > initialization with zerobuf for the cases when pg_pwrite_zeros is > > called for sizes far greater than BLCKSZ (for instance, WAL file > > initialization). In those cases the cost of initializing the IOV doesn't matter, relative to the other costs. The important point is to not initialize a lot of elements if they're not even needed. Because we need to overwrite the trailing iov element, it doesn't seem worth to try to "pre-initialize" iov. Referencing a static variable is more expensive than accessing an on-stack variable. Having a first-call check is more expensive than not having it. Thus making the iov and zbuf_sz static isn't helpful. Particularly the latter seems like a bad idea, because it's a compiler constant. > It seems to me that v3 would do extra initializations only if > pg_pwritev_with_retry() does *not* retry its writes, but that's not > the case as it retries on a partial write as per its name. The number > of iov buffers is stricly capped by remaining_size. I don't really understand this bit? Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Thu, Feb 16, 2023 at 11:00:20AM -0800, Andres Freund wrote: > I don't really understand this bit? As of this message, I saw this quote: https://www.postgresql.org/message-id/fCALj2ACXEBwY_bM3kmZEkYpcXsM+yGitpYHi4FdT6MSk6YRtKTQ@mail.gmail.com "However, it increases the number of iovec initialization with zerobuf for the cases when pg_pwrite_zeros is called for sizes far greater than BLCKSZ (for instance, WAL file initialization)." But it looks like I misunderstood what this quote meant compared to what v3 does. It is true that v3 sets iov_len and iov_base more than needed when writing sizes larger than BLCKSZ. Seems like you think that it is not really going to matter much to track which iovecs have been already initialized during the first loop on pg_pwritev_with_retry() to keep the code shorter? -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-02-17 16:19:46 +0900, Michael Paquier wrote: > But it looks like I misunderstood what this quote meant compared to > what v3 does. It is true that v3 sets iov_len and iov_base more than > needed when writing sizes larger than BLCKSZ. I don't think it does for writes larger than BLCKSZ, it just does more for writes larger than PG_IKOV_MAX * BLCKSZ. But in those cases CPU time is going to be spent elsewhere. > Seems like you think that it is not really going to matter much to track > which iovecs have been already initialized during the first loop on > pg_pwritev_with_retry() to keep the code shorter? Yes. I'd bet that, in the unlikely case you're going to see any difference at all, unconditionally initializing is going to win. Right now we memset() 8KB, and iterate over 32 IOVs, unconditionally, on every call. Even if we could do some further optimizations of what I did in the patch, you can initialize needed IOVs repeatedly a *lot* of times, before it shows up... I'm inclined to go with my version, with the argument order swapped to Bharath's order. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Fri, Feb 17, 2023 at 09:31:14AM -0800, Andres Freund wrote: > On 2023-02-17 16:19:46 +0900, Michael Paquier wrote: >> But it looks like I misunderstood what this quote meant compared to >> what v3 does. It is true that v3 sets iov_len and iov_base more than >> needed when writing sizes larger than BLCKSZ. > > I don't think it does for writes larger than BLCKSZ, it just does more for > writes larger than PG_IKOV_MAX * BLCKSZ. But in those cases CPU time is going > to be spent elsewhere. Yep. >> Seems like you think that it is not really going to matter much to track >> which iovecs have been already initialized during the first loop on >> pg_pwritev_with_retry() to keep the code shorter? > > Yes. I'd bet that, in the unlikely case you're going to see any difference at > all, unconditionally initializing is going to win. > > Right now we memset() 8KB, and iterate over 32 IOVs, unconditionally, on every > call. Even if we could do some further optimizations of what I did in the > patch, you can initialize needed IOVs repeatedly a *lot* of times, before it > shows up... > > I'm inclined to go with my version, with the argument order swapped to > Bharath's order. Okay. That's fine by me. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Bharath Rupireddy
Date:
On Mon, Feb 20, 2023 at 11:03 AM Michael Paquier <michael@paquier.xyz> wrote: > > > I'm inclined to go with my version, with the argument order swapped to > > Bharath's order. > > Okay. That's fine by me. I ran some tests on my dev system [1] and I don't see much difference between v3 and v4. So, +1 for v3 patch (+ argument order swap) from Andres to keep the code simple and elegant. [1] HEAD: 16MB (12.231 ms), 8190 Bytes (0.199 ms), 8192 Bytes (0.176 ms), 1GB (603.668 ms), 10GB (21184.936 ms (00:21.185)) v3 patch: 16MB (12.632 ms), 8190 Bytes (0.183 ms), 8192 Bytes (0.166 ms), 1GB (610.428 ms), 10GB (22647.308 ms (00:22.647)) v4 patch: 16MB (12.044 ms), 8190 Bytes (0.167 ms), 8192 Bytes (0.139 ms), 1GB (603.848 ms), 10GB (21225.331 ms (00:21.225)) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Mon, Feb 20, 2023 at 01:54:00PM +0530, Bharath Rupireddy wrote: > I ran some tests on my dev system [1] and I don't see much difference > between v3 and v4. So, +1 for v3 patch (+ argument order swap) from > Andres to keep the code simple and elegant. This thread has stalled for a couple of weeks, so I have gone back to it. Testing on a tmpfs I am not seeing a difference if performance for any of the approaches discussed. At the end, as I am the one behind the introduction of pg_pwrite_zeros(), I have applied v3 after switches the size and offset parameters to be the same way as in v4. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-03-06 13:29:50 +0900, Michael Paquier wrote: > On Mon, Feb 20, 2023 at 01:54:00PM +0530, Bharath Rupireddy wrote: > > I ran some tests on my dev system [1] and I don't see much difference > > between v3 and v4. So, +1 for v3 patch (+ argument order swap) from > > Andres to keep the code simple and elegant. > > This thread has stalled for a couple of weeks, so I have gone back to > it. Testing on a tmpfs I am not seeing a difference if performance > for any of the approaches discussed. At the end, as I am the one > behind the introduction of pg_pwrite_zeros(), I have applied v3 after > switches the size and offset parameters to be the same way as in v4. Thanks. Greetings, Andres Freund
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Mon, Mar 6, 2023 at 5:30 PM Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Feb 20, 2023 at 01:54:00PM +0530, Bharath Rupireddy wrote: > > I ran some tests on my dev system [1] and I don't see much difference > > between v3 and v4. So, +1 for v3 patch (+ argument order swap) from > > Andres to keep the code simple and elegant. > > This thread has stalled for a couple of weeks, so I have gone back to > it. Testing on a tmpfs I am not seeing a difference if performance > for any of the approaches discussed. At the end, as I am the one > behind the introduction of pg_pwrite_zeros(), I have applied v3 after > switches the size and offset parameters to be the same way as in v4. Apparently ye olde GCC 4.7 on "lapwing" doesn't like the way you initialised that struct. I guess it wants {{0}} instead of {0}. Apparently old GCC was wrong about that warning[1], but that system doesn't have the back-patched fixes? Not sure. [1] https://stackoverflow.com/questions/63355760/how-standard-is-the-0-initializer-in-c89
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Tue, Mar 7, 2023 at 3:42 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Apparently ye olde GCC 4.7 on "lapwing" doesn't like the way you > initialised that struct. I guess it wants {{0}} instead of {0}. > Apparently old GCC was wrong about that warning[1], but that system > doesn't have the back-patched fixes? Not sure. Oh, you already pushed a fix. But now I'm wondering if it's useful to have old buggy compilers set to run with -Werror.
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Tue, Mar 07, 2023 at 03:44:46PM +1300, Thomas Munro wrote: > On Tue, Mar 7, 2023 at 3:42 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> Apparently ye olde GCC 4.7 on "lapwing" doesn't like the way you >> initialised that struct. I guess it wants {{0}} instead of {0}. >> Apparently old GCC was wrong about that warning[1], but that system >> doesn't have the back-patched fixes? Not sure. 6392f2a was one such case. > Oh, you already pushed a fix. But now I'm wondering if it's useful to > have old buggy compilers set to run with -Werror. Yes, as far as I can see when investigating the issue, this is an old bug of gcc when detecting where the initialization needs to be applied. And at the same time the fix is deadly simple, so the current statu-quo does not sound that bad to me. Note that lapwing is one of the only animals testing 32b builds, and it has saved from quite few bugs over the years. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Michael Paquier
Date:
On Mon, Mar 06, 2023 at 02:29:50PM -0800, Andres Freund wrote: > Thanks. Sure, no problem. If there is anything else needed for this thread, feel free to ping me here. -- Michael
Attachment
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Tue, Mar 7, 2023 at 5:32 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Mar 07, 2023 at 03:44:46PM +1300, Thomas Munro wrote: > > Oh, you already pushed a fix. But now I'm wondering if it's useful to > > have old buggy compilers set to run with -Werror. > > Yes, as far as I can see when investigating the issue, this is an old > bug of gcc when detecting where the initialization needs to be > applied. And at the same time the fix is deadly simple, so the > current statu-quo does not sound that bad to me. Note that lapwing is > one of the only animals testing 32b builds, and it has saved from > quite few bugs over the years. Yeah, but I'm just wondering, why not run a current release on it[1]? Debian is one of the few distributions still supporting 32 bit kernels, and it's good to test rare things, but AFAIK the primary reason we finish up with EOL'd OSes in the 'farm is because they have been forgotten (the secondary reason is because they couldn't be upgraded because the OS dropped the [micro]architecture). Unlike vintage SPARC, actual users might plausibly be running a current release on a 32 bit Intel system, I guess (maybe on a Quark microcontroller?)? BTW CI also tests 32 bit with -m32 on Debian, but with a 64 bit kernel, which probably doesn't change much at the level we care about, so maybe this doesn't matter much... just sharing an observation that we're wasting time thinking about an OS release that gave up the ghost in 2016, because it is running with -Werror. *shrug* [1] https://wiki.debian.org/DebianReleases
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Julien Rouhaud
Date:
On Tue, Mar 07, 2023 at 07:14:51PM +1300, Thomas Munro wrote: > On Tue, Mar 7, 2023 at 5:32 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Mar 07, 2023 at 03:44:46PM +1300, Thomas Munro wrote: > > > Oh, you already pushed a fix. But now I'm wondering if it's useful to > > > have old buggy compilers set to run with -Werror. > > > > Yes, as far as I can see when investigating the issue, this is an old > > bug of gcc when detecting where the initialization needs to be > > applied. And at the same time the fix is deadly simple, so the > > current statu-quo does not sound that bad to me. Note that lapwing is > > one of the only animals testing 32b builds, and it has saved from > > quite few bugs over the years. > > Yeah, but I'm just wondering, why not run a current release on it[1]? > Debian is one of the few distributions still supporting 32 bit > kernels, and it's good to test rare things, but AFAIK the primary > reason we finish up with EOL'd OSes in the 'farm is because they have > been forgotten (the secondary reason is because they couldn't be > upgraded because the OS dropped the [micro]architecture). I registered lapwing as a 32b Debian 7 so I thought it would be expected to keep it as-is rather than upgrading to all newer major Debian versions, especially since there were newer debian animal registered (no 32b though AFAICS). I'm not opposed to upgrading it but I think there's still value in having somewhat old packages versions being tested, especially since there isn't much 32b coverage of those. I would be happy to register a newer 32b version, or even sid, if needed but the -m32 part on the CI makes me think there isn't much value doing that now. Now about the -Werror: > BTW CI also tests 32 bit with -m32 on Debian, but with a 64 bit > kernel, which probably doesn't change much at the level we care about, > so maybe this doesn't matter much... just sharing an observation that > we're wasting time thinking about an OS release that gave up the ghost > in 2016, because it is running with -Werror. *shrug* I think this is the first time that a problem raised by -Werror on that old animal is actually a false positive, while there were many times it reported useful stuff. Now this has been up for years before we got better CI tooling, especially with -m32 support, so there might not be any value to have it anymore. As I mentioned at [1] I don't mind removing it or just work on upgrading any dependency (or removing known buggy compiler flags) to keep it without being annoying. In any case I'm usually quite fast at reacting to any problem/complaint on that animal, so you don't have to worry about the buildfarm being red too long if it came to that. [1] https://www.postgresql.org/message-id/20220921155025.wdixzbrt2uzbi6vz%40jrouhaud
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Thomas Munro
Date:
On Tue, Mar 7, 2023 at 7:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > I registered lapwing as a 32b Debian 7 so I thought it would be expected to > keep it as-is rather than upgrading to all newer major Debian versions, > especially since there were newer debian animal registered (no 32b though > AFAICS). Animals do get upgraded: see the "w. e. f." ("with effect from") line in https://buildfarm.postgresql.org/cgi-bin/show_members.pl which comes from people running something like ./update_personality.pl --os-version "11" so that it shows up on the website. > I'm not opposed to upgrading it but I think there's still value in > having somewhat old packages versions being tested, especially since there > isn't much 32b coverage of those. I would be happy to register a newer 32b > version, or even sid, if needed but the -m32 part on the CI makes me think > there isn't much value doing that now. Totally up to you as an animal zoo keeper but in my humble opinion the interesting range of Debian releases currently is 11-13, or maybe 10 if you really want to test the LTS/old-stable release (and CI is testing 11). > I think this is the first time that a problem raised by -Werror on that old > animal is actually a false positive, while there were many times it reported > useful stuff. Now this has been up for years before we got better CI tooling, > especially with -m32 support, so there might not be any value to have it > anymore. As I mentioned at [1] I don't mind removing it or just work on > upgrading any dependency (or removing known buggy compiler flags) to keep it > without being annoying. In any case I'm usually quite fast at reacting to any > problem/complaint on that animal, so you don't have to worry about the > buildfarm being red too long if it came to that. Yeah, it's given us lots of useful data, thanks. Personally I would upgrade it so it keeps telling us useful things but I feel like I've said enough about that so I'll shut up now :-) Re: being red too long... yeah that reminds me, I really need to fix seawasp ASAP...
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Alvaro Herrera
Date:
On 2023-Mar-07, Julien Rouhaud wrote: > I registered lapwing as a 32b Debian 7 so I thought it would be expected to > keep it as-is rather than upgrading to all newer major Debian versions, > especially since there were newer debian animal registered (no 32b though > AFAICS). I'm not opposed to upgrading it but I think there's still value in > having somewhat old packages versions being tested, especially since there > isn't much 32b coverage of those. I would be happy to register a newer 32b > version, or even sid, if needed but the -m32 part on the CI makes me think > there isn't much value doing that now. I think a pure 32bit animal running contemporary Debian would be better than just ditching the animal completely, as would appear to be the alternative, precisely because we have no other 32bit machine running x86 Linux. Maybe you can have *two* animals on the same machine: one running the old Debian without -Werror, and the one with new Debian and that flag kept. I think CI is not a replacement for the buildfarm. It helps catche some problems earlier, but we shouldn't think that we no longer need some buildfarm animals because CI runs those configs. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
From
Andres Freund
Date:
Hi, On 2023-03-07 15:44:46 +1300, Thomas Munro wrote: > On Tue, Mar 7, 2023 at 3:42 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Apparently ye olde GCC 4.7 on "lapwing" doesn't like the way you > > initialised that struct. I guess it wants {{0}} instead of {0}. > > Apparently old GCC was wrong about that warning[1], but that system > > doesn't have the back-patched fixes? Not sure. > > Oh, you already pushed a fix. But now I'm wondering if it's useful to > have old buggy compilers set to run with -Werror. I think it's actively harmful to do so. Avoiding warnings on a > 10 year old compiler a) is a waste of time b) unnecessarily requires making our code uglier. Greetings, Andres Freund