Thread: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
Hi, Currently the server is erroring out when unable to remove/parse a logical rewrite file in CheckPointLogicalRewriteHeap wasting the amount of work the checkpoint has done and preventing the checkpoint from finishing. This is unlike CheckPointSnapBuild does for snapshot files i.e. it just emits a message at LOG level and continues if it is unable to parse or remove the file. Attaching a small patch applying the same idea to the mapping files. Thoughts? Regards, Bharath Rupireddy.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
On 12/31/21, 4:44 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > Currently the server is erroring out when unable to remove/parse a > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > amount of work the checkpoint has done and preventing the checkpoint > from finishing. This is unlike CheckPointSnapBuild does for snapshot > files i.e. it just emits a message at LOG level and continues if it is > unable to parse or remove the file. Attaching a small patch applying > the same idea to the mapping files. This seems reasonable to me. AFAICT moving on to other files after an error shouldn't cause any problems. In fact, it's probably beneficial to try to clean up as much as possible so that the files do not continue to build up. The only feedback I have for the patch is that I don't think the new comments are necessary. Nathan
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Thu, Jan 13, 2022 at 3:47 AM Bossart, Nathan <bossartn@amazon.com> wrote: > > On 12/31/21, 4:44 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > > Currently the server is erroring out when unable to remove/parse a > > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > > amount of work the checkpoint has done and preventing the checkpoint > > from finishing. This is unlike CheckPointSnapBuild does for snapshot > > files i.e. it just emits a message at LOG level and continues if it is > > unable to parse or remove the file. Attaching a small patch applying > > the same idea to the mapping files. > > This seems reasonable to me. AFAICT moving on to other files after an > error shouldn't cause any problems. In fact, it's probably beneficial > to try to clean up as much as possible so that the files do not > continue to build up. Thanks for the review Nathan! > The only feedback I have for the patch is that I don't think the new > comments are necessary. I borrowed the comments as-is from the CheckPointSnapBuild introduced by the commit b89e15105. IMO, let the comments be there as they explain why we are not emitting ERRORs, however I will leave it to the committer to decide on that. Regards, Bharath Rupireddy.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
On 1/12/22, 10:03 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote: > On Thu, Jan 13, 2022 at 3:47 AM Bossart, Nathan <bossartn@amazon.com> wrote: >> The only feedback I have for the patch is that I don't think the new >> comments are necessary. > > I borrowed the comments as-is from the CheckPointSnapBuild introduced > by the commit b89e15105. IMO, let the comments be there as they > explain why we are not emitting ERRORs, however I will leave it to the > committer to decide on that. Huh, somehow I missed that when I looked at it yesterday. I'm going to bump this one to ready-for-committer, then. Nathan
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote: > Currently the server is erroring out when unable to remove/parse a > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > amount of work the checkpoint has done and preventing the checkpoint > from finishing. This seems like it'd make failures to remove the files practically invisible. Which'd have it's own set of problems? What motivated proposing this change? Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Fri, Jan 14, 2022 at 1:08 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-12-31 18:12:37 +0530, Bharath Rupireddy wrote: > > Currently the server is erroring out when unable to remove/parse a > > logical rewrite file in CheckPointLogicalRewriteHeap wasting the > > amount of work the checkpoint has done and preventing the checkpoint > > from finishing. > > This seems like it'd make failures to remove the files practically > invisible. Which'd have it's own set of problems? > > What motivated proposing this change? We had an issue where there were many mapping files generated during the crash recovery and end-of-recovery checkpoint was taking a lot of time. We had to manually intervene and delete some of the mapping files (although it may not sound sensible) to make end-of-recovery checkpoint faster. Because of the race condition between manual deletion and checkpoint deletion, the unlink error occurred which crashed the server and the server entered the recovery again wasting the entire earlier recovery work. In summary, with the changes (emitting LOG-only messages for unlink failures and continuing with the other files) proposed for CheckPointLogicalRewriteHeap in this thread and the existing code in CheckPointSnapBuild, I'm sure it will help not waste the recovery that's has been done in case unlink fails for any reasons. Regards, Bharath Rupireddy.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Julien Rouhaud
Date:
Hi, On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > We had an issue where there were many mapping files generated during > the crash recovery and end-of-recovery checkpoint was taking a lot of > time. We had to manually intervene and delete some of the mapping > files (although it may not sound sensible) to make end-of-recovery > checkpoint faster. Because of the race condition between manual > deletion and checkpoint deletion, the unlink error occurred which > crashed the server and the server entered the recovery again wasting > the entire earlier recovery work. Maybe I'm missing something but wouldn't https://commitfest.postgresql.org/36/3448/ better solve the problem?
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Sat, Jan 15, 2022 at 02:04:12PM +0530, Bharath Rupireddy wrote: > > > > We had an issue where there were many mapping files generated during > > the crash recovery and end-of-recovery checkpoint was taking a lot of > > time. We had to manually intervene and delete some of the mapping > > files (although it may not sound sensible) to make end-of-recovery > > checkpoint faster. Because of the race condition between manual > > deletion and checkpoint deletion, the unlink error occurred which > > crashed the server and the server entered the recovery again wasting > > the entire earlier recovery work. > > Maybe I'm missing something but wouldn't > https://commitfest.postgresql.org/36/3448/ better solve the problem? The error can cause the new background process proposed there in that thread to restart, which is again costly. Since we have LOG-only and continue behavior in CheckPointSnapBuild already, having the same behavior for CheckPointLogicalRewriteHeap helps a lot. Regards, Bharath Rupireddy.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote: >> Maybe I'm missing something but wouldn't >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > The error can cause the new background process proposed there in that > thread to restart, which is again costly. Since we have LOG-only and > continue behavior in CheckPointSnapBuild already, having the same > behavior for CheckPointLogicalRewriteHeap helps a lot. [ stares at CheckPointLogicalRewriteHeap for awhile ... ] This code has got more problems than that. It took me awhile to absorb it, but we don't actually care about the contents of any of those files; all of the information is encoded in the file *names*. (This strikes me as probably not a very efficient design, compared to putting the same data into a single text file; but for now I'll assume we're not up for a complete rewrite.) That being the case, I wonder what it is we expect fsync'ing the surviving files to do exactly. We should be fsync'ing the directory not the files themselves, no? Other things that seem poorly thought out: * Why is the check for "map-" prefix after, rather than before, the lstat? * Why is it okay to ignore lstat failure? Seems like we might as well not even have the lstat. * The sscanf on the file name would not notice trailing junk, such as an editor backup marker. Is that okay? As far as the patch itself goes, I agree that failure to unlink is noncritical, because such a file would have no further effect and we can just ignore it. I think I also agree that failure of the sscanf is noncritical, because the implication of that is that the file name doesn't conform to our expectations, which means it's basically just like the check that causes us to ignore names not starting with "map-". (Actually, isn't the separate check for "map-" useless, given that sscanf will make the equivalent check?) I started out wondering why the patch didn't also change the loop's other ERROR conditions to LOG. But we do want to ERROR if we're unable to sync transient state down to disk, and that is what the other steps (think they) are doing. It might be worth a comment to point that out though, before someone decides that if these errors are just LOG level then the others can be too. Anyway, I think possibly we can drop the bottom half of the loop (the part trying to fsync non-removed files) in favor of fsync'ing the directory afterwards. Thoughts? regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
I wrote: > Anyway, I think possibly we can drop the bottom half of the loop > (the part trying to fsync non-removed files) in favor of fsync'ing > the directory afterwards. Thoughts? Oh, scratch that --- *this* loop doesn't care about the file contents, but other code does. However, don't we need a directory fsync too? Or is that handled someplace else? regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-01-19 13:34:21 -0500, Tom Lane wrote: > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > On Sat, Jan 15, 2022 at 2:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > >> Maybe I'm missing something but wouldn't > >> https://commitfest.postgresql.org/36/3448/ better solve the problem? > > > The error can cause the new background process proposed there in that > > thread to restart, which is again costly. Since we have LOG-only and > > continue behavior in CheckPointSnapBuild already, having the same > > behavior for CheckPointLogicalRewriteHeap helps a lot. > > [ stares at CheckPointLogicalRewriteHeap for awhile ... ] > > This code has got more problems than that. It took me awhile to > absorb it, but we don't actually care about the contents of any of > those files; all of the information is encoded in the file *names*. I'm not following - we *do* need the contents of the files? They're applied as-needed in ApplyLogicalMappingFile(). > (This strikes me as probably not a very efficient design, compared > to putting the same data into a single text file; but for now I'll > assume we're not up for a complete rewrite.) That being the case, > I wonder what it is we expect fsync'ing the surviving files to do > exactly. We should be fsync'ing the directory not the files > themselves, no? Fsyncing the directory doesn't guarantee anything about the contents of files. But, you're right, we need an fsync of the directory too. > Other things that seem poorly thought out: > > * Why is the check for "map-" prefix after, rather than before, > the lstat? It doesn't seem to matter much - there shouldn't be a meaningful amount of other files in there. > * Why is it okay to ignore lstat failure? Seems like we might > as well not even have the lstat. Yea, that seems odd, not sure why that ended up this way. I guess the aim might have been to tolerate random files we don't have permissions for or such? > * The sscanf on the file name would not notice trailing junk, > such as an editor backup marker. Is that okay? I don't really see a problem with it - there shouldn't be other files matching the pattern - but it couldn't hurt to check the pattern matches exhaustively. > As far as the patch itself goes, I agree that failure to unlink > is noncritical, because such a file would have no further effect > and we can just ignore it. I don't agree. We iterate through the directory regularly on systems with catalog changes + logical decoding. An ever increasing list of gunk will make that more and more expensive. And I haven't heard a meaningful reason why we would have map-* files that we can't remove. Ignoring failures like this just makes problems much harder to debug and they tend to bite harder for it. > I think I also agree that failure of the sscanf is noncritical, because the > implication of that is that the file name doesn't conform to our > expectations, which means it's basically just like the check that causes us > to ignore names not starting with "map-". (Actually, isn't the separate > check for "map-" useless, given that sscanf will make the equivalent check?) Well, this way only files starting with "map-" are expected to conform to a strict format, the rest is ignored? > Anyway, I think possibly we can drop the bottom half of the loop > (the part trying to fsync non-removed files) in favor of fsync'ing > the directory afterwards. Thoughts? I don't think that'd be correct. In short: We should add a directory fsync, I'm fine with improving the error checking, but the rest seems like a net-negative with no convincing reasoning. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
On 1/19/22, 11:08 AM, "Andres Freund" <andres@anarazel.de> wrote: > On 2022-01-19 13:34:21 -0500, Tom Lane wrote: >> As far as the patch itself goes, I agree that failure to unlink >> is noncritical, because such a file would have no further effect >> and we can just ignore it. > > I don't agree. We iterate through the directory regularly on systems with > catalog changes + logical decoding. An ever increasing list of gunk will make > that more and more expensive. And I haven't heard a meaningful reason why we > would have map-* files that we can't remove. I think the other side of this is that we don't want checkpointing to continually fail because of a noncritical failure. That could also lead to problems down the road. > Ignoring failures like this just makes problems much harder to debug and they > tend to bite harder for it. If such noncritical failures happened regularly, the server logs will likely become filled with messages about it. Perhaps users may not notice for a while, but I don't think the proposed patch would make debugging excessively difficult. Nathan
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
"Bossart, Nathan" <bossartn@amazon.com> writes: > I think the other side of this is that we don't want checkpointing to > continually fail because of a noncritical failure. That could also > lead to problems down the road. Yeah, a persistent failure to complete checkpoints is very nasty. Your disk will soon fill with unrecyclable WAL. I don't see how that's better than a somewhat hypothetical performance issue. regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
I took the liberty of adjusting Bharath's patch based on the latest feedback. On 1/19/22, 10:35 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > We should be fsync'ing the directory not the files > themselves, no? I added a directory sync at the end of CheckPointLogicalRewriteHeap(), which IIUC is enough. > * Why is the check for "map-" prefix after, rather than before, > the lstat? I swapped these checks. I stopped short of moving the sscanf() before the lstat(), though, as 1) I don't think it will help very much and 2) it seemed weird to start emitting "could not parse filename" logs for non-regular files we presently skip silently. > * Why is it okay to ignore lstat failure? Seems like we might > as well not even have the lstat. I added error checking for lstat(). > * The sscanf on the file name would not notice trailing junk, > such as an editor backup marker. Is that okay? I think this is okay. The absolute worst that would happen would be that the extra file would be deleted. This might eventually become a problem if files with the same prefix format were created by the server. However, CheckPointSnapBuild() already has this problem with temporary files, and it claims not to need any extra handling: * temporary filenames from SnapBuildSerialize() include the LSN and * everything but are postfixed by .$pid.tmp. We can just remove them * the same as other files because there can be none that are * currently being written that are older than cutoff. > (Actually, isn't the > separate check for "map-" useless, given that sscanf will make > the equivalent check?) The only benefit I see from the extra "map-" check is that it'll avoid "could not parse filename" logs for files that clearly aren't related to the task at hand. I don't know if this is expected during normal operation at the moment. I've left the "map-" check for now. > I started out wondering why the patch didn't also change the loop's > other ERROR conditions to LOG. But we do want to ERROR if we're > unable to sync transient state down to disk, and that is what > the other steps (think they) are doing. It might be worth a > comment to point that out though, before someone decides that > if these errors are just LOG level then the others can be too. I added such a comment. I also updated CheckPointSnapBuild() and UpdateLogicalMappings() with similar adjustments. Nathan
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-01-20 19:15:08 +0000, Bossart, Nathan wrote: > > * Why is it okay to ignore lstat failure? Seems like we might > > as well not even have the lstat. > > I added error checking for lstat(). It seems odd to change a bunch of things to not be errors anymore, but then add new sources of errors. If we try to deal with concurrent deletions or permission issues - otherwise what's the point of making unlink() not an error anymore - why do we expect to be able to lstat()? > I also updated CheckPointSnapBuild() and UpdateLogicalMappings() with > similar adjustments. FWIW, I still think the ERROR->LOG changes are bad idea. The whole thing of "oh, let's just ignore stuff that we don't expect and soldier on" has bitten us over and over again. It makes us less robust, not more robust. It's also just about impossible to monitor for problems that emit LOG. I'd be more on board accepting some selective errors. E.g. not erroring on ENOENT, but continuing to error on others (most likely ENOACCESS). I think we should *not* change the > + /* > + * We just log a message if a file doesn't fit the pattern, it's > + * probably some editor's lock/state file or similar... > + */ An editor's lock file that starts with map- would presumably be the whole filename plus an additional file-ending. But this check won't catch those. > + * Unlike failures to unlink() old logical rewrite files, we must > + * ERROR if we're unable to sync transient state down to disk. This > + * allows replay to assume that everything written out before > + * checkpoint start is persisted. > */ Hm, not quite happy with the second bit. Checkpointed state being durable isn't about replay directly, it's about the basic property of a checkpoint being fulfilled? > pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_CHECKPOINT_SYNC); > if (pg_fsync(fd) != 0) > @@ -1282,4 +1305,7 @@ CheckPointLogicalRewriteHeap(void) > } > } > FreeDir(mappings_dir); > + > + /* persist directory entries to disk */ > + fsync_fname("pg_logical/mappings", true); > } This is probably worth backpatching, if you split it out I'll do so. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
Thanks for your feedback. On 1/20/22, 11:47 AM, "Andres Freund" <andres@anarazel.de> wrote: > It seems odd to change a bunch of things to not be errors anymore, but then > add new sources of errors. If we try to deal with concurrent deletions or > permission issues - otherwise what's the point of making unlink() not an error > anymore - why do we expect to be able to lstat()? My reasoning for making lstat() an ERROR was that there's a chance we need to fsync() the file, and if we can't fsync() a file for whatever reason, we definitely want to ERROR. I suppose we could conditionally ERROR based on the file name, but I don't know if that's really worth the complexity. > I'd be more on board accepting some selective errors. E.g. not erroring on > ENOENT, but continuing to error on others (most likely ENOACCESS). I think we > should *not* change the I think this approach would work for the use-case Bharath mentioned upthread. In any case, if deleting a file fails because the file was already deleted, there's no point in ERROR-ing. I think filtering errors is a bit trickier for lstat(). If we would've fsync'd the file but lstat() gives us ENOENT, we may have a problem. (However, there's also a good chance we wouldn't notice such problems if the race didn't occur.) I'll play around with it. >> + /* >> + * We just log a message if a file doesn't fit the pattern, it's >> + * probably some editor's lock/state file or similar... >> + */ > > An editor's lock file that starts with map- would presumably be the whole > filename plus an additional file-ending. But this check won't catch those. Right, it will either fsync() or unlink() those. I'll work on the comment. Or do you think it's worth validating that there are no trailing characters? I looked into that a bit earlier, and the code felt excessive to me, but I don't have a strong opinion here. >> + * Unlike failures to unlink() old logical rewrite files, we must >> + * ERROR if we're unable to sync transient state down to disk. This >> + * allows replay to assume that everything written out before >> + * checkpoint start is persisted. >> */ > > Hm, not quite happy with the second bit. Checkpointed state being durable > isn't about replay directly, it's about the basic property of a checkpoint > being fulfilled? I'll work on this. FWIW I modeled this based on the comment above the function. Do you think that should be adjusted as well? >> + /* persist directory entries to disk */ >> + fsync_fname("pg_logical/mappings", true); > > This is probably worth backpatching, if you split it out I'll do so. Sure thing, will do shortly. Nathan
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
"Bossart, Nathan"
Date:
On 1/20/22, 12:24 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote: > On 1/20/22, 11:47 AM, "Andres Freund" <andres@anarazel.de> wrote: >>> + /* persist directory entries to disk */ >>> + fsync_fname("pg_logical/mappings", true); >> >> This is probably worth backpatching, if you split it out I'll do so. > > Sure thing, will do shortly. Here's this part. Nathan
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
On 2022-01-20 20:41:16 +0000, Bossart, Nathan wrote: > Here's this part. And pushed to all branches. Thanks.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Fri, Jan 21, 2022 at 11:49:56AM -0800, Andres Freund wrote: > On 2022-01-20 20:41:16 +0000, Bossart, Nathan wrote: >> Here's this part. > > And pushed to all branches. Thanks. Thanks! I spent some time thinking about the right way to proceed here, and I came up with the attached patches. The first patch just adds error checking for various lstat() calls in the replication code. If lstat() fails, then it probably doesn't make sense to try to continue processing the file. The second patch changes some nearby calls to ereport() to ERROR. If these failures are truly unexpected, and we don't intend to support use-cases like concurrent manual deletion, then failing might be the right way to go. I think it's a shame that such failures could cause checkpointing to continually fail, but that topic is already being discussed elsewhere [0]. [0] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Thu, Jan 27, 2022 at 6:31 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > I spent some time thinking about the right way to proceed here, and I came > up with the attached patches. The first patch just adds error checking for > various lstat() calls in the replication code. If lstat() fails, then it > probably doesn't make sense to try to continue processing the file. > > The second patch changes some nearby calls to ereport() to ERROR. If these > failures are truly unexpected, and we don't intend to support use-cases > like concurrent manual deletion, then failing might be the right way to go. > I think it's a shame that such failures could cause checkpointing to > continually fail, but that topic is already being discussed elsewhere [0]. > > [0] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com After an off-list discussion with Andreas, proposing here a patch that basically replaces ReadDir call with ReadDirExtended and gets rid of lstat entirely. With this chance, the checkpoint will only care about the snapshot and mapping files and not fail if it finds other files in the directories. Removing lstat enables us to make things faster as we avoid a bunch of extra system calls - one lstat call per each mapping or snapshot file. This patch also includes, converting "could not parse filename" and "could not remove file" errors to LOG messages in CheckPointLogicalRewriteHeap. This will enable checkpoint not to waste the amount of work that it has done in case it is unable to parse/remove the file, for whatever reasons. Regards, Bharath Rupireddy.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > After an off-list discussion with Andreas, proposing here a patch that > basically replaces ReadDir call with ReadDirExtended and gets rid of > lstat entirely. With this chance, the checkpoint will only care about > the snapshot and mapping files and not fail if it finds other files in > the directories. Removing lstat enables us to make things faster as we > avoid a bunch of extra system calls - one lstat call per each mapping > or snapshot file. I think removing the lstat() is probably reasonable. We currently aren't doing proper error checking, and the chances of a non-regular file matching the prefix are likely pretty low. In the worst case, we'll LOG or ERROR when unlinking or fsyncing fails. However, I'm not sure about the change to ReadDirExtended(). That might be okay for CheckPointSnapBuild(), which is just trying to remove old files, but CheckPointLogicalRewriteHeap() is responsible for ensuring that files are flushed to disk for the checkpoint. If we stop reading the directory after an error and let the checkpoint continue, isn't it possible that some mappings files won't be persisted to disk? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > > After an off-list discussion with Andreas, proposing here a patch that > > basically replaces ReadDir call with ReadDirExtended and gets rid of > > lstat entirely. With this chance, the checkpoint will only care about > > the snapshot and mapping files and not fail if it finds other files in > > the directories. Removing lstat enables us to make things faster as we > > avoid a bunch of extra system calls - one lstat call per each mapping > > or snapshot file. > > I think removing the lstat() is probably reasonable. We currently aren't > doing proper error checking, and the chances of a non-regular file matching > the prefix are likely pretty low. In the worst case, we'll LOG or ERROR > when unlinking or fsyncing fails. > > However, I'm not sure about the change to ReadDirExtended(). That might be > okay for CheckPointSnapBuild(), which is just trying to remove old files, > but CheckPointLogicalRewriteHeap() is responsible for ensuring that files > are flushed to disk for the checkpoint. If we stop reading the directory > after an error and let the checkpoint continue, isn't it possible that some > mappings files won't be persisted to disk? Unless I mis-read your above statement, with LOG level in ReadDirExtended, I don't think we stop reading the files in CheckPointLogicalRewriteHeap. Am I missing something here? Since, we also continue in CheckPointLogicalRewriteHeap if we can't parse/delete some files with the change of "could not parse filename"/"could not remove file" messages to LOG level I'm attaching v6, just changed elog(LOG, to ereport(LOG in CheckPointLogicalRewriteHeap, other things remain the same. Regards, Bharath Rupireddy.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> However, I'm not sure about the change to ReadDirExtended(). That might be >> okay for CheckPointSnapBuild(), which is just trying to remove old files, >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files >> are flushed to disk for the checkpoint. If we stop reading the directory >> after an error and let the checkpoint continue, isn't it possible that some >> mappings files won't be persisted to disk? > > Unless I mis-read your above statement, with LOG level in > ReadDirExtended, I don't think we stop reading the files in > CheckPointLogicalRewriteHeap. Am I missing something here? ReadDirExtended() has the following comment: * If elevel < ERROR, returns NULL after any error. With the normal coding * pattern, this will result in falling out of the loop immediately as * though the directory contained no (more) entries. If there is a problem reading the directory, we will LOG and then exit the loop. If we didn't scan through all the entries in the directory, there is a chance that we didn't fsync() all the files that need it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> However, I'm not sure about the change to ReadDirExtended(). That might be > >> okay for CheckPointSnapBuild(), which is just trying to remove old files, > >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files > >> are flushed to disk for the checkpoint. If we stop reading the directory > >> after an error and let the checkpoint continue, isn't it possible that some > >> mappings files won't be persisted to disk? > > > > Unless I mis-read your above statement, with LOG level in > > ReadDirExtended, I don't think we stop reading the files in > > CheckPointLogicalRewriteHeap. Am I missing something here? > > ReadDirExtended() has the following comment: > > * If elevel < ERROR, returns NULL after any error. With the normal coding > * pattern, this will result in falling out of the loop immediately as > * though the directory contained no (more) entries. > > If there is a problem reading the directory, we will LOG and then exit the > loop. If we didn't scan through all the entries in the directory, there is > a chance that we didn't fsync() all the files that need it. Thanks. I get it. For syncing map files, we don't want to tolerate any errors, whereas removal of the old map files (lesser than cutoff LSN) can be tolerated in CheckPointLogicalRewriteHeap. Here's the v7 version using ReadDir for CheckPointLogicalRewriteHeap. Regards, Bharath Rupireddy.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Thu, Feb 03, 2022 at 09:45:08AM +0530, Bharath Rupireddy wrote: > On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> If there is a problem reading the directory, we will LOG and then exit the >> loop. If we didn't scan through all the entries in the directory, there is >> a chance that we didn't fsync() all the files that need it. > > Thanks. I get it. For syncing map files, we don't want to tolerate any > errors, whereas removal of the old map files (lesser than cutoff LSN) > can be tolerated in CheckPointLogicalRewriteHeap. LGTM. Andres noted upthread [0] that the comment above sscanf() about skipping editors' lock files might not be accurate. I don't think it's a huge problem if sscanf() matches those files, but perhaps we can improve the comment. [0] https://postgr.es/m/20220120194618.hmfd4kxkng2cgryh%40alap3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Fri, Feb 4, 2022 at 5:33 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > Thanks. I get it. For syncing map files, we don't want to tolerate any > > errors, whereas removal of the old map files (lesser than cutoff LSN) > > can be tolerated in CheckPointLogicalRewriteHeap. > > LGTM. Andres noted upthread [0] that the comment above sscanf() about > skipping editors' lock files might not be accurate. I don't think it's a > huge problem if sscanf() matches those files, but perhaps we can improve > the comment. > > [0] https://postgr.es/m/20220120194618.hmfd4kxkng2cgryh%40alap3.anarazel.de Andres comment from [0]: > An editor's lock file that starts with map- would presumably be the whole > filename plus an additional file-ending. But this check won't catch those. Agreed. sscanf checks can't detect the files named "whole filename plus an additional file-ending". I just checked with vi editor lock state file .0-14ED3B8.snap.swp [1], the log generated is [2]. I'm not sure exactly which editor would create a lockfile like "whole filename plus an additional file-ending". In any case, let's remove the editor's lock/state file from those comments and have just only "We just log a message if a file doesn't fit the pattern". Attached v8 patch with that change. [1] -rw------- 1 bharath bharath 12288 Feb 10 15:48 .0-14ED3B8.snap.swp -rw------- 1 bharath bharath 128 Feb 10 15:48 0-14ED518.snap -rw------- 1 bharath bharath 128 Feb 10 15:49 0-14ED518.snap.lockfile -rw------- 1 bharath bharath 128 Feb 10 15:49 0-14ED550.snap -rw------- 1 bharath bharath 128 Feb 10 15:49 0-14ED600.snap [2] 2022-02-10 15:48:47.938 UTC [1121678] LOG: could not parse file name "pg_logical/snapshots/.0-14ED3B8.snap.swp" Regards, Bharath Rupireddy.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Thu, Feb 10, 2022 at 09:30:45PM +0530, Bharath Rupireddy wrote: > In any case, let's remove the editor's lock/state file from those > comments and have just only "We just log a message if a file doesn't > fit the pattern". Attached v8 patch with that change. I've moved this one to ready-for-committer. I was under the impression that Andres was firmly against this approach, but you did mention there was an off-list discussion. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-02-10 21:30:45 +0530, Bharath Rupireddy wrote: > From 4801ff2c3b1e7bc7076205b676d4e3bc4a4ed308 Mon Sep 17 00:00:00 2001 > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> > Date: Thu, 10 Feb 2022 15:58:58 +0000 > Subject: [PATCH v8] Replace ReadDir with ReadDirExtended > > Replace ReadDir with ReadDirExtended (in CheckPointSnapBuild) and > get rid of lstat entirely. I think this might be based on a slight misunderstanding / bad phrasing on my part. We can use get_dirent_type() to optimize away the lstat on most platforms, ReadDirExtended itself doesn't do that automatically. I was trying to reference removing lstat calls by using get_dirent_type() in more places... > We still use ReadDir in CheckPointLogicalRewriteHeap > because unable to read directory would result a NULL from > ReadDirExtended and we may miss to fsync the remaining map files, > so here let's error out with ReadDir. Then why is this skipping the lstat? > Also, convert "could not parse filename" and "could not remove file" > errors to LOG messages in CheckPointLogicalRewriteHeap. This will > enable checkpoint not to waste the amount of work that it had done. I still doubt this is a good idea. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Feb 15, 2022 at 09:09:52AM -0800, Andres Freund wrote: > On 2022-02-10 21:30:45 +0530, Bharath Rupireddy wrote: >> Replace ReadDir with ReadDirExtended (in CheckPointSnapBuild) and >> get rid of lstat entirely. > > I think this might be based on a slight misunderstanding / bad phrasing on my > part. We can use get_dirent_type() to optimize away the lstat on most > platforms, ReadDirExtended itself doesn't do that automatically. I was trying > to reference removing lstat calls by using get_dirent_type() in more places... > > >> We still use ReadDir in CheckPointLogicalRewriteHeap >> because unable to read directory would result a NULL from >> ReadDirExtended and we may miss to fsync the remaining map files, >> so here let's error out with ReadDir. > > Then why is this skipping the lstat? > > >> Also, convert "could not parse filename" and "could not remove file" >> errors to LOG messages in CheckPointLogicalRewriteHeap. This will >> enable checkpoint not to waste the amount of work that it had done. > > I still doubt this is a good idea. IIUC you are advocating for something more like the attached patches. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-02-15 09:57:53 -0800, Nathan Bossart wrote: > IIUC you are advocating for something more like the attached patches. At least for patch 1 yes. Don't have the cycles just now to look at the others. I generally think it'd be a good exercise to go through an use get_dirent_type() in nearly all ReadDir() style loops - it's a nice efficiency win in general, and IIRC a particularly big one on windows. It'd probably be good to add a reference to get_dirent_type() to ReadDir[Extended]()'s docs. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: > I generally think it'd be a good exercise to go through an use > get_dirent_type() in nearly all ReadDir() style loops - it's a nice efficiency > win in general, and IIRC a particularly big one on windows. > > It'd probably be good to add a reference to get_dirent_type() to > ReadDir[Extended]()'s docs. Agreed. I might give this a try. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote: > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: >> I generally think it'd be a good exercise to go through an use >> get_dirent_type() in nearly all ReadDir() style loops - it's a nice efficiency >> win in general, and IIRC a particularly big one on windows. >> >> It'd probably be good to add a reference to get_dirent_type() to >> ReadDir[Extended]()'s docs. > > Agreed. I might give this a try. Alright, here is a new patch set where I've tried to replace as many stat()/lstat() calls as possible with get_dirent_type(). 0002 and 0003 are the same as v9. I noticed a few remaining stat()/lstat() calls that don't appear to be doing proper error checking, but I haven't had a chance to try fixing those yet. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Thomas Munro
Date:
On Wed, Feb 16, 2022 at 12:11 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Tue, Feb 15, 2022 at 10:37:58AM -0800, Nathan Bossart wrote: > > On Tue, Feb 15, 2022 at 10:10:34AM -0800, Andres Freund wrote: > >> I generally think it'd be a good exercise to go through an use > >> get_dirent_type() in nearly all ReadDir() style loops - it's a nice efficiency > >> win in general, and IIRC a particularly big one on windows. > >> > >> It'd probably be good to add a reference to get_dirent_type() to > >> ReadDir[Extended]()'s docs. > > > > Agreed. I might give this a try. > > Alright, here is a new patch set where I've tried to replace as many > stat()/lstat() calls as possible with get_dirent_type(). 0002 and 0003 are > the same as v9. I noticed a few remaining stat()/lstat() calls that don't > appear to be doing proper error checking, but I haven't had a chance to try > fixing those yet. 0001: These get_dirent_type() conversions are nice. LGTM. 0002: /* we're only handling directories here, skip if it's not ours */ - if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) + if (lstat(path, &statbuf) != 0) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not stat file \"%s\": %m", path))); + else if (!S_ISDIR(statbuf.st_mode)) return; Why is this a good place to silently ignore non-directories? StartupReorderBuffer() is already in charge of skipping random detritus found in the directory, so would it be better to do "if (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs() completely? Then perhaps its ReadDirExtended() shoud be using ERROR instead of INFO, so that missing/non-dir/b0rked directories raise an error. I don't understand why it's reporting readdir() errors at INFO but unlink() errors at ERROR, and as far as I can see the other paths that reach this code shouldn't be sending in paths to non-directories here unless something is seriously busted and that's ERROR-worthy. 0003: I haven't studied this cure vs disease thing... no opinion today.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
Thanks for taking a look! On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote: > /* we're only handling directories here, skip if it's not ours */ > - if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) > + if (lstat(path, &statbuf) != 0) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not stat file \"%s\": %m", path))); > + else if (!S_ISDIR(statbuf.st_mode)) > return; > > Why is this a good place to silently ignore non-directories? > StartupReorderBuffer() is already in charge of skipping random > detritus found in the directory, so would it be better to do "if > (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then > drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs() > completely? Then perhaps its ReadDirExtended() shoud be using ERROR > instead of INFO, so that missing/non-dir/b0rked directories raise an > error. My guess is that this was done because ReorderBufferCleanupSerializedTXNs() is also called from ReorderBufferAllocate() and ReorderBufferFree(). However, it is odd that we just silently return if the slot path isn't a directory in those cases. I think we could use get_dirent_type() in StartupReorderBuffer() as you suggested, and then we could let ReadDir() ERROR for non-directories for the other callers of ReorderBufferCleanupSerializedTXNs(). WDYT? > I don't understand why it's reporting readdir() errors at INFO > but unlink() errors at ERROR, and as far as I can see the other paths > that reach this code shouldn't be sending in paths to non-directories > here unless something is seriously busted and that's ERROR-worthy. I agree. I'll switch it to ReadDir() in the next revision so that we ERROR instead of INFO. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Mar 29, 2022 at 03:48:32PM -0700, Nathan Bossart wrote: > On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote: >> /* we're only handling directories here, skip if it's not ours */ >> - if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) >> + if (lstat(path, &statbuf) != 0) >> + ereport(ERROR, >> + (errcode_for_file_access(), >> + errmsg("could not stat file \"%s\": %m", path))); >> + else if (!S_ISDIR(statbuf.st_mode)) >> return; >> >> Why is this a good place to silently ignore non-directories? >> StartupReorderBuffer() is already in charge of skipping random >> detritus found in the directory, so would it be better to do "if >> (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then >> drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs() >> completely? Then perhaps its ReadDirExtended() shoud be using ERROR >> instead of INFO, so that missing/non-dir/b0rked directories raise an >> error. > > My guess is that this was done because ReorderBufferCleanupSerializedTXNs() > is also called from ReorderBufferAllocate() and ReorderBufferFree(). > However, it is odd that we just silently return if the slot path isn't a > directory in those cases. I think we could use get_dirent_type() in > StartupReorderBuffer() as you suggested, and then we could let ReadDir() > ERROR for non-directories for the other callers of > ReorderBufferCleanupSerializedTXNs(). WDYT? > >> I don't understand why it's reporting readdir() errors at INFO >> but unlink() errors at ERROR, and as far as I can see the other paths >> that reach this code shouldn't be sending in paths to non-directories >> here unless something is seriously busted and that's ERROR-worthy. > > I agree. I'll switch it to ReadDir() in the next revision so that we ERROR > instead of INFO. Here is an updated patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Wed, Mar 30, 2022 at 09:21:30AM -0700, Nathan Bossart wrote: > Here is an updated patch set. rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Sat, Apr 9, 2022 at 1:49 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Mar 30, 2022 at 09:21:30AM -0700, Nathan Bossart wrote: > > Here is an updated patch set. > > rebased Thanks. 0001 - there are many places where lstat/stat is being used - don't we need to replace all or most of them with get_dirent_type? 0002 - I'm not quite happy with this patch, with the change, checkpoint errors out, if it can't remove just a file - the comments there says it all. Is there any strong reason for this change? - /* - * It's not particularly harmful, though strange, if we can't - * remove the file here. Don't prevent the checkpoint from - * completing, that'd be a cure worse than the disease. - */ if (unlink(path) < 0) - { - ereport(LOG, + ereport(ERROR, (errcode_for_file_access(), errmsg("could not remove file \"%s\": %m", path))); - continue; - } Regards, Bharath Rupireddy.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Fri, Jul 08, 2022 at 09:39:10PM +0530, Bharath Rupireddy wrote: > 0001 - there are many places where lstat/stat is being used - don't we > need to replace all or most of them with get_dirent_type? It's been a while since I wrote this one, but I believe my intent was to replace as many [l]stat() calls in ReadDir()-style loops as possible with get_dirent_type(). Are there any that I've missed? > 0002 - I'm not quite happy with this patch, with the change, > checkpoint errors out, if it can't remove just a file - the comments > there says it all. Is there any strong reason for this change? Andres noted several concerns upthread. In short, ignoring unexpected errors makes them harder to debug and likely masks bugs. FWIW I agree that it is unfortunate that a relatively non-critical error here leads to checkpoint failures, which can cause much worse problems down the road. I think this is one of the reasons for moving tasks like this out of the checkpointer, as I'm trying to do with the proposed custodian process [0]. [0] https://commitfest.postgresql.org/38/3448/ -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote: > @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir, > > join_path_components(filename, directory, de->d_name); > canonicalize_path(filename); > - if (stat(filename, &st) == 0) > + de_type = get_dirent_type(filename, de, true, elevel); > + if (de_type == PGFILETYPE_ERROR) > { > - if (!S_ISDIR(st.st_mode)) > - { > - /* Add file to array, increasing its size in blocks of 32 */ > - if (num_filenames >= size_filenames) > - { > - size_filenames += 32; > - filenames = (char **) repalloc(filenames, > - size_filenames * sizeof(char *)); > - } > - filenames[num_filenames] = pstrdup(filename); > - num_filenames++; > - } > - } > - else > - { > - /* > - * stat does not care about permissions, so the most likely reason > - * a file can't be accessed now is if it was removed between the > - * directory listing and now. > - */ > - ereport(elevel, > - (errcode_for_file_access(), > - errmsg("could not stat file \"%s\": %m", > - filename))); > record_config_file_error(psprintf("could not stat file \"%s\"", > filename), > calling_file, calling_lineno, > @@ -1068,6 +1046,18 @@ ParseConfigDirectory(const char *includedir, > status = false; > goto cleanup; > } > + else if (de_type != PGFILETYPE_DIR) > + { > + /* Add file to array, increasing its size in blocks of 32 */ > + if (num_filenames >= size_filenames) > + { > + size_filenames += 32; > + filenames = (char **) repalloc(filenames, > + size_filenames * sizeof(char *)); > + } > + filenames[num_filenames] = pstrdup(filename); > + num_filenames++; > + } > } > > if (num_filenames > 0) Seems like the diff would be easier to read if it didn't move code around as much? Looks pretty reasonable, I'd be happy to commit it, I think. Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Mon, Jul 11, 2022 at 01:25:33PM -0700, Andres Freund wrote: > On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote: >> @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir, >> >> join_path_components(filename, directory, de->d_name); >> canonicalize_path(filename); >> - if (stat(filename, &st) == 0) >> + de_type = get_dirent_type(filename, de, true, elevel); >> + if (de_type == PGFILETYPE_ERROR) >> { >> - if (!S_ISDIR(st.st_mode)) >> - { >> - /* Add file to array, increasing its size in blocks of 32 */ >> - if (num_filenames >= size_filenames) >> - { >> - size_filenames += 32; >> - filenames = (char **) repalloc(filenames, >> - size_filenames * sizeof(char *)); >> - } >> - filenames[num_filenames] = pstrdup(filename); >> - num_filenames++; >> - } >> - } >> - else >> - { >> - /* >> - * stat does not care about permissions, so the most likely reason >> - * a file can't be accessed now is if it was removed between the >> - * directory listing and now. >> - */ >> - ereport(elevel, >> - (errcode_for_file_access(), >> - errmsg("could not stat file \"%s\": %m", >> - filename))); >> record_config_file_error(psprintf("could not stat file \"%s\"", >> filename), >> calling_file, calling_lineno, >> @@ -1068,6 +1046,18 @@ ParseConfigDirectory(const char *includedir, >> status = false; >> goto cleanup; >> } >> + else if (de_type != PGFILETYPE_DIR) >> + { >> + /* Add file to array, increasing its size in blocks of 32 */ >> + if (num_filenames >= size_filenames) >> + { >> + size_filenames += 32; >> + filenames = (char **) repalloc(filenames, >> + size_filenames * sizeof(char *)); >> + } >> + filenames[num_filenames] = pstrdup(filename); >> + num_filenames++; >> + } >> } >> >> if (num_filenames > 0) > > Seems like the diff would be easier to read if it didn't move code around as > much? I opted to reorganize in order save an indent and simplify the conditions a bit. The alternative is something like this: if (de_type != PGFILETYPE_ERROR) { if (de_type != PGTFILETYPE_DIR) { ... } } else { ... } I don't feel strongly one way or another, and I'll change it if you think it's worth it to simplify the diff. > Looks pretty reasonable, I'd be happy to commit it, I think. Appreciate it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Fri, Jul 8, 2022 at 10:44 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > 0002 - I'm not quite happy with this patch, with the change, > > checkpoint errors out, if it can't remove just a file - the comments > > there says it all. Is there any strong reason for this change? > > Andres noted several concerns upthread. In short, ignoring unexpected > errors makes them harder to debug and likely masks bugs. > > FWIW I agree that it is unfortunate that a relatively non-critical error > here leads to checkpoint failures, which can cause much worse problems down > the road. I think this is one of the reasons for moving tasks like this > out of the checkpointer, as I'm trying to do with the proposed custodian > process [0]. > > [0] https://commitfest.postgresql.org/38/3448/ IMHO, we can keep it as-is and if required can be changed as part of the patch set [0], as this change without [0] can cause a checkpoint to fail. Furthermore, I would like it if we convert "could not parse filename" and "could not remove file" ERRORs of CheckPointLogicalRewriteHeap to LOGs until [0] gets in - others may have different opinions though. Just wondering - do we ever have a problem if we can't remove the snapshot or mapping file? May be unrelated, RemoveTempXlogFiles doesn't even bother to check if the temp wal file is removed. Regards, Bharath Rupireddy.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Mon, Jul 18, 2022 at 04:53:18PM +0530, Bharath Rupireddy wrote: > Just wondering - do we ever have a problem if we can't remove the > snapshot or mapping file? Besides running out of disk space, there appears to be a transaction ID wraparound risk with the mappings files. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Thomas Munro
Date:
Working on get_dirent_type() reminded me of this thread. I was going to commit the first of these patches, but then I noticed Andres said he was planning to, so I'll wait another day. Here they are, with commit messages but otherwise unchanged from Nathan's v12 except for a slight comment tweak: - /* we're only handling directories here, skip if it's not ours */ + /* we're only handling directories here, skip if it's not one */ The only hunk I'm having second thoughts about is the following, which makes unexpected stray files break checkpoints: - * We just log a message if a file doesn't fit the pattern, it's - * probably some editors lock/state file or similar... */ if (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2) - { - ereport(LOG, + ereport(ERROR, (errmsg("could not parse file name \"%s\"", path))); Bharath mentioned other places that loop over stat(), but I think those are places that want stuff we don't already have, like st_size.
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes: > The only hunk I'm having second thoughts about is the following, which > makes unexpected stray files break checkpoints: Sounds like a pretty bad idea. What's the upside? regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> The only hunk I'm having second thoughts about is the following, which >> makes unexpected stray files break checkpoints: > Sounds like a pretty bad idea. What's the upside? Actually, having now read the patch, I don't think there is any part of 0002 that is a good idea. It's blithely removing the comments that explain why the existing coding is the way it is, and not providing a shred of justification for making checkpoints more brittle. I have not tried to analyze the error-handling properties of 0001, but if it's being equally cavalier then it shouldn't be committed either. Most of this behavior is the result of decades of hard-won experience; discarding it because it doesn't fit conveniently into some refactoring plan isn't smart. regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Thomas Munro
Date:
On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Actually, having now read the patch, I don't think there is any > part of 0002 that is a good idea. It's blithely removing the > comments that explain why the existing coding is the way it is, > and not providing a shred of justification for making checkpoints > more brittle. 0002 also contradicts the original $SUBJECT and goal of this thread, which is possibly why it was kept separate. I was only thinking of committing 0001 myself, which is the one I'd reviewed an earlier version of. > I have not tried to analyze the error-handling properties of 0001, > but if it's being equally cavalier then it shouldn't be committed > either. Most of this behavior is the result of decades of hard-won > experience; discarding it because it doesn't fit conveniently > into some refactoring plan isn't smart. 0001 does introduce new errors, as mentioned in the commit message, in the form of elevel ERROR passed into get_dirent_type(), which might be thrown if your OS has no d_type and lstat() fails (also if you asked to follow symlinks, but in those cases errors were already thrown). But in those cases, it seems at least a little fishy that we ignored errors from the existing lstat(). I wondered if that was because they expected that any failure meant ENOENT and they wanted to tolerate that, but that does not seem to be the case, so I considered the error to be an improvement.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I have not tried to analyze the error-handling properties of 0001, >> but if it's being equally cavalier then it shouldn't be committed >> either. > 0001 does introduce new errors, as mentioned in the commit message, in > the form of elevel ERROR passed into get_dirent_type(), which might be > thrown if your OS has no d_type and lstat() fails (also if you asked > to follow symlinks, but in those cases errors were already thrown). > But in those cases, it seems at least a little fishy that we ignored > errors from the existing lstat(). Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. But should the replacement behavior be elog-LOG-and-press-on, or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any hurry to believe that the latter is more appropriate without some analysis of what the callers are doing. The bottom line here is that I'm distrustful of behavioral changes introduced to simplify refactoring rather than to solve a live problem. regards, tom lane
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > On Tue, Aug 9, 2022 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I have not tried to analyze the error-handling properties of 0001, > >> but if it's being equally cavalier then it shouldn't be committed > >> either. > > > 0001 does introduce new errors, as mentioned in the commit message, in > > the form of elevel ERROR passed into get_dirent_type(), which might be > > thrown if your OS has no d_type and lstat() fails (also if you asked > > to follow symlinks, but in those cases errors were already thrown). > > But in those cases, it seems at least a little fishy that we ignored > > errors from the existing lstat(). > > Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. > But should the replacement behavior be elog-LOG-and-press-on, > or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any > hurry to believe that the latter is more appropriate without some > analysis of what the callers are doing. > > The bottom line here is that I'm distrustful of behavioral changes > introduced to simplify refactoring rather than to solve a live > problem. +1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the checkpoint operation. Because the checkpoint is more important than why a single snapshot file (out thousands or even million files) isn't removed at that moment. Also, I originally proposed to change elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink() failures for the same reason. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
On 2022-08-09 15:10:41 +1200, Thomas Munro wrote: > I was going to commit the first of these patches, but then I noticed Andres > said he was planning to, so I'll wait another day. I'd be happy for you to take this on...
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Aug 09, 2022 at 09:29:16AM +0530, Bharath Rupireddy wrote: > On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. >> But should the replacement behavior be elog-LOG-and-press-on, >> or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any >> hurry to believe that the latter is more appropriate without some >> analysis of what the callers are doing. >> >> The bottom line here is that I'm distrustful of behavioral changes >> introduced to simplify refactoring rather than to solve a live >> problem. > > +1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the > checkpoint operation. Because the checkpoint is more important than > why a single snapshot file (out thousands or even million files) isn't > removed at that moment. Also, I originally proposed to change > elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink() > failures for the same reason. This was my initial instinct as well, but this thread has received contradictory feedback during the months since. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Thomas Munro
Date:
On Tue, Aug 9, 2022 at 4:28 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Tue, Aug 09, 2022 at 09:29:16AM +0530, Bharath Rupireddy wrote: > > On Tue, Aug 9, 2022 at 9:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Hmmm ... I'll grant that ignoring lstat errors altogether isn't great. > >> But should the replacement behavior be elog-LOG-and-press-on, > >> or elog-ERROR-and-fail-the-surrounding-operation? I'm not in any > >> hurry to believe that the latter is more appropriate without some > >> analysis of what the callers are doing. > >> > >> The bottom line here is that I'm distrustful of behavioral changes > >> introduced to simplify refactoring rather than to solve a live > >> problem. > > > > +1. I agree with Tom not to change elog-LOG to elog-ERROR and fail the > > checkpoint operation. Because the checkpoint is more important than The changes were not from elog-LOG to elog-ERROR, they were changes from silently eating errors, but I take your (plural) general point. > > why a single snapshot file (out thousands or even million files) isn't > > removed at that moment. Also, I originally proposed to change > > elog-ERROR to elog-LOG in CheckPointLogicalRewriteHeap for unlink() > > failures for the same reason. > > This was my initial instinct as well, but this thread has received > contradictory feedback during the months since. OK, so there aren't many places in 0001 that error out where we previously did not. For the one in CheckPointLogicalRewriteHeap(), if it fails, you don't just want to skip -- it might be in the range that we need to fsync(). So what if we just deleted that get_dirent_type() != PGFILETYPE_REG check altogether? Depending on the name range check, we'd either attempt to unlink() and LOG if that fails, or we'd attempt to open()-or-ERROR and then fsync()-or-PANIC. What functionality have we lost without the DT_REG check? Well, now if someone ill-advisedly puts an important character device, socket, symlink (ie any kind of non-DT_REG) into that directory with a name conforming to the unlinkable range, we'll try to unlink it and possibly succeed, where previously we'd have skipped, and if it's in the checkpointable range, we'd try to fsync() it and likely fail, which is good because this is a supposed to be a checkpoint. That's already what happens if lstat() fails in master. We fall back to treating it like a DT_REG anyway: if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) continue; For the one in CheckSnapBuild(), it's similar but unlink only. For the one in StartupReplicationSlots(), you could possibly take the same line: if it's not a directory, we'll try an rmdir() it anyway and then emit a WARNING if that fails. Again, that's exactly what master would do if that lstat() failed. In other words, if we're going to LOG and press on, maybe we should just press on anyway. Would the error message be any less informative? Would the risk of unlinking a character device that you accidentally stored in /clusters/pg16-main/pgdata/pg_logical/mappings/map-1234-5678 be a problem for anyone? Separately from that topic, it would be nice to have a comment in each place where we tolerate unlink() failure to explain why that is harmless except for wasted disk.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Tue, Aug 09, 2022 at 05:26:39PM +1200, Thomas Munro wrote: > The changes were not from elog-LOG to elog-ERROR, they were changes > from silently eating errors, but I take your (plural) general point. I think this was in reference to 0002. My comment was, at least. > OK, so there aren't many places in 0001 that error out where we > previously did not. > > For the one in CheckPointLogicalRewriteHeap(), if it fails, you don't > just want to skip -- it might be in the range that we need to fsync(). > So what if we just deleted that get_dirent_type() != PGFILETYPE_REG > check altogether? Depending on the name range check, we'd either > attempt to unlink() and LOG if that fails, or we'd attempt to > open()-or-ERROR and then fsync()-or-PANIC. It looks like CheckPointLogicalRewriteHeap() presently ERRORs when unlink() fails. However, CheckPointSnapBuild() only LOGs when unlink() fails. Since mappings files contain transaction IDs, persistent unlink() failures might lead to wraparound, but I don't think failing checkpoints will help improve matters. Bharath's original patch changed CheckPointLogicalRewriteHeap() to LOG on sscanf() and unlink() failures. IIUC things would work the way you suggest if that was applied. > What functionality have we > lost without the DT_REG check? Well, now if someone ill-advisedly > puts an important character device, socket, symlink (ie any kind of > non-DT_REG) into that directory with a name conforming to the > unlinkable range, we'll try to unlink it and possibly succeed, where > previously we'd have skipped, and if it's in the checkpointable range, > we'd try to fsync() it and likely fail, which is good because this is > a supposed to be a checkpoint. I don't know that I agree that the last part is good. Presumably we don't want checkpointing to fail forever because there's a random non-regular file that can't be fsync'd. But it's not clear why such files would exist in the first place. Presumably this isn't the sort of thing that Postgres should be expected to support. > In other words, if we're going to LOG and press on, maybe we should > just press on anyway. Would the error message be any less > informative? Would the risk of unlinking a character device that you > accidentally stored in > /clusters/pg16-main/pgdata/pg_logical/mappings/map-1234-5678 be a > problem for anyone? Probably not. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Tue, Aug 9, 2022 at 10:57 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > OK, so there aren't many places in 0001 that error out where we > previously did not. Well, that's not true I believe. The 0001 patch introduces get_dirent_type() with elevel ERROR which means that lstat failures are now reported as ERRORs with ereport(ERROR, as opposed to continuing on the HEAD. - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) + if (get_dirent_type(path, mapping_de, false, ERROR) != PGFILETYPE_REG) continue; - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) + if (get_dirent_type(path, snap_de, false, ERROR) != PGFILETYPE_REG) { so on. The main idea of using get_dirent_type() instead of lstat or stat is to benefit from avoiding system calls on platforms where the directory entry type is stored in dirent structure, but not to cause errors for lstat or stat system calls failures where we previously didn't. I think 0001 must be just replacing lstat or stat with get_dirent_type() with elevel ERROR if lstat or stat failure is treated as ERROR previously, otherwise with elevel LOG. I modified it as attached. Once this patch gets committed, then we can continue the discussion as to whether we make elog-LOG to elog-ERROR or vice-versa. I'm tempted to use get_dirent_type() in RemoveXlogFile() but that requires us to pass struct dirent * from RemoveOldXlogFiles() (as attached 0002 for now), If done, this avoids one lstat() system call per WAL file. IMO, that's okay to pass an extra function parameter to RemoveXlogFile() as it is a static function and there can be many (thousands of) WAL files at times, so saving system calls (lstat() * number of WAL files) is definitely an advantage. Thoughts? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Thomas Munro
Date:
On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > The main idea of using get_dirent_type() instead of lstat or stat is > to benefit from avoiding system calls on platforms where the directory > entry type is stored in dirent structure, but not to cause errors for > lstat or stat system calls failures where we previously didn't. Yeah, I get it... I'm OK with separating mechanical changes like lstat() -> get_dirent_type() from policy changes on error handling. I initially thought the ERROR was surely better than what we're doing now (making extra system calls to avoid unlinking unexpected things, for which our only strategy on failure is to try to unlink the thing anyway), but it's better to discuss that separately. > I > think 0001 must be just replacing lstat or stat with get_dirent_type() > with elevel ERROR if lstat or stat failure is treated as ERROR > previously, otherwise with elevel LOG. I modified it as attached. Once > this patch gets committed, then we can continue the discussion as to > whether we make elog-LOG to elog-ERROR or vice-versa. Agreed, will look at this version tomorrow. > I'm tempted to use get_dirent_type() in RemoveXlogFile() but that > requires us to pass struct dirent * from RemoveOldXlogFiles() (as > attached 0002 for now), If done, this avoids one lstat() system call > per WAL file. IMO, that's okay to pass an extra function parameter to > RemoveXlogFile() as it is a static function and there can be many > (thousands of) WAL files at times, so saving system calls (lstat() * > number of WAL files) is definitely an advantage. - lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) && + get_dirent_type(path, xlde, false, LOG) != PGFILETYPE_REG && I think you wanted ==, but maybe it'd be nicer to pass in a "recyclable" boolean to RemoveXlogFile()? If you're looking for more, rmtree() looks like another.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Wed, Aug 10, 2022 at 2:11 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Aug 10, 2022 at 7:15 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > The main idea of using get_dirent_type() instead of lstat or stat is > > to benefit from avoiding system calls on platforms where the directory > > entry type is stored in dirent structure, but not to cause errors for > > lstat or stat system calls failures where we previously didn't. > > Yeah, I get it... I'm OK with separating mechanical changes like > lstat() -> get_dirent_type() from policy changes on error handling. I > initially thought the ERROR was surely better than what we're doing > now (making extra system calls to avoid unlinking unexpected things, > for which our only strategy on failure is to try to unlink the thing > anyway), but it's better to discuss that separately. > > > I > > think 0001 must be just replacing lstat or stat with get_dirent_type() > > with elevel ERROR if lstat or stat failure is treated as ERROR > > previously, otherwise with elevel LOG. I modified it as attached. Once > > this patch gets committed, then we can continue the discussion as to > > whether we make elog-LOG to elog-ERROR or vice-versa. > > Agreed, will look at this version tomorrow. Thanks. > > I'm tempted to use get_dirent_type() in RemoveXlogFile() but that > > requires us to pass struct dirent * from RemoveOldXlogFiles() (as > > attached 0002 for now), If done, this avoids one lstat() system call > > per WAL file. IMO, that's okay to pass an extra function parameter to > > RemoveXlogFile() as it is a static function and there can be many > > (thousands of) WAL files at times, so saving system calls (lstat() * > > number of WAL files) is definitely an advantage. > > - lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) && > + get_dirent_type(path, xlde, false, LOG) != PGFILETYPE_REG && > > I think you wanted ==, but maybe it'd be nicer to pass in a > "recyclable" boolean to RemoveXlogFile()? Ah, my bad, I corrected it now. If you mean to do "bool recyclable = (get_dirent_type(path, xlde, false, LOG) == PGFILETYPE_REG)"" and pass it as parameter to RemoveXlogFile(), then we have to do get_dirent_type calls for every WAL file even when any of (wal_recycle && *endlogSegNo <= recycleSegNo && XLogCtl->InstallXLogFileSegmentActive) is false which is unnecessary and it's better to pass dirent structure as a parameter. > If you're looking for more, rmtree() looks like another. Are you suggesting to not use pgfnames() in rmtree() and do opendir()-readdir()-get_dirent_type() directly? If yes, I don't think it's a great idea because rmtree() has recursive calls and holding opendir()-readdir() for all rmtree() recursive calls without closedir() may eat up dynamic memory if there are deeper level directories. I would like to leave it that way. Do you have any other ideas in mind? Please find the v15 patch, I merged the RemoveXlogFile() changes into 0001. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote: > snprintf(path, sizeof(path), "pg_logical/mappings/%s", mapping_de->d_name); > - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) > + if (get_dirent_type(path, mapping_de, false, LOG) != PGFILETYPE_REG) > continue; Previously, failure to lstat() wouldn't lead to skipping the entry. With this patch, a failure to determine the file type will cause the entry to be skipped. This might be okay in some places (e.g., CheckPointSnapBuild()) but not in others. For example, in CheckPointLogicalRewriteHeap(), this could cause us to skip fsync-ing a file due to a get_dirent_type() failure, which seems bad. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Wed, Aug 17, 2022 at 2:02 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote: > > snprintf(path, sizeof(path), "pg_logical/mappings/%s", mapping_de->d_name); > > - if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode)) > > + if (get_dirent_type(path, mapping_de, false, LOG) != PGFILETYPE_REG) > > continue; > > Previously, failure to lstat() wouldn't lead to skipping the entry. With > this patch, a failure to determine the file type will cause the entry to be > skipped. This might be okay in some places (e.g., CheckPointSnapBuild()) > but not in others. For example, in CheckPointLogicalRewriteHeap(), this > could cause us to skip fsync-ing a file due to a get_dirent_type() failure, > which seems bad. Hm. I corrected it in the v16 patch, please review. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Wed, Aug 17, 2022 at 12:39:26PM +0530, Bharath Rupireddy wrote: > + /* > + * We're only handling directories here, skip if it's not ours. Also, skip > + * if the caller has already performed this check. > + */ > + if (!slot_dir_checked && > + lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) > return; There was previous discussion about removing this stanza from ReorderBufferCleanupSeralizedTXNs() completely [0]. If that is still possible, I think I would choose that over having callers specify whether to do the directory check. > + /* we're only handling directories here, skip if it's not one */ > + sprintf(path, "pg_replslot/%s", logical_de->d_name); > + if (get_dirent_type(path, logical_de, false, LOG) != PGFILETYPE_DIR) > + continue; IIUC an error in get_dirent_type() could cause slots to be skipped here, which is a behavior change. [0] https://postgr.es/m/20220329224832.GA560657%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Wed, Aug 17, 2022 at 12:39:26PM +0530, Bharath Rupireddy wrote: > > + /* > > + * We're only handling directories here, skip if it's not ours. Also, skip > > + * if the caller has already performed this check. > > + */ > > + if (!slot_dir_checked && > > + lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) > > return; > > There was previous discussion about removing this stanza from > ReorderBufferCleanupSeralizedTXNs() completely [0]. If that is still > possible, I think I would choose that over having callers specify whether > to do the directory check. > > [0] https://postgr.es/m/20220329224832.GA560657%40nathanxps13 From [0]: > My guess is that this was done because ReorderBufferCleanupSerializedTXNs() > is also called from ReorderBufferAllocate() and ReorderBufferFree(). > However, it is odd that we just silently return if the slot path isn't a > directory in those cases. I think we could use get_dirent_type() in > StartupReorderBuffer() as you suggested, and then we could let ReadDir() > ERROR for non-directories for the other callers of > ReorderBufferCleanupSerializedTXNs(). WDYT? Firstly, removing lstat() completely from ReorderBufferCleanupSerializedTXNs() is a behavioural change. ReorderBufferCleanupSerializedTXNs() returning silently to callers ReorderBufferAllocate() and ReorderBufferFree(), when the slot path isn't a directory completely makes sense to me because the callers are trying to clean something and if it doesn't exist that's okay, they can go ahead. Also, the usage of the ReorderBufferAllocate() and ReorderBufferFree() is pretty wide and I don't want to risk the new behaviour. > > + /* we're only handling directories here, skip if it's not one */ > > + sprintf(path, "pg_replslot/%s", logical_de->d_name); > > + if (get_dirent_type(path, logical_de, false, LOG) != PGFILETYPE_DIR) > > + continue; > > IIUC an error in get_dirent_type() could cause slots to be skipped here, > which is a behavior change. That behaviour hasn't changed, no? Currently, if lstat() fails in ReorderBufferCleanupSerializedTXNs() it returns to StartupReorderBuffer()'s while loop which is in a way continuing with the other slots, this patch does nothing new. So, no new patch, please find the latest v16 patch at [1]. [1] https://www.postgresql.org/message-id/CALj2ACXZPMYXrPZ%2BCUE0_5DKDnxyqDGRftSgPPx--PWhWB3RtQ%40mail.gmail.com -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Michael Paquier
Date:
On Thu, Aug 25, 2022 at 10:48:08AM +0530, Bharath Rupireddy wrote: > On Tue, Aug 23, 2022 at 11:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> IIUC an error in get_dirent_type() could cause slots to be skipped here, >> which is a behavior change. > > That behaviour hasn't changed, no? Currently, if lstat() fails in > ReorderBufferCleanupSerializedTXNs() it returns to > StartupReorderBuffer()'s while loop which is in a way continuing with > the other slots, this patch does nothing new. Are you sure? FWIW, the changes in reorderbuffer.c for ReorderBufferCleanupSerializedTXNs() reduce the code readability, in my opinion, so that's one less argument in favor of this change. The gain in ParseConfigDirectory() is kind of cool. pg_tzenumerate_next(), copydir(), RemoveXlogFile() StartupReplicationSlots(), CheckPointLogicalRewriteHeap() and RemovePgTempFilesInDir() seem fine, as well. At least these avoid extra lstat() calls when the file type is unknown, which would be only a limited number of users where some of the three DT_* are missing (right?). -- Michael
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Bharath Rupireddy
Date:
On Thu, Aug 25, 2022 at 5:51 PM Michael Paquier <michael@paquier.xyz> wrote: > > FWIW, the changes in reorderbuffer.c for > ReorderBufferCleanupSerializedTXNs() reduce the code readability, in > my opinion, so that's one less argument in favor of this change. Agreed. I reverted the changes. > The gain in ParseConfigDirectory() is kind of cool. > pg_tzenumerate_next(), copydir(), RemoveXlogFile() > StartupReplicationSlots(), CheckPointLogicalRewriteHeap() and > RemovePgTempFilesInDir() seem fine, as well. At least these avoid > extra lstat() calls when the file type is unknown, which would be only > a limited number of users where some of the three DT_* are missing > (right?). Yes, the idea is to avoid lstat() system calls when possible. PSA v17 patch with reorderbuffer.c changes reverted. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Sat, Aug 27, 2022 at 02:06:32PM +0530, Bharath Rupireddy wrote: > PSA v17 patch with reorderbuffer.c changes reverted. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Michael Paquier
Date:
On Sun, Aug 28, 2022 at 02:52:43PM -0700, Nathan Bossart wrote: > On Sat, Aug 27, 2022 at 02:06:32PM +0530, Bharath Rupireddy wrote: >> PSA v17 patch with reorderbuffer.c changes reverted. > > LGTM Yeah, what you have here looks pretty fine to me, so this is IMO committable. Let's wait a bit to see if there are any objections from the others, in case. -- Michael
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: > @@ -50,7 +51,7 @@ copydir(char *fromdir, char *todir, bool recurse) > > while ((xlde = ReadDir(xldir, fromdir)) != NULL) > { > - struct stat fst; > + PGFileType xlde_type; > > /* If we got a cancel signal during the copy of the directory, quit */ > CHECK_FOR_INTERRUPTS(); > @@ -62,18 +63,15 @@ copydir(char *fromdir, char *todir, bool recurse) > snprintf(fromfile, sizeof(fromfile), "%s/%s", fromdir, xlde->d_name); > snprintf(tofile, sizeof(tofile), "%s/%s", todir, xlde->d_name); > > - if (lstat(fromfile, &fst) < 0) > - ereport(ERROR, > - (errcode_for_file_access(), > - errmsg("could not stat file \"%s\": %m", fromfile))); > + xlde_type = get_dirent_type(fromfile, xlde, false, ERROR); > > - if (S_ISDIR(fst.st_mode)) > + if (xlde_type == PGFILETYPE_DIR) > { > /* recurse to handle subdirectories */ > if (recurse) > copydir(fromfile, tofile, true); > } > - else if (S_ISREG(fst.st_mode)) > + else if (xlde_type == PGFILETYPE_REG) > copy_file(fromfile, tofile); > } > FreeDir(xldir); It continues to make no sense to me to add behaviour changes around error-handling as part of a conversion to get_dirent_type(). I don't at all understand why e.g. the above change to make copydir() silently skip over files it can't stat is ok? Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote: > On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: >> - if (lstat(fromfile, &fst) < 0) >> - ereport(ERROR, >> - (errcode_for_file_access(), >> - errmsg("could not stat file \"%s\": %m", fromfile))); >> + xlde_type = get_dirent_type(fromfile, xlde, false, ERROR); >> >> - if (S_ISDIR(fst.st_mode)) >> + if (xlde_type == PGFILETYPE_DIR) >> { >> /* recurse to handle subdirectories */ >> if (recurse) >> copydir(fromfile, tofile, true); >> } >> - else if (S_ISREG(fst.st_mode)) >> + else if (xlde_type == PGFILETYPE_REG) >> copy_file(fromfile, tofile); >> } >> FreeDir(xldir); > > It continues to make no sense to me to add behaviour changes around > error-handling as part of a conversion to get_dirent_type(). I don't at all > understand why e.g. the above change to make copydir() silently skip over > files it can't stat is ok? In this example, the call to get_dirent_type() should ERROR if the call to lstat() fails (the "elevel" argument is set to ERROR). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Andres Freund
Date:
Hi, On 2022-08-31 12:24:28 -0700, Nathan Bossart wrote: > On Wed, Aug 31, 2022 at 12:14:33PM -0700, Andres Freund wrote: > > On 2022-08-27 14:06:32 +0530, Bharath Rupireddy wrote: > > It continues to make no sense to me to add behaviour changes around > > error-handling as part of a conversion to get_dirent_type(). I don't at all > > understand why e.g. the above change to make copydir() silently skip over > > files it can't stat is ok? > > In this example, the call to get_dirent_type() should ERROR if the call to > lstat() fails (the "elevel" argument is set to ERROR). Oh, oops. Skimmed code too quickly... Greetings, Andres Freund
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Michael Paquier
Date:
On Wed, Aug 31, 2022 at 04:10:59PM +0900, Michael Paquier wrote: > Yeah, what you have here looks pretty fine to me, so this is IMO > committable. Let's wait a bit to see if there are any objections from > the others, in case. I had a few hours and I've spent them looking at what you had here in details, and there were a few things I have tweaked before applying the patch. First, elevel was set to LOG for three calls of get_dirent_type() in code paths where we want to skip entries. This would have become very noisy, so I've switched two of them to DEBUG1 and the third one to DEBUG2 for consistency with the surroundings. A second thing was RemoveXlogFile(), where we passed down a dirent entry *and* its d_name. With this design, it would be easy to mess up things and pass down a file name that does not match with its dirent entry, so I have finished by replacing "segname" by the dirent structure. A last thing was about the two extra comment blocks in fd.c, and I could not convince myself that these were helpful hints so I have removed both of them. -- Michael
Attachment
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Nathan Bossart
Date:
On Fri, Sep 02, 2022 at 05:09:14PM +0900, Michael Paquier wrote: > I had a few hours and I've spent them looking at what you had here in > details, and there were a few things I have tweaked before applying > the patch. Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Ibrar Ahmed
Date:
On Sat, Sep 3, 2022 at 3:09 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Sep 02, 2022 at 05:09:14PM +0900, Michael Paquier wrote:
> I had a few hours and I've spent them looking at what you had here in
> details, and there were a few things I have tweaked before applying
> the patch.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Hi,
Your patch require a rebase. Please provide the latest rebase patch.
=== applying patch ./v17-0001-Make-more-use-of-get_dirent_type-in-place-of-sta.patch patching file src/backend/access/heap/rewriteheap.c Hunk #1 FAILED at 113. Hunk #2 FAILED at 1213. Hunk #3 FAILED at 1221. 3 out of 3 hunks FAILED -- saving rejects to file src/backend/access/heap/rewriteheap.c.rej
Ibrar Ahmed
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
From
Michael Paquier
Date:
On Sat, Sep 03, 2022 at 10:47:32AM +0500, Ibrar Ahmed wrote: > Your patch require a rebase. Please provide the latest rebase patch. I was looking for a CF entry yesterday when I looked at this patch, missing https://commitfest.postgresql.org/39/3496/. This has been applied as of bfb9dfd. -- Michael