Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows |
Date | |
Msg-id | ZjHMcwiltqYXsKYb@paquier.xyz Whole thread Raw |
In response to | Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-bugs |
On Tue, Apr 23, 2024 at 07:48:12PM +1200, Thomas Munro wrote: > Here is a new attempt to see what it might take to put > RelationTruncate() into a critical section. Problems encountered: > even if you've called mdnblocks() beforehand, dressed up as > smgrpreparetruncate(), if the highest segment is exactly full then the > later mdnblocks() again probes whether the next segment number exists > on disk, which involves GetRelationPath(), which allocates. So I > finished up having to write a GetRelationPathInPlace() function, and > to decide whether it's OK to use a MAXPGPATH-sized array on the stack > for this. I also had to teach _fdvec_resize() not to reallocate when > downsizing, to avoid the critical section assertion. It seems like > quite a lot to back-patch... but also awful to leave this trickle of > data corruption reports unaddressed. The big re-engineering ideas[1] > would be absolutely unbackpatchable, but I hope we can work on > something like that for 18... For now I'd agree with something like that, even for a backpatch. No idea what a good "cheap" solution would look like, though. Anything considered these past years for this issue was utterly expensive. > However, it didn't guarantee that the sync request for the > truncation got processed before the XLOG_CHECKPOINT_ONLINE record > was written. By setting XLOG_CHKPT_START, we guarantee that if > an XLOG_SMGR_TRUNCATE record is written to WAL before the redo > pointer of some concurrent checkpoint, the sync request queued by > that operation must be processed by that checkpoint (rather than > being left for the following one). s/XLOG_CHKPT_START/DELAY_CHKPT_START/ in the commit message of 0001. The point you are making about d8725104 seems like a good argument to do a backpatch down to v14, as the possible interrupt make this setof issues much easier to reach compared to the Windows-only-weird-syscall failures happening in the middle of the truncation that people have been complaining about for N years now. I am seeing less reports of these on WIN32 these days with files switched suddenly to read-only, or is that just my imagination? The pieces around GetRelationPathInPlace() and MAX_EXTENSION_SIZE could be a refactoring piece worth their own, split into its own patch before introducing the core part of the fix with the critical section. That would be much cleaner, IMO. It seems to me that an assertion at the end of GetRelationPathInPlace() about the computed size would be adapted based on MAXPGPATH. The interface of _mdfd_segpath() is quite confusing, actually. Why return the same pointer as the input argument rather than void? I was re-reading your comment about last October, and the fact that we may stuck replay, and finishing with semi-corrupted undetectible corruptions is worse than that, still I'm going to agree that failing harder will serve much better at this stage: https://www.postgresql.org/message-id/CA%2BhUKGJy9iCBfkjUyV8ZuRwd5CAGxZV1STywe%2B0S%2B9YKH1zF8w%40mail.gmail.com That may impact availability. Still that's cheaper than any other solution discussed. The one involving the WAL-logging of dirty pages that would get truncated shortly after comes first into mind.. PS: On HEAD, I'd like to see one or more tests for these failure scenarios in the long-term and how replay is able to react after panicking in the critical section. Now, injection points won't work inside a critical section as the initial function loading needs to do a palloc() for the library path. We've discussed about having a PRELOAD macro that could be called outside the critical path, before running a INJECTION_POINT. Just mentioning. -- Michael
Attachment
pgsql-bugs by date: