Re: Race between KeepFileRestoredFromArchive() and restartpoint - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: Race between KeepFileRestoredFromArchive() and restartpoint |
Date | |
Msg-id | da613671-86b1-8731-8cc0-d5bc1590d6f8@pgmasters.net Whole thread Raw |
In response to | Re: Race between KeepFileRestoredFromArchive() and restartpoint (Noah Misch <noah@leadboat.com>) |
Responses |
Re: Race between KeepFileRestoredFromArchive() and restartpoint
|
List | pgsql-hackers |
On 8/2/22 10:37, Noah Misch wrote: > On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote: >> On 7/31/22 02:17, Noah Misch wrote: >>> On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote: >>>> On 6/19/21 16:39, Noah Misch wrote: >>>>> On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote: >>>>>> Recycling and preallocation are wasteful during archive recovery, because >>>>>> KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to >>>>>> fix the race by adding an XLogCtl flag indicating which regime currently owns >>>>>> the right to add long-term pg_wal directory entries. In the archive recovery >>>>>> regime, the checkpointer will not preallocate and will unlink old segments >>>>>> instead of recycling them (like wal_recycle=off). XLogFileInit() will fail. >>>>> >>>>> Here's the implementation. Patches 1-4 suffice to stop the user-visible >>>>> ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem >>>>> writes, and it provides some future-proofing. >>>>> >>>>> I was tempted to (but did not) just remove preallocation. Creating one file >>>>> per checkpoint seems tiny relative to the max_wal_size=1GB default, so I >>>>> expect it's hard to isolate any benefit. Under the old checkpoint_segments=3 >>>>> default, a preallocated segment covered a respectable third of the next >>>>> checkpoint. Before commit 63653f7 (2002), preallocation created more files. >>>> >>>> This also seems like it would fix the link issues we are seeing in [1]. >>>> >>>> I wonder if that would make it worth a back patch? >>> >>> Perhaps. It's sad to have multiple people deep-diving into something fixed on >>> HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points on >>> this. One alternative would be adding an errhint like "This is known to >>> happen occasionally during archive recovery, where it is harmless." That has >>> an unpolished look, but it's low-risk and may avoid deep-dive efforts. >> >> I think in this case a HINT might be sufficient to at least keep people from >> wasting time tracking down a problem that has already been fixed. >> >> However, there is another issue [1] that might argue for a back patch if >> this patch (as I believe) would fix the issue. > >> [1] https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com > > That makes sense. Each iteration of the restartpoint recycle loop has a 1/N > chance of failing. Recovery adds >N files between restartpoints. Hence, the > WAL directory grows without bound. Is that roughly the theory in mind? Yes, though you have formulated it better than I had in my mind. Let's see if Don can confirm that he is seeing the "could not link file" messages. Regards, -David
pgsql-hackers by date: