Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding |
Date | |
Msg-id | 20200105003355.n2atv7thx7dqiqbt@development Whole thread Raw |
In response to | Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding (Noah Misch <noah@leadboat.com>) |
Responses |
Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
|
List | pgsql-hackers |
On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote: >On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote: >> The main consequence is the false alarm. > >That conclusion was incorrect. On further study, I was able to reproduce data >loss via either of two weaknesses in the "apparent wraparound" test: > >1. The result of the test is valid only until we release the SLRU ControlLock, > which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate > segments for deletion. Once we release that lock, latest_page_number can > advance. This creates a TOCTOU race condition, allowing excess deletion: > > [local] test=# table trunc_clog_concurrency ; > ERROR: could not access status of transaction 2149484247 > DETAIL: Could not open file "pg_xact/0801": No such file or directory. > >2. By the time the "apparent wraparound" test fires, we've already WAL-logged > the truncation. clog_redo() suppresses the "apparent wraparound" test, > then deletes too much. Startup then fails: > > 881997 2019-02-10 02:53:32.105 GMT FATAL: could not access status of transaction 708112327 > 881997 2019-02-10 02:53:32.105 GMT DETAIL: Could not open file "pg_xact/02A3": No such file or directory. > 881855 2019-02-10 02:53:32.107 GMT LOG: startup process (PID 881997) exited with exit code 1 > > >Fixes are available: > >a. Fix the rounding in SimpleLruTruncate(). (The patch I posted upthread is > wrong; I will correct it in a separate message.) > >b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than > AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a > checkpoint, or in the startup process. Hence, also arrange for only one > backend to call SimpleLruTruncate(AsyncCtl) at a time. > >c. Test "apparent wraparound" before writing WAL, and don't WAL-log > truncations that "apparent wraparound" forces us to skip. > >d. Hold the ControlLock for the entirety of SimpleLruTruncate(). This removes > the TOCTOU race condition, but TransactionIdDidCommit() and other key > operations would be waiting behind filesystem I/O. > >e. Have the SLRU track a "low cutoff" for an ongoing truncation. Initially, > the low cutoff is the page furthest in the past relative to cutoffPage (the > "high cutoff"). If SimpleLruZeroPage() wishes to use a page in the > truncation range, it would acquire an LWLock and increment the low cutoff. > Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the > same LWLock and recheck the segment against the latest low cutoff. > >With both (a) and (b), the only way I'd know to reach the "apparent >wraparound" message is to restart in single-user mode and burn XIDs to the >point of bona fide wraparound. Hence, I propose to back-patch (a) and (b), >and I propose (c) for HEAD only. I don't want (d), which threatens >performance too much. I would rather not have (e), because I expect it's more >complex than (b) and fixes strictly less than (b) fixes. > >Can you see a way to improve on that plan? Can you see other bugs of this >nature that this plan does not fix? > Seems reasonable, although I wonder how much more expensive would just doing (d) be. It seems by far the least complex solution, and it moves "just" the SlruScanDirectory() call before the lock. It's true it adds I/O requests, OTOH it's just unlink() without fsync() and I'd expect the number of files to be relatively low. Plus we already do SimpleLruWaitIO and SlruInternalWritePage in the loop. BTW isn't that an issue that SlruInternalDeleteSegment does not do any fsync calls after unlinking the segments? If the system crashes/reboots before this becomes persistent (i.e. some of the segments reappear, won't that cause a problem)? It's a bit unfortunate that a patch for a data corruption / loss issue (even if a low-probability one) fell through multiple commitfests. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: