RE: archive status ".ready" files may be created too early - Mailing list pgsql-hackers
From | matsumura.ryo@fujitsu.com |
---|---|
Subject | RE: archive status ".ready" files may be created too early |
Date | |
Msg-id | OSAPR01MB502711B9C35BC2BE07943E91E88F0@OSAPR01MB5027.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>) |
Responses |
Re: archive status ".ready" files may be created too early
|
List | pgsql-hackers |
2020-03-26 18:50:24 Bossart, Nathan <bossartn(at)amazon(dot)com> wrote: > The v3 patch is a proof-of-concept patch that moves the ready-for- > archive logic to the WAL writer process. We mark files as ready-for- > archive when the WAL flush pointer has advanced beyond a known WAL > record boundary. I like such a simple resolution, but I cannot agree it. 1. This patch makes wal_writer_delay to have two meanings. For example, an user setting the parameter to a bigger value gets a archived file later. 2. Even if we create a new parameter, we and users cannot determine the best value. 3. PostgreSQL guarantees that if a database cluster stopped smartly, the cluster flushed and archived all WAL record as follows. [xlog.c] * If archiving is enabled, rotate the last XLOG file so that all the * remaining records are archived (postmaster wakes up the archiver * process one more time at the end of shutdown). The checkpoint * record will go to the next XLOG file and won't be archived (yet). Therefore, the idea may need that end-synchronization between WalWriter and archiver(pgarch). I cannot agree it because processing for stopping system has complexity inherently and the syncronization makes it more complicated. Your idea gives up currency of the notifying instead of simplicity, but I think that the synchronization may ruin its merit. 4. I found the patch spills a chance for notifying. We have to be more careful. At the following case, WalWriter will notify after a little less than 3 times of wal_writer_delay in worst case. It may not be allowed depending on value of wal_writer_delay. If we create a new parameter, we cannot explain to user about it. Premise: - Seg1 has been already notified. - FlushedPtr is 0/2D00000 (= all WAL record is flushed). ----- Step 1. Backend-A updates InsertPtr to 0/2E00000, but does not copy WAL record to buffer. Step 2. (sleep) WalWriter memorize InsertPtr 0/2E00000 to the local variable (LocalInsertPtr) and sleep because FlushedPtr has not passed InsertPtr. Step 3. Backend-A copies WAL record to buffer. Step 4. Backend-B process updates InsertPtr to 0/3100000, copies their record to buffer, commits (flushes it by itself), and updates FlushedPtr to 0/3100000. Step 5. WalWriter detects that FlushedPtr(0/3100000) passes LocalInsertPtr(0/2E00000), but WalWriter cannot notify Seg2 though it should be notified. It is caused by that WalWriter does not know that which record is crossing segment boundary. Then, after two sleeping for cheking that InsertPtr passes FlushedPtr again in worst case, Seg2 is notified. Step 6. (sleep) WalWriter sleep. Step 7. Backend-C inserts WAL record, flush, and updates as follows: InsertPtr --> 0/3200000 FlushedPtr --> 0/3200000 Step 8. Backend-D updates InsertPtr to 0/3300000, but does not copy record to buffer. Step 9. (sleep) WalWriter memorize InsertPtr 0/3300000 to LocalInsertPtr and sleep because FlushedPtr has been 0/3200000. Step 10. Backend-D copies its record. Step 11. Someone(Backend-X or WalWriter) flushes and updates FlushedPtr to 0/3300000. Step 12. WalWriter detects that FlushedPtr(0/3300000) passes LocalInsertPtr(0/3300000) and notify Seg2. ----- I'm preparing a patch that backend inserting segment-crossboundary WAL record leaves its EndRecPtr and someone flushing it checks the EndRecPtr and notifies.. Regards Ryo Matsumura
pgsql-hackers by date: