Re: WAL fsync scheduling - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: WAL fsync scheduling |
Date | |
Msg-id | 200101241424.JAA15599@candle.pha.pa.us Whole thread Raw |
In response to | Re: WAL fsync scheduling (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
Added to TODO.detail and TODO list. > [ Charset ISO-8859-1 unsupported, converting... ] > > > There are two parts to transaction commit. The first is writing all > > > dirty buffers or log changes to the kernel, and second is fsync of the > > ^^^^^^^^^^^^ > > Backend doesn't write any dirty buffer to the kernel at commit time. > > Yes, I suspected that. > > > > > > log file. > > > > The first part is writing commit record into WAL buffers in shmem. > > This is what XLogInsert does. After that XLogFlush is called to ensure > > that entire commit record is on disk. XLogFlush does *both* write() and > > fsync() (single slock is used for both writing and fsyncing) if it needs to > > do it at all. > > Yes, I realize there are new steps in WAL. > > > > > > I suggest having a per-backend shared memory byte that has the following > > > values: > > > > > > START_LOG_WRITE > > > WAIT_ON_FSYNC > > > NOT_IN_COMMIT > > > backend_number_doing_fsync > > > > > > I suggest that when each backend starts a commit, it sets its byte to > > > START_LOG_WRITE. > > ^^^^^^^^^^^^^^^^^^^^^^^ > > Isn't START_COMMIT more meaningful? > > Yes. > > > > > > When it gets ready to fsync, it checks all backends. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > > What do you mean by this? The moment just after XLogInsert? > > Just before it calls fsync(). > > > > > > If all are NOT_IN_COMMIT, it does fsync and continues. > > > > 1st edition: > > > If one or more are in START_LOG_WRITE, it waits until no one is in > > > START_LOG_WRITE. It then checks all WAIT_ON_FSYNC, and if it is the > > > lowest backend in WAIT_ON_FSYNC, marks all others with its backend > > > number, and does fsync. It then clears all backends with its number to > > > NOT_IN_COMMIT. Other backend will see they are not the lowest > > > WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT > > > so they can then continue, knowing their data was synced. > > > > 2nd edition: > > > I have another idea. If a backend gets to the point that it needs > > > fsync, and there is another backend in START_LOG_WRITE, it can go to an > > > interuptable sleep, knowing another backend will perform the fsync and > > > wake it up. Therefore, there is no busy-wait or timed sleep. > > > > > > Of course, a backend must set its status to WAIT_ON_FSYNC to avoid a > > > race condition. > > > > The 2nd edition is much better. But I'm not sure do we really need in > > these per-backend bytes in shmem. Why not just have some counters? > > We can use a semaphore to wake-up all waiters at once. > > Yes, that is much better and clearer. My idea was just to say, "if no > one is entering commit phase, do the commit. If someone else is coming, > sleep and wait for them to do the fsync and wake me up with a singal." > > > > > > This allows a single backend not to sleep, and allows multiple backends > > > to bunch up only when they are all about to commit. > > > > > > The reason backend numbers are written is so other backends entering the > > > commit code will not interfere with the backends performing fsync. > > > > Being waked-up backend can check what's written/fsynced by calling XLogFlush. > > Seems that may not be needed anymore with a counter. The only issue is > that other backends may enter commit while fsync() is happening. The > process that did the fsync must be sure to wake up only the backends > that were waiting for it, and not other backends that may be also be > doing fsync as a group while the first fsync was happening. I leave > those details to people more experienced. :-) > > I am just glad people liked my idea. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: