WAL fsync scheduling - Mailing list pgsql-hackers

From Bruce Momjian
Subject WAL fsync scheduling
Date
Msg-id 200011180459.XAA19799@candle.pha.pa.us
Whole thread Raw
In response to Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)  (Tom Samplonius <tom@sdf.com>)
Responses Re: WAL fsync scheduling  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> > sleep(3) should conform to POSIX specification, if anyone has the
> > reference they can check it to see what the effect of sleep(0)
> > should be.
> 
>   Yes, but Posix also specifies sched_yield() which rather explicitly
> allows a process to yield its timeslice.  No idea how well that is
> supported.

OK, I have a new idea.

There are two parts to transaction commit.  The first is writing all
dirty buffers or log changes to the kernel, and second is fsync of the
log file.

I suggest having a per-backend shared memory byte that has the following
values:
START_LOG_WRITEWAIT_ON_FSYNCNOT_IN_COMMITbackend_number_doing_fsync

I suggest that when each backend starts a commit, it sets its byte to
START_LOG_WRITE.  When it gets ready to fsync, it checks all backends. 
If all are NOT_IN_COMMIT, it does fsync and continues.

If one or more are in START_LOG_WRITE, it waits until no one is in
START_LOG_WRITE.  It then checks all WAIT_ON_FSYNC, and if it is the
lowest backend in WAIT_ON_FSYNC, marks all others with its backend
number, and does fsync.  It then clears all backends with its number to
NOT_IN_COMMIT.  Other backend will see they are not the lowest
WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT
so they can then continue, knowing their data was synced.

This allows a single backend not to sleep, and allows multiple backends
to bunch up only when they are all about to commit.

The reason backend numbers are written is so other backends entering the
commit code will not interfere with the backends performing fsync.

Comments?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Failure to recognise new database
Next
From: Bruce Momjian
Date:
Subject: Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)