> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Can we put the backends to sleep waiting for a lock, and have them wake
> > up later?
>
> Locks don't have timeouts. There is no existing mechanism that will
> serve this purpose; we'll have to create a new one.
That is what I suspected.
Having thought about it, We currently have a few options:
1) let every backend fsync on its own2) try to delay backends so they all fsync() at the same time3) delay fsync until
aftercommit
Items 2 and 3 attempt to bunch up fsyncs. Option 2 has backends waiting
to fsync() on the expectation that some other backend may commit soon.
Option 3 I may turn out to be the best solution. No matter how smart we
make the code, we will never know for sure if someone is about to commit
and whether it is worth waiting.
My idea would be to let committing backends return "COMMIT" to the user,
and set a need_fsync flag that is guaranteed to cause an fsync within X
milliseconds. This way, if other backends commit in the next X
millisecond, they can all use one fsync().
Now, I know many will complain that we are returning commit while not
having the stuff on the platter. But consider, we only lose data from a
OS crash or hardware failure. Do people who commit something, and then
the machines crashes 2 milliseconds after the commit, really expect the
data to be on the disk when they restart? Maybe they do, but it seems
the benefit of grouped fsyncs() is large enough that many will say they
would rather have this option.
This was my point long ago that we could offer sub-second reliability
with no-fsync performance if we just had some process running that wrote
dirty pages and fsynced every 20 milliseconds.
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026