"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> "ITAGAKI Takahiro" <itagaki.takahiro@lab.ntt.co.jp> wrote
>> AbsorbFsyncRequests will be called during the fsync loop in my patch,
>> so new files might be added to pendingOpsTable and they will be removed
>> from the table *before* writing the pages belonging to them.
>> So I changed it to copy the contents of pendingOpsTable to a local
>> variables and iterate on the vars later.
I think this fear is incorrect. At the time ForwardFsyncRequest is
called, the backend must *already* have done whatever write it is
concerned about fsync'ing (note that ForwardFsyncRequest may choose
to do the fsync itself). Therefore it is OK if the bgwriter does that
fsync immediately upon receipt of the request. There is no constraint
saying that we ever need to delay execution of an fsync request.
> I see - it is the AbsorbFsyncRequests() added in mdsync() loop and you want
> to avoid unecessary fsyncs. But the remove-recover method you use has a
> caveat: if any hash_search(HASH_ENTER) failed when you try to reinsert them
> into the pendingOpsTable, you have to raise the error to PANIC since we
> can't get back the missing fds any more.
Yes, the patch is wrong as-is because it may lose uncompleted fsyncs.
But I think that we could just add the AbsorbFsyncRequests call in the
fsync loop and not worry about trying to avoid doing extra fsyncs.
Another possibility is to make the copied list as in the patch, but
HASH_REMOVE an entry only after doing the fsync successfully --- as long
as you don't AbsorbFsyncRequests between doing the fsync and removing
the entry, you aren't risking missing a necessary fsync. I'm
unconvinced that this is worth the trouble, however.
regards, tom lane