Hello Andres,
> 0001: Make SetHintBit() a bit more aggressive, afaics that fixes all the
> potential regressions of 0002
> 0002: Fix the overaggressive flushing by the wal writer, by only
> flushing every wal_writer_delay ms or wal_writer_flush_after
> bytes.
I've looked at these patches, especially the whole bench of explanations
and comments which is a good source for understanding what is going on in
the WAL writer, a part of pg I'm not familiar with.
When reading the patch 0002 explanations, I had the following comments:
AFAICS, there are several levels of actions when writing things in pg:
0: the thing is written in some internal buffer
1: the buffer is advised to be passed to the OS (hint bits?)
2: the buffer is actually passed to the OS (write, flush)
3: the OS is advised to send the written data to the io subsystem (sync_file_range with SYNC_FILE_RANGE_WRITE)
4: the OS is required to send the written data to the disk (fsync, sync_file_range with
SYNC_FILE_RANGE_WAIT_AFTER)
It is not clear when reading the text which level is discussed. In
particular, I'm not sure that "flush" refers to level 2, which is
misleading. When reading the description, I'm rather under the impression
that it is about level 4, but then if actual fsync are performed every 200
ms then the tps would be very low...
After more considerations, my final understanding is that this behavior
only occurs with "asynchronous commit", aka a situation when COMMIT does
not wait for data to be really fsynced, but the fsync is to occur within
some delay so it will not be too far away, some kind of compromise for
performance where commits can be lost.
Now all this is somehow alien to me because the whole point of committing
is having the data to disk, and I would not consider a database to be safe
if commit does not imply fsync, but I understand that people may have to
compromise for performance.
Is my understanding right?
--
Fabien.