Re: checkpointer continuous flushing - V16 - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing - V16
Date
Msg-id alpine.DEB.2.10.1602180914430.19826@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing - V16  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing - V16
List pgsql-hackers
Hello Andres,

> 0001: Make SetHintBit() a bit more aggressive, afaics that fixes all the
>      potential regressions of 0002
> 0002: Fix the overaggressive flushing by the wal writer, by only
>      flushing every wal_writer_delay ms or wal_writer_flush_after
>      bytes.

I've looked at these patches, especially the whole bench of explanations 
and comments which is a good source for understanding what is going on in 
the WAL writer, a part of pg I'm not familiar with.

When reading the patch 0002 explanations, I had the following comments:

AFAICS, there are several levels of actions when writing things in pg:
 0: the thing is written in some internal buffer
 1: the buffer is advised to be passed to the OS (hint bits?)
 2: the buffer is actually passed to the OS (write, flush)
 3: the OS is advised to send the written data to the io subsystem    (sync_file_range with SYNC_FILE_RANGE_WRITE)
 4: the OS is required to send the written data to the disk    (fsync, sync_file_range with
SYNC_FILE_RANGE_WAIT_AFTER)

It is not clear when reading the text which level is discussed. In 
particular, I'm not sure that "flush" refers to level 2, which is 
misleading. When reading the description, I'm rather under the impression 
that it is about level 4, but then if actual fsync are performed every 200 
ms then the tps would be very low...

After more considerations, my final understanding is that this behavior 
only occurs with "asynchronous commit", aka a situation when COMMIT does 
not wait for data to be really fsynced, but the fsync is to occur within 
some delay so it will not be too far away, some kind of compromise for 
performance where commits can be lost.

Now all this is somehow alien to me because the whole point of committing 
is having the data to disk, and I would not consider a database to be safe 
if commit does not imply fsync, but I understand that people may have to 
compromise for performance.

Is my understanding right?

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Freeze avoidance of very large table.
Next
From: David Rowley
Date:
Subject: Re: Performance improvement for joins where outer side is unique