Re: Partitioned checkpointing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Partitioned checkpointing
Date
Msg-id alpine.DEB.2.10.1509261416590.8351@sto
Whole thread Raw
In response to Re: Partitioned checkpointing  (Takashi Horikawa <t-horikawa@aj.jp.nec.com>)
List pgsql-hackers
Hello,

These are interesting runs.

> In a situation in which small values are set in dirty_bytes and 
> dirty_backgound_bytes, a buffer is likely stored in the HD immediately 
> after the buffer is written in the kernel by the checkpointer. Thus, I 
> tried a quick hack to make the checkpointer invoke write system call to 
> write a dirty buffer immediately followed by invoking store operation 
> for a buffer implemented with sync_file_range() system call. # For 
> reference, I attach the patch. As shown in file_sync_range.JPG, this 
> strategy considered to have been effective.

Indeed. This approach is part of this current patch:
    https://commitfest.postgresql.org/6/260/

Basically, what you do is to call sync_file_range on each block, and you 
tested on a high-end system probably with a lot of BBU disk cache, which I 
guess allows the disk to reorder writes so as to benefit from sequential 
write performance.

> In conclusion, as long as pgbench execution against linux concerns, 
> using sync_file_range() is a promising solution.

I found that calling sync_file_range for every block could degrade 
performance a bit under some conditions, at least onmy low-end systems 
(just a [raid] disk, no significant disk cache in front of it), so the 
above patch aggregates neighboring writes so as to issue less 
sync_file_range calls.

> That is, the checkpointer invokes sync_file_range() to store a buffer 
> immediately after it writes the buffer in the kernel.

Yep. It is interesting that sync_file_range alone improves stability a lot 
on your high-end system, although sorting is mandatory for low-end 
systems.

My interpretation, already stated above, is that the hardware does the 
sorting on the cached data at the disk level in your system.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pageinspect patch, for showing tuple data
Next
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan