Thread: Re: Sorting writes during checkpoint

Re: Sorting writes during checkpoint

From

ITAGAKI Takahiro

Date:

06 July 2008, 22:29:18

(Go back to -hackers)

Simon Riggs <simon@2ndquadrant.com> wrote:

> No action on this seen since last commitfest, but I think we should do
> something with it, rather than just ignore it.

I will have a plan to test it on RAID-5 disks, where sequential writing
are much better than random writing. I'll send the result as an evidence.

Also, I have a relevant idea to sorting writes. Smoothed checkpoint in 8.3
spreads write(), but calls fsync() at once. With sorted writes, we can
call fsync() segment-by-segment for each writes of dirty pages contained
in the segment. It could improve worst response time during checkpoints.

> Note that if we do this for checkpoint we should also do this for
> FlushRelationBuffers(), used during heap_sync(), for exactly the same
> reasons.

Ah, I overlooked FlushRelationBuffers(). It is worth sorting.

> Would suggest calling it bulk_io_hook() or similar.

I think we need to reconsider the "bufmgr - smgr - md" layers, not only
an I/O elevator hook. If we will have spreading fsync(), bufmgr should
know where the file segments are switched. It seems to break area
between bufmgr and md in the current architecture unhappily.

In addition, the current smgr layer is completely useless because
it cannot be extended dynamically and cannot handle multiple md-layer
modules. I would rather merge current smgr and part of bufmgr into
a new smgr and add smgr_hook() than bulk_io_hook().

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Re: Sorting writes during checkpoint

From

Greg Smith

Date:

16 July 2008, 02:52:24

On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:

> I will have a plan to test it on RAID-5 disks, where sequential writing
> are much better than random writing. I'll send the result as an evidence.

If you're running more tests here, please turn on log_checkpoints and 
collect the logs while the test is running.  I'm really curious if there's 
any significant difference in what that reports here in the sorted case 
vs. the regular one.

> Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. 
> With sorted writes, we can call fsync() segment-by-segment for each 
> writes of dirty pages contained in the segment. It could improve worst 
> response time during checkpoints.

Further decreasing the amount of data that is fsync'd at any point in time 
might be a bigger improvement than just the sorting itself is doing (so 
far I haven't seen anything really significant just from the sort but am 
still testing).

One thing I didn't see any comments from you on is how/if the sorted 
writes patch lowers worst-case latency.  That's the area I'd hope an 
improved fsync protocol would help most with, rather than TPS, which might 
even go backwards because writes won't be as bunched and therefore will 
have more seeking.  It's easy enough to analyze the data coming from 
"pgbench -l" to figure that out; example shell snipped that shows just the 
worst ones:

pgbench -l -N <db>
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some 
examples of that on my web page you may have seen before.

> In addition, the current smgr layer is completely useless because
> it cannot be extended dynamically and cannot handle multiple md-layer
> modules. I would rather merge current smgr and part of bufmgr into
> a new smgr and add smgr_hook() than bulk_io_hook().

I don't really have a firm opinion here about the code to comment on this 
specific suggestion, but I will say that I've found the amount of layering 
in this area makes it difficult to understand just what's going on 
sometimes (especially when new to it).  A lot of that abstraction felt a 
bit pass-through to me, and anything that would collapse that a bit would 
be helpful for streamlining the code instrumenting going on with things 
like dtrace.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD