Re: Sorting writes during checkpoint - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Sorting writes during checkpoint
Date
Msg-id Pine.GSO.4.64.0807092139390.8953@westnet.com
Whole thread Raw
In response to Re: Sorting writes during checkpoint  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
List pgsql-hackers
On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:

> I will have a plan to test it on RAID-5 disks, where sequential writing
> are much better than random writing. I'll send the result as an evidence.

If you're running more tests here, please turn on log_checkpoints and 
collect the logs while the test is running.  I'm really curious if there's 
any significant difference in what that reports here in the sorted case 
vs. the regular one.

> Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. 
> With sorted writes, we can call fsync() segment-by-segment for each 
> writes of dirty pages contained in the segment. It could improve worst 
> response time during checkpoints.

Further decreasing the amount of data that is fsync'd at any point in time 
might be a bigger improvement than just the sorting itself is doing (so 
far I haven't seen anything really significant just from the sort but am 
still testing).

One thing I didn't see any comments from you on is how/if the sorted 
writes patch lowers worst-case latency.  That's the area I'd hope an 
improved fsync protocol would help most with, rather than TPS, which might 
even go backwards because writes won't be as bunched and therefore will 
have more seeking.  It's easy enough to analyze the data coming from 
"pgbench -l" to figure that out; example shell snipped that shows just the 
worst ones:

pgbench -l -N <db>
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some 
examples of that on my web page you may have seen before.

> In addition, the current smgr layer is completely useless because
> it cannot be extended dynamically and cannot handle multiple md-layer
> modules. I would rather merge current smgr and part of bufmgr into
> a new smgr and add smgr_hook() than bulk_io_hook().

I don't really have a firm opinion here about the code to comment on this 
specific suggestion, but I will say that I've found the amount of layering 
in this area makes it difficult to understand just what's going on 
sometimes (especially when new to it).  A lot of that abstraction felt a 
bit pass-through to me, and anything that would collapse that a bit would 
be helpful for streamlining the code instrumenting going on with things 
like dtrace.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: [PATCHES] WITH RECURSIVE updated to CVS TIP
Next
From: David Fetter
Date:
Subject: Re: [PATCHES] WITH RECURSIVE updated to CVS TIP