Re: Sorted writes in checkpoint - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Sorted writes in checkpoint |
Date | |
Msg-id | 200803112005.m2BK51325629@momjian.us Whole thread Raw |
In response to | Sorted writes in checkpoint (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
List | pgsql-hackers |
Added to TODO: * Consider sorting writes during checkpoint http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > Greg Smith <gsmith@gregsmith.com> wrote: > > > On Mon, 11 Jun 2007, ITAGAKI Takahiro wrote: > > > If the kernel can treat sequential writes better than random writes, is > > > it worth sorting dirty buffers in block order per file at the start of > > > checkpoints? > > I wrote and tested the attached sorted-writes patch base on Heikki's > ldc-justwrites-1.patch. There was obvious performance win on OLTP workload. > > tests | pgbench | DBT-2 response time (avg/90%/max) > ---------------------------+---------+----------------------------------- > LDC only | 181 tps | 1.12 / 4.38 / 12.13 s > + BM_CHECKPOINT_NEEDED(*) | 187 tps | 0.83 / 2.68 / 9.26 s > + Sorted writes | 224 tps | 0.36 / 0.80 / 8.11 s > > (*) Don't write buffers that were dirtied after starting the checkpoint. > > machine : 2GB-ram, SCSI*4 RAID-5 > pgbench : -s400 -t40000 -c10 (about 5GB of database) > DBT-2 : 60WH (about 6GB of database) > > > > I think it has the potential to improve things. There are three obvious > > and one subtle argument against it I can think of: > > > > 1) Extra complexity for something that may not help. This would need some > > good, robust benchmarking improvements to justify its use. > > Exactly. I think we need a discussion board for I/O performance issues. > Can I use Developers Wiki for this purpose? Since performance graphs and > result tables are important for the discussion, so it might be better > than mailing lists, that are text-based. > > > > 2) Block number ordering may not reflect actual order on disk. While > > true, it's got to be better correlated with it than writing at random. > > 3) The OS disk elevator should be dealing with this issue, particularly > > because it may really know the actual disk ordering. > > Yes, both are true. However, I think there is pretty high correlation > in those orderings. In addition, we should use filesystem to assure > those orderings correspond to each other. For example, pre-allocation > of files might help us, as has often been discussed. > > > > Here's the subtle thing: by writing in the same order the LRU scan occurs > > in, you are writing dirty buffers in the optimal fashion to eliminate > > client backend writes during BuferAlloc. This makes the checkpoint a > > really effective LRU clearing mechanism. Writing in block order will > > change that. > > The issue will probably go away after we have LDC, because it writes LRU > buffers during checkpoints. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: