Home > mailing lists

Re: Improvement of checkpoint IO scheduler for stable transaction responses - Mailing list pgsql-hackers

From	KONDO Mitsumasa
Subject	Re: Improvement of checkpoint IO scheduler for stable transaction responses
Date	July 5, 2013 07:46:42
Msg-id	51D67ADA.4020205@lab.ntt.co.jp Whole thread Raw
In response to	Re: Improvement of checkpoint IO scheduler for stable transaction responses ("Joshua D. Drake" <jd@commandprompt.com>)
Responses	Re: Improvement of checkpoint IO scheduler for stable transaction responses
List	pgsql-hackers

Tree view

(2013/07/05 0:35), Joshua D. Drake wrote:
> On 07/04/2013 06:05 AM, Andres Freund wrote:
>>>> Presumably the smaller segsize is better because we don't
>>>> completely stall the system by submitting up to 1GB of io at once. So,
>>>> if we were to do it in 32MB chunks and then do a final fsync()
>>>> afterwards we might get most of the benefits.
>>> Yes, I try to test this setting './configure --with-segsize=0.03125' tonight.
>>> I will send you this test result tomorrow.
>>
>
> I did testing on this a few years ago, I tried with 2MB segments over 16MB
> thinking similarly to you. It failed miserably, performance completely tanked.
Just as you say, test result was miserable... Too small segsize is bad for 
parformance. It might be improved by separate derectory, but too many FD with 
open() and close() seem to be bad. However, I think taht this implementation have 
potential which is improve for IO performance, so we need to try to test with 
some methods.

* Performance result in DBT-2 (WH340)                                 | NOTPM    90%tile    Average  Maximum
--------------------------------+-----------------------------------original_0.7 (baseline)         | 3474.62
18.348328 5.739    36.977713 fsync + write                   | 3586.85  14.459486  4.960    27.266958 fsync + write +
segsize=0.25   | 3661.17  8.28816    4.117    17.23191 fsync + wrote + segsize=0.03125 | 3309.99  10.851245  6.759
19.500598


(2013/07/04 22:05), Andres Freund wrote:> 1) it breaks pg_upgrade. Which means many of the bigger users won't be>
ableto migrate to this and most packagers would carry the old>     segsize around forever.>     Even if we could get
pg_upgradeto split files accordingly link mode>     would still be broken.
 
I think that pg_upgrade is one of the contrib, but not mainly implimentation of 
Postgres. So contrib should not try to stand in improvement of main 
implimentaion. Pg_upgrade users might consider same opinion.
> 2) It drastically increases the amount of file handles neccessary and by>     extension increases the amount of
open/closecalls. Those aren't all>     that cheap. And it increases metadata traffic since mtime/atime are>     kept
formore files. Also, file creation is rather expensive since it>     requires metadata transaction on the filesystem
level.
My test result was seemed this problem. But my test wasn't separate directory in 
base/. I'm not sure that which way is best. If you have time to create patch, 
please send us, and I try to test in DBT-2.

Best regards,
--
Mitsumasa KONDO
NTT Open Sorce Software Center

pgsql-hackers by date:

From: Greg Smith
Date: 05 July 2013, 07:23:25
Subject: Re: fallocate / posix_fallocate for new WAL file creation (etc...)

From: Greg Smith
Date: 05 July 2013, 07:51:56
Subject: Re: Block write statistics WIP

Re: Improvement of checkpoint IO scheduler for stable transaction responses - Mailing list pgsql-hackers

Previous

Next