(2013/07/05 0:35), Joshua D. Drake wrote:
> On 07/04/2013 06:05 AM, Andres Freund wrote:
>>>> Presumably the smaller segsize is better because we don't
>>>> completely stall the system by submitting up to 1GB of io at once. So,
>>>> if we were to do it in 32MB chunks and then do a final fsync()
>>>> afterwards we might get most of the benefits.
>>> Yes, I try to test this setting './configure --with-segsize=0.03125' tonight.
>>> I will send you this test result tomorrow.
>>
>
> I did testing on this a few years ago, I tried with 2MB segments over 16MB
> thinking similarly to you. It failed miserably, performance completely tanked.
Just as you say, test result was miserable... Too small segsize is bad for
parformance. It might be improved by separate derectory, but too many FD with
open() and close() seem to be bad. However, I think taht this implementation have
potential which is improve for IO performance, so we need to try to test with
some methods.
* Performance result in DBT-2 (WH340) | NOTPM 90%tile Average Maximum
--------------------------------+-----------------------------------original_0.7 (baseline) | 3474.62
18.348328 5.739 36.977713 fsync + write | 3586.85 14.459486 4.960 27.266958 fsync + write +
segsize=0.25 | 3661.17 8.28816 4.117 17.23191 fsync + wrote + segsize=0.03125 | 3309.99 10.851245 6.759
19.500598
(2013/07/04 22:05), Andres Freund wrote:> 1) it breaks pg_upgrade. Which means many of the bigger users won't be>
ableto migrate to this and most packagers would carry the old> segsize around forever.> Even if we could get
pg_upgradeto split files accordingly link mode> would still be broken.
I think that pg_upgrade is one of the contrib, but not mainly implimentation of
Postgres. So contrib should not try to stand in improvement of main
implimentaion. Pg_upgrade users might consider same opinion.
> 2) It drastically increases the amount of file handles neccessary and by> extension increases the amount of
open/closecalls. Those aren't all> that cheap. And it increases metadata traffic since mtime/atime are> kept
formore files. Also, file creation is rather expensive since it> requires metadata transaction on the filesystem
level.
My test result was seemed this problem. But my test wasn't separate directory in
base/. I'm not sure that which way is best. If you have time to create patch,
please send us, and I try to test in DBT-2.
Best regards,
--
Mitsumasa KONDO
NTT Open Sorce Software Center