Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1509050740290.429@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
Hello Amit, >> Woops.... 14 GB/s and 1.2 GB/s?! Is this a *hard* disk?? > > Yes, there is no SSD in system. I have confirmed the same. There are RAID > spinning drives. Ok... I guess that there is some kind of cache to explain these great tps figures, probably on the RAID controller. What does "lspci" says? Does "hdparm" suggests that the write cache is enabled? It would be fine if the I/O system has a BBU, but that could also hide some of the patch benefits... A tentative explanation for the similar figures with and without sorting could be that depending on the controller cache size (may be 1GB or more) and firmware, the I/O system reorders disk writes so that they are basically sequential and the fact that pg sorts them beforehand has little or no impact. This may also be help by the fact that buffers are not really in random order to begin with as the warmup phase does an initial "select stuff from table". There could be other possible factors such as the file system details, "WAFL" hacks... the tricks are endless:-) Checking for the right explanation would involve removing the unconditional select warmup to use only a long and random warmup, and probably trying a much larger than cache database, and/or disabling the write cache, reading the hardware documentation in detail... But this is also a lot of bother and time. Maybe the simplest approach would be to disable the write cache for the test. Is that possible? >> Woops, 1.6 GB/s write... same questions, "rotating plates"?? > > One thing to notice is that if I don't remove the output file > (output.img) the speed is much slower, see the below output. I think > this means in our case we will get ~320 MB/s I would say that the OS was doing something here, and 320 MB/s looks more like an actual HDD RAID system sequential write performance. >> If these are SSD, or if there is some SSD cache on top of the HDD, I would >> not expect the patch to do much, because the SSD random I/O writes are >> pretty comparable to sequential I/O writes. >> >> I would be curious whether flushing helps, though. > > Yes, me too. I think we should try to reach on consensus for exact > scenarios and configuration where this patch('es) can give benefit or we > want to verify if there is any regression as I have access to this m/c > for a very-very limited time. This m/c might get formatted soon for > some other purpose. Yep, it would be great if you have time for a flush test before it disappears... I think it is advisable to disable the write cache as it may also hide the impact of flushing. >> So whether the database fits in 8 GB shared buffer during the 2 hours of >> the pgbench run is an open question. > > With this kind of configuration, I have noticed that more than 80% > of updates are HOT updates, not much bloat, so I think it won't > cross 8GB limit, but still I can keep it to 32GB if you have any doubts. The problem with performance tests is that you want to test one thing, but there are many factors that intervene and you may end up testing something else, such as lock contention or process scheduler or whatever, rather than what you were trying to put in evidence. So I would suggest to be on the safe side and use the larger value. -- Fabien.
pgsql-hackers by date: