postgresql latency & bgwriter not doing its job - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | postgresql latency & bgwriter not doing its job |
Date | |
Msg-id | alpine.DEB.2.10.1408251900211.11151@sto Whole thread Raw |
Responses |
Re: postgresql latency & bgwriter not doing its job
Re: postgresql latency & bgwriter not doing its job Re: postgresql latency & bgwriter not doing its job Re: postgresql latency & bgwriter not doing its job |
List | pgsql-hackers |
Hello pgdevs, I've been playing with pg for some time now to try to reduce the maximum latency of simple requests, to have a responsive server under small to medium load. On an old computer with a software RAID5 HDD attached, pgbench simple update script run for a some time (scale 100, fillfactor 95) pgbench -M prepared -N -c 2 -T 500 -P 1 ... gives 300 tps. However this performance is really +1000 tps for a few seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO storm. The server is totally unresponsive 75% of the time. That's bandwidth optimization for you. Hmmm... why not. Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above max): pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ... The same thing more or less happens in a delayed fashion... You get 50 tps for some time, followed by sections of 15 seconds at 0 tps for the checkpoint when the segments are full... the server is unresponsive about 10% of the time (one in ten transaction is late by more than 200 ms). It is not satisfying, pg should be able to handle that load easily. The culprit I found is "bgwriter", which is basically doing nothing to prevent the coming checkpoint IO storm, even though there would be ample time to write the accumulating dirty pages so that checkpoint would find a clean field and pass in a blink. Indeed, at the end of the 500 seconds throttled test, "pg_stat_bgwriter" says: buffers_checkpoint = 19046 buffers_clean = 2995 Which suggest that bgwriter took on him to write only 6 pages per second, where at least 50 would have been needed for the load, and could have been handled by the harware without any problem. I have not found any mean to force bgwriter to send writes when it can. (Well, I have: create a process which sends "CHECKPOINT" every 0.2 seconds... it works more or less, but this is not my point:-) Bgwriter control parameters allow to control the maximum number of pages (bgwriter_lru_maxpages) written per round (bgwriter_delay), and a multiplier (bgwriter_lru_multiplier) which controls some heuristics to estimate how many pages should be needed so as to make them available. This may be nice in some settings, but is not adapted to the write oriented OTPL load tested with pgbench. The problem is that with the update load on a fitting in memory database there is not that much need of "new" pages, even if pages are being dirtied (about 50 per seconds), so it seems that the heuristics decides not to write much. The net result of all this cleverness is that when the checkpoint arrives, several thousand pages have to be written and the server is offline for some time. ISTM that bgwriter lacks at least some "min page" setting it could be induced to free this many pages if it can. That would be a start. A better feature would be that it adapts itself to take advantage of the available IOPS, depending on the load induce by other tasks (vacuum, queries...), in a preventive manner, so as to avoid delaying what can be done right now under a small load, and thus avoid later IO storms. This would suggest that some setting would provide the expected IOPS capability of the underlying hardware, as some already suggest the expected available memory. Note that this preventive approach could also improve the "bandwith" measure: currently when pgbench is running at maximum speed before the checkpoint storm, nothing is written to disk but WAL, although it could probably also write some dirty pages. When the checkpoints arrives, less pages would need to be written, so the storm would be shorter. Any thoughts on this latency issue? Am I missing something? -- Fabien.
pgsql-hackers by date: