Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | alpine.DEB.2.10.1506240628160.3535@sto Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
>> Besides, causing additional cacheline bouncing during the >> sorting process is a bad idea. > > Hmmm. The impact would be to multiply the memory required by 3 or 4 (buf_id, > relation, forknum, offset), instead of just buf_id, and I understood that > memory was a concern. > > Moreover, once the sort process get the lines which contain the sorting data > from the buffer descriptor in its cache, I think that it should be pretty > much okay. Incidentally, they would probably have been brought to cache by > the scan to collect them. Also, I do not think that the sorting time for > 128000 buffers, and possible cache misses, was a big issue, but I do not have > a measure to defend that. I could try to collect some data about that. I've collected some data by adding a "sort time" measure, with checkpoint_sort_size=10000000 so that sorting is in one chunk, and done some large checkpoints: LOG: checkpoint complete: wrote 41091 buffers (6.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; sort=0.024s, write=0.488 s, sync=8.790 s, total=9.837 s; sync files=41, longest=8.717 s, average=0.214 s; distance=404972kB, estimate=404972 kB LOG: checkpoint complete: wrote 212124 buffers (32.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; sort=0.078s, write=128.885 s, sync=1.269 s, total=131.646 s; sync files=43, longest=1.155 s, average=0.029 s; distance=2102950kB, estimate=2102950 kB LOG: checkpoint complete: wrote 384427 buffers (36.7%); 0 transaction log file(s) added, 0 removed, 1 recycled; sort=0.120s, write=83.995 s, sync=13.944 s, total=98.035 s; sync files=9, longest=13.724 s, average=1.549 s; distance=3783305kB, estimate=3783305 kB LOG: checkpoint complete: wrote 809211 buffers (77.2%); 0 transaction log file(s) added, 0 removed, 1 recycled; sort=0.358s, write=138.146 s, sync=14.943 s, total=153.124 s; sync files=13, longest=14.871 s, average=1.149 s; distance=8075338kB, estimate=8075338 kB Summary of these checkpoints: #buffers size sort 41091 328MB 0.024 212124 1.7GB 0.078 384427 2.9GB 0.120 809211 6.2GB 0.358 Sort times are pretty negligeable compared to the whole checkpoint time, and under 0.1 s/GB of buffers sorted. On a 512 GB server with shared_buffers=128GB (25%), this suggest a worst case checkpoint sorting in a few seconds, and then you have a hundred GB to write anyway. If we project on next decade 1 TB checkpoint that would make sorting in under a minute... But then you have 1 TB of data to dump. As a comparison point, I've done the large checkpoint with the default sort size of 131072: LOG: checkpoint complete: wrote 809211 buffers (77.2%); 0 transaction log file(s) added, 0 removed, 1 recycled; sort=0.251s, write=152.377 s, sync=15.062 s, total=167.453 s; sync files=13, longest=14.974 s, average=1.158 s; distance=8075338kB, estimate=8075338 kB The 0.251 sort time is to be compared to 0.358. Well, n.log(n) is not too bad, as expected. These figures suggest that sorting time and associated cache misses are not a significant issue and thus are not worth bothering much about, and also that probably a simple boolean option would be quite acceptable instead of the chunk approach. Attached is an updated version of the patch which turns the sort option into a boolean, and also include the sort time in the checkpoint log. There is still an open question about whether the sorting buffer allocation is lost on some signals and should be reallocated in such event. -- Fabien.
pgsql-hackers by date: