Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1506240628160.3535@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
>> Besides, causing additional cacheline bouncing during the
>> sorting process is a bad idea.
>
> Hmmm. The impact would be to multiply the memory required by 3 or 4 (buf_id, 
> relation, forknum, offset), instead of just buf_id, and I understood that 
> memory was a concern.
>
> Moreover, once the sort process get the lines which contain the sorting data 
> from the buffer descriptor in its cache, I think that it should be pretty 
> much okay. Incidentally, they would probably have been brought to cache by 
> the scan to collect them. Also, I do not think that the sorting time for 
> 128000 buffers, and possible cache misses, was a big issue, but I do not have 
> a measure to defend that. I could try to collect some data about that.

I've collected some data by adding a "sort time" measure, with 
checkpoint_sort_size=10000000 so that sorting is in one chunk, and done 
some large checkpoints:

LOG:  checkpoint complete: wrote 41091 buffers (6.3%);  0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.024s, write=0.488 s, sync=8.790 s, total=9.837 s;  sync files=41, longest=8.717 s, average=0.214 s;
distance=404972kB, estimate=404972 kB
 

LOG:  checkpoint complete: wrote 212124 buffers (32.4%);  0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.078s, write=128.885 s, sync=1.269 s, total=131.646 s;  sync files=43, longest=1.155 s, average=0.029 s;
distance=2102950kB, estimate=2102950 kB
 

LOG:  checkpoint complete: wrote 384427 buffers (36.7%);  0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.120s, write=83.995 s, sync=13.944 s, total=98.035 s;  sync files=9, longest=13.724 s, average=1.549 s;
distance=3783305kB, estimate=3783305 kB
 

LOG:  checkpoint complete: wrote 809211 buffers (77.2%);  0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.358s, write=138.146 s, sync=14.943 s, total=153.124 s;  sync files=13, longest=14.871 s, average=1.149 s;
distance=8075338kB, estimate=8075338 kB
 

Summary of these checkpoints:
  #buffers   size   sort     41091  328MB  0.024    212124  1.7GB  0.078    384427  2.9GB  0.120    809211  6.2GB
0.358

Sort times are pretty negligeable compared to the whole checkpoint time,
and under 0.1 s/GB of buffers sorted.

On a 512 GB server with shared_buffers=128GB (25%), this suggest a worst 
case checkpoint sorting in a few seconds, and then you have a hundred GB 
to write anyway. If we project on next decade 1 TB checkpoint that would 
make sorting in under a minute... But then you have 1 TB of data to dump.

As a comparison point, I've done the large checkpoint with the default 
sort size of 131072:

LOG:  checkpoint complete: wrote 809211 buffers (77.2%);  0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.251s, write=152.377 s, sync=15.062 s, total=167.453 s;  sync files=13, longest=14.974 s, average=1.158 s;
distance=8075338kB, estimate=8075338 kB
 

The 0.251 sort time is to be compared to 0.358. Well, n.log(n) is not too 
bad, as expected.


These figures suggest that sorting time and associated cache misses are 
not a significant issue and thus are not worth bothering much about, and 
also that probably a simple boolean option would be quite acceptable 
instead of the chunk approach.

Attached is an updated version of the patch which turns the sort option 
into a boolean, and also include the sort time in the checkpoint log.

There is still an open question about whether the sorting buffer 
allocation is lost on some signals and should be reallocated in such 
event.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Michael Paquier
Date:
Subject: Re: pg_rewind failure by file deletion in source server