Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1510190834250.30391@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

>> Here is a v13, which is just a rebase after 1aba62ec.
>
> I'm working on this patch, to get it into a state I think it'd be
> commitable.

I'll review it carefully. Also, if you can include some performance 
feature it would help, even if I'll do some more runs.

> In my performance testing it showed that calling PerformFileFlush() only 
> at segment boundaries and in CheckpointWriteDelay() can lead to rather 
> spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is 
> problematic because it only is triggered while on schedule, and not when 
> behind.

When behind, the PerformFileFlush should be called on segment boundaries.
The idea was not to go to sleep without flushing, and to do it as little 
as possible.

> My testing seems to show that just adding a limit of 32 buffers to
> FileAsynchronousFlush() leads to markedly better results.

Hmmm. 32 buffers means 256 KB, which is quite small. Not sure what a good 
"limit" would be. It could depend whether pages are close or not.

> I wonder if mmap() && msync(MS_ASYNC) isn't a better replacement for
> sync_file_range(SYNC_FILE_RANGE_WRITE) than posix_fadvise(DONTNEED). It
> might even be possible to later approximate that on windows using
> FlushViewOfFile().

I'm not sure that mmap/msync can be used for this purpose, because there 
is no real control it seems about where the file is mmapped.

> As far as I can see the while (nb_spaces != 0)/NextBufferToWrite() logic 
> doesn't work correctly if tablespaces aren't actually sorted. I'm 
> actually inclined to fix this by simply removing the flag to 
> enable/disable sorting.

I do no think that there is a significant downside to having sort always 
on, but showing it requires to be able to test, so to have a guc. The 
point of the guc is to demonstrate that the feature is harmless:-)

> Having defined(HAVE_SYNC_FILE_RANGE) || defined(HAVE_POSIX_FADVISE) in
> so many places looks ugly, I want to push that to the underlying
> functions. If we add a different flushing approach we shouldn't have to
> touch several places that don't actually really care.

I agree that it is pretty ugly, but I do not think that you can remove 
them all. You need at least one for checking the guc and one for enabling 
the feature. Maybe their number could be reduced if the functions are 
switched to do-nothing stubs which are called nevertheless, but I was not 
keen on letting unused code when there is no sync_file_range nor 
posix_fadvise.

> I've replaced the NextBufferToWrite() logic with a binaryheap.h heap -
> seems to work well, with a bit less code actually.

Hmmm. I'll check. I'm still unconvinced that using a tree for a 2-3 
element set in most case is an improvement.

> I'll post this after some more cleanup & testing.

I'll have a look when it is ready.

> I've also noticed that sleeping logic in CheckpointWriteDelay() isn't
> particularly good. In high throughput workloads the 100ms sleep is too
> long, leading to bursty IO behaviour. If 1k+ buffers a written out a
> second 100ms is a rather long sleep. For another that we only sleep
> 100ms when the write rate is low makes the checkpoint finish rather
> quickly - on a slow disk (say microsd) that can cause unneccesary
> slowdowns for concurrent activity.  ISTM we should calculate the sleep
> time in a better way.

I also noted this point, but I'm not sure how to have a better approach, 
so I let it as it is. I tried 50 ms & 200 ms on some runs, without 
significant effect on performance for the test I ran then. The point of 
having not too small a value is that it provide some significant work to 
the IO subsystem without overflowing it. On average it does not matter. 
I'm unsure how it would interact with flushing. So I decided not to do 
anything about it. Maybe it should be a guc, but I would not know how to 
choose it.

> The SIGHUP behaviour is also weird.  Anyway, this probably belongs on a 
> new thread.

Probably. I did not try to look at that.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [PATCH] SQL function to report log message
Next
From: Fabien COELHO
Date:
Subject: Re: Checkpoint throttling issues