Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1601072154420.24114@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

>> Hmmm. What I understood is that the workloads that have some performance
>> regressions (regressions that I have *not* seen in the many tests I ran) are
>> not due to checkpointer IOs, but rather in settings where most of the writes
>> is done by backends or bgwriter.
>
> As far as I can see you've not run many tests where the hot/warm data
> set is larger than memory (the full machine's memory, not
> shared_buffers).

Indeed, I think I ran some, but not many with such characteristics.

> That quite drastically alters the performance characteristics here, 
> because you suddenly have lots of synchronous read IO thrown into the 
> mix.

If I understand this point correctly...

I would expect the overall performance to be abysmal in such a situation 
because you get only intermixed *random* read and writes: As you point 
out, synchroneous *random* reads (very slow), but on the write side the 
IOs are mostly random as well on the checkpointer side because there is 
not much to aggregate to get sequential writes.

Now why would that degrade performance significantly? For me it should 
render the sorting/flushing less and less effective, and it would go back 
to the previous performance levels...

Or maybe it only the flushing itself which degrades performance, as you 
point out, because then you have some synchronous (synced) writes as well 
as read, as opposed to just the reads before without the patch.

If this is indeed the issue, then the solution to avoid the regression is 
*not* to flush so that the OS IO scheduler is less constrained in its job, 
and can be slightly more effective (well, we talking of abysmal random IO 
disk performance here, so effective would be between slightly more or less 
very very very bad).

Maybe a trick could be not to aggregate and flush when buffers in the same 
file are too much apart anyway, for instance, based on some threshold? 
This can be implemented locally when deciding to merge buffer flushes or 
not, and whether to flush or not, so it would fit the current code quite 
simply.

Now my understanding of the sync_file_range call is that it is an advice 
to flush the stuff, but it is still asynchronous in nature, so whether it 
would impact performance that badly depends on the OS IO scheduler. Also, 
I would like to check whether, under the "regressed performance" (in tps 
term that you observed), pg is more or less responsive. It could be that 
the average performance is better but pg is offline longer on fsync. In 
which case, I would consider it better to have lower tps in such cases 
*if* pg responsiveness is significantly improved.

Would you have these measures for the regression runs you observed?

> Whether it's bgwriter or not I've not fully been able to establish, but
> it's a working theory.

Ok, that is something to check for confirmation or infirmation.

Given the above discussion, I think my suggestion may be wrong: as the tps 
is low because of random read/write accesses then not many buffers are 
modified (so the bgwriter/backends won't need to make space), the 
checkpointer does not have much to write (good), *but* all of it is random 
(bad).

>> I do not see the point of rewriting the checkpointer for them, although
>> obviously I agree that something has to be done also for the other
>> processes.
>
> Rewriting the checkpointer and fixing the flush interface in a more
> generic way aren't the same thing at all.

Hmmm, probably I misunderstood something in the discussion. It started 
with an implementation strategy, but it derived to discussing a 
performance regression. I aggree that these are two different subjects.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: WIP: Covering + unique indexes.
Next
From: Alvaro Herrera
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status