Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: checkpointer continuous flushing
Date
Msg-id alpine.DEB.2.10.1511121851510.20444@sto
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Andres Freund <andres@anarazel.de>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hello,

>> Basically yes, I'm suggesting a mutex in the vdf struct.
>
> I can't see that being ok. I mean what would that thing even do? VFD
> isn't shared between processes, and if we get a smgr flush we have to
> apply it, or risk breaking other things.

Probably something is eluding my comprehension:-)

My basic assumption is that the fopen & fd is per process, so we just have 
to deal with the one in the checkpointer process, so it is enough that the 
checkpointer does not close the file while it is flushing things to it?

>>> * my laptop, 16 GB Ram, 840 EVO 1TB as storage. With 2GB
>>> shared_buffers. Tried checkpoint timeouts from 60 to 300s.
>>
>> Hmmm. This is quite short.
>
> Indeed. I'd never do that in a production scenario myself. But
> nonetheless it showcases a problem.

I would say that it would render sorting ineffective because all the 
rewriting is done by bgwriter or workers, which does not totally explain 
why the throughput would be worst than before, I would expect it to be as 
bad as before...

>>> Well, you can't easily sort bgwriter/backend writes stemming from cache
>>> replacement. Unless your access patterns are entirely sequential the
>>> data in shared buffers will be laid out in a nearly entirely random
>>> order.  We could try sorting the data, but with any reasonable window,
>>> for many workloads the likelihood of actually achieving much with that
>>> seems low.
>>
>> Maybe the sorting could be shared with others so that everybody uses the
>> same order?
>>
>> That would suggest to have one global sorting of buffers, maybe maintained
>> by the checkpointer, which could be used by all processes that need to scan
>> the buffers (in file order), instead of scanning them in memory order.
>
> Uh. Cache replacement is based on an approximated LRU, you can't just
> remove that without serious regressions.

I understand that, but there is a balance to find. Generating random I/Os 
is very bad for performance, so the decision process must combine LRU/LFU 
heuristics with considering things in some order as well.

>>>> Hmmm. The shorter the timeout, the more likely the sorting NOT to be
>>>> effective
>>>
>>> You mean, as evidenced by the results, or is that what you'd actually
>>> expect?
>>
>> What I would expect...
>
> I don't see why then? If you very quickly writes lots of data the OS
> will continously flush dirty data to the disk, in which case sorting is
> rather important?

What I have in mind is: the shorter the timeout the less neighboring 
buffers will be touched, so the less nice sequential writes will be found 
by sorting them, so the worst the positive impact on performance...

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: psql: add \pset true/false
Next
From: Thomas Munro
Date:
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data