Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: checkpointer continuous flushing
Date
Msg-id CAA4eK1J-1OF-dqM1SgJsSpDAqAuHu_5VTARY9gWq4Ht3F=2Cew@mail.gmail.com
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Tue, Jun 2, 2015 at 6:45 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Amit,
>
>> [...]
>>>
>>> The objective is to help avoid PG stalling when fsyncing on checkpoints,
>>> and in general to get better latency-bound performance.
>>
>>
>> Won't this lead to more-unsorted writes (random I/O) as the
>> FlushBuffer requests (by checkpointer or bgwriter) are not sorted as
>> per files or order of blocks on disk?
>
>
> Yep, probably. Under "moderate load" this is not an issue. The io-scheduler and other hd firmware will probably reorder writes anyway. Also, if several data are updated together, probably they are likely to be already neighbours in memory as well as on disk.
>
>> I remember sometime back there was some discusion regarding
>> sorting writes during checkpoint, one idea could be try to
>> check this idea along with that patch.  I just saw that Andres has
>> also given same suggestion which indicates that it is important
>> to see both the things together.
>
>
> I would rather separate them, unless this is a blocker. This version seems already quite effective and very light. ISTM that adding a sort phase would mean reworking significantly how the checkpointer processes pages.
>

I agree with you that if we have to add a sort phase, there is additional
work and that work could be significant depending on the design we
choose, however without that, this patch can have impact on many kind
of workloads, even in your mail in one of the tests
("pgbench -M prepared -N -T 100 -j 2 -c 4 -P 1" over 32 runs (4 clients))
it has shown 20% degradation which is quite significant and test also
seems to be representative of the workload which many users in real-world
will use.

Now one can say that for such workloads turn the new knob to off, but
in reality it could be difficult to predict if the load is always moderate.
I think users might be able to predict that at table level, but inspite of that
I don't think having any such knob can give us ticket to flush the buffers
in random order.

>> Also here another related point is that I think currently even fsync
>> requests are not in order of the files as they are stored on disk so
>> that also might cause random I/O?
>
>
> I think that currently the fsync is on the file handler, so what happens depends on how fsync is implemented by the system.
>

That can also lead to random I/O if the fsync for different files is not in
order as they are actually stored on disk.


>> Yet another idea could be to allow BGWriter to also fsync the dirty
>> buffers,
>
>
> ISTM That it is done with this patch with "bgwriter_flush_to_disk=on".
>

I think patch just issues an async operation not the actual flush.  Why
I have suggested so is that in your tests when the checkpoint_timeout
is small it seems there is a good gain in performance that means if
keep on flushing dirty buffers at regular intervals, the system's performance
is good and BGWriter is the process where that can be done conveniently
apart from checkpoint,  one might think that if same can be achieved by using
shorter checkpoint_timeout interval, then why to do this incremental flushes
by bgwriter, but in reality I think checkpoint is responsible for other things
as well other than dirty buffers, so we can't leave everything till checkpoint
happens.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing