Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	June 3, 2015 07:46:16
Msg-id	CAA4eK1J-1OF-dqM1SgJsSpDAqAuHu_5VTARY9gWq4Ht3F=2Cew@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
List	pgsql-hackers

Tree view

On Tue, Jun 2, 2015 at 6:45 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Amit,
>
>> [...]
>>>
>>> The objective is to help avoid PG stalling when fsyncing on checkpoints,
>>> and in general to get better latency-bound performance.
>>
>>
>> Won't this lead to more-unsorted writes (random I/O) as the
>> FlushBuffer requests (by checkpointer or bgwriter) are not sorted as
>> per files or order of blocks on disk?
>
>
> Yep, probably. Under "moderate load" this is not an issue. The io-scheduler and other hd firmware will probably reorder writes anyway. Also, if several data are updated together, probably they are likely to be already neighbours in memory as well as on disk.
>
>> I remember sometime back there was some discusion regarding
>> sorting writes during checkpoint, one idea could be try to
>> check this idea along with that patch. I just saw that Andres has
>> also given same suggestion which indicates that it is important
>> to see both the things together.
>
>
> I would rather separate them, unless this is a blocker. This version seems already quite effective and very light. ISTM that adding a sort phase would mean reworking significantly how the checkpointer processes pages.
>

I agree with you that if we have to add a sort phase, there is additional

work and that work could be significant depending on the design we

choose, however without that, this patch can have impact on many kind

of workloads, even in your mail in one of the tests

("pgbench -M prepared -N -T 100 -j 2 -c 4 -P 1" over 32 runs (4 clients))

it has shown 20% degradation which is quite significant and test also

seems to be representative of the workload which many users in real-world

will use.

Now one can say that for such workloads turn the new knob to off, but

in reality it could be difficult to predict if the load is always moderate.

I think users might be able to predict that at table level, but inspite of that

I don't think having any such knob can give us ticket to flush the buffers

in random order.

>> Also here another related point is that I think currently even fsync
>> requests are not in order of the files as they are stored on disk so
>> that also might cause random I/O?
>
>
> I think that currently the fsync is on the file handler, so what happens depends on how fsync is implemented by the system.
>

That can also lead to random I/O if the fsync for different files is not in

order as they are actually stored on disk.

>> Yet another idea could be to allow BGWriter to also fsync the dirty
>> buffers,
>
>
> ISTM That it is done with this patch with "bgwriter_flush_to_disk=on".
>

I think patch just issues an async operation not the actual flush. Why

I have suggested so is that in your tests when the checkpoint_timeout

is small it seems there is a good gain in performance that means if

keep on flushing dirty buffers at regular intervals, the system's performance

is good and BGWriter is the process where that can be done conveniently

apart from checkpoint, one might think that if same can be achieved by using

shorter checkpoint_timeout interval, then why to do this incremental flushes

by bgwriter, but in reality I think checkpoint is responsible for other things

as well other than dirty buffers, so we can't leave everything till checkpoint

happens.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Amit Kapila
Date: 03 June 2015, 06:55:19
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

From: Fabien COELHO
Date: 03 June 2015, 08:38:49
Subject: Re: checkpointer continuous flushing

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next