Re: checkpoint writeback via sync_file_range - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpoint writeback via sync_file_range
Date
Msg-id 201201111351.38738.andres@anarazel.de
Whole thread Raw
In response to Re: checkpoint writeback via sync_file_range  (Florian Weimer <fweimer@bfk.de>)
List pgsql-hackers
On Wednesday, January 11, 2012 10:33:47 AM Florian Weimer wrote:
> * Greg Smith:
> > One idea I was thinking about here was building a little hash table
> > inside of the fsync absorb code, tracking how many absorb operations
> > have happened for whatever the most popular relation files are.  The
> > idea is that we might say "use sync_file_range every time <N> calls
> > for a relation have come in", just to keep from ever accumulating too
> > many writes to any one file before trying to nudge some of it out of
> > there. The bat that keeps hitting me in the head here is that right
> > now, a single fsync might have a full 1GB of writes to flush out,
> > perhaps because it extended a table and then write more than that to
> > it.  And in everything but a SSD or giant SAN cache situation, 1GB of
> > I/O is just too much to fsync at a time without the OS choking a
> > little on it.
> 
> Isn't this pretty much like tuning vm.dirty_bytes?  We generally set it
> to pretty low values, and seems to help to smoothen the checkpoints.
If done correctly/way much more invasive you could only issue sync_file_range's 
to the areas of the file where checkpointing needs to happen and you could 
leave out e.g. hint bit only changes. Which could help to reduce the cost of 
checkpoints.

Andres


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: checkpoint writeback via sync_file_range
Next
From: Andrew Dunstan
Date:
Subject: Re: JSON for PG 9.2