Home > mailing lists

Re: sync_file_range() - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: sync_file_range()
Date	June 20, 2006 01:35:46
Msg-id	23179.1150767330@sss.pgh.pa.us Whole thread Raw
In response to	Re: sync_file_range() (Greg Stark <gsstark@mit.edu>)
Responses	Re: sync_file_range()
List	pgsql-hackers

Tree view

Greg Stark <gsstark@mit.edu> writes:
> Come to think of it I wonder whether there's anything to be gained by using
> smaller files for tables. Instead of 1G files maybe 256M files or something
> like that to reduce the hit of fsyncing a file.

Actually probably not.  The weak part of our current approach is that we
tell the kernel "sync this file", then "sync that file", etc, in a more
or less random order.  This leads to a probably non-optimal sequence of
disk accesses to complete a checkpoint.  What we would really like is a
way to tell the kernel "sync all these files, and let me know when
you're done" --- then the kernel and hardware have some shot at
scheduling all the writes in an intelligent fashion.

sync_file_range() is not that exactly, but since it lets you request
syncing and then go back and wait for the syncs later, we could get the
desired effect with two passes over the file list.  (If the file list
is longer than our allowed number of open files, though, the extra
opens/closes could hurt.)

Smaller files would make the I/O scheduling problem worse not better.
Indeed, I've been wondering lately if we shouldn't resurrect
LET_OS_MANAGE_FILESIZE and make that the default on systems with
largefile support.  If nothing else it would cut down on open/close
overhead on very large relations.
        regards, tom lane

pgsql-hackers by date:

From: Theo Schlossnagle
Date: 19 June 2006, 23:49:17
Subject: Re: Generic Monitoring Framework Proposal

From: "Qingqing Zhou"
Date: 20 June 2006, 01:57:20
Subject: shall we have a TRACE_MEMORY mode

Re: sync_file_range() - Mailing list pgsql-hackers

Previous

Next