Thread: pg_fallocate
Hi,
@font-face { font-family: "Times";
}@font-face { font-family: "MS 明朝";
}@font-face { font-family: "Century";
}@font-face { font-family: "Century";
}@font-face { font-family: "@MS 明朝";
}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0mm 0mm 0.0001pt; text-align: justify; font-size: 12pt; font-family: Century; }.MsoChpDefault { font-family: Century; }div.WordSection1 { page: WordSec@font-face { font-family: "Times";
}@font-face { font-family: "MS 明朝";
}@font-face { font-family: "Century";
}@font-face { font-family: "Cambria Math";
}@font-face { font-family: "@MS 明朝";
}p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0mm 0mm 0.0001pt; text-align: justify; font-size: 12pt; font-family: Century; }.MsoChpDefault { font-family: Century; }div.WordSection1 { page: WordSection1; }I'l like to add fallocate() system call to improve sequential read/write peformance. fallocate() system call is different from posix_fallocate() that is zero-fille algorithm to reserve continues disk space. fallocate() is almost less overhead alogotithm to reserve continues disk space than posix_fallocate().
It will be needed by sorted checkpoint and more faster vacuum command in near the future.
If you get more detail information, please see linux manual.
I go sight seeing in Dublin with Ishii-san now:-)
Regards,
--
Mitsumasa KONDO
NTT Open Source Software
Attachment
On Thu, Oct 31, 2013 at 9:16 AM, Mitsumasa KONDO <kondo.mitsumasa@gmail.com> wrote: > I'l like to add fallocate() system call to improve sequential read/write > peformance. fallocate() system call is different from posix_fallocate() that > is zero-fille algorithm to reserve continues disk space. fallocate() is > almost less overhead alogotithm to reserve continues disk space than > posix_fallocate(). > > It will be needed by sorted checkpoint and more faster vacuum command in > near the future. > > > If you get more detail information, please see linux manual. > > I go sight seeing in Dublin with Ishii-san now:-) Our last attempts to improve performance in this area died in a fire when it turned out that code that should have been an improvement fell down over inexplicable ext4 behavior. I think, therefore, that extensive benchmarking of this or any other proposed approach is absolutely essential. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/31/13, 9:16 AM, Mitsumasa KONDO wrote: > I'l like to add fallocate() system call to improve sequential read/write > peformance. fallocate() system call is different from posix_fallocate() > that is zero-fille algorithm to reserve continues disk space. > fallocate() is almost less overhead alogotithm to reserve continues disk > space than posix_fallocate(). Your patch seems to be missing a bit that defines HAVE_FALLOCATE, probably something in configure.in.
On Thu, Oct 31, 2013 at 01:16:44PM +0000, Mitsumasa KONDO wrote: > --- a/src/backend/storage/file/fd.c > +++ b/src/backend/storage/file/fd.c > @@ -383,6 +383,21 @@ pg_flush_data(int fd, off_t offset, off_t amount) > return 0; > } > > +/* > + * pg_fallocate --- advise OS that the data pre-allocate continus file segments > + * in physical disk. > + * > + * Not all platforms have fallocate. Some platforms only have posix_fallocate, > + * but it ped zero fill to get pre-allocate file segmnets. It is not good > + * peformance when extend new segmnets, so we don't use posix_fallocate. > + */ > +int > +pg_fallocate(File file, int flags, off_t offset, off_t nbytes) > +{ > +#if defined(HAVE_FALLOCATE) > + return fallocate(VfdCache[file].fd, flags, offset, nbytes); > +#endif > +} You should set errno to ENOSYS and return -1 if HAVE_FALLOCATE isn't defined. > --- a/src/backend/storage/smgr/md.c > +++ b/src/backend/storage/smgr/md.c > @@ -24,6 +24,7 @@ > #include <unistd.h> > #include <fcntl.h> > #include <sys/file.h> > +#include <linux/falloc.h> This would have to be wrapped in #ifdef HAVE_FALLOCATE or HAVE_LINUX_FALLOC_H; if you want to create a wrapper around fallocate() you should add PG defines for the flags, too. Otherwise it's probably easier to just call fallocate() directly inside an #ifdef block as you did in xlog.c. > @@ -510,6 +511,10 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, > * if bufmgr.c had to dump another buffer of the same file to make room > * for the new page's buffer. > */ > + > + if(forknum == 1) > + pg_fallocate(v->mdfd_vfd, FALLOC_FL_KEEP_SIZE, 0, RELSEG_SIZE); > + Return value should be checked; if it's -1 and errno is something else than ENOSYS or EOPNOTSUPP the disk space allocation failed and you must return an error. / Oskari