Re: Maximum transaction rate - Mailing list pgsql-general

From Marco Colombo
Subject Re: Maximum transaction rate
Date
Msg-id 49C04766.1060503@esiway.net
Whole thread Raw
In response to Re: Maximum transaction rate  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
Responses Re: Maximum transaction rate  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List pgsql-general
Ron Mayer wrote:
> Greg Smith wrote:
>> There are some known limitations to Linux fsync that I remain somewhat
>> concerned about, independantly of LVM, like "ext3 fsync() only does a
>> journal commit when the inode has changed" (see
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/990504 ).  The
>> way files are preallocated, the PostgreSQL WAL is supposed to function
>> just fine even if you're using fdatasync after WAL writes, which also
>> wouldn't touch the journal (last time I checked fdatasync was
>> implemented as a full fsync on Linux).  Since the new ext4 is more
>
> Indeed it does.
>
> I wonder if there should be an optional fsync mode
> in postgres should turn fsync() into
>     fchmod (fd, 0644); fchmod (fd, 0664);
> to work around this issue.

Question is... why do you care if the journal is not flushed on fsync?
Only the file data blocks need to be, if the inode is unchanged.

> For example this program below will show one write
> per disk revolution if you leave the fchmod() in there,
> and run many times faster (i.e. lying) if you remove it.
> This with ext3 on a standard IDE drive with the write
> cache enabled, and no LVM or anything between them.
>
> ==========================================================
> /*
> ** based on http://article.gmane.org/gmane.linux.file-systems/21373
> ** http://thread.gmane.org/gmane.linux.kernel/646040
> */
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(int argc,char *argv[]) {
>   if (argc<2) {
>     printf("usage: fs <filename>\n");
>     exit(1);
>   }
>   int fd = open (argv[1], O_RDWR | O_CREAT | O_TRUNC, 0666);
>   int i;
>   for (i=0;i<100;i++) {
>     char byte;
>     pwrite (fd, &byte, 1, 0);
>     fchmod (fd, 0644); fchmod (fd, 0664);
>     fsync (fd);
>   }
> }
> ==========================================================
>

I ran the program above, w/o the fchmod()s.

$ time ./test2 testfile

real    0m0.056s
user    0m0.001s
sys     0m0.008s

This is with ext3+LVM+raid1+sata disks with hdparm -W1.
With -W0 I get:

$ time ./test2 testfile

real    0m1.014s
user    0m0.000s
sys     0m0.008s

Big difference. The fsync() there does its job.

The same program runs with a x3 slowdown with the fsyncs, but that's
expected, it's doing twice the writes, and in different places.

.TM.

pgsql-general by date:

Previous
From: Jack W
Date:
Subject: Question about Warm Standby
Next
From: Greg Smith
Date:
Subject: Re: Maximum transaction rate