Re: 9.4 regression - Mailing list pgsql-hackers

From Andres Freund
Subject Re: 9.4 regression
Date
Msg-id 20130819194910.GF26775@awork2.anarazel.de
Whole thread Raw
In response to Re: 9.4 regression  (Jon Nelson <jnelson+pgsql@jamponi.net>)
Responses Re: 9.4 regression  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 2013-08-19 14:40:07 -0500, Jon Nelson wrote:
> On Fri, Aug 16, 2013 at 3:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Aug 15, 2013 at 12:08:57PM -0500, Jon Nelson wrote:
> >> > Where are we on this issue?
> >>
> >> I've been able to replicate it pretty easily with PostgreSQL and
> >> continue to look into it. I've contacted Theodore Ts'o and have gotten
> >> some useful information, however I'm unable to replicate the behavior
> >> with the test program (even one that's been modified). What I've
> >> learned is:
> >>
> >> - XLogWrite appears to take approx. 2.5 times longer when writing to a
> >> file allocated with posix_fallocate, but only the first time the file
> >> contents are overwritten. This is partially explained by how ext4
> >> handles extents and uninitialized data, but 2.5x is MUCH more
> >> expensive than anticipated or expected here.
> >> - Writing zeroes to a file allocated with posix_fallocate (essentially
> >> adding a posix_fallocate step before the usual write-zeroes-in-a-loop
> >> approach) not only doesn't seem to hurt performance, it seems to help
> >> or at least have parity, *and* the space is guaranteed to exist on
> >> disk. At the very least that seems useful.
> >
> > Is it time to revert this patch until we know more?
> 
> While I'm not qualified to say, my inclination is to say yes. It can
> always be added back later. The only caveat there would be that -
> perhaps - a small modification of the patch would be warranted.
> Specifically, with with posix_fallocate, I saw no undesirable behavior
> when the (newly allocated) file was manually zeroed anyway. The only
> advantages (that I can see) to doing it this way versus not using
> posix_fallocate at all is (a) a potential reduction in the number of
> extents

I vote for adapting the patch to additionally zero out the file via
write(). In your tests that seemed to perform at least as good as the
old method... It also has the advantage that we can use it a littlebit
more as a testbed for possibly using it for heap extensions one day.
We're pretty early in the cycle, so I am not worried about this too much...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Jon Nelson
Date:
Subject: Re: 9.4 regression
Next
From: Josh Berkus
Date:
Subject: Re: danger of stats_temp_directory = /dev/shm