Re: 9.4 regression - Mailing list pgsql-hackers

From Jon Nelson
Subject Re: 9.4 regression
Date
Msg-id CAKuK5J0cG43gy2xOD5=GOhdBKsJB-0XzwOPZfYYZeSG3pWnHOA@mail.gmail.com
Whole thread Raw
In response to Re: 9.4 regression  (Bruce Momjian <bruce@momjian.us>)
Responses Re: 9.4 regression  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Fri, Aug 16, 2013 at 3:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug 15, 2013 at 12:08:57PM -0500, Jon Nelson wrote:
>> > Where are we on this issue?
>>
>> I've been able to replicate it pretty easily with PostgreSQL and
>> continue to look into it. I've contacted Theodore Ts'o and have gotten
>> some useful information, however I'm unable to replicate the behavior
>> with the test program (even one that's been modified). What I've
>> learned is:
>>
>> - XLogWrite appears to take approx. 2.5 times longer when writing to a
>> file allocated with posix_fallocate, but only the first time the file
>> contents are overwritten. This is partially explained by how ext4
>> handles extents and uninitialized data, but 2.5x is MUCH more
>> expensive than anticipated or expected here.
>> - Writing zeroes to a file allocated with posix_fallocate (essentially
>> adding a posix_fallocate step before the usual write-zeroes-in-a-loop
>> approach) not only doesn't seem to hurt performance, it seems to help
>> or at least have parity, *and* the space is guaranteed to exist on
>> disk. At the very least that seems useful.
>
> Is it time to revert this patch until we know more?

While I'm not qualified to say, my inclination is to say yes. It can
always be added back later. The only caveat there would be that -
perhaps - a small modification of the patch would be warranted.
Specifically, with with posix_fallocate, I saw no undesirable behavior
when the (newly allocated) file was manually zeroed anyway. The only
advantages (that I can see) to doing it this way versus not using
posix_fallocate at all is (a) a potential reduction in the number of
extents and (b) the space is guaranteed to be on disk if
posix_fallocate succeeds. My reading of the patch is that even if
posix_fallocate fails due to out of space conditions, we will still
try to create the file by writing out zeroes, so perhaps the
out-of-disk-space scenario isn't all that useful anyway.

I'm awaiting more information from Theodore Ts'o, but I don't expect
things to materially change in the near future.


-- 
Jon



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: danger of stats_temp_directory = /dev/shm
Next
From: Andres Freund
Date:
Subject: Re: 9.4 regression