Re: fallocate / posix_fallocate for new WAL file creation (etc...) - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date
Msg-id 20130529143607.GD6434@tamriel.snowman.net
Whole thread Raw
In response to Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: fallocate / posix_fallocate for new WAL file creation (etc...)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
* Peter Eisentraut (peter_e@gmx.net) wrote:
> On 5/28/13 11:36 AM, Greg Smith wrote:
> > Outside of the run for performance testing, I think it would be good at
> > this point to validate that there is really a 16MB file full of zeroes
> > resulting from these operations.  I am not really concerned that
> > posix_fallocate might be slower in some cases; that seems unlikely.  I
> > am concerned that it might result in a file that isn't structurally the
> > same as the 16MB of zero writes implementation used now.
>
> I see nothing in the posix_fallocate() man pages that says that the
> allocated space is filled with any kind of data or zeroes.  It will
> likely be garbage data, but that should be fine for a new WAL file.

I *really* hope that the Linux kernel, and other, folks are smart enough
to realize that they can't just re-use random blocks from an I/O device
without cleaning it first.  That would be one massive security hole.  I
expect posix_fallocate() actually works more like spase files, except
that it also counts the space as being 'taken', but it doesn't go out
and actually pull blocks to use until you actually go to write to it.
At which point, perhaps there's an optimization that says "if the first
thing done with this is writing, then just write out whatever data is
requested and then fill the rest of the block out with zeros", and a
similar read operation which says "if we havn't formally assigned a
block for this, just return zeros".  Hopefully it's smart enough to
avoid writing out all zeros and then turning around and writing out
whatever data is given, though since it'd all be in memory, perhaps
that wouldn't be too bad and might be simpler to implement.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: pg_dump with postgis extension dumps rules separately
Next
From: Dimitri Fontaine
Date:
Subject: Re: Patch to .gitignore