I like the idea too, but I think there are some major problems to
solve. In particular I think we need a better solution to blocks
growing than sparse files.
The main problem with using sparse files is that currently postgres is
careful to allocate blocks early so it can fail if there's not enough
space. With your sparse file solution Postgres might only find out
there's no space after it has already committed a transaction.
bgwriter has no good course of action to take if it finds out there's
nowhere to put the data it has in shared buffers.
But I think even if you solve that it's not really a good long-term
solution. We don't know how the OS handles block allocation for this
type of file. I'm actually moderately surprised it isn't skipping
enough blocks assuming you'll allocate them eventually. Even if it
does handle it the way you expect what happens when you do grow a
block, it'll have to allocate it way out of the way and we have no way
to repair that discontinuity later.
Also, the way you've prellocated blocks effectively nails the maximum
compression at 2x. That seems to be leaving a lot of money on the
table.
To handle read-write tables I think we would need to directly
implement the kind of indirection layer that you're getting out of the
filesystem's block layer currently. That would let you allocate enough
blocks to hold the data uncompressed and then free up those blocks
once you're sure the data is compressible.
One possibility is to handle only read-only tables. That would make
things a *lot* simpler. But it sure would be inconvenient if it's only
useful on large static tables but requires you to rewrite the whole
table -- just what you don't want to do with large static tables -- to
get the benefit.