On 28/10/12 07:41, Heikki Linnakangas wrote:
> On 27.10.2012 16:43, Tom Lane wrote:
>> Jan Wieck<JanWieck@Yahoo.com> writes:
>>> The reason why we need full_page_writes is that we need to guard
>>> against
>>> torn pages or partial writes. So what if smgr would manage a mapping
>>> between logical page numbers and their physical location in the
>>> relation?
>>
>>> At the moment where we today require a full page write into WAL, we
>>> would mark the buffer as "needs relocation". The smgr would then write
>>> this page into another physical location whenever it is time to
>>> write it
>>> (via the background writer, hopefully). After that page is flushed, it
>>> would update the page location pointer, or whatever we want to call it.
>>> A thus free'd physical page location can be reused, once the location
>>> pointer has been flushed to disk. This is a critical ordering of
>>> writes.
>>> First the page at the new location, second the pointer to the current
>>> location. Doing so would make write(2) appear atomic to us, which is
>>> exactly what we need for crash recovery.
>
> Hmm, aka copy-on-write.
>
>> I think you're just moving the atomic-write problem from the data pages
>> to wherever you keep these pointers.
>
> If the pointers are stored as simple 4-byte integers, you probably
> could assume that they're atomic, and won't be torn.
>
> There's a lot of practical problems in adding another level of
> indirection to every page access, though. It'll surely add some
> overhead to every access, even if the data never changes. And it's not
> at all clear to me that it would perform better than full_page_writes.
> You're writing and flushing out roughly the same amount of data AFAICS.
>
> What exactly is the problem with full_page_writes that we're trying to
> solve?
>
> - Heikki
>
>
Would a 4 byte pointer be adequate for a 64 bit machine with well over
4GB used by Postgres?
Cheers,
Gavin