On 27.10.2012 16:43, Tom Lane wrote:
> Jan Wieck<JanWieck@Yahoo.com> writes:
>> The reason why we need full_page_writes is that we need to guard against
>> torn pages or partial writes. So what if smgr would manage a mapping
>> between logical page numbers and their physical location in the relation?
>
>> At the moment where we today require a full page write into WAL, we
>> would mark the buffer as "needs relocation". The smgr would then write
>> this page into another physical location whenever it is time to write it
>> (via the background writer, hopefully). After that page is flushed, it
>> would update the page location pointer, or whatever we want to call it.
>> A thus free'd physical page location can be reused, once the location
>> pointer has been flushed to disk. This is a critical ordering of writes.
>> First the page at the new location, second the pointer to the current
>> location. Doing so would make write(2) appear atomic to us, which is
>> exactly what we need for crash recovery.
Hmm, aka copy-on-write.
> I think you're just moving the atomic-write problem from the data pages
> to wherever you keep these pointers.
If the pointers are stored as simple 4-byte integers, you probably could
assume that they're atomic, and won't be torn.
There's a lot of practical problems in adding another level of
indirection to every page access, though. It'll surely add some overhead
to every access, even if the data never changes. And it's not at all
clear to me that it would perform better than full_page_writes. You're
writing and flushing out roughly the same amount of data AFAICS.
What exactly is the problem with full_page_writes that we're trying to
solve?
- Heikki