Re: Logical to physical page mapping - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Logical to physical page mapping
Date
Msg-id 508C2AEF.1040004@vmware.com
Whole thread Raw
In response to Re: Logical to physical page mapping  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Logical to physical page mapping  (Claudio Freire <klaussfreire@gmail.com>)
Re: Logical to physical page mapping  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Re: Logical to physical page mapping  (Greg Stark <stark@mit.edu>)
Re: Logical to physical page mapping  (Jan Wieck <JanWieck@Yahoo.com>)
List pgsql-hackers
On 27.10.2012 16:43, Tom Lane wrote:
> Jan Wieck<JanWieck@Yahoo.com>  writes:
>> The reason why we need full_page_writes is that we need to guard against
>> torn pages or partial writes. So what if smgr would manage a mapping
>> between logical page numbers and their physical location in the relation?
>
>> At the moment where we today require a full page write into WAL, we
>> would mark the buffer as "needs relocation". The smgr would then write
>> this page into another physical location whenever it is time to write it
>> (via the background writer, hopefully). After that page is flushed, it
>> would update the page location pointer, or whatever we want to call it.
>> A thus free'd physical page location can be reused, once the location
>> pointer has been flushed to disk. This is a critical ordering of writes.
>> First the page at the new location, second the pointer to the current
>> location. Doing so would make write(2) appear atomic to us, which is
>> exactly what we need for crash recovery.

Hmm, aka copy-on-write.

> I think you're just moving the atomic-write problem from the data pages
> to wherever you keep these pointers.

If the pointers are stored as simple 4-byte integers, you probably could 
assume that they're atomic, and won't be torn.

There's a lot of practical problems in adding another level of 
indirection to every page access, though. It'll surely add some overhead 
to every access, even if the data never changes. And it's not at all 
clear to me that it would perform better than full_page_writes. You're 
writing and flushing out roughly the same amount of data AFAICS.

What exactly is the problem with full_page_writes that we're trying to 
solve?

- Heikki



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Next
From: Heikki Linnakangas
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation