Logical to physical page mapping - Mailing list pgsql-hackers

From Jan Wieck
Subject Logical to physical page mapping
Date
Msg-id 508B6ABE.2030801@Yahoo.com
Whole thread Raw
Responses Re: Logical to physical page mapping  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Logical to physical page mapping  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I just had this thought a few minutes ago, discussed it briefly with 
RhodiumToad on #postgresql and wanted to put it out here for discussion. 
Feel free to rip it apart. It probably is a bit "al-dente" at this point 
and needs more cooking.

The reason why we need full_page_writes is that we need to guard against 
torn pages or partial writes. So what if smgr would manage a mapping 
between logical page numbers and their physical location in the relation?

At the moment where we today require a full page write into WAL, we 
would mark the buffer as "needs relocation". The smgr would then write 
this page into another physical location whenever it is time to write it 
(via the background writer, hopefully). After that page is flushed, it 
would update the page location pointer, or whatever we want to call it. 
A thus free'd physical page location can be reused, once the location 
pointer has been flushed to disk. This is a critical ordering of writes. 
First the page at the new location, second the pointer to the current 
location. Doing so would make write(2) appear atomic to us, which is 
exactly what we need for crash recovery.

In addition to that, vacuum would now be able to tell smgr "hey, this 
page is completely empty". Instead of doing the second "empty page for 
truncate" scan, smgr could slowly migrate pages on first touch after a 
checkpoint towards the head of the file, into these empty pages. This 
way it would free pages at the end and now smgr is completely at liberty 
to truncate them off whenever it sees fit. No extra scan require, just a 
little more bookkeeping. This would not only be the case for heap pages, 
but for empty index pages as well. Shrinking/truncating indexes is 
something, we are completely unable to do today. Whenever the buffer 
manager is asked for such a page that doesn't exist physically any more, 
it would just initialize an empty one of that kind (heap/index) in a 
buffer and mark it "needs relocation". It would get recreated physically 
on eviction/checkpoint without freeing any previously occupied space.


Comments?
Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: splitting *_desc routines
Next
From: Jan Wieck
Date:
Subject: Logical to physical page mapping