Re: Rewriting Free Space Map - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Rewriting Free Space Map
Date
Msg-id 47DEC5E2.6090103@enterprisedb.com
Whole thread Raw
In response to Re: Rewriting Free Space Map  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Rewriting Free Space Map  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-hackers
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> You're cavalierly waving away a whole boatload of problems that will
>>> arise as soon as you start trying to make the index AMs play along
>>> with this :-(.  
> 
>> It doesn't seem very hard.
> 
> The problem is that the index AMs are no longer in control of what goes
> where within their indexes, which has always been their prerogative to
> determine.  The fact that you think you can kluge btree to still work
> doesn't mean that it will work for other AMs.

Well, it does work with all the existing AMs AFAICS. I do agree with the 
general point; it'd certainly be cleaner, more modular and more flexible 
if the AMs didn't need to know about the existence of the maps.

>>> The idea that's becoming attractive to me while contemplating the
>>> multiple-maps problem is that we should adopt something similar to
>>> the old Mac OS idea of multiple "forks" in a relation.
> 
>> Hmm. You also need to teach at least xlog.c and xlogutils.c about the 
>> map forks, for full page images and the invalid page tracking.
> 
> Well, you'd have to teach them something anyway, for any incarnation
> of maps that they might need to update.

Umm, the WAL code doesn't care where the pages it operates on came from. 
Sure, we'll need rmgr-specific code that know what to do with the maps, 
but the full page image code would work without changes with the 
multiple RelFileNode approach.

The essential change with the map fork idea is that a RelFileNode no 
longer uniquely identifies a file on disk (ignoring the segmentation 
which is handled in smgr for now). Anything that operates on 
RelFileNodes, without any higher level information of what it is, needs 
to be modified to use RelFileNode+forkid instead. That includes at least 
the buffer manager, smgr, and the full page image code in xlog.c.

It's probably a pretty mechanical change, even though it affects a lot 
of code. We'd probably want to have a new struct, let's call it 
PhysFileId for now, for RelFileNode+forkid, and basically replace all 
occurrences of RelFileNode with PhysFileId in smgr, bufmgr and xlog code.

>> I also wonder what the performance impact of extending BufferTag is.
> 
> That's a fair objection, and obviously something we'd need to check.
> But I don't recall seeing hash_any so high on any profile that I think
> it'd be a big problem.

I do remember seeing hash_any in some oprofile runs. But that's fairly 
easy to test: we don't need to actually implement any of the stuff, 
other than add a field to BufferTag, and run pgbench.

>> My original thought was to have a separate RelFileNode for each of the 
>> maps. That would require no smgr or xlog changes, and not very many 
>> changes in the buffer manager, though I guess you'd more catalog 
>> changes. You had doubts about that on the previous thread 
>> (http://archives.postgresql.org/pgsql-hackers/2007-11/msg00204.php), but 
>> the "map forks" idea certainly seems much more invasive than that.
> 
> The main problems with that are (a) the need to expose every type of map
> in pg_class and (b) the need to pass all those relfilenode numbers down
> to pretty low levels of the system. 

(a) is certainly a valid point. Regarding (b), I don't think the low 
level stuff (I assume you mean smgr, bufmgr, bgwriter, xlog by that) 
would need to be passed any additional relfilenode numbers. Or rather, 
they already work with relfilenodes, and they don't need to know whether 
the relfilenode is for an index, a heap, or an FSM attached to something 
else. The relfilenodes would be in RelationData, and we already have 
that around whenever we do anything that needs to differentiate between 
those.

Another consideration is which approach is easiest to debug. The "map 
fork" approach seems better on that front, as you can immediately see 
from the PhysFileId if a page is coming from an auxiliary map or the 
main data portion. That might turn out to be handy in the buffer manager 
or bgwriter as well; they don't currently have any knowledge of what a 
page contains.

> The nice thing about the fork idea
> is that you don't need any added info to uniquely identify what relation
> you're working on.  The fork numbers would be hard-wired into whatever
> code needed to know about particular forks.  (Of course, these same
> advantages apply to using special space in an existing file.  I'm
> just suggesting that we can keep these advantages without buying into
> the restrictions that special space would have.)

I don't see that advantage. All the higher-level code that care which 
relation you're working on already have Relation around. All the 
lower-level stuff don't care.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Rewriting Free Space Map
Next
From: "Dawid Kuroczko"
Date:
Subject: Re: Rewriting Free Space Map