Re: heap metapages - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: heap metapages |
Date | |
Msg-id | CA+TgmoaTaW9+OD2V8caMQ21rKdSAVYfFDk8mOXj-wnfNjOAfOQ@mail.gmail.com Whole thread Raw |
In response to | Re: heap metapages (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: heap metapages
|
List | pgsql-hackers |
On Mon, May 21, 2012 at 2:22 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, May 21, 2012 at 12:56 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> At dinner on Friday night at PGCon, the end of the table that included >> Tom Lane, Stephen Frost, and myself got to talking about the idea of >> including some kind of metapage in every relation, including heap >> relations. At least some index relations already have something like >> this (cf _bt_initmetapage, _hash_metapinit). I believe that adding >> this for all relations, including heaps, would allow us to make >> improvements in several areas. > > The first thing that jumps to mind is: why can't the metapage be > extended to span multiple pages if necessary? I've often wondered why > the visibility map isn't stored within the heap itself... Well, the idea of a metapage, almost by definition, is that it stores a small amount of information whose size is pretty much fixed and which can be reasonably anticipated to always fit in one page. If you're trying to store some data that can get bigger than that (or even, come close to filling that up), you need a different system. I'm anticipating that the amount of relation metadata we need to store will fit into a 512-byte sector with significant room left over, leaving us with the rest of the block for whatever we'd like to use it for (e.g. bits of the FSM or VM). If at some point in the future, we need some kind of relation-level metadata that can grow beyond a handful of bytes, we can either put it in its own fork, or store one or more block pointers in the metapage indicating the blocks where information is stored - but right now I'm not seeing the need for anything that fancy. Now, that having been said, I don't think there's any particular reason why we coudn't multiplex all the relation forks onto a single physical file if we were so inclined. The FSM and VM are small enough that interleaving them with the actual data probably wouldn't slow down seq scans materially. But on the other hand I am not sure that we'd gain much by it in general. I see the value of doing it for small relations: it saves inodes, potentially quite a lot of inodes if you're on a system that uses schemas to implement multi-tenancy. But it's not clear to me that it's worthwhile in general. Sticking all the FSM stuff in its own relation may allow the OS to lay out those pages physically closer to each other on disk, whereas interleaving them with the data blocks would probably give up that advantage, and it's not clear to me what we'd be getting in exchange. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: