Re: WIP: relation metapages - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP: relation metapages |
Date | |
Msg-id | CA+TgmoazFS8kvpYBCgLpcUSSC+4cC-Ezv9mbjea2ne7RHvL1SA@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: relation metapages (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
List | pgsql-hackers |
On Thu, Jun 14, 2012 at 5:34 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14.06.2012 18:01, Robert Haas wrote: >> What I'm really looking for at this stage of the game is feedback on >> the design decisions I made. The intention here is that it should be >> possible to read old-format heaps and indexes transparently, but that >> when we create or rewrite a relation, we add a new-style metapage. > > That dodges the problem of pg_upgrade, but eventually, we will want to use > the metapage for something important, and it can't be optional at that point > anymore. So we will eventually need to deal with pg_upgrade somehow. Well, the code as I've written deals with pg_upgrade just fine: you can move your old relation files over and they still work. What's missing at present is an efficient way to convert them to the new format. If you don't mind the inefficiency, you can use VACUUM FULL or CLUSTER, and you're there, but obviously we'll want something better before we start relying on this for anything too critical. For indexes, it should be pretty trivial to reduce the requirement from "rewrite while holding AccessExclusiveLock" to "brief AccessExclusiveLock". Everything except GiST already has a metapage, so you just need to rewrite that page in the new format, which is a SMOP. GiST requires moving the existing metapage out to a free page (or a new page) and writing a metapage pointing to it into block 0, which is again pretty simple. Heaps are a bit harder. We could adopt your idea of storing a block number in the metablock; any index TIDs that point to block 0 will be interpreted as pointing to that block instead. Then we can basically just relocate block to any free block, or a new one, as with GiST. Alternatively, we can read the tuples in block 0 and reinsert them; vacuum; and then repurpose block 0. Taking it even further, we could try to do better than "brief AccessExclusiveLock". That might be possible too, especially for indexes, but I'm not sure that it's necessary, and I haven't thought through the details. Even if we just get it down to "brief AccessExclusiveLock", I think we might also be able to improve the experience by making autovacuum do conversions automatically. So, if you pg_upgrade, the first autovacuum worker that spins through the database will ConditionalLockAcquire an AccessExclusiveLock on each relation in turn and try to do the conversion. If it can't get the lock it'll keep retrying until it succeeds. Odds are good that for most people this would make the addition of the metapage completely transparent. On the other hand, if metapages exist mostly to support operations like making an unlogged table logged or visca versa, that's not really necessary: we can add the metapage when someone performs a DDL operation that requires it. There is a lot of optimization possible on top of the basic mechanism, but how much of it makes sense to do, and which parts, depends on exactly which of the many things we could do with this we end up deciding to actually do. I'm trying to start with the basics, which means getting the basic infrastructure in place and working. > It would be nice to have the oid of the access method in the metapage (or > some other way to identify it). Yeah, I was thinking about that. I think a magic number might be preferable to an OID, and we actually already have that as the first 4-bytes of the access method metadata for all index types except GIN. I'm thinking about trying to fix up GIN so that it adds one as well; the trick is to do it in a way that is backward-compatible, which I have an idea how to do but haven't tackled yet. We can add a magic number for heaps as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: