Thread: WIP: relation metapages
Here's a WIP patch implementing metapages for all relations, somewhat along lines previously discussed: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00860.php It turns out that doing this for indexes was pretty easy and didn't obviously break anything; doing it for heaps was harder and broke a lot of stuff. If you apply the patch as attached here, you'll find that we fail a whole bunch of regression tests, mostly due to plan changes. It seems that having N+1 pages in the heap changes the optimal way to do... everything. Of course, the extra page need not be included in seq-scans, so you'd think this was mostly a matter of adjusting the costing functions to reduce the number of pages by 1 for costing purposes. However, so far I haven't been able to hack the costing to make the plan changes go away, though, which may be a sign that I've broken something else. I can't seem to make Merge Append work at all, which is maybe a better sign that I've broken something. If you want to see the patch pass regression tests, hack heap_create_storage not to emit a metapage for heaps and all the regression test failures disappear. What I'm really looking for at this stage of the game is feedback on the design decisions I made. The intention here is that it should be possible to read old-format heaps and indexes transparently, but that when we create or rewrite a relation, we add a new-style metapage. For all index types except gist, this is really just a format change for the metapage that already existed: the new data that gets stored for all relation types is added at the beginning of the page, just following the page header, and then the AM-specific stuff is moved further down the page. For GiST, it means adding a metapage that wasn't there before, but that went smoothly too. For some AMs, I had to rejigger the WAL-logging a little; review of those changes would be good. The basic idea is that we don't want to have to try to reconstruct what the metapage should have been during recovery (indeed, we can't) so we just log an image of the page instead. For heaps, I refactored things so that heap_create() is no longer used for indexes. Instead, index_create() calls RelationBuildLocalRelation directly. This required moving a little bit of logic from heap_create() into RelationBuildLocalRelation(), but it seems like it may fit better there anyway. That means that heap_create() can now assume that it's creating a heap and not an index. This refactoring might be worth pulling out of the patch and committing separately, since I think the result is actually simpler and cleaner than what we're doing now; but it's a minor point in any case. I put the new metapage code in src/backend/access/common/metapage.c, but I don't have a lot of confidence that that's the appropriate location for it. Suggestions are appreciated. I am pretty sure that clustering a relation will cause it to end up with the wrong relation ID in its metapage afterwards. Since nothing relies on that information at this point, this shouldn't break anything, but it needs to be fixed eventually. I think the thing I'm most worried about is the plan changes that result from adding heap metapages. Suggestions on what to do about that from a costing perspective would be particularly appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 14.06.2012 18:01, Robert Haas wrote: > What I'm really looking for at this stage of the game is feedback on > the design decisions I made. The intention here is that it should be > possible to read old-format heaps and indexes transparently, but that > when we create or rewrite a relation, we add a new-style metapage. That dodges the problem of pg_upgrade, but eventually, we will want to use the metapage for something important, and it can't be optional at that point anymore. So we will eventually need to deal with pg_upgrade somehow. > I put the new metapage code in src/backend/access/common/metapage.c, > but I don't have a lot of confidence that that's the appropriate > location for it. Suggestions are appreciated. That seems good to me. It would be nice to have the oid of the access method in the metapage (or some other way to identify it). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jun 14, 2012 at 5:34 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14.06.2012 18:01, Robert Haas wrote: >> What I'm really looking for at this stage of the game is feedback on >> the design decisions I made. The intention here is that it should be >> possible to read old-format heaps and indexes transparently, but that >> when we create or rewrite a relation, we add a new-style metapage. > > That dodges the problem of pg_upgrade, but eventually, we will want to use > the metapage for something important, and it can't be optional at that point > anymore. So we will eventually need to deal with pg_upgrade somehow. Well, the code as I've written deals with pg_upgrade just fine: you can move your old relation files over and they still work. What's missing at present is an efficient way to convert them to the new format. If you don't mind the inefficiency, you can use VACUUM FULL or CLUSTER, and you're there, but obviously we'll want something better before we start relying on this for anything too critical. For indexes, it should be pretty trivial to reduce the requirement from "rewrite while holding AccessExclusiveLock" to "brief AccessExclusiveLock". Everything except GiST already has a metapage, so you just need to rewrite that page in the new format, which is a SMOP. GiST requires moving the existing metapage out to a free page (or a new page) and writing a metapage pointing to it into block 0, which is again pretty simple. Heaps are a bit harder. We could adopt your idea of storing a block number in the metablock; any index TIDs that point to block 0 will be interpreted as pointing to that block instead. Then we can basically just relocate block to any free block, or a new one, as with GiST. Alternatively, we can read the tuples in block 0 and reinsert them; vacuum; and then repurpose block 0. Taking it even further, we could try to do better than "brief AccessExclusiveLock". That might be possible too, especially for indexes, but I'm not sure that it's necessary, and I haven't thought through the details. Even if we just get it down to "brief AccessExclusiveLock", I think we might also be able to improve the experience by making autovacuum do conversions automatically. So, if you pg_upgrade, the first autovacuum worker that spins through the database will ConditionalLockAcquire an AccessExclusiveLock on each relation in turn and try to do the conversion. If it can't get the lock it'll keep retrying until it succeeds. Odds are good that for most people this would make the addition of the metapage completely transparent. On the other hand, if metapages exist mostly to support operations like making an unlogged table logged or visca versa, that's not really necessary: we can add the metapage when someone performs a DDL operation that requires it. There is a lot of optimization possible on top of the basic mechanism, but how much of it makes sense to do, and which parts, depends on exactly which of the many things we could do with this we end up deciding to actually do. I'm trying to start with the basics, which means getting the basic infrastructure in place and working. > It would be nice to have the oid of the access method in the metapage (or > some other way to identify it). Yeah, I was thinking about that. I think a magic number might be preferable to an OID, and we actually already have that as the first 4-bytes of the access method metadata for all index types except GIN. I'm thinking about trying to fix up GIN so that it adds one as well; the trick is to do it in a way that is backward-compatible, which I have an idea how to do but haven't tackled yet. We can add a magic number for heaps as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, I tried to perform a submission review of your relation metapages patch, but it does not apply cleanly to the current master (fa188b5). I've attached the rejects file for your review. Regards, Albert
Attachment
Hello, I tried to perform a submission review of your relation metapages patch, but it does not apply cleanly to the current master (fa188b5). I've attached the rejects file for your review. Regards, Albert