WIP: relation metapages - Mailing list pgsql-hackers

From Robert Haas
Subject WIP: relation metapages
Date
Msg-id CA+Tgmoa13Ou22KU5bYT6hwArqH=cXRNEph4qyOfaQ6qqM4JfbQ@mail.gmail.com
Whole thread Raw
Responses Re: WIP: relation metapages
List pgsql-hackers
Here's a WIP patch implementing metapages for all relations, somewhat
along lines previously discussed:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00860.php

It turns out that doing this for indexes was pretty easy and didn't
obviously break anything; doing it for heaps was harder and broke a
lot of stuff.  If you apply the patch as attached here, you'll find
that we fail a whole bunch of regression tests, mostly due to plan
changes.  It seems that having N+1 pages in the heap changes the
optimal way to do... everything.  Of course, the extra page need not
be included in seq-scans, so you'd think this was mostly a matter of
adjusting the costing functions to reduce the number of pages by 1 for
costing purposes.  However, so far I haven't been able to hack the
costing to make the plan changes go away, though, which may be a sign
that I've broken something else.  I can't seem to make Merge Append
work at all, which is maybe a better sign that I've broken something.
If you want to see the patch pass regression tests, hack
heap_create_storage not to emit a metapage for heaps and all the
regression test failures disappear.

What I'm really looking for at this stage of the game is feedback on
the design decisions I made.  The intention here is that it should be
possible to read old-format heaps and indexes transparently, but that
when we create or rewrite a relation, we add a new-style metapage.
For all index types except gist, this is really just a format change
for the metapage that already existed: the new data that gets stored
for all relation types is added at the beginning of the page, just
following the page header, and then the AM-specific stuff is moved
further down the page.  For GiST, it means adding a metapage that
wasn't there before, but that went smoothly too.  For some AMs, I had
to rejigger the WAL-logging a little; review of those changes would be
good.  The basic idea is that we don't want to have to try to
reconstruct what the metapage should have been during recovery
(indeed, we can't) so we just log an image of the page instead.

For heaps, I refactored things so that heap_create() is no longer used
for indexes.  Instead, index_create() calls RelationBuildLocalRelation
directly.  This required moving a little bit of logic from
heap_create() into RelationBuildLocalRelation(), but it seems like it
may fit better there anyway.  That means that heap_create() can now
assume that it's creating a heap and not an index.  This refactoring
might be worth pulling out of the patch and committing separately,
since I think the result is actually simpler and cleaner than what
we're doing now; but it's a minor point in any case.

I put the new metapage code in src/backend/access/common/metapage.c,
but I don't have a lot of confidence that that's the appropriate
location for it.  Suggestions are appreciated.

I am pretty sure that clustering a relation will cause it to end up
with the wrong relation ID in its metapage afterwards.  Since nothing
relies on that information at this point, this shouldn't break
anything, but it needs to be fixed eventually.

I think the thing I'm most worried about is the plan changes that
result from adding heap metapages.  Suggestions on what to do about
that from a costing perspective would be particularly appreciated.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Phil Sorber
Date:
Subject: Re: libpq compression
Next
From: Robert Haas
Date:
Subject: Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)