Re: Relation forks & FSM rewrite patches - Mailing list pgsql-patches

From Tom Lane
Subject Re: Relation forks & FSM rewrite patches
Date
Msg-id 22640.1215887277@sss.pgh.pa.us
Whole thread Raw
In response to Relation forks & FSM rewrite patches  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: Relation forks & FSM rewrite patches
List pgsql-patches
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Here's an updated version of the "relation forks" patch, and an
> incremental FSM rewrite patch on top of that. The relation forks patch
> is ready for review. The FSM implementation is more work-in-progress
> still, but I'd like to get some review on that as well, with the goal of
> doing more performance testing and committing it after the commit fest.

I looked through the "relation forks" patch, and think it's close to
committable, but I have a few minor gripes.

> The one part that I'm not totally satisfied in the relation forks patch
> is the smgrcreate() function. The question problem is: which piece of
> code decides which forks to create for a relation, and when to create
> them? I settled on the concept that all forks that a relation will need
> are created at once, in one smgrcreate() call. There's no function to
> create additional forks later on.

I think that's an extremely bad idea.  You need look no further than
Zdenek's hopes of in-place-upgrade to see a need for adding a fork
to an existing relation; but even without that I can see possibly
wanting to not create a fork right away.  I think smgrcreate() should
just create one fork as it's told (ie, same API you gave mdcreate).

> Currently, heap_create() decides which forks to create. That's fine at
> the moment, but will become problematic in the future, as it's
> indexam-specific which forks an index requires. That decision should
> really be done in the indexam. One possibility would be to only create
> the main fork in heap_create(), and let indexam create any additional
> forks it needs in ambuild function.

Problem goes away if you redo smgrcreate API as above.

Other nits:

relpath() is now in danger of buffer overrun: you need to increase
the palloc() sizes.  Seems like a #define for the max number of digits
in a ForkNumber wouldn't be out of place (though I sure hope it never
gets past 1 ...).

Also, I strongly suggest *not* appending _0 to the name of the main fork's
file.  This'd break on-disk compatibility and people's expectations,
for no particular reason.

Don't like the name NUM_FORKS; it seems to suggest that's the actual
number of forks in existence.  MAX_NUM_FORKS would be better.

I think that setting InvalidForkNumber to -1 is unportable: there is
nothing compelling the compiler to represent enum ForkNumber as a signed
type, so the places where you assume a ForkNumber variable can hold
InvalidForkNumber all look like trouble.  One solution is to include
InvalidForkNumber in the enum:

    enum ForkNumber {
        InvalidForkNumber = -1,
        MAIN_FORKNUM = 0
    };

This would be a bit annoying if someone wrote a switch statement listing
different possible fork numbers, as the compiler would complain if
there's no case for InvalidForkNumber; but I can't see a reason for code
like that to be common.

Another possibility is to set InvalidForkNumber = 0, MAIN_FORKNUM = 1;
which is still a bit of a type cheat if InvalidForkNumber isn't listed
in the enum, but one that's very unlikely to cause trouble.  A bigger
problem is it would mess up a lot of your loops.

BTW, don't put a comma at end of enum ForkNumber, I think some compilers
will barf on that.

Last, don't forget to update the sgml docs chapter on database physical
storage.

            regards, tom lane

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: VACUUM Improvements - WIP Patch
Next
From: Jeff Davis
Date:
Subject: Re: variadic function support