Re: autogenerating headers & bki stuff - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autogenerating headers & bki stuff
Date
Msg-id 603c8f070907251317s20c929e3td4e574d38d6a9364@mail.gmail.com
Whole thread Raw
In response to Re: autogenerating headers & bki stuff  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: autogenerating headers & bki stuff
Re: autogenerating headers & bki stuff
List pgsql-hackers
On Sat, Jul 25, 2009 at 10:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
>>> I think a less invasive change would be to include anum.h into all the
>>> catalog/pg_*.h headers, so that the external interface stays the same.
>
>> Gah.  I wish a toplevel make would build "contrib".
>
>> Anyway, yeah, we could do that.  The downsides to that approach are:
>
> I didn't realize this change was intending to throw all the Anum_
> constants into a single header file.  I am strongly against that
> on namespace pollution grounds,

I don't really understand this objection.  The reason why namespace
pollution is bad is because there's a risk that someone might be using
one of the names used for some other purpose, but the chances that
someone who is using a Anum_pg_* or Natts_pg_* constant also needs a
similarly named constant for some purpose other than referencing the
PostgreSQL catalogs seems so as nearly zero as makes no difference.

The hypothetical scenario in which this is a problem goes something
like this: someone is counting on the fact that if they include
"catalog/pg_foo.h", then Anum_pg_foo_* and Natts_pg_foo will be
defined appropriately for reference to PostgreSQL backend catalogs,
but they are also counting on the fact that Anum_pg_bar_* and
Natts_pg_bar (for some value of bar that collides with a system
catalog name) are not defined and that they can use those constants
for their own internal purposes.  When they port their code to PG 8.5,
they are forced into changing the naming of those constants, because
it's no longer possible to just get the pg_foo constants without the
pg_bar constants.  If anyone is really doing this, I submit that it's
a horribly bad idea and they ought to stop right away whether this
patch gets committed or not.

> quite aside from the massive #include
> restructuring it'd require.

This is all done in the patch (with the exception of a handful of
loose ends that Peter found in his review) and I don't think it's all
that massive.

> And then there's the fact that any change
> in such a file would force rebuild of just about the entire backend.

It requires a rebuild of 56 of 547 files '*.c' files in src/backend,
which is to say 10.2% of the backend.  Also, the system is set up in
such a way that the timestamp on catalog/anum.h changes only when its
contents actually change, and dependencies are not rebuilt otherwise.
So basically it'll happen when someone adds an attribute to, or
removes one from, a system catalog: the fact that the .h file was
updated in some other way is not sufficient.

> I do not see any virtue in autogenerating the Anum_ constants anyway.
> Yeah, manually updating them is a bit of a pain, but it's only a tiny
> part of the work that's normally involved in changing a system catalog.

Well, I'd like to work on fixing some of the other problems too, but
this seems like a good place to start.  Currently, if there are two
uncommitted patches that make changes to the system catalog, whichever
is committed first is 100% guaranteed to conflict with each other, and
the resolution is typically painful.  Of course, fixing the Anum and
Natts declarations does not come close to fixing this problem: for
catalogs that are initialized with any data at bootstrap time, the
DATA() lines are a much bigger issue, but fixing that is going to
require a bigger hammer than can be put in place with one patch.  I do
think this is a pretty good foundation on which to build, though.

> In any case, practically all of the benefit involved could be gotten
> by just not having to mess with the numerical values of the individual
> constants.  Which we could do by setting them up as enums instead of
> macros, along the lines of
> http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php

I'd certainly be willing to concede that some of the benefit could be
gotten that way, but I'm not sure I agree with "practically all".  The
benefits of this patch as I see them are: (1) to reduce the number of
places where a catalog change creates a merge conflict, and (2) to
eliminate the possibility of human error in setting up the Anum and
Natts declarations.  The fact that I found a case where this had been
done inconsistently in pg_listener (and no one noticed for 10 years)
provides that this is not an entirely hypothetical possibility even
for committed code, and I've definitely screwed it up a few times in
my own tree, too.  Replacing the declarations with enums would make
the merge conflicts involve fewer lines and maybe slightly simplify
the manual updating process, but it won't completely solve either
problem.

...Robert


pgsql-hackers by date:

Previous
From: Sam Mason
Date:
Subject: Re: SE-PostgreSQL Specifications
Next
From: Sam Mason
Date:
Subject: Re: SE-PostgreSQL Specifications