Thread: autogenerating headers & bki stuff

autogenerating headers & bki stuff

From
Robert Haas
Date:
As some of you have probably gathered from a couple of recent patches
that I've posted, I've been taking a look at the scripts we use to
generate various derived files based on the catalog headers, and I
think there's room for improvement.  First, as discussed on previous
threads, it seems needlessly complex to have both perl and
shell-script implementations of genbki and Gen_fmgrtab.   Second,
there's a fair amount of duplicated data that could be auto-generated,
but currently isn't.

The attached patch merges all of the logic currently in genbki.sh and
Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
then extends that logic to generate all of the Anum_* and Natts_*
constants, as well as the Schema_pg_* declarations for the bootstrap
tables.  (I don't see a clean, easy way to generate Schema_pg_index,
since there are no DATA() lines for that table in pg_attribute.h; but
the others are all straightforward.)  It also adds a fair amount of
error checking to what we currently have in place.

In order to avoid create a build-time dependency on Perl for
non-Windows platforms, this patch makes all of the things generated by
gen_catalog.pl into distprep targets.  This should be OK, since none
of them depend on architecture or configuration.

I have not made any attempt to fix the MSVC build to work with this,
but there should be some stuff that can be simplified there as well
(in particular, the Perl reimplementation of genbki).

Love it?  Hate it?  Comments appreciated.

Thanks,

...Robert

Attachment

Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Mon, Jun 29, 2009 at 11:59 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> As some of you have probably gathered from a couple of recent patches
> that I've posted, I've been taking a look at the scripts we use to
> generate various derived files based on the catalog headers, and I
> think there's room for improvement.  First, as discussed on previous
> threads, it seems needlessly complex to have both perl and
> shell-script implementations of genbki and Gen_fmgrtab.   Second,
> there's a fair amount of duplicated data that could be auto-generated,
> but currently isn't.
>
> The attached patch merges all of the logic currently in genbki.sh and
> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
> then extends that logic to generate all of the Anum_* and Natts_*
> constants, as well as the Schema_pg_* declarations for the bootstrap
> tables.  (I don't see a clean, easy way to generate Schema_pg_index,
> since there are no DATA() lines for that table in pg_attribute.h; but
> the others are all straightforward.)  It also adds a fair amount of
> error checking to what we currently have in place.
>
> In order to avoid create a build-time dependency on Perl for
> non-Windows platforms, this patch makes all of the things generated by
> gen_catalog.pl into distprep targets.  This should be OK, since none
> of them depend on architecture or configuration.
>
> I have not made any attempt to fix the MSVC build to work with this,
> but there should be some stuff that can be simplified there as well
> (in particular, the Perl reimplementation of genbki).

Updated to CVS HEAD.  Peter's commit of my related patch "pg_listener
attribute number #defines" introduced a minor conflict.

...Robert

Attachment

Re: autogenerating headers & bki stuff

From
Peter Eisentraut
Date:
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
> The attached patch merges all of the logic currently in genbki.sh and
> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
> then extends that logic to generate all of the Anum_* and Natts_*
> constants, as well as the Schema_pg_* declarations for the bootstrap
> tables.

I see a potential problem with the introduction of the catalog/anum.h header, 
to hold the Anum_... defines.  This looks like it could be a significant break 
in the backend API, as evidenced by the fact that plperl and dblink no longer 
compile with this change.

I think a less invasive change would be to include anum.h into all the 
catalog/pg_*.h headers, so that the external interface stays the same.


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
>> The attached patch merges all of the logic currently in genbki.sh and
>> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl.  It
>> then extends that logic to generate all of the Anum_* and Natts_*
>> constants, as well as the Schema_pg_* declarations for the bootstrap
>> tables.
>
> I see a potential problem with the introduction of the catalog/anum.h header,
> to hold the Anum_... defines.  This looks like it could be a significant break
> in the backend API, as evidenced by the fact that plperl and dblink no longer
> compile with this change.
>
> I think a less invasive change would be to include anum.h into all the
> catalog/pg_*.h headers, so that the external interface stays the same.

Gah.  I wish a toplevel make would build "contrib".

Anyway, yeah, we could do that.  The downsides to that approach are:

1. Changing a catalog definition in a way that actually affects the
contents of anum.h will force more things to be recompiled (note that
there are guards against useless rebuilds of anum.h), and

2. If we do that, we'll probably be stuck with it forever, and it
seems like a bit of a hack.

...Robert


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
>> I think a less invasive change would be to include anum.h into all the
>> catalog/pg_*.h headers, so that the external interface stays the same.

> Gah.  I wish a toplevel make would build "contrib".

> Anyway, yeah, we could do that.  The downsides to that approach are:

I didn't realize this change was intending to throw all the Anum_
constants into a single header file.  I am strongly against that
on namespace pollution grounds, quite aside from the massive #include
restructuring it'd require.  And then there's the fact that any change
in such a file would force rebuild of just about the entire backend.

I do not see any virtue in autogenerating the Anum_ constants anyway.
Yeah, manually updating them is a bit of a pain, but it's only a tiny
part of the work that's normally involved in changing a system catalog.
In any case, practically all of the benefit involved could be gotten
by just not having to mess with the numerical values of the individual
constants.  Which we could do by setting them up as enums instead of
macros, along the lines of
http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sat, Jul 25, 2009 at 10:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
>>> I think a less invasive change would be to include anum.h into all the
>>> catalog/pg_*.h headers, so that the external interface stays the same.
>
>> Gah.  I wish a toplevel make would build "contrib".
>
>> Anyway, yeah, we could do that.  The downsides to that approach are:
>
> I didn't realize this change was intending to throw all the Anum_
> constants into a single header file.  I am strongly against that
> on namespace pollution grounds,

I don't really understand this objection.  The reason why namespace
pollution is bad is because there's a risk that someone might be using
one of the names used for some other purpose, but the chances that
someone who is using a Anum_pg_* or Natts_pg_* constant also needs a
similarly named constant for some purpose other than referencing the
PostgreSQL catalogs seems so as nearly zero as makes no difference.

The hypothetical scenario in which this is a problem goes something
like this: someone is counting on the fact that if they include
"catalog/pg_foo.h", then Anum_pg_foo_* and Natts_pg_foo will be
defined appropriately for reference to PostgreSQL backend catalogs,
but they are also counting on the fact that Anum_pg_bar_* and
Natts_pg_bar (for some value of bar that collides with a system
catalog name) are not defined and that they can use those constants
for their own internal purposes.  When they port their code to PG 8.5,
they are forced into changing the naming of those constants, because
it's no longer possible to just get the pg_foo constants without the
pg_bar constants.  If anyone is really doing this, I submit that it's
a horribly bad idea and they ought to stop right away whether this
patch gets committed or not.

> quite aside from the massive #include
> restructuring it'd require.

This is all done in the patch (with the exception of a handful of
loose ends that Peter found in his review) and I don't think it's all
that massive.

> And then there's the fact that any change
> in such a file would force rebuild of just about the entire backend.

It requires a rebuild of 56 of 547 files '*.c' files in src/backend,
which is to say 10.2% of the backend.  Also, the system is set up in
such a way that the timestamp on catalog/anum.h changes only when its
contents actually change, and dependencies are not rebuilt otherwise.
So basically it'll happen when someone adds an attribute to, or
removes one from, a system catalog: the fact that the .h file was
updated in some other way is not sufficient.

> I do not see any virtue in autogenerating the Anum_ constants anyway.
> Yeah, manually updating them is a bit of a pain, but it's only a tiny
> part of the work that's normally involved in changing a system catalog.

Well, I'd like to work on fixing some of the other problems too, but
this seems like a good place to start.  Currently, if there are two
uncommitted patches that make changes to the system catalog, whichever
is committed first is 100% guaranteed to conflict with each other, and
the resolution is typically painful.  Of course, fixing the Anum and
Natts declarations does not come close to fixing this problem: for
catalogs that are initialized with any data at bootstrap time, the
DATA() lines are a much bigger issue, but fixing that is going to
require a bigger hammer than can be put in place with one patch.  I do
think this is a pretty good foundation on which to build, though.

> In any case, practically all of the benefit involved could be gotten
> by just not having to mess with the numerical values of the individual
> constants.  Which we could do by setting them up as enums instead of
> macros, along the lines of
> http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php

I'd certainly be willing to concede that some of the benefit could be
gotten that way, but I'm not sure I agree with "practically all".  The
benefits of this patch as I see them are: (1) to reduce the number of
places where a catalog change creates a merge conflict, and (2) to
eliminate the possibility of human error in setting up the Anum and
Natts declarations.  The fact that I found a case where this had been
done inconsistently in pg_listener (and no one noticed for 10 years)
provides that this is not an entirely hypothetical possibility even
for committed code, and I've definitely screwed it up a few times in
my own tree, too.  Replacing the declarations with enums would make
the merge conflicts involve fewer lines and maybe slightly simplify
the manual updating process, but it won't completely solve either
problem.

...Robert


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jul 25, 2009 at 10:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I didn't realize this change was intending to throw all the Anum_
>> constants into a single header file. �I am strongly against that
>> on namespace pollution grounds,

> I don't really understand this objection.

It's for the same reasons we don't put all of include/catalog/ into one
giant header file, or all of include/ for that matter.  It's bad for
modularity, it's bad for compilation time, it's bad for rebuild time
if you're using --enable-depend.

> The reason why namespace
> pollution is bad is because there's a risk that someone might be using
> one of the names used for some other purpose,

Uh, no, that's actually pretty much irrelevant for our purposes.  As a
general rule, any two PG header files should be non-conflicting since
some .c file might need to include both.  So we'd have to get rid of
conflicts anyhow.  That does not make compartmentalization useless.
As a for-instance, exposing names that a given .c file doesn't really
need opens the door to typos that the compiler won't catch for you
(ie, accidentally using the wrong Anum_ constant, in this context).

> [ other straw-man argumentation snipped ]

None of this impresses me at all.  We should not throw a pile of
unrelated declarations into one header just to simplify the life
of an automatic script.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Greg Stark
Date:
On Sat, Jul 25, 2009 at 9:17 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> Of course, fixing the Anum and
> Natts declarations does not come close to fixing this problem: for
> catalogs that are initialized with any data at bootstrap time, the
> DATA() lines are a much bigger issue, but fixing that is going to
> require a bigger hammer than can be put in place with one patch.  I do
> think this is a pretty good foundation on which to build, though.


I think addressing that would actually be fairly simple in theory.
Move a lot of those DATA lines to SQL initdb scripts. Virtually all of
pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be
initialized using SQL. Hardly any of the records in there are needed
for bootstrapping.

That would reduce the pain of editing this files *enormously*. The
worst part of adding new operators is making sure all the opclass
entries line up properly. And when there's an OID conflict and they
all have to be renumbered and the opclasses fixed up that's when
they're a real headache.



--
greg
http://mit.edu/~gsstark/resume.pdf


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Move a lot of those DATA lines to SQL initdb scripts. Virtually all of
> pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be
> initialized using SQL. Hardly any of the records in there are needed
> for bootstrapping.

It's easy to make that claim, much less easy to actually do it.

The other issue is that there will be some fraction of the entries that
unavoidably *are* needed before you can use SQL to insert the rest.
What will we do with those?  Having two different representations for
essentially the same kind of data isn't much fun.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
[ dept. of second thoughts ]

I wrote:
> It's easy to make that claim, much less easy to actually do it.

Reflecting on this a bit more ... it seems to me that it's not the right
thing to set the goal as "try to get rid of as many DATA statements as
possible".  The right way to think about this is to classify the stuff
we have got in DATA statements, and consider how to reduce the pain
associated with the harder-to-maintain categories.

In particular, I think Greg correctly identified the main pain point
as being the catalog entries associated with operator classes (ie,
pg_opclass, pg_opfamily, pg_amop, pg_amproc entries).  So what I'm
thinking is we should consider how to migrate *all* of those entries
into CREATE OPERATOR CLASS/FAMILY commands.  And I think that that might
be doable.  Those entries are not needed by the system until we create
catalog indexes.  So maybe we could split the current bootstrap phase
into three phases:* create core catalogs and load DATA commands, using bki* create operator classes, using sql script*
createindexes, using bki* proceed on as before
 

I'm not nearly as excited about migrating all or even most of, say,
the pg_proc DATA lines into SQL.  That simply isn't going to buy very
much in maintainability --- a patch that wants to add a new property
to all the functions is going to conflict just as much with another
patch doing the same.  And it is going to cost us in places like
how do we generate the fmgr lookup table.

Thoughts?
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I'm not nearly as excited about migrating all or even most of, say,
> the pg_proc DATA lines into SQL.  That simply isn't going to buy very
> much in maintainability --- a patch that wants to add a new property
> to all the functions is going to conflict just as much with another
> patch doing the same.  And it is going to cost us in places like
> how do we generate the fmgr lookup table.
>
> Thoughts?

I think it would actually buy you quite a bit to migrate them to SQL,
because in SQL, default properties can generally be omitted, which
means that a patch which adds a new property to pg_proc that takes the
same value for every row doesn't actually need to touch the SQL at
all.  I suspect that's a pretty common case, too: SE-PostgreSQL
modifies a whole bunch of system catalogs to add a security label
attribute, and ALTER TABLE ... ALTER COLUMN ... SET DISTINCT adds a
column to pg_attribute that defaults to 0.

I can hear you objecting that there's no possible way we can use SQL
to construct pg_attribute, and that's certainly true.  But I have
another idea.  What we could do is generate the BKI but using some
more sophisticated method than just writing it all out longhand in the
header files and copying it over into the bki file.  The pg_attribute
entries for the bootstrap tables, for example, are mostly inferrable
from the PG_CATALOG() declarations (I think storage class and maybe
one other property might be problematic).  And certainly you could
design a more human readable format for the pg_proc entries, maybe
something like:

DATA_PG_PROC(function-name, function-arg1-type-name
function-arg2-type-name,
function-return-type-name,language,definition)

To convert this into BKI, you make an initial pass through pg_type.h
and collect the OIDs of all the type names.  Then you zip through
pg_proc.h and now you have enough information to map all the type
names into OIDs and generate the BKI.  I'm waving my hands a little
bit here but I really don't think this is too hard, coding-wise, and
it seems like it would make it a LOT easier to edit this file...

...Robert


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I'm not nearly as excited about migrating all or even most of, say,
>> the pg_proc DATA lines into SQL.

> I think it would actually buy you quite a bit to migrate them to SQL,
> because in SQL, default properties can generally be omitted, which
> means that a patch which adds a new property to pg_proc that takes the
> same value for every row doesn't actually need to touch the SQL at
> all.

[ shrug... ]  If you think default values would buy something in
maintainability, we could revise the BKI notation to support them,
with a lot less work and risk than what you're proposing.  Perhaps
something like

DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );

DATA( oid=1242 proname=boolin pronargs=2 ... );
DATA( oid=1243 proname=boolout pronargs=2 ... );

with the convention that any field not specified in either the
DATA macro or the current defaults would go to NULL, except OID
which would retain its current special treatment.  (Hmm, I wonder
if we'd even still need the _null_ hack anymore?)

I remain unexcited about inventing contraptions that solve limited
special cases.  It's just not that hard to maintain those cases
the way we're doing now, and every added processing step introduces
its own comprehension and maintenance overheads.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>   
>> On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>     
>>> I'm not nearly as excited about migrating all or even most of, say,
>>> the pg_proc DATA lines into SQL.
>>>       
>
>   
>> I think it would actually buy you quite a bit to migrate them to SQL,
>> because in SQL, default properties can generally be omitted, which
>> means that a patch which adds a new property to pg_proc that takes the
>> same value for every row doesn't actually need to touch the SQL at
>> all.
>>     
>
> [ shrug... ]  If you think default values would buy something in
> maintainability, we could revise the BKI notation to support them,
> with a lot less work and risk than what you're proposing.  Perhaps
> something like
>
> DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );
>
> DATA( oid=1242 proname=boolin pronargs=2 ... );
> DATA( oid=1243 proname=boolout pronargs=2 ... );
>
> with the convention that any field not specified in either the
> DATA macro or the current defaults would go to NULL, except OID
> which would retain its current special treatment.  (Hmm, I wonder
> if we'd even still need the _null_ hack anymore?)
>   

I kinda like this. It will make it easier not only to make catalog 
changes but to add entries to thinks like pg_proc (which is surely the 
biggest piece of the headache).

> I remain unexcited about inventing contraptions that solve limited
> special cases.  It's just not that hard to maintain those cases
> the way we're doing now, and every added processing step introduces
> its own comprehension and maintenance overheads.
>
>             
>   
Agreed.

cheers

andrew


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
I wrote:
> ... So maybe we could split the current bootstrap phase
> into three phases:
>     * create core catalogs and load DATA commands, using bki
>     * create operator classes, using sql script
>     * create indexes, using bki
>     * proceed on as before

I experimented with that a little bit and found it doesn't seem to be
tremendously easy.  A non-bootstrap-mode backend will PANIC immediately
on startup if it doesn't find the critical system indexes, so the second
step has issues.  Also, there is no provision for resuming bootstrap
mode in an already-existing database, so the third step doesn't work
either.  We could hack up solutions to those things, but it's not clear
that it's worth it.  What seems more profitable is just to allow CREATE
OPERATOR CLASS/FAMILY to be executed while still in bootstrap mode.
There will still be some obstacles to be surmounted, no doubt (in
particular persuading these commands to run without system indexes
present) ... but we'd have to surmount those anyway.

In the spirit of not inventing single-purpose solutions, I suggest
that the BKI representation for this might be something like

BKI_EXECUTE( any old SQL command );

where the bootstrap.c code just passes the given string to the main SQL
parser, and whether it works or not is dependent on whether the
specified command has been bootstrap-mode-proofed.  For the moment we'd
only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but
the door would be open for other things if it seemed worthwhile.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Greg Stark
Date:
On Sun, Jul 26, 2009 at 5:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>
> In the spirit of not inventing single-purpose solutions, I suggest
> that the BKI representation for this might be something like
>
> BKI_EXECUTE( any old SQL command );
>
> where the bootstrap.c code just passes the given string to the main SQL
> parser, and whether it works or not is dependent on whether the
> specified command has been bootstrap-mode-proofed.  For the moment we'd
> only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but
> the door would be open for other things if it seemed worthwhile.


I have nothing against a BKI_EXECUTE() like you propose.

But my instinct is still to go the other way. Of determining which
parts are actually necessary for bootstrapping and which parts really
aren't. I think it's valuable to have those two classes separated so
we understand when we're introducing new dependencies and when we're
varying from the well-trodden standard approaches.

It would also be valuable if we ever want to move some of these things
out to contrib modules or move other modules into the core. We might
even envision having optional components which the user could have the
optoin to decide at  at initdb-time whether to include them.

AFAICT the only opclasses that need to be in the bootstrap set are
int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.

--
greg
http://mit.edu/~gsstark/resume.pdf


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> AFAICT the only opclasses that need to be in the bootstrap set are
> int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.

Maybe so, but the first two are part of the integer_ops family.  If
we have to continue implementing all of that through DATA statements
then we haven't done much towards making things more maintainable
or less fragile.  I think we need to try to get *all* of the operator
classes out of the hand-maintained-DATA-entries collection.

The argument about optional stuff doesn't impress me.  I would think
that something that's going to be optionally loaded doesn't need to be
considered during bootstrap mode at all.  We can just have initdb run
some SQL scripts (or not) during its post-bootstrap phase.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sun, Jul 26, 2009 at 11:31 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> I'm not nearly as excited about migrating all or even most of, say,
>>> the pg_proc DATA lines into SQL.
>
>> I think it would actually buy you quite a bit to migrate them to SQL,
>> because in SQL, default properties can generally be omitted, which
>> means that a patch which adds a new property to pg_proc that takes the
>> same value for every row doesn't actually need to touch the SQL at
>> all.
>
> [ shrug... ]  If you think default values would buy something in
> maintainability, we could revise the BKI notation to support them,
> with a lot less work and risk than what you're proposing.

Really?  I thought about that too, but concluded that it would be
easier to verify that a change to the BKI-generation stuff was correct
(by just diffing the generated files).  I don't know how to verify
that two versions of initdb do the same thing - I assume the databases
won't be byte-for-byte identical.  But that was my only concern about
it: I like the idea of expanding what can be done in BKI mode, if we
can figure out how to do it.

> Perhaps
> something like
>
> DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... );
>
> DATA( oid=1242 proname=boolin pronargs=2 ... );
> DATA( oid=1243 proname=boolout pronargs=2 ... );
>
> with the convention that any field not specified in either the
> DATA macro or the current defaults would go to NULL, except OID
> which would retain its current special treatment.  (Hmm, I wonder
> if we'd even still need the _null_ hack anymore?)
>
> I remain unexcited about inventing contraptions that solve limited
> special cases.  It's just not that hard to maintain those cases
> the way we're doing now, and every added processing step introduces
> its own comprehension and maintenance overheads.

If you think that the current system is anywhere close to ideal, I
give up.  To do so much as add a single line to pg_proc requires all
sort of useless manual work, like translating type names to OIDs, and
making sure that pronargs contains the correct value when the same
information is already encapsulated in both proargtypes and
proargmodes.

Introducing defaults for DATA() would bring some benefits because it
would mostly avoid the need to change every row in the file when
adding a new column.  But a preprocessing script can do much more
sophisticated transformations, like computing a value for a column, or
looking up type names in another file and translating them into OIDs.
It's not even hard; it's probably a 100-line patch on top of what I
already submitted.

...Robert


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sun, Jul 26, 2009 at 1:58 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> AFAICT the only opclasses that need to be in the bootstrap set are
>> int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops.
>
> Maybe so, but the first two are part of the integer_ops family.  If
> we have to continue implementing all of that through DATA statements
> then we haven't done much towards making things more maintainable
> or less fragile.  I think we need to try to get *all* of the operator
> classes out of the hand-maintained-DATA-entries collection.

Is this mostly a forward-reference problem?

...Robert


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Introducing defaults for DATA() would bring some benefits because it
> would mostly avoid the need to change every row in the file when
> adding a new column.  But a preprocessing script can do much more
> sophisticated transformations, like computing a value for a column, or
> looking up type names in another file and translating them into OIDs.

Hmm.  A preprocessing script that produces DATA commands might in fact
be a reasonable proposal, but it was not what I understood you to be
suggesting before.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jul 26, 2009 at 1:58 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I think we need to try to get *all* of the operator
>> classes out of the hand-maintained-DATA-entries collection.

> Is this mostly a forward-reference problem?

No, I don't see that as particularly the issue.  What I'm concerned
about is the prospect of different parts of the same opfamily being
represented in different notations --- that sounds pretty error-prone
to me.  Greg is arguing that special-casing some minimum subset of the
opclasses is a good idea, but I disagree.  I think if we can make the
idea work at all, we can migrate *all* the built-in opclasses into the
higher-level notation, and that's how I want to approach it.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Alvaro Herrera
Date:
Tom Lane escribió:

> I experimented with that a little bit and found it doesn't seem to be
> tremendously easy.  A non-bootstrap-mode backend will PANIC immediately
> on startup if it doesn't find the critical system indexes, so the second
> step has issues.  Also, there is no provision for resuming bootstrap
> mode in an already-existing database, so the third step doesn't work
> either.

FWIW we hacked up a sort-of-bootstrap mode in Mammoth Replicator to be
able to create our own catalogs and stuff.  It's not particularly
hard nor large:

 bootstrap.c |   31 ++++++!!!!!!!!!!!!!!!!!!!!!!!!!
 1 file changed, 6 insertions(+), 25 modifications(!)


(This is BSD code so feel free to use it if you find it useful)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sun, Jul 26, 2009 at 8:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Introducing defaults for DATA() would bring some benefits because it
>> would mostly avoid the need to change every row in the file when
>> adding a new column.  But a preprocessing script can do much more
>> sophisticated transformations, like computing a value for a column, or
>> looking up type names in another file and translating them into OIDs.
>
> Hmm.  A preprocessing script that produces DATA commands might in fact
> be a reasonable proposal, but it was not what I understood you to be
> suggesting before.

OK, sorry if I was unclear.  I'm not sure exactly what you mean by
producing DATA() commands; I think the output should be BKI directly.
One of the things this patch does that I think is good (however flawed
it may be otherwise) is unifies all of the stuff that needs to parse
the DATA() statements into a single script.  I think this is something
we should pursue, because I think it will simplify the introduction of
any other notation we want to consider in this area (regardless of
whether it's DATA_DEFAULTS or EXEC_BKI or what have you).

Maybe I should rip out all the anum.h stuff (sniff, I'm sad, I liked
that design...) and resubmit.

...Robert


Re: autogenerating headers & bki stuff

From
Peter Eisentraut
Date:
On Sunday 26 July 2009 01:40:09 Tom Lane wrote:
> And it is going to cost us in places like
> how do we generate the fmgr lookup table.

We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and 
put them in a list.


Re: autogenerating headers & bki stuff

From
Peter Eisentraut
Date:
On Sunday 26 July 2009 20:58:30 Tom Lane wrote:
> The argument about optional stuff doesn't impress me.  I would think
> that something that's going to be optionally loaded doesn't need to be
> considered during bootstrap mode at all.  We can just have initdb run
> some SQL scripts (or not) during its post-bootstrap phase.

Isn't that exactly what some people are proposing?  Remove the nonessential 
things from the DATA() lines and put them into SQL scripts?


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Mon, Jul 27, 2009 at 4:17 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Sunday 26 July 2009 01:40:09 Tom Lane wrote:
>> And it is going to cost us in places like
>> how do we generate the fmgr lookup table.
>
> We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and
> put them in a list.

Probably parsing the SQL would be a better idea.  Otherwise, the
outputs would depend on every .c file in the entire source tree.

...Robert


Re: autogenerating headers & bki stuff

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Sunday 26 July 2009 20:58:30 Tom Lane wrote:
>> The argument about optional stuff doesn't impress me.  I would think
>> that something that's going to be optionally loaded doesn't need to be
>> considered during bootstrap mode at all.  We can just have initdb run
>> some SQL scripts (or not) during its post-bootstrap phase.

> Isn't that exactly what some people are proposing?  Remove the nonessential 
> things from the DATA() lines and put them into SQL scripts?

I think what "some people are proposing" is to rip inessential stuff
(say, the geometry types) out of core completely.  Which I'd not be
against.  The discussion at the moment is about whether to have
first-class and second-class citizens within the core set of functions;
and that I'm against.  I think two notations implemented by two
different toolchains will be confusing and not help maintenance that much.
        regards, tom lane


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Sun, Jul 26, 2009 at 11:05 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> Maybe I should rip out all the anum.h stuff (sniff, I'm sad, I liked
> that design...) and resubmit.

Hearing no comments on this either way, here is a stripped down version.

...Robert

Attachment

Re: autogenerating headers & bki stuff

From
Peter Eisentraut
Date:
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
> The attached patch merges all of the logic currently in genbki.sh and
> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl

I can't really convince myself to like this change.  I think there is some 
merit that these scripts are separate.  I'm not sure what the combined script 
buys us except that it is a lot bigger and does everything at once instead of 
in two or three steps.

That together with the big makefile moving around makes me think that this 
would cause more confusion and not much advantage.

Btw., is this stamp-h business really necessary?  I guess you copied that from 
pg_config.h, but the reason there is that everything depends on pg_config.h, 
and rerunning configure would normally cause everything to be rebuilt.  The 
files we are looking at here should only change when something is edited.


Re: autogenerating headers & bki stuff

From
Alvaro Herrera
Date:
Peter Eisentraut escribió:
> On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
> > The attached patch merges all of the logic currently in genbki.sh and
> > Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl
> 
> I can't really convince myself to like this change.  I think there is some 
> merit that these scripts are separate.  I'm not sure what the combined script 
> buys us except that it is a lot bigger and does everything at once instead of 
> in two or three steps.

Maybe we can move forward in little steps.  For example it would be
excellent to have the schemapg.h file autogenerated instead of having to
edit two copies of those definitions.

I'm not sure that it buys us a lot to have all those things in a single
script.  Why not have a script to generate schemapg.h, another one to
generate the bki stuff, another one to generate the fmgrtab header?
They don't seem to share a lot of code anyway (and even if they do,
surely we can add a .pm module containing common code).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: autogenerating headers & bki stuff

From
Robert Haas
Date:
On Tue, Aug 11, 2009 at 4:52 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Tuesday 30 June 2009 06:59:51 Robert Haas wrote:
>> The attached patch merges all of the logic currently in genbki.sh and
>> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl
>
> I can't really convince myself to like this change.  I think there is some
> merit that these scripts are separate.  I'm not sure what the combined script
> buys us except that it is a lot bigger and does everything at once instead of
> in two or three steps.
>
> That together with the big makefile moving around makes me think that this
> would cause more confusion and not much advantage.
>
> Btw., is this stamp-h business really necessary?  I guess you copied that from
> pg_config.h, but the reason there is that everything depends on pg_config.h,
> and rerunning configure would normally cause everything to be rebuilt.  The
> files we are looking at here should only change when something is edited.

It definitely has less appeal without the anum.h stuff.

I think there is some benefit in having a single script because it
means that if we want to add additional syntax in the future (like
BKI_EXECUTE or DATA_DEFAULTS or some kind of more human-readable
notation for functions or opclasses, all of which were discussed
upthread) there is only one place to change.  But is that sufficient
reason to commit it at this point, given that we don't have a
fully-fleshed out design for any of those things?  Not sure.  Building
schemapg.h automatically seems definitely nice to me.

The stamp-h stuff is quite important for minimizing unnecessary
rebuilds.  Without that, any change to any include/catalog/*.h file
will trigger a massive rebuild.  With it, it only triggers a rebuild
if one of the outputs actually changes, and only for those portions
for which the output actually changed.

...Robert