Thread: autogenerating headers & bki stuff
As some of you have probably gathered from a couple of recent patches that I've posted, I've been taking a look at the scripts we use to generate various derived files based on the catalog headers, and I think there's room for improvement. First, as discussed on previous threads, it seems needlessly complex to have both perl and shell-script implementations of genbki and Gen_fmgrtab. Second, there's a fair amount of duplicated data that could be auto-generated, but currently isn't. The attached patch merges all of the logic currently in genbki.sh and Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl. It then extends that logic to generate all of the Anum_* and Natts_* constants, as well as the Schema_pg_* declarations for the bootstrap tables. (I don't see a clean, easy way to generate Schema_pg_index, since there are no DATA() lines for that table in pg_attribute.h; but the others are all straightforward.) It also adds a fair amount of error checking to what we currently have in place. In order to avoid create a build-time dependency on Perl for non-Windows platforms, this patch makes all of the things generated by gen_catalog.pl into distprep targets. This should be OK, since none of them depend on architecture or configuration. I have not made any attempt to fix the MSVC build to work with this, but there should be some stuff that can be simplified there as well (in particular, the Perl reimplementation of genbki). Love it? Hate it? Comments appreciated. Thanks, ...Robert
Attachment
On Mon, Jun 29, 2009 at 11:59 PM, Robert Haas<robertmhaas@gmail.com> wrote: > As some of you have probably gathered from a couple of recent patches > that I've posted, I've been taking a look at the scripts we use to > generate various derived files based on the catalog headers, and I > think there's room for improvement. First, as discussed on previous > threads, it seems needlessly complex to have both perl and > shell-script implementations of genbki and Gen_fmgrtab. Second, > there's a fair amount of duplicated data that could be auto-generated, > but currently isn't. > > The attached patch merges all of the logic currently in genbki.sh and > Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl. It > then extends that logic to generate all of the Anum_* and Natts_* > constants, as well as the Schema_pg_* declarations for the bootstrap > tables. (I don't see a clean, easy way to generate Schema_pg_index, > since there are no DATA() lines for that table in pg_attribute.h; but > the others are all straightforward.) It also adds a fair amount of > error checking to what we currently have in place. > > In order to avoid create a build-time dependency on Perl for > non-Windows platforms, this patch makes all of the things generated by > gen_catalog.pl into distprep targets. This should be OK, since none > of them depend on architecture or configuration. > > I have not made any attempt to fix the MSVC build to work with this, > but there should be some stuff that can be simplified there as well > (in particular, the Perl reimplementation of genbki). Updated to CVS HEAD. Peter's commit of my related patch "pg_listener attribute number #defines" introduced a minor conflict. ...Robert
Attachment
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote: > The attached patch merges all of the logic currently in genbki.sh and > Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl. It > then extends that logic to generate all of the Anum_* and Natts_* > constants, as well as the Schema_pg_* declarations for the bootstrap > tables. I see a potential problem with the introduction of the catalog/anum.h header, to hold the Anum_... defines. This looks like it could be a significant break in the backend API, as evidenced by the fact that plperl and dblink no longer compile with this change. I think a less invasive change would be to include anum.h into all the catalog/pg_*.h headers, so that the external interface stays the same.
On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Tuesday 30 June 2009 06:59:51 Robert Haas wrote: >> The attached patch merges all of the logic currently in genbki.sh and >> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl. It >> then extends that logic to generate all of the Anum_* and Natts_* >> constants, as well as the Schema_pg_* declarations for the bootstrap >> tables. > > I see a potential problem with the introduction of the catalog/anum.h header, > to hold the Anum_... defines. This looks like it could be a significant break > in the backend API, as evidenced by the fact that plperl and dblink no longer > compile with this change. > > I think a less invasive change would be to include anum.h into all the > catalog/pg_*.h headers, so that the external interface stays the same. Gah. I wish a toplevel make would build "contrib". Anyway, yeah, we could do that. The downsides to that approach are: 1. Changing a catalog definition in a way that actually affects the contents of anum.h will force more things to be recompiled (note that there are guards against useless rebuilds of anum.h), and 2. If we do that, we'll probably be stuck with it forever, and it seems like a bit of a hack. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote: >> I think a less invasive change would be to include anum.h into all the >> catalog/pg_*.h headers, so that the external interface stays the same. > Gah. I wish a toplevel make would build "contrib". > Anyway, yeah, we could do that. The downsides to that approach are: I didn't realize this change was intending to throw all the Anum_ constants into a single header file. I am strongly against that on namespace pollution grounds, quite aside from the massive #include restructuring it'd require. And then there's the fact that any change in such a file would force rebuild of just about the entire backend. I do not see any virtue in autogenerating the Anum_ constants anyway. Yeah, manually updating them is a bit of a pain, but it's only a tiny part of the work that's normally involved in changing a system catalog. In any case, practically all of the benefit involved could be gotten by just not having to mess with the numerical values of the individual constants. Which we could do by setting them up as enums instead of macros, along the lines of http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php regards, tom lane
On Sat, Jul 25, 2009 at 10:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jul 25, 2009 at 3:21 AM, Peter Eisentraut<peter_e@gmx.net> wrote: >>> I think a less invasive change would be to include anum.h into all the >>> catalog/pg_*.h headers, so that the external interface stays the same. > >> Gah. I wish a toplevel make would build "contrib". > >> Anyway, yeah, we could do that. The downsides to that approach are: > > I didn't realize this change was intending to throw all the Anum_ > constants into a single header file. I am strongly against that > on namespace pollution grounds, I don't really understand this objection. The reason why namespace pollution is bad is because there's a risk that someone might be using one of the names used for some other purpose, but the chances that someone who is using a Anum_pg_* or Natts_pg_* constant also needs a similarly named constant for some purpose other than referencing the PostgreSQL catalogs seems so as nearly zero as makes no difference. The hypothetical scenario in which this is a problem goes something like this: someone is counting on the fact that if they include "catalog/pg_foo.h", then Anum_pg_foo_* and Natts_pg_foo will be defined appropriately for reference to PostgreSQL backend catalogs, but they are also counting on the fact that Anum_pg_bar_* and Natts_pg_bar (for some value of bar that collides with a system catalog name) are not defined and that they can use those constants for their own internal purposes. When they port their code to PG 8.5, they are forced into changing the naming of those constants, because it's no longer possible to just get the pg_foo constants without the pg_bar constants. If anyone is really doing this, I submit that it's a horribly bad idea and they ought to stop right away whether this patch gets committed or not. > quite aside from the massive #include > restructuring it'd require. This is all done in the patch (with the exception of a handful of loose ends that Peter found in his review) and I don't think it's all that massive. > And then there's the fact that any change > in such a file would force rebuild of just about the entire backend. It requires a rebuild of 56 of 547 files '*.c' files in src/backend, which is to say 10.2% of the backend. Also, the system is set up in such a way that the timestamp on catalog/anum.h changes only when its contents actually change, and dependencies are not rebuilt otherwise. So basically it'll happen when someone adds an attribute to, or removes one from, a system catalog: the fact that the .h file was updated in some other way is not sufficient. > I do not see any virtue in autogenerating the Anum_ constants anyway. > Yeah, manually updating them is a bit of a pain, but it's only a tiny > part of the work that's normally involved in changing a system catalog. Well, I'd like to work on fixing some of the other problems too, but this seems like a good place to start. Currently, if there are two uncommitted patches that make changes to the system catalog, whichever is committed first is 100% guaranteed to conflict with each other, and the resolution is typically painful. Of course, fixing the Anum and Natts declarations does not come close to fixing this problem: for catalogs that are initialized with any data at bootstrap time, the DATA() lines are a much bigger issue, but fixing that is going to require a bigger hammer than can be put in place with one patch. I do think this is a pretty good foundation on which to build, though. > In any case, practically all of the benefit involved could be gotten > by just not having to mess with the numerical values of the individual > constants. Which we could do by setting them up as enums instead of > macros, along the lines of > http://archives.postgresql.org/pgsql-committers/2008-05/msg00080.php I'd certainly be willing to concede that some of the benefit could be gotten that way, but I'm not sure I agree with "practically all". The benefits of this patch as I see them are: (1) to reduce the number of places where a catalog change creates a merge conflict, and (2) to eliminate the possibility of human error in setting up the Anum and Natts declarations. The fact that I found a case where this had been done inconsistently in pg_listener (and no one noticed for 10 years) provides that this is not an entirely hypothetical possibility even for committed code, and I've definitely screwed it up a few times in my own tree, too. Replacing the declarations with enums would make the merge conflicts involve fewer lines and maybe slightly simplify the manual updating process, but it won't completely solve either problem. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 25, 2009 at 10:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I didn't realize this change was intending to throw all the Anum_ >> constants into a single header file. �I am strongly against that >> on namespace pollution grounds, > I don't really understand this objection. It's for the same reasons we don't put all of include/catalog/ into one giant header file, or all of include/ for that matter. It's bad for modularity, it's bad for compilation time, it's bad for rebuild time if you're using --enable-depend. > The reason why namespace > pollution is bad is because there's a risk that someone might be using > one of the names used for some other purpose, Uh, no, that's actually pretty much irrelevant for our purposes. As a general rule, any two PG header files should be non-conflicting since some .c file might need to include both. So we'd have to get rid of conflicts anyhow. That does not make compartmentalization useless. As a for-instance, exposing names that a given .c file doesn't really need opens the door to typos that the compiler won't catch for you (ie, accidentally using the wrong Anum_ constant, in this context). > [ other straw-man argumentation snipped ] None of this impresses me at all. We should not throw a pile of unrelated declarations into one header just to simplify the life of an automatic script. regards, tom lane
On Sat, Jul 25, 2009 at 9:17 PM, Robert Haas<robertmhaas@gmail.com> wrote: > Of course, fixing the Anum and > Natts declarations does not come close to fixing this problem: for > catalogs that are initialized with any data at bootstrap time, the > DATA() lines are a much bigger issue, but fixing that is going to > require a bigger hammer than can be put in place with one patch. I do > think this is a pretty good foundation on which to build, though. I think addressing that would actually be fairly simple in theory. Move a lot of those DATA lines to SQL initdb scripts. Virtually all of pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be initialized using SQL. Hardly any of the records in there are needed for bootstrapping. That would reduce the pain of editing this files *enormously*. The worst part of adding new operators is making sure all the opclass entries line up properly. And when there's an OID conflict and they all have to be renumbered and the opclasses fixed up that's when they're a real headache. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > Move a lot of those DATA lines to SQL initdb scripts. Virtually all of > pg_proc, pg_operator, pg_opclass, pg_opfamily, pg_cast, etc can be > initialized using SQL. Hardly any of the records in there are needed > for bootstrapping. It's easy to make that claim, much less easy to actually do it. The other issue is that there will be some fraction of the entries that unavoidably *are* needed before you can use SQL to insert the rest. What will we do with those? Having two different representations for essentially the same kind of data isn't much fun. regards, tom lane
[ dept. of second thoughts ] I wrote: > It's easy to make that claim, much less easy to actually do it. Reflecting on this a bit more ... it seems to me that it's not the right thing to set the goal as "try to get rid of as many DATA statements as possible". The right way to think about this is to classify the stuff we have got in DATA statements, and consider how to reduce the pain associated with the harder-to-maintain categories. In particular, I think Greg correctly identified the main pain point as being the catalog entries associated with operator classes (ie, pg_opclass, pg_opfamily, pg_amop, pg_amproc entries). So what I'm thinking is we should consider how to migrate *all* of those entries into CREATE OPERATOR CLASS/FAMILY commands. And I think that that might be doable. Those entries are not needed by the system until we create catalog indexes. So maybe we could split the current bootstrap phase into three phases:* create core catalogs and load DATA commands, using bki* create operator classes, using sql script* createindexes, using bki* proceed on as before I'm not nearly as excited about migrating all or even most of, say, the pg_proc DATA lines into SQL. That simply isn't going to buy very much in maintainability --- a patch that wants to add a new property to all the functions is going to conflict just as much with another patch doing the same. And it is going to cost us in places like how do we generate the fmgr lookup table. Thoughts? regards, tom lane
On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > I'm not nearly as excited about migrating all or even most of, say, > the pg_proc DATA lines into SQL. That simply isn't going to buy very > much in maintainability --- a patch that wants to add a new property > to all the functions is going to conflict just as much with another > patch doing the same. And it is going to cost us in places like > how do we generate the fmgr lookup table. > > Thoughts? I think it would actually buy you quite a bit to migrate them to SQL, because in SQL, default properties can generally be omitted, which means that a patch which adds a new property to pg_proc that takes the same value for every row doesn't actually need to touch the SQL at all. I suspect that's a pretty common case, too: SE-PostgreSQL modifies a whole bunch of system catalogs to add a security label attribute, and ALTER TABLE ... ALTER COLUMN ... SET DISTINCT adds a column to pg_attribute that defaults to 0. I can hear you objecting that there's no possible way we can use SQL to construct pg_attribute, and that's certainly true. But I have another idea. What we could do is generate the BKI but using some more sophisticated method than just writing it all out longhand in the header files and copying it over into the bki file. The pg_attribute entries for the bootstrap tables, for example, are mostly inferrable from the PG_CATALOG() declarations (I think storage class and maybe one other property might be problematic). And certainly you could design a more human readable format for the pg_proc entries, maybe something like: DATA_PG_PROC(function-name, function-arg1-type-name function-arg2-type-name, function-return-type-name,language,definition) To convert this into BKI, you make an initial pass through pg_type.h and collect the OIDs of all the type names. Then you zip through pg_proc.h and now you have enough information to map all the type names into OIDs and generate the BKI. I'm waving my hands a little bit here but I really don't think this is too hard, coding-wise, and it seems like it would make it a LOT easier to edit this file... ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I'm not nearly as excited about migrating all or even most of, say, >> the pg_proc DATA lines into SQL. > I think it would actually buy you quite a bit to migrate them to SQL, > because in SQL, default properties can generally be omitted, which > means that a patch which adds a new property to pg_proc that takes the > same value for every row doesn't actually need to touch the SQL at > all. [ shrug... ] If you think default values would buy something in maintainability, we could revise the BKI notation to support them, with a lot less work and risk than what you're proposing. Perhaps something like DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... ); DATA( oid=1242 proname=boolin pronargs=2 ... ); DATA( oid=1243 proname=boolout pronargs=2 ... ); with the convention that any field not specified in either the DATA macro or the current defaults would go to NULL, except OID which would retain its current special treatment. (Hmm, I wonder if we'd even still need the _null_ hack anymore?) I remain unexcited about inventing contraptions that solve limited special cases. It's just not that hard to maintain those cases the way we're doing now, and every added processing step introduces its own comprehension and maintenance overheads. regards, tom lane
Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > >> On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> >>> I'm not nearly as excited about migrating all or even most of, say, >>> the pg_proc DATA lines into SQL. >>> > > >> I think it would actually buy you quite a bit to migrate them to SQL, >> because in SQL, default properties can generally be omitted, which >> means that a patch which adds a new property to pg_proc that takes the >> same value for every row doesn't actually need to touch the SQL at >> all. >> > > [ shrug... ] If you think default values would buy something in > maintainability, we could revise the BKI notation to support them, > with a lot less work and risk than what you're proposing. Perhaps > something like > > DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... ); > > DATA( oid=1242 proname=boolin pronargs=2 ... ); > DATA( oid=1243 proname=boolout pronargs=2 ... ); > > with the convention that any field not specified in either the > DATA macro or the current defaults would go to NULL, except OID > which would retain its current special treatment. (Hmm, I wonder > if we'd even still need the _null_ hack anymore?) > I kinda like this. It will make it easier not only to make catalog changes but to add entries to thinks like pg_proc (which is surely the biggest piece of the headache). > I remain unexcited about inventing contraptions that solve limited > special cases. It's just not that hard to maintain those cases > the way we're doing now, and every added processing step introduces > its own comprehension and maintenance overheads. > > > Agreed. cheers andrew
I wrote: > ... So maybe we could split the current bootstrap phase > into three phases: > * create core catalogs and load DATA commands, using bki > * create operator classes, using sql script > * create indexes, using bki > * proceed on as before I experimented with that a little bit and found it doesn't seem to be tremendously easy. A non-bootstrap-mode backend will PANIC immediately on startup if it doesn't find the critical system indexes, so the second step has issues. Also, there is no provision for resuming bootstrap mode in an already-existing database, so the third step doesn't work either. We could hack up solutions to those things, but it's not clear that it's worth it. What seems more profitable is just to allow CREATE OPERATOR CLASS/FAMILY to be executed while still in bootstrap mode. There will still be some obstacles to be surmounted, no doubt (in particular persuading these commands to run without system indexes present) ... but we'd have to surmount those anyway. In the spirit of not inventing single-purpose solutions, I suggest that the BKI representation for this might be something like BKI_EXECUTE( any old SQL command ); where the bootstrap.c code just passes the given string to the main SQL parser, and whether it works or not is dependent on whether the specified command has been bootstrap-mode-proofed. For the moment we'd only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but the door would be open for other things if it seemed worthwhile. regards, tom lane
On Sun, Jul 26, 2009 at 5:48 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > In the spirit of not inventing single-purpose solutions, I suggest > that the BKI representation for this might be something like > > BKI_EXECUTE( any old SQL command ); > > where the bootstrap.c code just passes the given string to the main SQL > parser, and whether it works or not is dependent on whether the > specified command has been bootstrap-mode-proofed. For the moment we'd > only bother to fix CREATE OPERATOR CLASS/FAMILY to work that way, but > the door would be open for other things if it seemed worthwhile. I have nothing against a BKI_EXECUTE() like you propose. But my instinct is still to go the other way. Of determining which parts are actually necessary for bootstrapping and which parts really aren't. I think it's valuable to have those two classes separated so we understand when we're introducing new dependencies and when we're varying from the well-trodden standard approaches. It would also be valuable if we ever want to move some of these things out to contrib modules or move other modules into the core. We might even envision having optional components which the user could have the optoin to decide at at initdb-time whether to include them. AFAICT the only opclasses that need to be in the bootstrap set are int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > AFAICT the only opclasses that need to be in the bootstrap set are > int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops. Maybe so, but the first two are part of the integer_ops family. If we have to continue implementing all of that through DATA statements then we haven't done much towards making things more maintainable or less fragile. I think we need to try to get *all* of the operator classes out of the hand-maintained-DATA-entries collection. The argument about optional stuff doesn't impress me. I would think that something that's going to be optionally loaded doesn't need to be considered during bootstrap mode at all. We can just have initdb run some SQL scripts (or not) during its post-bootstrap phase. regards, tom lane
On Sun, Jul 26, 2009 at 11:31 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jul 25, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I'm not nearly as excited about migrating all or even most of, say, >>> the pg_proc DATA lines into SQL. > >> I think it would actually buy you quite a bit to migrate them to SQL, >> because in SQL, default properties can generally be omitted, which >> means that a patch which adds a new property to pg_proc that takes the >> same value for every row doesn't actually need to touch the SQL at >> all. > > [ shrug... ] If you think default values would buy something in > maintainability, we could revise the BKI notation to support them, > with a lot less work and risk than what you're proposing. Really? I thought about that too, but concluded that it would be easier to verify that a change to the BKI-generation stuff was correct (by just diffing the generated files). I don't know how to verify that two versions of initdb do the same thing - I assume the databases won't be byte-for-byte identical. But that was my only concern about it: I like the idea of expanding what can be done in BKI mode, if we can figure out how to do it. > Perhaps > something like > > DATA_DEFAULTS( pronamespace=PGNSP proowner=PGUID prolang=12 ... ); > > DATA( oid=1242 proname=boolin pronargs=2 ... ); > DATA( oid=1243 proname=boolout pronargs=2 ... ); > > with the convention that any field not specified in either the > DATA macro or the current defaults would go to NULL, except OID > which would retain its current special treatment. (Hmm, I wonder > if we'd even still need the _null_ hack anymore?) > > I remain unexcited about inventing contraptions that solve limited > special cases. It's just not that hard to maintain those cases > the way we're doing now, and every added processing step introduces > its own comprehension and maintenance overheads. If you think that the current system is anywhere close to ideal, I give up. To do so much as add a single line to pg_proc requires all sort of useless manual work, like translating type names to OIDs, and making sure that pronargs contains the correct value when the same information is already encapsulated in both proargtypes and proargmodes. Introducing defaults for DATA() would bring some benefits because it would mostly avoid the need to change every row in the file when adding a new column. But a preprocessing script can do much more sophisticated transformations, like computing a value for a column, or looking up type names in another file and translating them into OIDs. It's not even hard; it's probably a 100-line patch on top of what I already submitted. ...Robert
On Sun, Jul 26, 2009 at 1:58 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> AFAICT the only opclasses that need to be in the bootstrap set are >> int2_ops, int4_ops, name_ops, oid_ops, and oidvector_ops. > > Maybe so, but the first two are part of the integer_ops family. If > we have to continue implementing all of that through DATA statements > then we haven't done much towards making things more maintainable > or less fragile. I think we need to try to get *all* of the operator > classes out of the hand-maintained-DATA-entries collection. Is this mostly a forward-reference problem? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Introducing defaults for DATA() would bring some benefits because it > would mostly avoid the need to change every row in the file when > adding a new column. But a preprocessing script can do much more > sophisticated transformations, like computing a value for a column, or > looking up type names in another file and translating them into OIDs. Hmm. A preprocessing script that produces DATA commands might in fact be a reasonable proposal, but it was not what I understood you to be suggesting before. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jul 26, 2009 at 1:58 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I think we need to try to get *all* of the operator >> classes out of the hand-maintained-DATA-entries collection. > Is this mostly a forward-reference problem? No, I don't see that as particularly the issue. What I'm concerned about is the prospect of different parts of the same opfamily being represented in different notations --- that sounds pretty error-prone to me. Greg is arguing that special-casing some minimum subset of the opclasses is a good idea, but I disagree. I think if we can make the idea work at all, we can migrate *all* the built-in opclasses into the higher-level notation, and that's how I want to approach it. regards, tom lane
Tom Lane escribió: > I experimented with that a little bit and found it doesn't seem to be > tremendously easy. A non-bootstrap-mode backend will PANIC immediately > on startup if it doesn't find the critical system indexes, so the second > step has issues. Also, there is no provision for resuming bootstrap > mode in an already-existing database, so the third step doesn't work > either. FWIW we hacked up a sort-of-bootstrap mode in Mammoth Replicator to be able to create our own catalogs and stuff. It's not particularly hard nor large: bootstrap.c | 31 ++++++!!!!!!!!!!!!!!!!!!!!!!!!! 1 file changed, 6 insertions(+), 25 modifications(!) (This is BSD code so feel free to use it if you find it useful) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
On Sun, Jul 26, 2009 at 8:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Introducing defaults for DATA() would bring some benefits because it >> would mostly avoid the need to change every row in the file when >> adding a new column. But a preprocessing script can do much more >> sophisticated transformations, like computing a value for a column, or >> looking up type names in another file and translating them into OIDs. > > Hmm. A preprocessing script that produces DATA commands might in fact > be a reasonable proposal, but it was not what I understood you to be > suggesting before. OK, sorry if I was unclear. I'm not sure exactly what you mean by producing DATA() commands; I think the output should be BKI directly. One of the things this patch does that I think is good (however flawed it may be otherwise) is unifies all of the stuff that needs to parse the DATA() statements into a single script. I think this is something we should pursue, because I think it will simplify the introduction of any other notation we want to consider in this area (regardless of whether it's DATA_DEFAULTS or EXEC_BKI or what have you). Maybe I should rip out all the anum.h stuff (sniff, I'm sad, I liked that design...) and resubmit. ...Robert
On Sunday 26 July 2009 01:40:09 Tom Lane wrote: > And it is going to cost us in places like > how do we generate the fmgr lookup table. We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and put them in a list.
On Sunday 26 July 2009 20:58:30 Tom Lane wrote: > The argument about optional stuff doesn't impress me. I would think > that something that's going to be optionally loaded doesn't need to be > considered during bootstrap mode at all. We can just have initdb run > some SQL scripts (or not) during its post-bootstrap phase. Isn't that exactly what some people are proposing? Remove the nonessential things from the DATA() lines and put them into SQL scripts?
On Mon, Jul 27, 2009 at 4:17 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Sunday 26 July 2009 01:40:09 Tom Lane wrote: >> And it is going to cost us in places like >> how do we generate the fmgr lookup table. > > We rgrep the source tree for PG_FUNCTION_ARGS, extract the function name, and > put them in a list. Probably parsing the SQL would be a better idea. Otherwise, the outputs would depend on every .c file in the entire source tree. ...Robert
Peter Eisentraut <peter_e@gmx.net> writes: > On Sunday 26 July 2009 20:58:30 Tom Lane wrote: >> The argument about optional stuff doesn't impress me. I would think >> that something that's going to be optionally loaded doesn't need to be >> considered during bootstrap mode at all. We can just have initdb run >> some SQL scripts (or not) during its post-bootstrap phase. > Isn't that exactly what some people are proposing? Remove the nonessential > things from the DATA() lines and put them into SQL scripts? I think what "some people are proposing" is to rip inessential stuff (say, the geometry types) out of core completely. Which I'd not be against. The discussion at the moment is about whether to have first-class and second-class citizens within the core set of functions; and that I'm against. I think two notations implemented by two different toolchains will be confusing and not help maintenance that much. regards, tom lane
On Sun, Jul 26, 2009 at 11:05 PM, Robert Haas<robertmhaas@gmail.com> wrote: > Maybe I should rip out all the anum.h stuff (sniff, I'm sad, I liked > that design...) and resubmit. Hearing no comments on this either way, here is a stripped down version. ...Robert
Attachment
On Tuesday 30 June 2009 06:59:51 Robert Haas wrote: > The attached patch merges all of the logic currently in genbki.sh and > Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl I can't really convince myself to like this change. I think there is some merit that these scripts are separate. I'm not sure what the combined script buys us except that it is a lot bigger and does everything at once instead of in two or three steps. That together with the big makefile moving around makes me think that this would cause more confusion and not much advantage. Btw., is this stamp-h business really necessary? I guess you copied that from pg_config.h, but the reason there is that everything depends on pg_config.h, and rerunning configure would normally cause everything to be rebuilt. The files we are looking at here should only change when something is edited.
Peter Eisentraut escribió: > On Tuesday 30 June 2009 06:59:51 Robert Haas wrote: > > The attached patch merges all of the logic currently in genbki.sh and > > Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl > > I can't really convince myself to like this change. I think there is some > merit that these scripts are separate. I'm not sure what the combined script > buys us except that it is a lot bigger and does everything at once instead of > in two or three steps. Maybe we can move forward in little steps. For example it would be excellent to have the schemapg.h file autogenerated instead of having to edit two copies of those definitions. I'm not sure that it buys us a lot to have all those things in a single script. Why not have a script to generate schemapg.h, another one to generate the bki stuff, another one to generate the fmgrtab header? They don't seem to share a lot of code anyway (and even if they do, surely we can add a .pm module containing common code). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Aug 11, 2009 at 4:52 PM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Tuesday 30 June 2009 06:59:51 Robert Haas wrote: >> The attached patch merges all of the logic currently in genbki.sh and >> Gen_fmgrtab.{sh,pl} into a single script called gen_catalog.pl > > I can't really convince myself to like this change. I think there is some > merit that these scripts are separate. I'm not sure what the combined script > buys us except that it is a lot bigger and does everything at once instead of > in two or three steps. > > That together with the big makefile moving around makes me think that this > would cause more confusion and not much advantage. > > Btw., is this stamp-h business really necessary? I guess you copied that from > pg_config.h, but the reason there is that everything depends on pg_config.h, > and rerunning configure would normally cause everything to be rebuilt. The > files we are looking at here should only change when something is edited. It definitely has less appeal without the anum.h stuff. I think there is some benefit in having a single script because it means that if we want to add additional syntax in the future (like BKI_EXECUTE or DATA_DEFAULTS or some kind of more human-readable notation for functions or opclasses, all of which were discussed upthread) there is only one place to change. But is that sufficient reason to commit it at this point, given that we don't have a fully-fleshed out design for any of those things? Not sure. Building schemapg.h automatically seems definitely nice to me. The stamp-h stuff is quite important for minimizing unnecessary rebuilds. Without that, any change to any include/catalog/*.h file will trigger a massive rebuild. With it, it only triggers a rebuild if one of the outputs actually changes, and only for those portions for which the output actually changed. ...Robert