Thread: Experimental patch: generating BKI revisited
Hello everyone, I was quite intrigued by a discussion that happened this past summer regarding generation of bootstrap files such as postgres.bki, and the associated pain points of maintaining the DATA() statements in catalog headers. It occurred to me that the current system is backwards: Instead of generating the bootstrap files from hard-coded strings contained in various header files, it seems it would be a cleaner design to generate both from a human-readable high-level description of the system catalogs. 1. You wouldn't need hand-maintained pg_foo.h files. The struct declarations, Natts/Annums, and such are predictable enough to be machine-generated. Ditto for all 6 Schema_pg_foo declarations needed by relcache.c. 2. You wouldn't even need to specify the contents of pg_attribute -- the bootstrap data for that are completely determined by the column names/types of the bootstrap tables and some type information in pg_type. 3. With a human-readable format, you could have placeholder strings representing oid values. 4. Since looking up catalog oids would be easy, we could get rid of postgres.description, postgres.shdescription by putting those statements into postgres.bki as well, which would eliminate the need for the setup_description() function in initdb.c with its hard-coded SQL. The patch linked to below implements 1-3 of what I've just described. Unfortunately, as will become apparent, it introduces a dependency on a Perl module which is inappropriate for Postgres' requirements of portability. So consider this a research project, not an item for the patch queue. My hope is that it could still be the basis for a practical solution. If this interests you, keep reading... FILE GENERATION There are 4 scripts, which output distprep targets: gen_bki.pl -- generates postgres.bki and schemapg.h gen_header.pl -- generates the pg_foo.h headers. gen_descr.pl -- generates postgres.description and postgres.shdescription gen_fmgr.pl -- generates fmgroids.h and fmgrtab.c CATALOG DATA FORMAT The catalog data is represented in the YAML format. The features that led to this decision are: 1. Ease of reading/editing. 2. Anchors/aliases -- enable human-readable oid values. 3. Merge key -- enables default values. A simple example using pg_tablespace will illustrate: pg_tablespace: relation_oid: 1213 relation_define: TableSpaceRelationId shared_relation: True columns: - spcname:name # tablespace name - spcowner: oid # owner of tablespace - spclocation: text # physicallocation (VAR LENGTH) - spcacl: aclitem[] # access permissions (VAR LENGTH) column_defaults: &pg_tablespace_defaults spcowner: *PGUID spclocation: '""' spcacl: _null_ data: - spcname: pg_default oid: &DEFAULTTABLESPACE_OID 1663 define: DEFAULTTABLESPACE_OID <<: *pg_tablespace_defaults - spcname: pg_global oid: 1664 define: GLOBALTABLESPACE_OID <<: *pg_tablespace_defaults When the YAML parser loads this into the Perl data structures used by the scripts, they look similar to this when output through Data::Dumper: $catalogs->{pg_tablespace} = { 'shared_relation' => 'True', 'relation_oid' => 1213, 'relation_define'=> 'TableSpaceRelationId', 'columns' => [ { 'spcname' => 'name' }, { 'spcowner' => 'oid' }, { 'spclocation' => 'text' }, { 'spcacl' =>'aclitem[]' } ], 'data' => [ { 'spcname' => 'pg_default', 'spclocation' => '""', 'oid' => 1663, 'spcacl' =>'_null_', 'define' => 'DEFAULTTABLESPACE_OID', 'spcowner' => 10 }, { 'spcname' => 'pg_global', 'spclocation' => '""', 'oid' => 1664, 'spcacl' => '_null_', 'define' => 'GLOBALTABLESPACE_OID', 'spcowner' => 10 } ], 'column_defaults'=> { 'spclocation' => '""', 'spcacl' => '_null_', 'spcowner' => 10 } }; Note that the alias *PGUID is expanded to 10 since there was (not shown here) a corresponding anchor in pg_authid -- "oid: &PGUID 10". Similarly, in any subsequent catolog that refers to oid 1663 you can instead write *DEFAULTTABLESPACE_OID. Note also that each data entry is merged with the column_defaults hash. A portion of a more complex example will hopefully motivate this method of data organization: This is the first 5 entries of the current representation of pg_amop: /* default operators int2 */ DATA(insert ( 1976 21 21 1 95 403 )); DATA(insert ( 1976 21 21 2 522 403 )); DATA(insert ( 1976 21 21 3 94 403 )); DATA(insert ( 1976 21 21 4 524 403 )); DATA(insert ( 1976 21 21 5 520 403 )); The YAML representation leaves out half of the columns, to be computed at compile time by gen_bki.pl, since they are intentional denormalizations. The remaining columns are self-documenting because of human-readable oids: - amopfamily: *integer_btree_fam amopstrategy: 1 amopopr: *int2lt_op - amopfamily: *integer_btree_fam amopstrategy: 2 amopopr: *int2le_op - amopfamily: *integer_btree_fam amopstrategy: 3 amopopr: *int2eq_op - amopfamily: *integer_btree_fam amopstrategy: 4 amopopr: *int2ge_op - amopfamily: *integer_btree_fam amopstrategy: 5 amopopr: *int2gt_op I think this approach is more readable and less error-prone. DEPENDENCIES AND THE REAL WORLD Parsing YAML into Perl data structures requires the YAML::XS module, which in turn requires Perl 5.10. Since the generated files are distprep targets, this would only apply to those who want to build from the repo. Since this is still an unacceptable dependency, it might be worth it to use the new infrastructure with a simpler data format that can be parsed with straight Perl or C. NEW WARTS Some entries in catalog.yaml are only there to put macros into the generated header. These are indicated by data entries that contain only a "define:" value and a "nobki: True" value. If a catalog header used to contain things like function prototypes, enums, and #include's, these have been put into 9 new pg_foo_fn.h files which are #include'd into the generated pg_foo.h file. This is indicated by the presence of an "include:" value. The number of new files could be reduced by consolidation, but I didn't do that so that it would be obvious where the definitions come from. The old mechanism is retained for the declare index and declare toast statements. That is, they are still retrieved from indexing.h and toasting.h. using regular expressions. CAVEATS FOR THE CURIOUS 1. I haven't changed the configure script to test for YAML::XS. 2. I've run make -j2 successfully, but I'm not positive that my changes are 100% correct for parallel make. 3. I don't have ready access to a Windows box with the necessary development environment, so MSVC is certainly broken. 4. Since there are whitepace inconsistencies in the current headers, you need this command on the current postgres.bki to diff cleanly with mine: sed -i 's/_)$/_ )/' src/backend/catalog/postgres.bki INFO The project is located at http://git.postgresql.org/gitweb?p=users/jnaylor/bki.git;a=summary .gitignore | 141 +src/Makefile | 2 +-src/Makefile.global.in | 18 +src/backend/Makefile | 46 +-src/backend/catalog/Catalog.pm | 76 +src/backend/catalog/Makefile | 85 +-src/backend/catalog/README | 166 +-src/backend/catalog/catalog.yaml |28871 +++++++++++++++++++++++++src/backend/catalog/gen_bki.pl | 493 +src/backend/catalog/gen_descr.pl | 72 +src/backend/catalog/gen_header.pl | 215 +src/backend/catalog/genbki.sh | 438 -src/backend/catalog/unused_oids.pl | 114 +src/backend/utils/Gen_fmgrtab.pl | 194 -src/backend/utils/Gen_fmgrtab.sh | 253 -src/backend/utils/Makefile | 14 +-src/backend/utils/cache/relcache.c | 1 +src/backend/utils/gen_fmgr.pl | 189 +src/include/Makefile | 2 +-src/include/catalog/duplicate_oids | 27 -src/include/catalog/genbki.h | 40 -src/include/catalog/indexing.h | 7 +-src/include/catalog/pg_aggregate.h | 239 -src/include/catalog/pg_aggregate_fn.h | 32 +src/include/catalog/pg_am.h | 125 -src/include/catalog/pg_amop.h | 681 -src/include/catalog/pg_amproc.h | 331 -src/include/catalog/pg_attrdef.h | 56 -src/include/catalog/pg_attribute.h | 516 -src/include/catalog/pg_auth_members.h | 56 -src/include/catalog/pg_authid.h | 97 -src/include/catalog/pg_cast.h | 355 -src/include/catalog/pg_cast_fn.h | 44 +src/include/catalog/pg_class.h | 144 -src/include/catalog/pg_constraint.h | 231 -src/include/catalog/pg_constraint_fn.h | 84 +src/include/catalog/pg_conversion.h | 77 -src/include/catalog/pg_database.h | 77 -src/include/catalog/pg_db_role_setting.h | 67 -src/include/catalog/pg_db_role_setting_fn.h | 29 +src/include/catalog/pg_default_acl.h | 75 -src/include/catalog/pg_depend.h | 90 -src/include/catalog/pg_description.h | 84 -src/include/catalog/pg_enum.h | 66 -src/include/catalog/pg_enum_fn.h | 25 +src/include/catalog/pg_foreign_data_wrapper.h | 62 -src/include/catalog/pg_foreign_server.h | 65 -src/include/catalog/pg_index.h | 91 -src/include/catalog/pg_inherits.h | 59 -src/include/catalog/pg_language.h | 79 -src/include/catalog/pg_largeobject.h | 58 -src/include/catalog/pg_largeobject_fn.h | 21 +src/include/catalog/pg_listener.h | 59 -src/include/catalog/pg_namespace.h | 82 -src/include/catalog/pg_namespace_fn.h | 22 +src/include/catalog/pg_opclass.h | 215 -src/include/catalog/pg_operator.h | 961 -src/include/catalog/pg_operator_fn.h | 31 +src/include/catalog/pg_opfamily.h | 141 -src/include/catalog/pg_pltemplate.h | 77 -src/include/catalog/pg_proc.h | 4749 ----src/include/catalog/pg_rewrite.h | 69 -src/include/catalog/pg_shdepend.h | 90 -src/include/catalog/pg_shdescription.h | 75 -src/include/catalog/pg_statistic.h | 259 -src/include/catalog/pg_tablespace.h | 63 -src/include/catalog/pg_trigger.h | 113 -src/include/catalog/pg_trigger_fn.h | 42 +src/include/catalog/pg_ts_config.h | 60 -src/include/catalog/pg_ts_config_map.h | 78 -src/include/catalog/pg_ts_dict.h | 63 -src/include/catalog/pg_ts_parser.h | 67 -src/include/catalog/pg_ts_template.h | 67 -src/include/catalog/pg_type.h | 659 -src/include/catalog/pg_user_mapping.h | 59 -src/include/catalog/toasting.h | 4 +-src/include/catalog/unused_oids | 57 -77 files changed, 30720 insertions(+), 12922 deletions(-) Some code snippets and conventions were borrowed from Robert Haas' earlier efforts. Feedback is appreciated. John
John Naylor escribió: > Hello everyone, > > I was quite intrigued by a discussion that happened this past summer > regarding generation of bootstrap files such as postgres.bki, and the > associated pain points of maintaining the DATA() statements in catalog headers. > It occurred to me that the current system is backwards: Instead of > generating the > bootstrap files from hard-coded strings contained in various header files, it > seems it would be a cleaner design to generate both from a human-readable > high-level description of the system catalogs. I had a look at this some time ago and I must admit that I find it pretty interesting. The technology choices make it obviously impossible to merge -- not only the particular Perl modules used, but the mere fact that Perl is used (and that such a recent version is required). But you're already aware of all this so I'm not going to say more. As far as the data file is concerned, I think having it all in a single file is a loser. I'd go for a file per catalog. Also, I don't like the fact that the column descriptions are lost because of being in a YAML comment. I think it'd be better if the generated pg_foo.h files had them. One thing I loved about this is that it's trivial to add a column to pg_proc and that this not mean that I have to edit almost every single line of the damn monster file. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I had a look at this some time ago and I must admit that I find it > pretty interesting. The technology choices make it > obviously impossible to merge -- not only the particular Perl modules > used, but the mere fact that Perl is used (and that such a recent > version is required). But you're already aware of all this so I'm not > going to say more. Really? Perl is a show-stopper? That seems harsh. Perl is pretty ubiquitous - in fact, given that we now support Windows, arguably more so than awk. ...Robert
On Fri, Nov 13, 2009 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> I had a look at this some time ago and I must admit that I find it >> pretty interesting. The technology choices make it >> obviously impossible to merge -- not only the particular Perl modules >> used, but the mere fact that Perl is used (and that such a recent >> version is required). But you're already aware of all this so I'm not >> going to say more. > > Really? Perl is a show-stopper? That seems harsh. Perl is pretty > ubiquitous - in fact, given that we now support Windows, arguably more > so than awk. Building in VC++ on Windows already requires Perl. And if you're building in mingw, you've probably already got it, or can get it pretty easily. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Fri, Nov 13, 2009 at 11:00 AM, Dave Page <dpage@pgadmin.org> wrote: > On Fri, Nov 13, 2009 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >>> I had a look at this some time ago and I must admit that I find it >>> pretty interesting. The technology choices make it >>> obviously impossible to merge -- not only the particular Perl modules >>> used, but the mere fact that Perl is used (and that such a recent >>> version is required). But you're already aware of all this so I'm not >>> going to say more. >> >> Really? Perl is a show-stopper? That seems harsh. Perl is pretty >> ubiquitous - in fact, given that we now support Windows, arguably more >> so than awk. > > Building in VC++ on Windows already requires Perl. And if you're > building in mingw, you've probably already got it, or can get it > pretty easily. Yep, it's only on UNIX-ish systems where Perl isn't necessarily required, and realistically I think it is probably present on nearly all of those, too. We can even include the autogenerated files in make distprep so that it's only required for development, not to compile a release tarball. ...Robert
On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Yep, it's only on UNIX-ish systems where Perl isn't necessarily > required, and realistically I think it is probably present on nearly > all of those, too. Exactly. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Dave Page <dpage@pgadmin.org> writes: > On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Yep, it's only on UNIX-ish systems where Perl isn't necessarily >> required, and realistically I think it is probably present on nearly >> all of those, too. > Exactly. Yeah. Although the project policy is that we don't require Perl to build on Unix, there was a bug in the makefiles that made it effectively required, and nobody noticed for several years. I don't think it would be a hard sell to change that policy if we got a significant benefit out of it. (Depending on non-core Perl modules is a totally different thing though.) regards, tom lane
Tom Lane escribió: > Dave Page <dpage@pgadmin.org> writes: > > On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> Yep, it's only on UNIX-ish systems where Perl isn't necessarily > >> required, and realistically I think it is probably present on nearly > >> all of those, too. > > > Exactly. > > Yeah. Although the project policy is that we don't require Perl to > build on Unix, there was a bug in the makefiles that made it effectively > required, and nobody noticed for several years. I don't think it would > be a hard sell to change that policy if we got a significant benefit out > of it. (Depending on non-core Perl modules is a totally different thing > though.) Well, this is a pretty fortunate turn of events. I had two paragraphs in my original email that I edited out ("... so I'm not going to say more") on how to workaround the lack of Perl. If we're all OK now on requiring some basic Perl installation then all the better. I certainly have no trouble with it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane escribi�: >> Yeah. Although the project policy is that we don't require Perl to >> build on Unix, there was a bug in the makefiles that made it effectively >> required, and nobody noticed for several years. I don't think it would >> be a hard sell to change that policy if we got a significant benefit out >> of it. (Depending on non-core Perl modules is a totally different thing >> though.) > Well, this is a pretty fortunate turn of events. I had two paragraphs > in my original email that I edited out ("... so I'm not going to say > more") on how to workaround the lack of Perl. If we're all OK now on > requiring some basic Perl installation then all the better. I certainly > have no trouble with it. Although actually, we could still keep that policy if Perl is needed to build .bki files --- we just have to build those files in distprep and ship them as part of tarballs. It's already the case that you need Perl to build from a CVS pull, it's only tarball users who don't need it. regards, tom lane
On Fri, Nov 13, 2009 at 2:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Tom Lane escribió: >>> Yeah. Although the project policy is that we don't require Perl to >>> build on Unix, there was a bug in the makefiles that made it effectively >>> required, and nobody noticed for several years. I don't think it would >>> be a hard sell to change that policy if we got a significant benefit out >>> of it. (Depending on non-core Perl modules is a totally different thing >>> though.) > >> Well, this is a pretty fortunate turn of events. I had two paragraphs >> in my original email that I edited out ("... so I'm not going to say >> more") on how to workaround the lack of Perl. If we're all OK now on >> requiring some basic Perl installation then all the better. I certainly >> have no trouble with it. > > Although actually, we could still keep that policy if Perl is needed to > build .bki files --- we just have to build those files in distprep and > ship them as part of tarballs. It's already the case that you need Perl > to build from a CVS pull, it's only tarball users who don't need it. I just said the same thing a few hours ago... ...Robert
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Tom Lane escribió: >> >>> Yeah. Although the project policy is that we don't require Perl to >>> build on Unix, there was a bug in the makefiles that made it effectively >>> required, and nobody noticed for several years. I don't think it would >>> be a hard sell to change that policy if we got a significant benefit out >>> of it. (Depending on non-core Perl modules is a totally different thing >>> though.) >>> > > >> Well, this is a pretty fortunate turn of events. I had two paragraphs >> in my original email that I edited out ("... so I'm not going to say >> more") on how to workaround the lack of Perl. If we're all OK now on >> requiring some basic Perl installation then all the better. I certainly >> have no trouble with it. >> > > Although actually, we could still keep that policy if Perl is needed to > build .bki files --- we just have to build those files in distprep and > ship them as part of tarballs. It's already the case that you need Perl > to build from a CVS pull, it's only tarball users who don't need it. > > > That's all true. But maybe it's time to look again at this anyway. Perl is pretty darn ubiquitous these days. And awk-fu is getting rarer and rarer. cheers andrew
On Fri, Nov 13, 2009 at 5:16 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I had a look at this some time ago and I must admit that I find it > pretty interesting. The technology choices make it > obviously impossible to merge -- not only the particular Perl modules > used, but the mere fact that Perl is used (and that such a recent > version is required). But you're already aware of all this so I'm not > going to say more. Thanks for taking a look! One thing I didn't mention is that the scripts that use Catalog.pm were designed to be 5.6 compatible. A different catalog description that didn't require Perl modules could be swapped in. As long as Catalog.pm exported the same data structures, the logic wouldn't need to change. The concern there would be cost of developing a format and a parser to go with it. If it didn't make developers' lives significantly easier, it wouldn't be worth it. I haven't tried, but I think that the current system could be tweaked to generate pg_attribute and schemapg.h without needing any declarations in pg_attribute.h. I'll look into it. > As far as the data file is concerned, I think having it all in a single > file is a loser. I'd go for a file per catalog. YAML aliases can only refer to anchors that precede it in the same document. It does make the file frustratingly large, but human-readable oids make up for it IMO. > Also, I don't like the > fact that the column descriptions are lost because of being in a > YAML comment. I think it'd be better if the generated pg_foo.h files > had them. I considered that, but I wasn't sure the added complexity was worth it. > One thing I loved about this is that it's trivial to add a column to > pg_proc and that this not mean that I have to edit almost every single > line of the damn monster file. One could conceivably write a script against Catalog.pm that generated DATA(); statements to copy and paste into a header file. It might be a useful tool, if only for error checking. For that to be successful, the YAML file would need to be kept up to date, incurring duplicate maintenance. John