Experimental patch: generating BKI revisited - Mailing list pgsql-hackers

From John Naylor
Subject Experimental patch: generating BKI revisited
Date
Msg-id 4d191a530911041228v621286a7q6a98d9ab8a2ed734@mail.gmail.com
Whole thread Raw
Responses Re: Experimental patch: generating BKI revisited
List pgsql-hackers
Hello everyone,

I was quite intrigued by a discussion that happened this past summer
regarding generation of bootstrap files such as postgres.bki, and the
associated pain points of maintaining the DATA() statements in catalog headers.
It occurred to me that the current system is backwards: Instead of
generating the
bootstrap files from hard-coded strings contained in various header files, it
seems it would be a cleaner design to generate both from a human-readable
high-level description of the system catalogs.

1. You wouldn't need hand-maintained pg_foo.h files. The struct declarations,
Natts/Annums, and such are predictable enough to be machine-generated. Ditto
for all 6 Schema_pg_foo declarations needed by relcache.c.
2. You wouldn't even need to specify the contents of pg_attribute -- the
bootstrap data for that are completely determined by the column names/types of
the bootstrap tables and some type information in pg_type.
3. With a human-readable format, you could have placeholder strings
representing oid values.
4. Since looking up catalog oids would be easy, we could get rid of
postgres.description, postgres.shdescription by putting those statements into
postgres.bki as well, which would eliminate the need for the
setup_description() function in initdb.c with its hard-coded SQL.

The patch linked to below implements 1-3 of what I've just described.
Unfortunately, as will become apparent, it introduces a dependency on a Perl
module which is inappropriate for Postgres' requirements of portability. So
consider this a research project, not an item for the patch queue. My hope is
that it could still be the basis for a practical solution. If this interests
you, keep reading...


FILE GENERATION

There are 4 scripts, which output distprep targets:

gen_bki.pl -- generates postgres.bki and schemapg.h
gen_header.pl -- generates the pg_foo.h headers.
gen_descr.pl -- generates postgres.description and postgres.shdescription
gen_fmgr.pl -- generates fmgroids.h and fmgrtab.c


CATALOG DATA FORMAT

The catalog data is represented in the YAML format. The features that led to
this decision are:

1. Ease of reading/editing.
2. Anchors/aliases -- enable human-readable oid values.
3. Merge key -- enables default values.

A simple example using pg_tablespace will illustrate:

pg_tablespace:   relation_oid: 1213   relation_define: TableSpaceRelationId   shared_relation: True   columns:       -
spcname:name       # tablespace name       - spcowner: oid       # owner of tablespace       - spclocation: text   #
physicallocation (VAR LENGTH)       - spcacl: aclitem[]   # access permissions (VAR LENGTH)   column_defaults:
&pg_tablespace_defaults      spcowner: *PGUID       spclocation: '""'       spcacl: _null_   data:       - spcname:
pg_default        oid: &DEFAULTTABLESPACE_OID 1663         define: DEFAULTTABLESPACE_OID         <<:
*pg_tablespace_defaults      - spcname: pg_global         oid: 1664         define: GLOBALTABLESPACE_OID         <<:
*pg_tablespace_defaults

When the YAML parser loads this into the Perl data structures used by the
scripts, they look similar to this when output through Data::Dumper:

$catalogs->{pg_tablespace} = {         'shared_relation' => 'True',         'relation_oid' => 1213,
'relation_define'=> 'TableSpaceRelationId',         'columns' => [                      { 'spcname' => 'name' },
             { 'spcowner' => 'oid' },                      { 'spclocation' => 'text' },                      { 'spcacl'
=>'aclitem[]' }                    ],         'data' => [                   {                     'spcname' =>
'pg_default',                    'spclocation' => '""',                     'oid' => 1663,                     'spcacl'
=>'_null_',                     'define' => 'DEFAULTTABLESPACE_OID',                     'spcowner' => 10
   },                   {                     'spcname' => 'pg_global',                     'spclocation' => '""',
              'oid' => 1664,                     'spcacl' => '_null_',                     'define' =>
'GLOBALTABLESPACE_OID',                    'spcowner' => 10                   }                 ],
'column_defaults'=> {                              'spclocation' => '""',                              'spcacl' =>
'_null_',                             'spcowner' => 10                            }
 
};

Note that the alias *PGUID is expanded to 10 since there was (not shown here) a
corresponding anchor in pg_authid -- "oid: &PGUID 10". Similarly, in any
subsequent catolog that refers to oid 1663 you can instead write
*DEFAULTTABLESPACE_OID. Note also that each data entry
is merged with the column_defaults hash.

A portion of a more complex example will hopefully motivate this method of data
organization:

This is the first 5 entries of the current representation of pg_amop:

/* default operators int2 */
DATA(insert (   1976   21 21 1  95  403 ));
DATA(insert (   1976   21 21 2  522 403 ));
DATA(insert (   1976   21 21 3  94  403 ));
DATA(insert (   1976   21 21 4  524 403 ));
DATA(insert (   1976   21 21 5  520 403 ));

The YAML representation leaves out half of the columns, to be computed at
compile time by gen_bki.pl, since they are intentional denormalizations. The
remaining columns are self-documenting because of human-readable oids:

- amopfamily: *integer_btree_fam amopstrategy: 1 amopopr: *int2lt_op
- amopfamily: *integer_btree_fam amopstrategy: 2 amopopr: *int2le_op
- amopfamily: *integer_btree_fam amopstrategy: 3 amopopr: *int2eq_op
- amopfamily: *integer_btree_fam amopstrategy: 4 amopopr: *int2ge_op
- amopfamily: *integer_btree_fam amopstrategy: 5 amopopr: *int2gt_op

I think this approach is more readable and less error-prone.


DEPENDENCIES AND THE REAL WORLD

Parsing YAML into Perl data structures requires the YAML::XS module, which in
turn requires Perl 5.10. Since the generated files are distprep targets, this
would only apply to those who want to build from the repo. Since this
is still an
unacceptable dependency, it might be worth it to use the new infrastructure
with a simpler data format that can be parsed with straight Perl or C.


NEW WARTS

Some entries in catalog.yaml are only there to put macros into the generated
header. These are indicated by data entries that contain only a
"define:" value and a "nobki: True" value. If a catalog header used
to contain things like function prototypes, enums, and #include's, these have
been put into 9 new pg_foo_fn.h files which are #include'd into the generated
pg_foo.h file. This is indicated by the presence of an "include:" value. The
number of new files could be reduced by consolidation, but I didn't do that so
that it would be obvious where the definitions come from.

The old mechanism is retained for the declare index and declare toast
statements. That is, they are still retrieved from indexing.h and toasting.h.
using regular expressions.


CAVEATS FOR THE CURIOUS

1. I haven't changed the configure script to test for YAML::XS.
2. I've run make -j2 successfully, but I'm not positive that my changes are
100% correct for parallel make.
3. I don't have ready access to a Windows box with the necessary development
environment, so MSVC is certainly broken.
4. Since there are whitepace inconsistencies in the current headers, you need
this command on the current postgres.bki to diff cleanly with mine:       sed -i 's/_)$/_ )/'
src/backend/catalog/postgres.bki


INFO

The project is located at

http://git.postgresql.org/gitweb?p=users/jnaylor/bki.git;a=summary

.gitignore                                    |  141 +src/Makefile                                  |    2
+-src/Makefile.global.in                       |   18 +src/backend/Makefile                          |   46
+-src/backend/catalog/Catalog.pm               |   76 +src/backend/catalog/Makefile                  |   85
+-src/backend/catalog/README                   |  166 +-src/backend/catalog/catalog.yaml              |28871
+++++++++++++++++++++++++src/backend/catalog/gen_bki.pl               |  493 +src/backend/catalog/gen_descr.pl
   |   72 +src/backend/catalog/gen_header.pl             |  215 +src/backend/catalog/genbki.sh                 |  438
-src/backend/catalog/unused_oids.pl           |  114 +src/backend/utils/Gen_fmgrtab.pl              |  194
-src/backend/utils/Gen_fmgrtab.sh             |  253 -src/backend/utils/Makefile                    |   14
+-src/backend/utils/cache/relcache.c           |    1 +src/backend/utils/gen_fmgr.pl                 |  189
+src/include/Makefile                         |    2 +-src/include/catalog/duplicate_oids            |   27
-src/include/catalog/genbki.h                 |   40 -src/include/catalog/indexing.h                |    7
+-src/include/catalog/pg_aggregate.h           |  239 -src/include/catalog/pg_aggregate_fn.h         |   32
+src/include/catalog/pg_am.h                  |  125 -src/include/catalog/pg_amop.h                 |  681
-src/include/catalog/pg_amproc.h              |  331 -src/include/catalog/pg_attrdef.h              |   56
-src/include/catalog/pg_attribute.h           |  516 -src/include/catalog/pg_auth_members.h         |   56
-src/include/catalog/pg_authid.h              |   97 -src/include/catalog/pg_cast.h                 |  355
-src/include/catalog/pg_cast_fn.h             |   44 +src/include/catalog/pg_class.h                |  144
-src/include/catalog/pg_constraint.h          |  231 -src/include/catalog/pg_constraint_fn.h        |   84
+src/include/catalog/pg_conversion.h          |   77 -src/include/catalog/pg_database.h             |   77
-src/include/catalog/pg_db_role_setting.h     |   67 -src/include/catalog/pg_db_role_setting_fn.h   |   29
+src/include/catalog/pg_default_acl.h         |   75 -src/include/catalog/pg_depend.h               |   90
-src/include/catalog/pg_description.h         |   84 -src/include/catalog/pg_enum.h                 |   66
-src/include/catalog/pg_enum_fn.h             |   25 +src/include/catalog/pg_foreign_data_wrapper.h |   62
-src/include/catalog/pg_foreign_server.h      |   65 -src/include/catalog/pg_index.h                |   91
-src/include/catalog/pg_inherits.h            |   59 -src/include/catalog/pg_language.h             |   79
-src/include/catalog/pg_largeobject.h         |   58 -src/include/catalog/pg_largeobject_fn.h       |   21
+src/include/catalog/pg_listener.h            |   59 -src/include/catalog/pg_namespace.h            |   82
-src/include/catalog/pg_namespace_fn.h        |   22 +src/include/catalog/pg_opclass.h              |  215
-src/include/catalog/pg_operator.h            |  961 -src/include/catalog/pg_operator_fn.h          |   31
+src/include/catalog/pg_opfamily.h            |  141 -src/include/catalog/pg_pltemplate.h           |   77
-src/include/catalog/pg_proc.h                | 4749 ----src/include/catalog/pg_rewrite.h              |   69
-src/include/catalog/pg_shdepend.h            |   90 -src/include/catalog/pg_shdescription.h        |   75
-src/include/catalog/pg_statistic.h           |  259 -src/include/catalog/pg_tablespace.h           |   63
-src/include/catalog/pg_trigger.h             |  113 -src/include/catalog/pg_trigger_fn.h           |   42
+src/include/catalog/pg_ts_config.h           |   60 -src/include/catalog/pg_ts_config_map.h        |   78
-src/include/catalog/pg_ts_dict.h             |   63 -src/include/catalog/pg_ts_parser.h            |   67
-src/include/catalog/pg_ts_template.h         |   67 -src/include/catalog/pg_type.h                 |  659
-src/include/catalog/pg_user_mapping.h        |   59 -src/include/catalog/toasting.h                |    4
+-src/include/catalog/unused_oids              |   57 -77 files changed, 30720 insertions(+), 12922 deletions(-)
 


Some code snippets and conventions were borrowed from Robert Haas' earlier
efforts. Feedback is appreciated.

John


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
Next
From: Alvaro Herrera
Date:
Subject: Re: WIP: pushing parser hooks through SPI and plancache