Thread: Experimental patch: generating BKI revisited

Experimental patch: generating BKI revisited

From

John Naylor

Date:

04 November 2009, 19:28:41

Hello everyone,

I was quite intrigued by a discussion that happened this past summer
regarding generation of bootstrap files such as postgres.bki, and the
associated pain points of maintaining the DATA() statements in catalog headers.
It occurred to me that the current system is backwards: Instead of
generating the
bootstrap files from hard-coded strings contained in various header files, it
seems it would be a cleaner design to generate both from a human-readable
high-level description of the system catalogs.

1. You wouldn't need hand-maintained pg_foo.h files. The struct declarations,
Natts/Annums, and such are predictable enough to be machine-generated. Ditto
for all 6 Schema_pg_foo declarations needed by relcache.c.
2. You wouldn't even need to specify the contents of pg_attribute -- the
bootstrap data for that are completely determined by the column names/types of
the bootstrap tables and some type information in pg_type.
3. With a human-readable format, you could have placeholder strings
representing oid values.
4. Since looking up catalog oids would be easy, we could get rid of
postgres.description, postgres.shdescription by putting those statements into
postgres.bki as well, which would eliminate the need for the
setup_description() function in initdb.c with its hard-coded SQL.

The patch linked to below implements 1-3 of what I've just described.
Unfortunately, as will become apparent, it introduces a dependency on a Perl
module which is inappropriate for Postgres' requirements of portability. So
consider this a research project, not an item for the patch queue. My hope is
that it could still be the basis for a practical solution. If this interests
you, keep reading...


FILE GENERATION

There are 4 scripts, which output distprep targets:

gen_bki.pl -- generates postgres.bki and schemapg.h
gen_header.pl -- generates the pg_foo.h headers.
gen_descr.pl -- generates postgres.description and postgres.shdescription
gen_fmgr.pl -- generates fmgroids.h and fmgrtab.c


CATALOG DATA FORMAT

The catalog data is represented in the YAML format. The features that led to
this decision are:

1. Ease of reading/editing.
2. Anchors/aliases -- enable human-readable oid values.
3. Merge key -- enables default values.

A simple example using pg_tablespace will illustrate:

pg_tablespace:   relation_oid: 1213   relation_define: TableSpaceRelationId   shared_relation: True   columns:       -
spcname:name       # tablespace name       - spcowner: oid       # owner of tablespace       - spclocation: text   #
physicallocation (VAR LENGTH)       - spcacl: aclitem[]   # access permissions (VAR LENGTH)   column_defaults:
&pg_tablespace_defaults      spcowner: *PGUID       spclocation: '""'       spcacl: _null_   data:       - spcname:
pg_default        oid: &DEFAULTTABLESPACE_OID 1663         define: DEFAULTTABLESPACE_OID         <<:
*pg_tablespace_defaults      - spcname: pg_global         oid: 1664         define: GLOBALTABLESPACE_OID         <<:
*pg_tablespace_defaults

When the YAML parser loads this into the Perl data structures used by the
scripts, they look similar to this when output through Data::Dumper:

$catalogs->{pg_tablespace} = {         'shared_relation' => 'True',         'relation_oid' => 1213,
'relation_define'=> 'TableSpaceRelationId',         'columns' => [                      { 'spcname' => 'name' },
             { 'spcowner' => 'oid' },                      { 'spclocation' => 'text' },                      { 'spcacl'
=>'aclitem[]' }                    ],         'data' => [                   {                     'spcname' =>
'pg_default',                    'spclocation' => '""',                     'oid' => 1663,                     'spcacl'
=>'_null_',                     'define' => 'DEFAULTTABLESPACE_OID',                     'spcowner' => 10
   },                   {                     'spcname' => 'pg_global',                     'spclocation' => '""',
              'oid' => 1664,                     'spcacl' => '_null_',                     'define' =>
'GLOBALTABLESPACE_OID',                    'spcowner' => 10                   }                 ],
'column_defaults'=> {                              'spclocation' => '""',                              'spcacl' =>
'_null_',                             'spcowner' => 10                            }
 
};

Note that the alias *PGUID is expanded to 10 since there was (not shown here) a
corresponding anchor in pg_authid -- "oid: &PGUID 10". Similarly, in any
subsequent catolog that refers to oid 1663 you can instead write
*DEFAULTTABLESPACE_OID. Note also that each data entry
is merged with the column_defaults hash.

A portion of a more complex example will hopefully motivate this method of data
organization:

This is the first 5 entries of the current representation of pg_amop:

/* default operators int2 */
DATA(insert (   1976   21 21 1  95  403 ));
DATA(insert (   1976   21 21 2  522 403 ));
DATA(insert (   1976   21 21 3  94  403 ));
DATA(insert (   1976   21 21 4  524 403 ));
DATA(insert (   1976   21 21 5  520 403 ));

The YAML representation leaves out half of the columns, to be computed at
compile time by gen_bki.pl, since they are intentional denormalizations. The
remaining columns are self-documenting because of human-readable oids:

- amopfamily: *integer_btree_fam amopstrategy: 1 amopopr: *int2lt_op
- amopfamily: *integer_btree_fam amopstrategy: 2 amopopr: *int2le_op
- amopfamily: *integer_btree_fam amopstrategy: 3 amopopr: *int2eq_op
- amopfamily: *integer_btree_fam amopstrategy: 4 amopopr: *int2ge_op
- amopfamily: *integer_btree_fam amopstrategy: 5 amopopr: *int2gt_op

I think this approach is more readable and less error-prone.


DEPENDENCIES AND THE REAL WORLD

Parsing YAML into Perl data structures requires the YAML::XS module, which in
turn requires Perl 5.10. Since the generated files are distprep targets, this
would only apply to those who want to build from the repo. Since this
is still an
unacceptable dependency, it might be worth it to use the new infrastructure
with a simpler data format that can be parsed with straight Perl or C.


NEW WARTS

Some entries in catalog.yaml are only there to put macros into the generated
header. These are indicated by data entries that contain only a
"define:" value and a "nobki: True" value. If a catalog header used
to contain things like function prototypes, enums, and #include's, these have
been put into 9 new pg_foo_fn.h files which are #include'd into the generated
pg_foo.h file. This is indicated by the presence of an "include:" value. The
number of new files could be reduced by consolidation, but I didn't do that so
that it would be obvious where the definitions come from.

The old mechanism is retained for the declare index and declare toast
statements. That is, they are still retrieved from indexing.h and toasting.h.
using regular expressions.


CAVEATS FOR THE CURIOUS

1. I haven't changed the configure script to test for YAML::XS.
2. I've run make -j2 successfully, but I'm not positive that my changes are
100% correct for parallel make.
3. I don't have ready access to a Windows box with the necessary development
environment, so MSVC is certainly broken.
4. Since there are whitepace inconsistencies in the current headers, you need
this command on the current postgres.bki to diff cleanly with mine:       sed -i 's/_)$/_ )/'
src/backend/catalog/postgres.bki


INFO

The project is located at

http://git.postgresql.org/gitweb?p=users/jnaylor/bki.git;a=summary

.gitignore                                    |  141 +src/Makefile                                  |    2
+-src/Makefile.global.in                       |   18 +src/backend/Makefile                          |   46
+-src/backend/catalog/Catalog.pm               |   76 +src/backend/catalog/Makefile                  |   85
+-src/backend/catalog/README                   |  166 +-src/backend/catalog/catalog.yaml              |28871
+++++++++++++++++++++++++src/backend/catalog/gen_bki.pl               |  493 +src/backend/catalog/gen_descr.pl
   |   72 +src/backend/catalog/gen_header.pl             |  215 +src/backend/catalog/genbki.sh                 |  438
-src/backend/catalog/unused_oids.pl           |  114 +src/backend/utils/Gen_fmgrtab.pl              |  194
-src/backend/utils/Gen_fmgrtab.sh             |  253 -src/backend/utils/Makefile                    |   14
+-src/backend/utils/cache/relcache.c           |    1 +src/backend/utils/gen_fmgr.pl                 |  189
+src/include/Makefile                         |    2 +-src/include/catalog/duplicate_oids            |   27
-src/include/catalog/genbki.h                 |   40 -src/include/catalog/indexing.h                |    7
+-src/include/catalog/pg_aggregate.h           |  239 -src/include/catalog/pg_aggregate_fn.h         |   32
+src/include/catalog/pg_am.h                  |  125 -src/include/catalog/pg_amop.h                 |  681
-src/include/catalog/pg_amproc.h              |  331 -src/include/catalog/pg_attrdef.h              |   56
-src/include/catalog/pg_attribute.h           |  516 -src/include/catalog/pg_auth_members.h         |   56
-src/include/catalog/pg_authid.h              |   97 -src/include/catalog/pg_cast.h                 |  355
-src/include/catalog/pg_cast_fn.h             |   44 +src/include/catalog/pg_class.h                |  144
-src/include/catalog/pg_constraint.h          |  231 -src/include/catalog/pg_constraint_fn.h        |   84
+src/include/catalog/pg_conversion.h          |   77 -src/include/catalog/pg_database.h             |   77
-src/include/catalog/pg_db_role_setting.h     |   67 -src/include/catalog/pg_db_role_setting_fn.h   |   29
+src/include/catalog/pg_default_acl.h         |   75 -src/include/catalog/pg_depend.h               |   90
-src/include/catalog/pg_description.h         |   84 -src/include/catalog/pg_enum.h                 |   66
-src/include/catalog/pg_enum_fn.h             |   25 +src/include/catalog/pg_foreign_data_wrapper.h |   62
-src/include/catalog/pg_foreign_server.h      |   65 -src/include/catalog/pg_index.h                |   91
-src/include/catalog/pg_inherits.h            |   59 -src/include/catalog/pg_language.h             |   79
-src/include/catalog/pg_largeobject.h         |   58 -src/include/catalog/pg_largeobject_fn.h       |   21
+src/include/catalog/pg_listener.h            |   59 -src/include/catalog/pg_namespace.h            |   82
-src/include/catalog/pg_namespace_fn.h        |   22 +src/include/catalog/pg_opclass.h              |  215
-src/include/catalog/pg_operator.h            |  961 -src/include/catalog/pg_operator_fn.h          |   31
+src/include/catalog/pg_opfamily.h            |  141 -src/include/catalog/pg_pltemplate.h           |   77
-src/include/catalog/pg_proc.h                | 4749 ----src/include/catalog/pg_rewrite.h              |   69
-src/include/catalog/pg_shdepend.h            |   90 -src/include/catalog/pg_shdescription.h        |   75
-src/include/catalog/pg_statistic.h           |  259 -src/include/catalog/pg_tablespace.h           |   63
-src/include/catalog/pg_trigger.h             |  113 -src/include/catalog/pg_trigger_fn.h           |   42
+src/include/catalog/pg_ts_config.h           |   60 -src/include/catalog/pg_ts_config_map.h        |   78
-src/include/catalog/pg_ts_dict.h             |   63 -src/include/catalog/pg_ts_parser.h            |   67
-src/include/catalog/pg_ts_template.h         |   67 -src/include/catalog/pg_type.h                 |  659
-src/include/catalog/pg_user_mapping.h        |   59 -src/include/catalog/toasting.h                |    4
+-src/include/catalog/unused_oids              |   57 -77 files changed, 30720 insertions(+), 12922 deletions(-)
 


Some code snippets and conventions were borrowed from Robert Haas' earlier
efforts. Feedback is appreciated.

John

Re: Experimental patch: generating BKI revisited

From

Alvaro Herrera

Date:

13 November 2009, 12:16:30

John Naylor escribió:
> Hello everyone,
> 
> I was quite intrigued by a discussion that happened this past summer
> regarding generation of bootstrap files such as postgres.bki, and the
> associated pain points of maintaining the DATA() statements in catalog headers.
> It occurred to me that the current system is backwards: Instead of
> generating the
> bootstrap files from hard-coded strings contained in various header files, it
> seems it would be a cleaner design to generate both from a human-readable
> high-level description of the system catalogs.

I had a look at this some time ago and I must admit that I find it
pretty interesting.  The technology choices make it
obviously impossible to merge -- not only the particular Perl modules
used, but the mere fact that Perl is used (and that such a recent
version is required).  But you're already aware of all this so I'm not
going to say more.

As far as the data file is concerned, I think having it all in a single
file is a loser.  I'd go for a file per catalog.  Also, I don't like the
fact that the column descriptions are lost because of being in a
YAML comment.  I think it'd be better if the generated pg_foo.h files
had them.

One thing I loved about this is that it's trivial to add a column to
pg_proc and that this not mean that I have to edit almost every single
line of the damn monster file.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Experimental patch: generating BKI revisited

From

Robert Haas

Date:

13 November 2009, 14:54:34

On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I had a look at this some time ago and I must admit that I find it
> pretty interesting.  The technology choices make it
> obviously impossible to merge -- not only the particular Perl modules
> used, but the mere fact that Perl is used (and that such a recent
> version is required).  But you're already aware of all this so I'm not
> going to say more.

Really?  Perl is a show-stopper?  That seems harsh.  Perl is pretty
ubiquitous - in fact, given that we now support Windows, arguably more
so than awk.

...Robert

Re: Experimental patch: generating BKI revisited

From

Dave Page

Date:

13 November 2009, 15:00:19

On Fri, Nov 13, 2009 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> I had a look at this some time ago and I must admit that I find it
>> pretty interesting.  The technology choices make it
>> obviously impossible to merge -- not only the particular Perl modules
>> used, but the mere fact that Perl is used (and that such a recent
>> version is required).  But you're already aware of all this so I'm not
>> going to say more.
>
> Really?  Perl is a show-stopper?  That seems harsh.  Perl is pretty
> ubiquitous - in fact, given that we now support Windows, arguably more
> so than awk.

Building in VC++ on Windows already requires Perl. And if you're
building in mingw, you've probably already got it, or can get it
pretty easily.


--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Experimental patch: generating BKI revisited

From

Robert Haas

Date:

13 November 2009, 15:12:34

On Fri, Nov 13, 2009 at 11:00 AM, Dave Page <dpage@pgadmin.org> wrote:
> On Fri, Nov 13, 2009 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Nov 13, 2009 at 8:16 AM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>>> I had a look at this some time ago and I must admit that I find it
>>> pretty interesting.  The technology choices make it
>>> obviously impossible to merge -- not only the particular Perl modules
>>> used, but the mere fact that Perl is used (and that such a recent
>>> version is required).  But you're already aware of all this so I'm not
>>> going to say more.
>>
>> Really?  Perl is a show-stopper?  That seems harsh.  Perl is pretty
>> ubiquitous - in fact, given that we now support Windows, arguably more
>> so than awk.
>
> Building in VC++ on Windows already requires Perl. And if you're
> building in mingw, you've probably already got it, or can get it
> pretty easily.

Yep, it's only on UNIX-ish systems where Perl isn't necessarily
required, and realistically I think it is probably present on nearly
all of those, too.  We can even include the autogenerated files in
make distprep so that it's only required for development, not to
compile a release tarball.

...Robert

Re: Experimental patch: generating BKI revisited

From

Dave Page

Date:

13 November 2009, 15:15:09

On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> Yep, it's only on UNIX-ish systems where Perl isn't necessarily
> required, and realistically I think it is probably present on nearly
> all of those, too.

Exactly.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Experimental patch: generating BKI revisited

From

Tom Lane

Date:

13 November 2009, 15:52:11

Dave Page <dpage@pgadmin.org> writes:
> On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Yep, it's only on UNIX-ish systems where Perl isn't necessarily
>> required, and realistically I think it is probably present on nearly
>> all of those, too.

> Exactly.

Yeah.  Although the project policy is that we don't require Perl to
build on Unix, there was a bug in the makefiles that made it effectively
required, and nobody noticed for several years.  I don't think it would
be a hard sell to change that policy if we got a significant benefit out
of it.  (Depending on non-core Perl modules is a totally different thing
though.)
        regards, tom lane

Re: Experimental patch: generating BKI revisited

From

Alvaro Herrera

Date:

13 November 2009, 17:33:00

Tom Lane escribió:
> Dave Page <dpage@pgadmin.org> writes:
> > On Fri, Nov 13, 2009 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Yep, it's only on UNIX-ish systems where Perl isn't necessarily
> >> required, and realistically I think it is probably present on nearly
> >> all of those, too.
> 
> > Exactly.
> 
> Yeah.  Although the project policy is that we don't require Perl to
> build on Unix, there was a bug in the makefiles that made it effectively
> required, and nobody noticed for several years.  I don't think it would
> be a hard sell to change that policy if we got a significant benefit out
> of it.  (Depending on non-core Perl modules is a totally different thing
> though.)

Well, this is a pretty fortunate turn of events.  I had two paragraphs
in my original email that I edited out ("... so I'm not going to say
more") on how to workaround the lack of Perl.  If we're all OK now on
requiring some basic Perl installation then all the better.  I certainly
have no trouble with it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Experimental patch: generating BKI revisited

From

Tom Lane

Date:

13 November 2009, 18:27:53

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> Yeah.  Although the project policy is that we don't require Perl to
>> build on Unix, there was a bug in the makefiles that made it effectively
>> required, and nobody noticed for several years.  I don't think it would
>> be a hard sell to change that policy if we got a significant benefit out
>> of it.  (Depending on non-core Perl modules is a totally different thing
>> though.)

> Well, this is a pretty fortunate turn of events.  I had two paragraphs
> in my original email that I edited out ("... so I'm not going to say
> more") on how to workaround the lack of Perl.  If we're all OK now on
> requiring some basic Perl installation then all the better.  I certainly
> have no trouble with it.

Although actually, we could still keep that policy if Perl is needed to
build .bki files --- we just have to build those files in distprep and
ship them as part of tarballs.  It's already the case that you need Perl
to build from a CVS pull, it's only tarball users who don't need it.
        regards, tom lane

Re: Experimental patch: generating BKI revisited

From

Robert Haas

Date:

13 November 2009, 18:30:18

On Fri, Nov 13, 2009 at 2:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Tom Lane escribió:
>>> Yeah.  Although the project policy is that we don't require Perl to
>>> build on Unix, there was a bug in the makefiles that made it effectively
>>> required, and nobody noticed for several years.  I don't think it would
>>> be a hard sell to change that policy if we got a significant benefit out
>>> of it.  (Depending on non-core Perl modules is a totally different thing
>>> though.)
>
>> Well, this is a pretty fortunate turn of events.  I had two paragraphs
>> in my original email that I edited out ("... so I'm not going to say
>> more") on how to workaround the lack of Perl.  If we're all OK now on
>> requiring some basic Perl installation then all the better.  I certainly
>> have no trouble with it.
>
> Although actually, we could still keep that policy if Perl is needed to
> build .bki files --- we just have to build those files in distprep and
> ship them as part of tarballs.  It's already the case that you need Perl
> to build from a CVS pull, it's only tarball users who don't need it.

I just said the same thing a few hours ago...

...Robert

Re: Experimental patch: generating BKI revisited

From

Andrew Dunstan

Date:

13 November 2009, 20:10:02


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> Tom Lane escribió:
>>     
>>> Yeah.  Although the project policy is that we don't require Perl to
>>> build on Unix, there was a bug in the makefiles that made it effectively
>>> required, and nobody noticed for several years.  I don't think it would
>>> be a hard sell to change that policy if we got a significant benefit out
>>> of it.  (Depending on non-core Perl modules is a totally different thing
>>> though.)
>>>       
>
>   
>> Well, this is a pretty fortunate turn of events.  I had two paragraphs
>> in my original email that I edited out ("... so I'm not going to say
>> more") on how to workaround the lack of Perl.  If we're all OK now on
>> requiring some basic Perl installation then all the better.  I certainly
>> have no trouble with it.
>>     
>
> Although actually, we could still keep that policy if Perl is needed to
> build .bki files --- we just have to build those files in distprep and
> ship them as part of tarballs.  It's already the case that you need Perl
> to build from a CVS pull, it's only tarball users who don't need it.
>
>             
>   

That's all true. But maybe it's time to look again at this anyway. Perl 
is pretty darn ubiquitous these days. And awk-fu is getting rarer and rarer.

cheers

andrew

Re: Experimental patch: generating BKI revisited

From

John Naylor

Date:

13 November 2009, 20:30:01

On Fri, Nov 13, 2009 at 5:16 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I had a look at this some time ago and I must admit that I find it
> pretty interesting.  The technology choices make it
> obviously impossible to merge -- not only the particular Perl modules
> used, but the mere fact that Perl is used (and that such a recent
> version is required).  But you're already aware of all this so I'm not
> going to say more.

Thanks for taking a look!

One thing I didn't mention is that the scripts that use Catalog.pm
were designed to be 5.6 compatible. A different catalog description
that didn't require Perl modules could be swapped in. As long as
Catalog.pm exported the same data structures, the logic wouldn't need
to change. The concern there would be cost of developing a format and
a parser to go with it. If it didn't make developers' lives
significantly easier, it wouldn't be worth it.

I haven't tried, but I think that the current system could be tweaked
to generate pg_attribute and schemapg.h without needing any
declarations in pg_attribute.h. I'll look into it.

> As far as the data file is concerned, I think having it all in a single
> file is a loser.  I'd go for a file per catalog.

YAML aliases can only refer to anchors that precede it in the same
document. It does make the file frustratingly large, but
human-readable oids make up for it IMO.

> Also, I don't like the
> fact that the column descriptions are lost because of being in a
> YAML comment.  I think it'd be better if the generated pg_foo.h files
> had them.

I considered that, but I wasn't sure the added complexity was worth it.

> One thing I loved about this is that it's trivial to add a column to
> pg_proc and that this not mean that I have to edit almost every single
> line of the damn monster file.

One could conceivably write a script against Catalog.pm that generated
DATA(); statements to copy and paste into a header file. It might be a
useful tool, if only for error checking. For that to be successful,
the YAML file would need to be kept up to date, incurring duplicate
maintenance.

John