Bootstrap DATA is a pita - Mailing list pgsql-hackers

From Andres Freund
Subject Bootstrap DATA is a pita
Date
Msg-id 20150220234142.GH12653@awork2.anarazel.de
Whole thread Raw
Responses Re: Bootstrap DATA is a pita  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Hi,

I've been for a long while been rather annoyed about how cumbersome it
is to add catalog rows using the bootstrap format. Especially pg_proc.h,
pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy.

I think this needs to be improved. And while I'm not going to start
working on it tonight, I do plan to work on it if we can agree on a
design that I think is worth implementing.

The things that bug me most are:

1) When adding new rows it's rather hard to kno which columns are which,  and you have to specify a lot you really
don'tcare about. Especially  in pg_proc that's rather annoying.
 

2) Having to assign oids for many things that don't actually need is  bothersome and greatly increases the likelihood
ofconflicts. There's  some rows for which we need fixed oids (pg_type ones for example),  but e.g. for the majority of
pg_procit's unnecessary.
 

3) Adding a new column to a system catalog, especially pg_proc.h,  basically requires writing a complex regex or
programto modify the  header.
 

Therefore I propose that we add another format to generate the .bki
insert lines.

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

To address 1) we just need to make each row a hash and allow leaving out
columns that have some default value.

2) is a bit more complex. Generally many rows don't need a fixed oid at
all and many others primarily need it to handle object descriptions. The
latter seems best best solved by not making it dependant on the oid
anymore.

3) Seems primarily solved by not requiring default values to be
specified anymore. Also it should be much easier to add new values
automatically to a parseable format.

I think we'll need to generate oid #defines for some catalog contents,
but that seems solveable.

Maybe something rougly like:

# pg_type.data
CatalogData(   'pg_type',   [    { oid => 2249, data => {typname => 'cstring', typlen => -2, typbyval => 1, fake =>
'...'},oiddefine => 'CSTRINGOID'    }   ]
 
);

# pg_proc.data
CatalogData(   'pg_proc',   [    { oid => 1242, data => {proname => 'boolin', proretttype => 16, proargtypes => [2275],
provolatile=> 'i'}, description => 'I/O',    },    { data => {proname => 'mode_final', proretttype => 2283, proargtypes
=>[2281, 2283]}, description => 'aggregate final function',    }   ]
 
);

There'd need to be some logic to assign default values for columns, and
maybe even simple logic e.g. to determine arguments like pronargs based
on proargtypes.

This is far from fully though through, but I think something very
roughly along these lines could be a remarkable improvement in the ease
of adding new catalog contents.

Comments?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Idea: GSoC - Query Rewrite with Materialized Views
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT UPDATE and logical decoding