Re: Bootstrap DATA is a pita - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Bootstrap DATA is a pita
Date
Msg-id 3A8EFC37-E483-4FA6-A996-881EC29CBA63@gmail.com
Whole thread Raw
In response to Re: Bootstrap DATA is a pita  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bootstrap DATA is a pita  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> On Dec 11, 2015, at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
>>> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Huh?  Those files are the definition of that mapping, no?  Isn't what
>>> you're proposing circular?
>
>> No, there are far more references to Oids than there are definitions of them.
>
> Well, you're still not being very clear, but I *think* what you're
> proposing is to put a lot more smarts into the script that converts
> the master source files into .bki format.  That is, we might have
> "=(int8,int4)" in an entry in the master source file for pg_amop, but
> the script would look up that entry using the source data for pg_type
> and pg_operator, and then emit a simple numeric OID into the .bki file.
> (Presumably, it would know to do this because we'd redefine the
> pg_amop.amopopr column as of regoperator type not plain OID.)
>
> Yeah, that could work, though I'd be a bit concerned about the complexity
> and speed of the script.  Still, one doesn't usually rebuild postgres.bki
> many times a day, so speed might not be a big problem.

I am proposing that each of the catalog headers that currently has DATA
lines instead have a COPY loadable file that contains the same information.
So, for pg_type.h, there would be a pg_type.dat file.  All the DATA lines
would be pulled out of pg_type.h and a corresponding tab delimited row
would be written to pg_type.dat.  Henceforth, if you cloned the git repository,
you'd find no DATA lines in pg_type.h, but would find a pg_type.dat file
in the src/include/catalog directory.  Likewise for the other header files.

There would be some script, SQL or perl or whatever, that would convert
these .dat files into the .bki file.

Now, if we know that pg_type.dat will be processed before pg_proc.dat,
we can replace all the Oids representing datatypes in pg_proc.dat with the
names for those types, given that we already have a name <=> oid
mapping for types.

Likewise, if we know that pg_proc.dat will be processed before pg_operator.dat,
we can specify both functions and datatypes by name rather than by Oid
in that file, making it much easier to read.  By the time pg_operator.dat is
read, pg_type.dat and pg_proc.dat will already have been read and processed,
so there shouldn't be ambiguity.

By the time pg_amop.dat is processed, the operators, procs, datatypes,
opfamilies and so forth would already be know.  The example I gave up
thread would be easy to parse:

amopfamily          amoplefttype        amoprighttype       amopstrategy        amoppurpose         amopopr
amopmethod         amopsortfamily 
integer_ops         int2                int2                1                   search              "<"
btree              0 
integer_ops         int2                int2                2                   search              "<="
btree              0 
integer_ops         int2                int2                3                   search              "="
btree              0 
integer_ops         int2                int2                4                   search              ">="
btree              0 
integer_ops         int2                int2                5                   search              ">"
btree              0 

And if I came along and defined a new datatype, int384, I could add rows to
this file much more easily, as:

amopfamily          amoplefttype        amoprighttype       amopstrategy        amoppurpose         amopopr
amopmethod         amopsortfamily 
integer_ops         int384                int384                1                   search              "<"
   btree               0 
integer_ops         int384                int384                2                   search              "<="
   btree               0 
integer_ops         int384                int384                3                   search              "="
   btree               0 
integer_ops         int384                int384                4                   search              ">="
   btree               0 
integer_ops         int384                int384                5                   search              ">"
   btree               0 

I don't see how this creates all that much complication, and I clearly see
how it makes files like pg_operator.{h,dat} and pg_amop.{h,dat} easier to read.


mark




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Using quicksort for every external sort run
Next
From: Andres Freund
Date:
Subject: Re: Bootstrap DATA is a pita