Thread: Bootstrap DATA is a pita

Bootstrap DATA is a pita

From

Andres Freund

Date:

21 February 2015, 02:41:50

Hi,

I've been for a long while been rather annoyed about how cumbersome it
is to add catalog rows using the bootstrap format. Especially pg_proc.h,
pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy.

I think this needs to be improved. And while I'm not going to start
working on it tonight, I do plan to work on it if we can agree on a
design that I think is worth implementing.

The things that bug me most are:

1) When adding new rows it's rather hard to kno which columns are which,  and you have to specify a lot you really
don'tcare about. Especially  in pg_proc that's rather annoying.
 

2) Having to assign oids for many things that don't actually need is  bothersome and greatly increases the likelihood
ofconflicts. There's  some rows for which we need fixed oids (pg_type ones for example),  but e.g. for the majority of
pg_procit's unnecessary.
 

3) Adding a new column to a system catalog, especially pg_proc.h,  basically requires writing a complex regex or
programto modify the  header.
 

Therefore I propose that we add another format to generate the .bki
insert lines.

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

To address 1) we just need to make each row a hash and allow leaving out
columns that have some default value.

2) is a bit more complex. Generally many rows don't need a fixed oid at
all and many others primarily need it to handle object descriptions. The
latter seems best best solved by not making it dependant on the oid
anymore.

3) Seems primarily solved by not requiring default values to be
specified anymore. Also it should be much easier to add new values
automatically to a parseable format.

I think we'll need to generate oid #defines for some catalog contents,
but that seems solveable.

Maybe something rougly like:

# pg_type.data
CatalogData(   'pg_type',   [    { oid => 2249, data => {typname => 'cstring', typlen => -2, typbyval => 1, fake =>
'...'},oiddefine => 'CSTRINGOID'    }   ]
 
);

# pg_proc.data
CatalogData(   'pg_proc',   [    { oid => 1242, data => {proname => 'boolin', proretttype => 16, proargtypes => [2275],
provolatile=> 'i'}, description => 'I/O',    },    { data => {proname => 'mode_final', proretttype => 2283, proargtypes
=>[2281, 2283]}, description => 'aggregate final function',    }   ]
 
);

There'd need to be some logic to assign default values for columns, and
maybe even simple logic e.g. to determine arguments like pronargs based
on proargtypes.

This is far from fully though through, but I think something very
roughly along these lines could be a remarkable improvement in the ease
of adding new catalog contents.

Comments?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Josh Berkus

Date:

21 February 2015, 04:46:47

On 02/20/2015 03:41 PM, Andres Freund wrote:
> What I think we should do is to add pg_<catalog>.data files that contain
> the actual data that are automatically parsed by Catalog.pm. Those
> contain the rows in some to-be-decided format. I was considering using
> json, but it turns out only perl 5.14 started shipping JSON::PP as part
> of the standard library. So I guess it's best we just make it a big perl
> array + hashes.

What about YAML?  That might have been added somewhat earlier.

Or what about just doing CSV?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Bootstrap DATA is a pita

From

Peter Eisentraut

Date:

21 February 2015, 06:20:05

On 2/20/15 8:46 PM, Josh Berkus wrote:
> What about YAML?  That might have been added somewhat earlier.

YAML isn't included in Perl, but there is

Module::Build::YAML - Provides just enough YAML support so that
Module::Build works even if YAML.pm is not installed

which might work.

> Or what about just doing CSV?

I don't think that would actually address the problems.  It would just
be the same format as now with different delimiters.

Re: Bootstrap DATA is a pita

From

Peter Eisentraut

Date:

21 February 2015, 06:23:25

I violently support this proposal.

> Maybe something rougly like:
> 
> # pg_type.data
> CatalogData(
>     'pg_type',
>     [
>      {
>      oid => 2249,
>      data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
>      oiddefine => 'CSTRINGOID'
>      }
>     ]
> );

One concern I have with this is that in my experience different tools
and editors have vastly different ideas on how to format these kinds of
nested structures.  I'd try out YAML, or even a homemade fake YAML over
this.

Re: Bootstrap DATA is a pita

From

Petr Jelinek

Date:

21 February 2015, 06:27:10

On 21/02/15 04:22, Peter Eisentraut wrote:
> I violently support this proposal.
>
>> Maybe something rougly like:
>>
>> # pg_type.data
>> CatalogData(
>>      'pg_type',
>>      [
>>       {
>>      oid => 2249,
>>      data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
>>      oiddefine => 'CSTRINGOID'
>>       }
>>      ]
>> );
>
> One concern I have with this is that in my experience different tools
> and editors have vastly different ideas on how to format these kinds of
> nested structures.  I'd try out YAML, or even a homemade fake YAML over
> this.
>

+1 for the idea and +1 for YAML(-like) syntax.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

21 February 2015, 13:04:34

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
> On 2/20/15 8:46 PM, Josh Berkus wrote:
> > What about YAML?  That might have been added somewhat earlier.
>
> YAML isn't included in Perl, but there is
>
> Module::Build::YAML - Provides just enough YAML support so that
> Module::Build works even if YAML.pm is not installed

I'm afraid not:

sub Load {   shift if ($_[0] eq __PACKAGE__ || ref($_[0]) eq __PACKAGE__);   die "not yet implemented";
}

> > Or what about just doing CSV?
>
> I don't think that would actually address the problems.  It would just
> be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

> One concern I have with this is that in my experience different tools
> and editors have vastly different ideas on how to format these kinds of
> nested structures.  I'd try out YAML, or even a homemade fake YAML over
> this.

Yes, that's a good point. I have zero desire to open-code a format
though, I think that's a bad idea. We could say we just include
Yaml::Tiny, that's what it's made for.

To allow for changing things programatically without noise I was
wondering whether we shouldn't just load/dump the file at some point of
the build process. Then we're sure the indentation is correct and it can
be changed programatically wihtout requiring manual fixup of comments.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

21 February 2015, 17:39:40

On 02/21/2015 05:04 AM, Andres Freund wrote:

> Yes, that's a good point. I have zero desire to open-code a format
> though, I think that's a bad idea. We could say we just include
> Yaml::Tiny, that's what it's made for.
>

Personally, I think I would prefer that we use JSON (and yes, there's a 
JSON::Tiny module, which definitely lives up to its name).

For one thing, we've made a feature of supporting JSON, so arguably we 
should eat the same dog food.

I also dislike YAML's line oriented format. I'd like to be able to add a 
pg_proc entry in a handful of lines instead of 29 or more (pg_proc has 
27 attributes, but some of them are arrays, and there's an oid and in 
most cases a description to add as well). We could reduce that number by 
defaulting some of the attributes (pronamespace, proowner and prolang, 
for example) and possibly infering others (pronargs?). Even so it's 
going to take up lots of lines of vertical screen real estate. A JSON 
format could be more vertically compact. The price for that is that JSON 
strings have to be quoted, which I know lots of people hate.

cheers

andrew

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

21 February 2015, 17:59:47

On 02/21/2015 09:39 AM, Andrew Dunstan wrote:
>
> On 02/21/2015 05:04 AM, Andres Freund wrote:
>
>> Yes, that's a good point. I have zero desire to open-code a format
>> though, I think that's a bad idea. We could say we just include
>> Yaml::Tiny, that's what it's made for.
>>
>
>
> Personally, I think I would prefer that we use JSON (and yes, there's 
> a JSON::Tiny module, which definitely lives up to its name).
>
> For one thing, we've made a feature of supporting JSON, so arguably we 
> should eat the same dog food.
>
> I also dislike YAML's line oriented format. I'd like to be able to add 
> a pg_proc entry in a handful of lines instead of 29 or more (pg_proc 
> has 27 attributes, but some of them are arrays, and there's an oid and 
> in most cases a description to add as well). We could reduce that 
> number by defaulting some of the attributes (pronamespace, proowner 
> and prolang, for example) and possibly infering others (pronargs?). 
> Even so it's going to take up lots of lines of vertical screen real 
> estate. A JSON format could be more vertically compact. The price for 
> that is that JSON strings have to be quoted, which I know lots of 
> people hate.
>
>

Followup:

The YAML spec does support explicit flows like JSON, which would 
overcome my objections above, but unfortunately these are not supported 
by YAML::Tiny.


cheers

andrew

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

21 February 2015, 19:34:25

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
>> On 2/20/15 8:46 PM, Josh Berkus wrote:
>>> Or what about just doing CSV?

>> I don't think that would actually address the problems.  It would just
>> be the same format as now with different delimiters.

> Yea, we need hierarchies and named keys.

Yeah.  One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like
    { oid => 2249, oiddefine => 'CSTRINGOID', typname => 'cstring',        typlen => -2,        typbyval => 1,
...   }
 

This will be easier to edit, either manually or programmatically I think.
The code that turns it into a .bki file will need to know the exact set
of columns in each system catalog, but it would have had to know that
anyway I believe, if you're expecting it to insert default values.

Ideally the column defaults could come from BKI_ macros in the catalog/*.h
files; it would be good if we could keep those files as the One Source of
Truth for catalog schema info, even as we split out the initial data.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

21 February 2015, 19:43:39

On 2015-02-21 11:34:09 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
> >> On 2/20/15 8:46 PM, Josh Berkus wrote:
> >>> Or what about just doing CSV?
> 
> >> I don't think that would actually address the problems.  It would just
> >> be the same format as now with different delimiters.
> 
> > Yea, we need hierarchies and named keys.
> 
> Yeah.  One thought though is that I don't think we need the "data" layer
> in your proposal; that is, I'd flatten the representation to something
> more like
> 
>      {
>      oid => 2249,
>      oiddefine => 'CSTRINGOID',
>      typname => 'cstring',
>          typlen => -2,
>          typbyval => 1,
>          ...
>      }

I don't really like that - then stuff like oid, description, comment (?)
have to not conflict with any catalog columns. I think it's easier to
have them separate.

> This will be easier to edit, either manually or programmatically I think.
> The code that turns it into a .bki file will need to know the exact set
> of columns in each system catalog, but it would have had to know that
> anyway I believe, if you're expecting it to insert default values.

There'll need to be some awareness of columns, sure. But I think
programatically editing the values will still be simpler if you don't
need to discern whether a key is a column or some genbki specific value.

> Ideally the column defaults could come from BKI_ macros in the catalog/*.h
> files; it would be good if we could keep those files as the One Source of
> Truth for catalog schema info, even as we split out the initial data.

Hm, yea.

One thing I was considering was to do the regtype and regproc lookups
directly in the tool. That'd have two advantages: 1) it'd make it
possible to refer to typenames in pg_proc, 2) It'd be much faster. Right
now most of initdb's time is doing syscache lookups during bootstrap,
because it can't use indexes... A simple hash lookup during bki
generation could lead to quite measurable savings during lookup.

We could then even rip the bootstrap code out of regtypein/regprocin...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

21 February 2015, 19:43:46

Andrew Dunstan <andrew@dunslane.net> writes:
> On 02/21/2015 09:39 AM, Andrew Dunstan wrote:
>> Personally, I think I would prefer that we use JSON (and yes, there's 
>> a JSON::Tiny module, which definitely lives up to its name).
>> For one thing, we've made a feature of supporting JSON, so arguably we 
>> should eat the same dog food.

We've also made a feature of supporting XML, and a lot earlier, so that
argument seems pretty weak.

My only real requirement on the format choice is that it should absolutely
not require any Perl module that's not in a bog-standard installation.
I've gotten the buildfarm code running on several ancient machines now and
in most cases getting the module dependencies dealt with was pure hell.
No non-core modules for a basic build please.  I don't care whether they
are "tiny".
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

21 February 2015, 21:20:20

On 02/21/2015 11:43 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 02/21/2015 09:39 AM, Andrew Dunstan wrote:
>>> Personally, I think I would prefer that we use JSON (and yes, there's
>>> a JSON::Tiny module, which definitely lives up to its name).
>>> For one thing, we've made a feature of supporting JSON, so arguably we
>>> should eat the same dog food.
> We've also made a feature of supporting XML, and a lot earlier, so that
> argument seems pretty weak.


Fair enough

>
> My only real requirement on the format choice is that it should absolutely
> not require any Perl module that's not in a bog-standard installation.
> I've gotten the buildfarm code running on several ancient machines now and
> in most cases getting the module dependencies dealt with was pure hell.
> No non-core modules for a basic build please.  I don't care whether they
> are "tiny".
>
>             

The point about using the "tiny" modules is that they are so small and 
self-contained they can either be reasonably shipped with our code or 
embedded directly in the script that uses them, so no extra build 
dependency would be created.

However, I rather like your suggestion of this:

>       {
>      oid => 2249,
>      oiddefine => 'CSTRINGOID',
>      typname => 'cstring',
>           typlen => -2,
>           typbyval => 1,
>           ...
>       }


which is pure perl syntax and wouldn't need any extra module, and has 
the advantage over JSON that key names won't need to be quoted, making 
it more readable.


cheers

andrew

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

21 February 2015, 23:01:58

On February 21, 2015 7:20:04 PM CET, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>On 02/21/2015 11:43 AM, Tom Lane wrote:
>
>>       {
>>      oid => 2249,
>>      oiddefine => 'CSTRINGOID',
>>      typname => 'cstring',
>>           typlen => -2,
>>           typbyval => 1,
>>           ...
>>       }
>
>
>which is pure perl syntax and wouldn't need any extra module, and has 
>the advantage over JSON that key names won't need to be quoted, making 
>it more readable.

Yea, my original post suggested using actual perl hashes to avoid problems with the availability of libraries. So far
I'venot really heard a convincing alternative. Peter's problem with formatting seems to be most easily solved by
rewritingthe file automatically...
 

Andres


-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

22 February 2015, 02:08:55

On 2015-02-21 17:43:09 +0100, Andres Freund wrote:
> One thing I was considering was to do the regtype and regproc lookups
> directly in the tool. That'd have two advantages: 1) it'd make it
> possible to refer to typenames in pg_proc, 2) It'd be much faster. Right
> now most of initdb's time is doing syscache lookups during bootstrap,
> because it can't use indexes... A simple hash lookup during bki
> generation could lead to quite measurable savings during lookup.

I've *very* quickly hacked this up. Doing this for all regproc columns
gives a consistent speedup in an assert enabled from ~0m3.589s to
~0m2.544s. My guess is that the relative speedup in optimized mode would
actually be even bigger as now most of the time is spent in
AtEOXact_CatCache.

Given that pg_proc is unlikely to get any smaller and that the current
code is essentially O(lookups * #pg_proc), this alone seems to be worth
a good bit.

The same trick should also allow us to simply refer to type names in
pg_proc et al. If we had a way to denote a column being of type
relnamespace/relauthid we could replace
    $row->{bki_values} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
    $row->{bki_values} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
as well.

The changes in pg_proc.h are just to demonstrate that using names
instead of oids works.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-WIP-resolve-regtype-regproc-in-genbki.pl.patch

Re: Bootstrap DATA is a pita

From

Greg Stark

Date:

03 March 2015, 01:06:06

On Sat, Feb 21, 2015 at 11:08 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> The changes in pg_proc.h are just to demonstrate that using names
> instead of oids works.

Fwiw I always thought it was strange how much of our bootstrap was
done in a large static text file. Very little of it is actually needed
for bootstrapping and we could get by with a very small set followed
by a bootstrap script written in standard SQL, not unlike how the
system views are created. It's much easier to type CREATE OPERATOR and
CREATE OPERATOR CLASS with all the symbolic names instead of having to
fill in the table.

-- 
greg

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 05:49:33

On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
>>> On 2/20/15 8:46 PM, Josh Berkus wrote:
>>>> Or what about just doing CSV?
>
>>> I don't think that would actually address the problems.  It would just
>>> be the same format as now with different delimiters.
>
>> Yea, we need hierarchies and named keys.
>
> Yeah.  One thought though is that I don't think we need the "data" layer
> in your proposal; that is, I'd flatten the representation to something
> more like
>
>      {
>          oid => 2249,
>          oiddefine => 'CSTRINGOID',
>          typname => 'cstring',
>          typlen => -2,
>          typbyval => 1,
>          ...
>      }

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring.  I agree that the current format is a pain in the tail, but
pg_proc.h is >5k lines already.  I don't want it to be 100k lines
instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

04 March 2015, 11:48:19

On 2015-03-03 21:49:21 -0500, Robert Haas wrote:
> On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> >> On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:
> >>> On 2/20/15 8:46 PM, Josh Berkus wrote:
> >>>> Or what about just doing CSV?
> >
> >>> I don't think that would actually address the problems.  It would just
> >>> be the same format as now with different delimiters.
> >
> >> Yea, we need hierarchies and named keys.
> >
> > Yeah.  One thought though is that I don't think we need the "data" layer
> > in your proposal; that is, I'd flatten the representation to something
> > more like
> >
> >      {
> >          oid => 2249,
> >          oiddefine => 'CSTRINGOID',
> >          typname => 'cstring',
> >          typlen => -2,
> >          typbyval => 1,
> >          ...
> >      }
> 
> Even this promises to vastly increase the number of lines in the file,
> and make it harder to compare entries by grepping out some common
> substring.  I agree that the current format is a pain in the tail, but
> pg_proc.h is >5k lines already.  I don't want it to be 100k lines
> instead.

Do you have a better suggestion? Sure it'll be a long file, but it still
seems vastly superiour to what we have now.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 16:47:51

>> Even this promises to vastly increase the number of lines in the file,
>> and make it harder to compare entries by grepping out some common
>> substring.  I agree that the current format is a pain in the tail, but
>> pg_proc.h is >5k lines already.  I don't want it to be 100k lines
>> instead.
>
> Do you have a better suggestion? Sure it'll be a long file, but it still
> seems vastly superiour to what we have now.

Not really.  What had occurred to me is to try to improve the format
of the DATA lines (e.g. by allowing names to be used instead of OIDs)
but that wouldn't allow defaulted fields to be omitted, which is
certainly a big win.  I wonder whether some home-grown single-line
format might be better than using a pre-existing format, but I'm not
too sure it would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

04 March 2015, 16:54:43

On 2015-03-04 08:47:44 -0500, Robert Haas wrote:
> >> Even this promises to vastly increase the number of lines in the file,
> >> and make it harder to compare entries by grepping out some common
> >> substring.  I agree that the current format is a pain in the tail, but
> >> pg_proc.h is >5k lines already.  I don't want it to be 100k lines
> >> instead.
> >
> > Do you have a better suggestion? Sure it'll be a long file, but it still
> > seems vastly superiour to what we have now.
> 
> Not really.  What had occurred to me is to try to improve the format
> of the DATA lines (e.g. by allowing names to be used instead of OIDs)

That's a separate patch so far, so if we decide to only want thta we can
do it.

> but that wouldn't allow defaulted fields to be omitted, which is
> certainly a big win.  I wonder whether some home-grown single-line
> format might be better than using a pre-existing format, but I'm not
> too sure it would.

I can't see readability of anything being good unless the column names
are there - we just have too many columns in some of the tables. I think
having more lines is a acceptable price to pay. We can easily start to
split the files at some point if we want, that'd just be a couple lines
of code.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Peter Eisentraut

Date:

04 March 2015, 17:06:16

On 3/3/15 9:49 PM, Robert Haas wrote:
>> Yeah.  One thought though is that I don't think we need the "data" layer
>> in your proposal; that is, I'd flatten the representation to something
>> more like
>>
>>      {
>>          oid => 2249,
>>          oiddefine => 'CSTRINGOID',
>>          typname => 'cstring',
>>          typlen => -2,
>>          typbyval => 1,
>>          ...
>>      }
> 
> Even this promises to vastly increase the number of lines in the file,

I think lines are cheap.  Columns are much harder to deal with.

> and make it harder to compare entries by grepping out some common
> substring.

Could you give an example of the sort of thing you wish to do?

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

04 March 2015, 17:42:39

Peter Eisentraut <peter_e@gmx.net> writes:
> On 3/3/15 9:49 PM, Robert Haas wrote:
>> Even this promises to vastly increase the number of lines in the file,

> I think lines are cheap.  Columns are much harder to deal with.

Yeah.  pg_proc.h is already impossible to work with in a standard
80-column window.  I don't want to find that the lines mostly wrap even
when I expand my editor window to full screen width, but that is certainly
what will happen if we adopt column labelling *and* insist that entries
remain all on one line.  (As a data point, the maximum usable Emacs window
width on my Mac laptop seems to be about 230 characters.)

It's possible that gaining the ability to depend on per-column defaults
would reduce the typical number of fields so much that pg_proc.h entries
would still fit on a line of 100-some characters ... but I'd want to see
proof before assuming that.  And pg_proc isn't even our widest catalog.
Some of the ones that are wider, like pg_am, don't seem like there would
be any scope whatsoever for saving space with per-column defaults.

So while I can see the attraction of trying to fit things on one line,
I doubt it's gonna work very well.  I'd rather go over to a
one-value-per-line format and live with lots of lines.

>> and make it harder to compare entries by grepping out some common
>> substring.

> Could you give an example of the sort of thing you wish to do?

On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is.  I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 17:51:47

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> and make it harder to compare entries by grepping out some common
>> substring.
>
> Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 17:55:05

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> and make it harder to compare entries by grepping out some common
>>> substring.
>
>> Could you give an example of the sort of thing you wish to do?
>
> On that angle, I'm dubious that a format that allows omission of fields is
> going to be easy for editing scripts to modify, no matter what the layout
> convention is.  I've found it relatively easy to write sed or even Emacs
> macros to add new column values to old-school pg_proc.h ... but in this
> brave new world, I'm going to be really hoping that the column default
> works for 99.9% of pg_proc entries when we add a new pg_proc column,
> because slipping a value into a desired position is gonna be hard for
> a script when you don't know whether the adjacent existing fields are
> present or not.

I wonder if we should have a tool in our repository to help people
edit the file.  So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

or

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

04 March 2015, 17:59:24

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> Could you give an example of the sort of thing you wish to do?

> e.g. grep for a function name and check that all the matches have the
> same volatility.

Well, grep is not going to work too well anymore, but extracting a
specific field from an entry is going to be beyond the competence of
simple grep/sed tools anyway if we allow column default substitutions.

I think a fairer question is "can you do that in a one-liner Perl script",
which seems like it might be achievable given an appropriate choice of
data markup language.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

04 March 2015, 18:04:55

On 03/04/2015 09:42 AM, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On 3/3/15 9:49 PM, Robert Haas wrote:
>>> Even this promises to vastly increase the number of lines in the file,
>> I think lines are cheap.  Columns are much harder to deal with.
> Yeah.  pg_proc.h is already impossible to work with in a standard
> 80-column window.  I don't want to find that the lines mostly wrap even
> when I expand my editor window to full screen width, but that is certainly
> what will happen if we adopt column labelling *and* insist that entries
> remain all on one line.  (As a data point, the maximum usable Emacs window
> width on my Mac laptop seems to be about 230 characters.)
>
> It's possible that gaining the ability to depend on per-column defaults
> would reduce the typical number of fields so much that pg_proc.h entries
> would still fit on a line of 100-some characters ... but I'd want to see
> proof before assuming that.  And pg_proc isn't even our widest catalog.
> Some of the ones that are wider, like pg_am, don't seem like there would
> be any scope whatsoever for saving space with per-column defaults.
>
> So while I can see the attraction of trying to fit things on one line,
> I doubt it's gonna work very well.  I'd rather go over to a
> one-value-per-line format and live with lots of lines.
>
>

Is it necessarily an all or nothing deal?

Taking a previous example, we could have something like:
    {         oid => 2249,  oiddefine => 'CSTRINGOID',  typname => 'cstring',         typlen => -2, typbyval => 1,
  ...    }
 

which would allow us to fit within a reasonable edit window (for my 
normal window and font that's around 180 characters) and still reduce 
the number of lines.

I'm not wedded to it, but it's a thought.

cheers

andrew

Re: Bootstrap DATA is a pita

From

Stephen Frost

Date:

04 March 2015, 18:07:20

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> and make it harder to compare entries by grepping out some common
> >>> substring.
> >
> >> Could you give an example of the sort of thing you wish to do?
> >
> > On that angle, I'm dubious that a format that allows omission of fields is
> > going to be easy for editing scripts to modify, no matter what the layout
> > convention is.  I've found it relatively easy to write sed or even Emacs
> > macros to add new column values to old-school pg_proc.h ... but in this
> > brave new world, I'm going to be really hoping that the column default
> > works for 99.9% of pg_proc entries when we add a new pg_proc column,
> > because slipping a value into a desired position is gonna be hard for
> > a script when you don't know whether the adjacent existing fields are
> > present or not.
>
> I wonder if we should have a tool in our repository to help people
> edit the file.  So instead of going in there yourself and changing
> things by hand, or writing your own script, you can do:
>
> updatepgproc.pl --oid 5678 provolatile=v
>
> or
>
> updatepgpproc.pl --name='.*xact.*' prowhatever=someval
>
> Regardless of what format we end up with, that seems like it would
> make things easier.

Alright, I'll bite on this- we have this really neat tool for editing
data in bulk, or individual values, or pulling out data to look at based
on particular values or even functions...  It's called PostgreSQL.

What if we had an easy way to export an existing table into whatever
format we decide to use for initdb to use?  For that matter, what if
that file was simple to import into PG?

What about having a way to load all the catalog tables from their git
repo files into a "pg_dev" schema?  Maybe even include a make target or
initdb option which does that?  (the point here being to provide a way
to modify the tables and compare the results to the existing tables
without breaking the instance one is using for this)

I have to admit that I've never tried to do that with the existing
format, but seems like an interesting idea to consider.  I further
wonder if it'd be possible to generate the table structures too..
Thanks!
    Stephen

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

04 March 2015, 18:08:19

On 03/04/2015 09:51 AM, Robert Haas wrote:
> On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> and make it harder to compare entries by grepping out some common
>>> substring.
>> Could you give an example of the sort of thing you wish to do?
> e.g. grep for a function name and check that all the matches have the
> same volatility.
>

I think grep will be the wrong tool for this format, but if we're 
settling on a perl format, a few perl one-liners should be able to work 
pretty well. It might be worth shipping a small perl module that 
provided some functions, or a script doing common tasks (or both).

cheers

andrew

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

04 March 2015, 18:18:02

On 2015-03-04 09:55:01 -0500, Robert Haas wrote:
> On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder if we should have a tool in our repository to help people
> edit the file.  So instead of going in there yourself and changing
> things by hand, or writing your own script, you can do:
> 
> updatepgproc.pl --oid 5678 provolatile=v
> 
> or
> 
> updatepgpproc.pl --name='.*xact.*' prowhatever=someval
> 
> Regardless of what format we end up with, that seems like it would
> make things easier.

The stuff I've started to work on basically allows to load the stuff
that Catalog.pm provides (in an extended format), edit it in memory, and
then serialize it again. So such a thing could relatively easily be
added if somebody wants to do so.  I sure hope though that the need for
it will become drastically lower with the new format.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 18:26:07

On Wed, Mar 4, 2015 at 10:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Is it necessarily an all or nothing deal?
>
> Taking a previous example, we could have something like:
>
>     {
>          oid => 2249,  oiddefine => 'CSTRINGOID',  typname => 'cstring',
>          typlen => -2, typbyval => 1,
>          ...
>     }
>
> which would allow us to fit within a reasonable edit window (for my normal
> window and font that's around 180 characters) and still reduce the number of
> lines.
>
> I'm not wedded to it, but it's a thought.

Another advantage of this is that it would probably make git less
likely to fumble a rebase.  If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk.  I would personally consider a substantial increase in the rate
of such occurrences as being a cure far, far worse than the disease.
If you keep the entry for each function on just a couple of lines the
chances of this happening are greatly reduced, because you're much
likely to get a false match to surrounding context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

04 March 2015, 18:32:35

Robert Haas <robertmhaas@gmail.com> writes:
> Another advantage of this is that it would probably make git less
> likely to fumble a rebase.  If there are lots of places in the file
> where we have the same 10 lines in a row with occasional variations,
> rebasing a patch could easily pick the the wrong place to reapply the
> hunk.

That is a really, really good point.

I had been thinking it was a disadvantage of Andrew's proposal that
line breaks would tend to fall in inconsistent places from one entry
to another ... but from this perspective, maybe that's not such a
bad thing.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Alvaro Herrera

Date:

04 March 2015, 22:26:55

Andrew Dunstan wrote:
> 
> On 03/04/2015 09:51 AM, Robert Haas wrote:
> >On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> >>>and make it harder to compare entries by grepping out some common
> >>>substring.
> >>Could you give an example of the sort of thing you wish to do?
> >e.g. grep for a function name and check that all the matches have the
> >same volatility.
> 
> I think grep will be the wrong tool for this format, but if we're settling
> on a perl format, a few perl one-liners should be able to work pretty well.
> It might be worth shipping a small perl module that provided some functions,
> or a script doing common tasks (or both).

I was going to say the same thing.  We need to make sure that the output
format of those oneliners is consistent, though -- it wouldn't be nice
if adding one column with nondefault value to a dozen of entries changes
the formatting of other entries.  For example, perhaps declare that the
order of entries is alphabetical or it matches something declared at the
start of the file.

From that POV, I don't like the idea of having multiple columns for a
sigle entry in a single line; adding more columns means that eventually
we're going to split lines that have become too long in a different
place, which would reformat the whole file; not very nice.  But maybe
this doesn't matter if we decree that changing the column split is a
manual chore rather than automatic, because then it can be done in a
separate mechanical commit after the extra column is added.

BTW one solution to the merge problem is to have unique separators for
each entry.  For instance, instead of

}        -- this is the end of the previous entry
,
{ oid = 2233,proname = array_append,

we could have
} # array_prepend 2232
,
} # array_append 2233oid = 2233,proname = array_append,

where the funcname-oid comment is there to avoid busted merges.  The
automatic editing tools make sure that those markers are always present.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

04 March 2015, 22:45:20

On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Andrew Dunstan wrote:
>> On 03/04/2015 09:51 AM, Robert Haas wrote:
>> >On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> >>>and make it harder to compare entries by grepping out some common
>> >>>substring.
>> >>Could you give an example of the sort of thing you wish to do?
>> >e.g. grep for a function name and check that all the matches have the
>> >same volatility.
>>
>> I think grep will be the wrong tool for this format, but if we're settling
>> on a perl format, a few perl one-liners should be able to work pretty well.
>> It might be worth shipping a small perl module that provided some functions,
>> or a script doing common tasks (or both).
>
> I was going to say the same thing.  We need to make sure that the output
> format of those oneliners is consistent, though -- it wouldn't be nice
> if adding one column with nondefault value to a dozen of entries changes
> the formatting of other entries.  For example, perhaps declare that the
> order of entries is alphabetical or it matches something declared at the
> start of the file.
>
> From that POV, I don't like the idea of having multiple columns for a
> sigle entry in a single line; adding more columns means that eventually
> we're going to split lines that have become too long in a different
> place, which would reformat the whole file; not very nice.  But maybe
> this doesn't matter if we decree that changing the column split is a
> manual chore rather than automatic, because then it can be done in a
> separate mechanical commit after the extra column is added.
>
> BTW one solution to the merge problem is to have unique separators for
> each entry.  For instance, instead of
>
> }               -- this is the end of the previous entry
> ,
> {
>         oid = 2233,
>         proname = array_append,
>
> we could have
> } # array_prepend 2232
> ,
> } # array_append 2233
>         oid = 2233,
>         proname = array_append,
>
> where the funcname-oid comment is there to avoid busted merges.  The
> automatic editing tools make sure that those markers are always present.

Speaking from entirely too much experience, that's not nearly enough.
git only needs 3 lines of context to apply a hunk with no qualms at
all, and it'll shade that to just 1 or 2 with little fanfare.  If your
pg_proc entries are each 20 lines long, this sort of thing will
provide little protection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Peter Eisentraut

Date:

04 March 2015, 23:20:52

On 3/4/15 9:51 AM, Robert Haas wrote:
> On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> and make it harder to compare entries by grepping out some common
>>> substring.
>>
>> Could you give an example of the sort of thing you wish to do?
> 
> e.g. grep for a function name and check that all the matches have the
> same volatility.

You could still do that with grep -A or something like that.  I think
that it would be easier than now.

Re: Bootstrap DATA is a pita

From

Alvaro Herrera

Date:

07 March 2015, 01:24:54

Robert Haas wrote:
> On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> > BTW one solution to the merge problem is to have unique separators for
> > each entry.  For instance, instead of

> Speaking from entirely too much experience, that's not nearly enough.
> git only needs 3 lines of context to apply a hunk with no qualms at
> all, and it'll shade that to just 1 or 2 with little fanfare.  If your
> pg_proc entries are each 20 lines long, this sort of thing will
> provide little protection.

Yeah, you're right.  This is going to be a problem, and we need some
solution for it.  I'm out of ideas, other than of course getting each
entry to be at most two lines long which nobody seems to like (for good
reasons.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bootstrap DATA is a pita

From

Jim Nasby

Date:

08 March 2015, 01:43:51

On 3/4/15 9:07 AM, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> and make it harder to compare entries by grepping out some common
>>>>> substring.
>>>
>>>> Could you give an example of the sort of thing you wish to do?
>>>
>>> On that angle, I'm dubious that a format that allows omission of fields is
>>> going to be easy for editing scripts to modify, no matter what the layout
>>> convention is.  I've found it relatively easy to write sed or even Emacs
>>> macros to add new column values to old-school pg_proc.h ... but in this
>>> brave new world, I'm going to be really hoping that the column default
>>> works for 99.9% of pg_proc entries when we add a new pg_proc column,
>>> because slipping a value into a desired position is gonna be hard for
>>> a script when you don't know whether the adjacent existing fields are
>>> present or not.
>>
>> I wonder if we should have a tool in our repository to help people
>> edit the file.  So instead of going in there yourself and changing
>> things by hand, or writing your own script, you can do:
>>
>> updatepgproc.pl --oid 5678 provolatile=v
>>
>> or
>>
>> updatepgpproc.pl --name='.*xact.*' prowhatever=someval
>>
>> Regardless of what format we end up with, that seems like it would
>> make things easier.
>
> Alright, I'll bite on this- we have this really neat tool for editing
> data in bulk, or individual values, or pulling out data to look at based
> on particular values or even functions...  It's called PostgreSQL.
>
> What if we had an easy way to export an existing table into whatever
> format we decide to use for initdb to use?  For that matter, what if
> that file was simple to import into PG?
>
> What about having a way to load all the catalog tables from their git
> repo files into a "pg_dev" schema?  Maybe even include a make target or
> initdb option which does that?  (the point here being to provide a way
> to modify the tables and compare the results to the existing tables
> without breaking the instance one is using for this)
>
> I have to admit that I've never tried to do that with the existing
> format, but seems like an interesting idea to consider.  I further
> wonder if it'd be possible to generate the table structures too..

Semi-related... if we put some special handling in some places for 
bootstrap mode, couldn't most catalog objects be created using SQL, once 
we got pg_class, pg_attributes and pg_type created? That would 
theoretically allow us to drive much more of initdb with plain SQL 
(possibly created via pg_dump).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

08 March 2015, 01:47:16

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
> Semi-related... if we put some special handling in some places for bootstrap
> mode, couldn't most catalog objects be created using SQL, once we got
> pg_class, pg_attributes and pg_type created? That would theoretically allow
> us to drive much more of initdb with plain SQL (possibly created via
> pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

08 March 2015, 02:30:43

On 03/07/2015 05:46 PM, Andres Freund wrote:
> On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
>> Semi-related... if we put some special handling in some places for bootstrap
>> mode, couldn't most catalog objects be created using SQL, once we got
>> pg_class, pg_attributes and pg_type created? That would theoretically allow
>> us to drive much more of initdb with plain SQL (possibly created via
>> pg_dump).
> Several people have now made that suggestion, but I *seriously* doubt
> that we actually want to go there. The overhead of executing SQL
> commands in comparison to the bki stuff is really rather
> noticeable. Doing the majority of the large number of insertions via SQL
> will make initdb noticeably slower. And it's already annoyingly
> slow. Besides make install it's probably the thing I wait most for
> during development.


My reaction exactly. We should not make users pay a price for 
developers' convenience.

>
> That's besides the fact that SQL commands aren't actually that
> comfortably editable in bulk.

Indeed.

cheers

andrew

Re: Bootstrap DATA is a pita

From

Stephen Frost

Date:

08 March 2015, 03:03:02

* Andrew Dunstan (andrew@dunslane.net) wrote:
> On 03/07/2015 05:46 PM, Andres Freund wrote:
> >On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
> >>Semi-related... if we put some special handling in some places for bootstrap
> >>mode, couldn't most catalog objects be created using SQL, once we got
> >>pg_class, pg_attributes and pg_type created? That would theoretically allow
> >>us to drive much more of initdb with plain SQL (possibly created via
> >>pg_dump).
> >Several people have now made that suggestion, but I *seriously* doubt
> >that we actually want to go there. The overhead of executing SQL
> >commands in comparison to the bki stuff is really rather
> >noticeable. Doing the majority of the large number of insertions via SQL
> >will make initdb noticeably slower. And it's already annoyingly
> >slow. Besides make install it's probably the thing I wait most for
> >during development.
>
> My reaction exactly. We should not make users pay a price for
> developers' convenience.

Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.

> >That's besides the fact that SQL commands aren't actually that
> >comfortably editable in bulk.
>
> Indeed.

No, they aren't, but having the data in a table in PG, with a way to
easily export to the format needed by BKI, would make bulk updates much
easier..
Thanks!
    Stephen

Re: Bootstrap DATA is a pita

From

Jim Nasby

Date:

08 March 2015, 03:10:03

On 3/7/15 6:02 PM, Stephen Frost wrote:
> * Andrew Dunstan (andrew@dunslane.net) wrote:
>> On 03/07/2015 05:46 PM, Andres Freund wrote:
>>> On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
>>>> Semi-related... if we put some special handling in some places for bootstrap
>>>> mode, couldn't most catalog objects be created using SQL, once we got
>>>> pg_class, pg_attributes and pg_type created? That would theoretically allow
>>>> us to drive much more of initdb with plain SQL (possibly created via
>>>> pg_dump).
>>> Several people have now made that suggestion, but I *seriously* doubt
>>> that we actually want to go there. The overhead of executing SQL
>>> commands in comparison to the bki stuff is really rather
>>> noticeable. Doing the majority of the large number of insertions via SQL
>>> will make initdb noticeably slower. And it's already annoyingly
>>> slow. Besides make install it's probably the thing I wait most for
>>> during development.
>>
>> My reaction exactly. We should not make users pay a price for
>> developers' convenience.

How often does a normal user actually initdb? I don't think it's that 
incredibly common. Added time to our development cycle certainly is a 
concern though.

> Just to clarify, since Jim was responding to my comment, my thought was
> *not* to use SQL commands inside initdb, but rather to use PG to create
> the source files that we have today in our tree, which wouldn't slow
> down initdb at all.

Yeah, I was thinking SQL would make it even easier, but perhaps not. 
Since the other options here seem to have hit a dead end though, it 
seems your load it into tables idea is what we've got left...

>>> That's besides the fact that SQL commands aren't actually that
>>> comfortably editable in bulk.
>>
>> Indeed.
>
> No, they aren't, but having the data in a table in PG, with a way to
> easily export to the format needed by BKI, would make bulk updates much
> easier..

My thought was that pg_dump would be useful here, so instead of hand 
editing you'd just make changes in a live database and then dump it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

08 March 2015, 19:35:52

On 2015-03-04 10:25:58 -0500, Robert Haas wrote:
> Another advantage of this is that it would probably make git less
> likely to fumble a rebase.  If there are lots of places in the file
> where we have the same 10 lines in a row with occasional variations,
> rebasing a patch could easily pick the the wrong place to reapply the
> hunk.  I would personally consider a substantial increase in the rate
> of such occurrences as being a cure far, far worse than the disease.
> If you keep the entry for each function on just a couple of lines the
> chances of this happening are greatly reduced, because you're much
> likely to get a false match to surrounding context.

I'm not particularly worried about this. Especially with attribute
defaults it seems unlikely that you often have the same three
surrounding lines in both directions in a similar region of the file.

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

08 March 2015, 21:12:03

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-03-04 10:25:58 -0500, Robert Haas wrote:
>> Another advantage of this is that it would probably make git less
>> likely to fumble a rebase.  If there are lots of places in the file
>> where we have the same 10 lines in a row with occasional variations,
>> rebasing a patch could easily pick the the wrong place to reapply the
>> hunk.  I would personally consider a substantial increase in the rate
>> of such occurrences as being a cure far, far worse than the disease.
>> If you keep the entry for each function on just a couple of lines the
>> chances of this happening are greatly reduced, because you're much
>> likely to get a false match to surrounding context.

> I'm not particularly worried about this. Especially with attribute
> defaults it seems unlikely that you often have the same three
> surrounding lines in both directions in a similar region of the file.

Really?  A lot depends on the details of how we choose to lay out these
files, but you could easily blow all your safety margin on lines
containing just braces, for instance.

I'll reserve judgment on this till I see the proposed new catalog data
files, but I absolutely reject any contention that it's not something
to worry about.

> And even if it turns out to actually be bothersome, you can help
> yourself by passing -U 5/setting diff.context = 5 or something like
> that.

Um.  Good luck with getting every patch submitter to do that.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

08 March 2015, 21:22:53

Stephen Frost <sfrost@snowman.net> writes:
> * Andrew Dunstan (andrew@dunslane.net) wrote:
>> On 03/07/2015 05:46 PM, Andres Freund wrote:
>>> On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
>>>> Semi-related... if we put some special handling in some places for bootstrap
>>>> mode, couldn't most catalog objects be created using SQL, once we got
>>>> pg_class, pg_attributes and pg_type created?

>>> Several people have now made that suggestion, but I *seriously* doubt
>>> that we actually want to go there. The overhead of executing SQL
>>> commands in comparison to the bki stuff is really rather
>>> noticeable. Doing the majority of the large number of insertions via SQL
>>> will make initdb noticeably slower. And it's already annoyingly
>>> slow. Besides make install it's probably the thing I wait most for
>>> during development.

>> My reaction exactly. We should not make users pay a price for
>> developers' convenience.

Another reason not to do this is that it would require a significant (in
my judgment) amount of crapification of a lot of code with bootstrap-mode
special cases.  Neither the parser, the planner, nor the executor could
function in bootstrap mode without a lot of lobotomization.  Far better
to confine all that ugliness to bootstrap.c.

> Just to clarify, since Jim was responding to my comment, my thought was
> *not* to use SQL commands inside initdb, but rather to use PG to create
> the source files that we have today in our tree, which wouldn't slow
> down initdb at all.

That, on the other hand, might be a sane suggestion.  I'm not sure
though.  It feels more like "use the hammer you have at hand" than
necessarily being a good fit.  In particular, keeping the raw data in
some tables doesn't seem like an environment that would naturally
distinguish between hard-coded and defaultable values.  For instance,
how would you distinguish hard-coded OIDs from ones that could be
assigned at initdb's whim?
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Alvaro Herrera

Date:

09 March 2015, 05:11:23

Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:

> > And even if it turns out to actually be bothersome, you can help
> > yourself by passing -U 5/setting diff.context = 5 or something like
> > that.
> 
> Um.  Good luck with getting every patch submitter to do that.

Can we do it centrally somehow?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bootstrap DATA is a pita

From

Andrew Dunstan

Date:

09 March 2015, 05:33:37

On 03/08/2015 10:11 PM, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> And even if it turns out to actually be bothersome, you can help
>>> yourself by passing -U 5/setting diff.context = 5 or something like
>>> that.
>> Um.  Good luck with getting every patch submitter to do that.
> Can we do it centrally somehow?
>


I don't believe there is provision for setting diff.context on a per 
file basis.

cheers

andrew

Re: Bootstrap DATA is a pita

From

Robert Haas

Date:

09 March 2015, 18:51:56

On Sun, Mar 8, 2015 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-03-04 10:25:58 -0500, Robert Haas wrote:
>> Another advantage of this is that it would probably make git less
>> likely to fumble a rebase.  If there are lots of places in the file
>> where we have the same 10 lines in a row with occasional variations,
>> rebasing a patch could easily pick the the wrong place to reapply the
>> hunk.  I would personally consider a substantial increase in the rate
>> of such occurrences as being a cure far, far worse than the disease.
>> If you keep the entry for each function on just a couple of lines the
>> chances of this happening are greatly reduced, because you're much
>> likely to get a false match to surrounding context.
>
> I'm not particularly worried about this. Especially with attribute
> defaults it seems unlikely that you often have the same three
> surrounding lines in both directions in a similar region of the file.

That's woefully optimistic, and you don't need to have 3 lines.  1 or
2 will do fine.

> And even if it turns out to actually be bothersome, you can help
> yourself by passing -U 5/setting diff.context = 5 or something like
> that.

I don't believe that for a minute.  When you have your own private
branch and you do 'git rebase master', how's that going to help?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

09 March 2015, 19:32:07

On 2015-03-07 18:09:36 -0600, Jim Nasby wrote:
> How often does a normal user actually initdb? I don't think it's that
> incredibly common. Added time to our development cycle certainly is a
> concern though.

There's many shops that run initdb as part of their test/CI systems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

10 December 2015, 22:48:21

Hello Hackers,

Reviving an old thread on simplifying the bootstrap process.

I'm a developer from the GPDB / HAWQ side of the world where we did some work a while back to enable catalog definition via SQL files and we have found it valuable from a dev perspective. The mechanism currently in those products is a bit.. convoluted where SQL is processed in perl to create the existing DATA statements, which are then processed as they are today in Postgres... I wouldn't suggest this route, but having worked with both the DATA mechanism and the SQL based one I've certainly found SQL to be a more convenient way of interacting with the catalog.

I'd propose:

- Keep enough of the existing bootstrap mechanism functional to get a small tidy core, essentially you need enough of pg_type, pg_proc, pg_class, pg_attribute to support the 25 types used by catalog tables and most everything else can be moved into SQL processing like how system_views.sql is handled today.

The above was largely proposed back in March and rejected based on concerns that

1. initdb would be slower.

2. It would introduce too much special purpose bootstrap cruft into the code.

3. Editing SQL commands is not comfortable in bulk

On 1.

I have a prototype that handles about 1000 functions (all the functions in pg_proc.h that are not used by other catalog tables, e.g. pg_type, pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc).

All of initdb can be processed in 1.53s. This compares to 1.37s with the current bootstrap approach. So yes, this is slower, but not 'noticeably slower' - I certainly didn't notice the 0.16s until I saw the concern and then timed it.

On 2.

So far the amount of cruft has been:

- Enabling adding functions with specific OIDs when creating functions.

1 line changes in pg_aggregate.c, proclang.c, typecmds.c

about dozen lines of code in functioncmds.c

3 lines changed in pg_proc.c

- Update the fmgr_internal_validator for builtin functions while the catalog is mutable

3 lines changed in pg_proc.c

- Update how the builtin function cache is built

Some significant work in fmgr.c that honestly still needs cleanup before it would be ready to propose as a patch that would be worthy of committing.

- Update how builtin functions are resolved outside of bootstrap

Minor updates to dynloader for lookup of symbols within the current executable, so far I've only done darwin.c for my prototype, this would need to be extended to the other ports.

- Initializitation of the builtin cache

2 line change in postinit.c

- Addition of a stage in initdb to process the sql directives similar in scope to the processing of system_views.sql.

No changes needed in the parser, planner, etc. My assessment is that this worry is not a major concern in practice with the right implementation.

On 3.

Having worked with both SQL and bki DATA directives I have personally found the convenience of SQL outweighs the pain. In many cases changes, such as adding a new column to pg_proc, have minimal impact on the SQL representation and what changes are needed are often simple to implement. E.g. accounting for COST only needs to be done for the functions that need something other than the default value. This however is somewhat subjective.

On the Pros side:

a. Debugging bootstrap is extremely painful, debugging once initdb has gotten to 'postgres --single' is way easier.

b. It is easier to introduce minor issues with DATA directives than it is when using the SQL processing used for all other user objects.

Example: currently in Postgres all builtin functions default to COST 1, and all SQL functions default to cost 100. However the following SQL functions included in bootstrap inexplicably are initialized with a COST of 1:

age(timestamp with time zone)

age(timestamp without time zone)

bit_length(bytea)

bit_length(text)

bit_length(bit)

date_part(text, abstime)

date_part(text, reltime)

date_part(text, date)

... and 26 other examples

c. SQL files are significantly less of a PITA (subjective opinion, but I can say this from a perspective of experience working with both DATA directives and SQL driven catalog definition).

If people have interest I can share my patch so far if that helps address concerns, but if there is not interest then I'll probably leave my prototype where it is rather than investing more effort in the proof of concept.

Thanks,

Caleb

On Sat, Mar 7, 2015 at 5:20 PM, <pgsql-hackers-owner@postgresql.org> wrote:

Date: Sat, 7 Mar 2015 23:46:54 +0100
From: Andres Freund <andres@2ndquadrant.com>
To: Jim Nasby <Jim.Nasby@BlueTreble.com>
Cc: Stephen Frost <sfrost@snowman.net>, Robert Haas <robertmhaas@gmail.com>,
Tom Lane <tgl@sss.pgh.pa.us>, Peter Eisentraut <peter_e@gmx.net>,
Josh Berkus <josh@agliodbs.com>,
"pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Subject: Re: Bootstrap DATA is a pita
Message-ID: <20150307224654.GC12213@awork2.anarazel.de>

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
> Semi-related... if we put some special handling in some places for bootstrap
> mode, couldn't most catalog objects be created using SQL, once we got
> pg_class, pg_attributes and pg_type created? That would theoretically allow
> us to drive much more of initdb with plain SQL (possibly created via
> pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

11 December 2015, 22:16:24

I'm happy working these ideas forward if there is interest.

Basic design proposal is:

- keep a minimal amount of bootstrap to avoid intrusive changes to core components

- Add capabilities of creating objects with specific OIDs via DDL during initdb

- Update the caching/resolution mechanism for builtin functions to be more dynamic.

- Move as much of bootstrap as possible into SQL files and create catalog via DDL

Feedback appreciated.

I can provide a sample patch if there is interest, about ~500 lines of combined diff for the needed infrastructure to support the above, not including the modifications to pg_proc.h that would follow.

Thanks,

Caleb

On Thu, Dec 10, 2015 at 11:47 AM, Caleb Welton wrote:

Hello Hackers,

Reviving an old thread on simplifying the bootstrap process.

I'm a developer from the GPDB / HAWQ side of the world where we did some
work a while back to enable catalog definition via SQL files and we have
found it valuable from a dev perspective. The mechanism currently in those
products is a bit.. convoluted where SQL is processed in perl to create the
existing DATA statements, which are then processed as they are today in
Postgres... I wouldn't suggest this route, but having worked with both the
DATA mechanism and the SQL based one I've certainly found SQL to be a more
convenient way of interacting with the catalog.

I'd propose:
- Keep enough of the existing bootstrap mechanism functional to get a
small tidy core, essentially you need enough of pg_type, pg_proc, pg_class,
pg_attribute to support the 25 types used by catalog tables and most
everything else can be moved into SQL processing like how system_views.sql
is handled today.

The above was largely proposed back in March and rejected based on
concerns that

1. initdb would be slower.
2. It would introduce too much special purpose bootstrap cruft into the
code.
3. Editing SQL commands is not comfortable in bulk

On 1.

I have a prototype that handles about 1000 functions (all the functions in
pg_proc.h that are not used by other catalog tables, e.g. pg_type,
pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc).

All of initdb can be processed in 1.53s. This compares to 1.37s with the
current bootstrap approach. So yes, this is slower, but not 'noticeably
slower' - I certainly didn't notice the 0.16s until I saw the concern and
then timed it.

On 2.

So far the amount of cruft has been:
- Enabling adding functions with specific OIDs when creating functions.
1 line changes in pg_aggregate.c, proclang.c, typecmds.c
about dozen lines of code in functioncmds.c
3 lines changed in pg_proc.c
- Update the fmgr_internal_validator for builtin functions while the
catalog is mutable
3 lines changed in pg_proc.c
- Update how the builtin function cache is built
Some significant work in fmgr.c that honestly still needs cleanup
before it would be ready to propose as a patch that would be worthy of
committing.
- Update how builtin functions are resolved outside of bootstrap
Minor updates to dynloader for lookup of symbols within the current
executable, so far I've only done darwin.c for my prototype, this would
need to be extended to the other ports.
- Initializitation of the builtin cache
2 line change in postinit.c
- Addition of a stage in initdb to process the sql directives similar in
scope to the processing of system_views.sql.

No changes needed in the parser, planner, etc. My assessment is that this
worry is not a major concern in practice with the right implementation.

On 3.

Having worked with both SQL and bki DATA directives I have personally found
the convenience of SQL outweighs the pain. In many cases changes, such as
adding a new column to pg_proc, have minimal impact on the SQL
representation and what changes are needed are often simple to implement.
E.g. accounting for COST only needs to be done for the functions that need
something other than the default value. This however is somewhat
subjective.

On the Pros side:

a. Debugging bootstrap is extremely painful, debugging once initdb has
gotten to 'postgres --single' is way easier.

b. It is easier to introduce minor issues with DATA directives than it is
when using the SQL processing used for all other user objects.

Example: currently in Postgres all builtin functions default to COST 1,
and all SQL functions default to cost 100. However the following SQL
functions included in bootstrap inexplicably are initialized with a COST of
1:
age(timestamp with time zone)
age(timestamp without time zone)
bit_length(bytea)
bit_length(text)
bit_length(bit)
date_part(text, abstime)
date_part(text, reltime)
date_part(text, date)
... and 26 other examples

c. SQL files are significantly less of a PITA (subjective opinion, but I
can say this from a perspective of experience working with both DATA
directives and SQL driven catalog definition).

If people have interest I can share my patch so far if that helps address
concerns, but if there is not interest then I'll probably leave my
prototype where it is rather than investing more effort in the proof of
concept.

Thanks,
Caleb

Re: Bootstrap DATA is a pita

From

Alvaro Herrera

Date:

11 December 2015, 22:43:33

Caleb Welton wrote:
> I'm happy working these ideas forward if there is interest.
> 
> Basic design proposal is:
>   - keep a minimal amount of bootstrap to avoid intrusive changes to core
> components
>   - Add capabilities of creating objects with specific OIDs via DDL during
> initdb
>   - Update the caching/resolution mechanism for builtin functions to be
> more dynamic.
>   - Move as much of bootstrap as possible into SQL files and create catalog
> via DDL

I think the point we got stuck last time at was deciding on a good
format for the data coming from the DATA lines.  One of the objections
raised for formats such as JSON is that it's trivial for "git merge" (or
similar tools) to make a mistake because object-end/object-start lines
are all identical.  And as for the SQL-format version, the objection was
that it's hard to modify the lines en-masse when modifying the catalog
definition (new column, etc).  Ideally we would like a format that can
be bulk-edited without too much trouble.

A SQL file would presumably not have the merge issue, but mass-editing
would be a pain.

Crazy idea: we could just have a CSV file which can be loaded into a
table for mass changes using regular DDL commands, then dumped back from
there into the file.  We already know how to do these things, using
\copy etc.  Since CSV uses one line per entry, there would be no merge
problems either (or rather: all merge problems would become conflicts,
which is what we want.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

11 December 2015, 23:55:13

Makes sense.

During my own prototyping what I did was generate the sql statements via sql querying the existing catalog.  Way easier
thanhand writing 1000+ function definitions and not difficult to modify for future changes.  As affirmed that it was
veryeasy to adapt my existing sql to account for some of the newer features in master.  

The biggest challenge was establishing a sort order that ensures both a unique ordering and that the dependencies
neededfor SQL functions have been processed before trying to define them.  Which effects about 4/1000 functions based
ona natural oid ordering.   

> On Dec 11, 2015, at 11:43 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Caleb Welton wrote:
>> I'm happy working these ideas forward if there is interest.
>>
>> Basic design proposal is:
>>  - keep a minimal amount of bootstrap to avoid intrusive changes to core
>> components
>>  - Add capabilities of creating objects with specific OIDs via DDL during
>> initdb
>>  - Update the caching/resolution mechanism for builtin functions to be
>> more dynamic.
>>  - Move as much of bootstrap as possible into SQL files and create catalog
>> via DDL
>
> I think the point we got stuck last time at was deciding on a good
> format for the data coming from the DATA lines.  One of the objections
> raised for formats such as JSON is that it's trivial for "git merge" (or
> similar tools) to make a mistake because object-end/object-start lines
> are all identical.  And as for the SQL-format version, the objection was
> that it's hard to modify the lines en-masse when modifying the catalog
> definition (new column, etc).  Ideally we would like a format that can
> be bulk-edited without too much trouble.
>
> A SQL file would presumably not have the merge issue, but mass-editing
> would be a pain.
>
> Crazy idea: we could just have a CSV file which can be loaded into a
> table for mass changes using regular DDL commands, then dumped back from
> there into the file.  We already know how to do these things, using
> \copy etc.  Since CSV uses one line per entry, there would be no merge
> problems either (or rather: all merge problems would become conflicts,
> which is what we want.)
>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 00:46:53

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Crazy idea: we could just have a CSV file which can be loaded into a
> table for mass changes using regular DDL commands, then dumped back from
> there into the file.  We already know how to do these things, using
> \copy etc.  Since CSV uses one line per entry, there would be no merge
> problems either (or rather: all merge problems would become conflicts,
> which is what we want.)

That's an interesting proposal.  It would mean that the catalog files
stay at more or less their current semantic level (direct representations
of bootstrap catalog contents), but it does sound like a more attractive
way to perform complex edits than writing Emacs macros ;-).

You could actually do that the hard way right now, with a bit of script
to convert between DATA lines and CSV format.  But if we anticipate that
becoming the standard approach, it would definitely make sense to migrate
the master copies into CSV or traditional COPY format, and teach BKI mode
to read that (or, perhaps, leave bootstrap.c alone and modify the code
that produces the .bki file).

This is somewhat orthogonal to the question of whether we want to do
things like converting noncritical operator-class definitions into
regular CREATE OPERATOR CLASS syntax.  There's almost certainly going
to be some hard core of catalog entries that aren't amenable to that,
and will still need to be loaded from data files of some sort.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Mark Dilger

Date:

12 December 2015, 01:24:52

> On Dec 11, 2015, at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Crazy idea: we could just have a CSV file which can be loaded into a
>> table for mass changes using regular DDL commands, then dumped back from
>> there into the file.  We already know how to do these things, using
>> \copy etc.  Since CSV uses one line per entry, there would be no merge
>> problems either (or rather: all merge problems would become conflicts,
>> which is what we want.)
>
> That's an interesting proposal.  It would mean that the catalog files
> stay at more or less their current semantic level (direct representations
> of bootstrap catalog contents), but it does sound like a more attractive
> way to perform complex edits than writing Emacs macros ;-).

I would be happy to work on this, if there is much chance of the community
accepting a patch.  Do you think replacing the numeric Oids for functions,
operators, opclasses and such in the source files with their names would
be ok, with the SQL converting those to Oids in the output?  My eyes have
gotten tired more than once trying to read head files in src/include/catalog
looking for mistakes in what largely amounts to a big table of numbers.

For example, in pg_amop.h:

/* default operators int2 */
DATA(insert (   1976   21 21 1 s    95  403 0 ));
DATA(insert (   1976   21 21 2 s    522 403 0 ));
DATA(insert (   1976   21 21 3 s    94  403 0 ));
DATA(insert (   1976   21 21 4 s    524 403 0 ));
DATA(insert (   1976   21 21 5 s    520 403 0 ));

Would become something like:

amopfamily          amoplefttype        amoprighttype       amopstrategy        amoppurpose         amopopr
amopmethod         amopsortfamily 
integer_ops         int2                int2                1                   search              "<"
btree              0 
integer_ops         int2                int2                2                   search              "<="
btree              0 
integer_ops         int2                int2                3                   search              "="
btree              0 
integer_ops         int2                int2                4                   search              ">="
btree              0 
integer_ops         int2                int2                5                   search              ">"
btree              0 

Note that I prefer to use tabs and a headerline, as the tabstop can be set to
line them up nicely, and the headerline allows you to see which column is
which, and what it is for.  Csv is always harder for me to use that way, though
maybe that is just a matter of which editor i use.  (vim)

And yes, I'd need to allow the HEADER option for copying tab delimited
files, since it is currently only allowed for csv, I believe.

mark

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 01:40:27

Mark Dilger <hornschnorter@gmail.com> writes:
>> On Dec 11, 2015, at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That's an interesting proposal.  It would mean that the catalog files
>> stay at more or less their current semantic level (direct representations
>> of bootstrap catalog contents), but it does sound like a more attractive
>> way to perform complex edits than writing Emacs macros ;-).

> I would be happy to work on this, if there is much chance of the community
> accepting a patch.  Do you think replacing the numeric Oids for functions,
> operators, opclasses and such in the source files with their names would
> be ok, with the SQL converting those to Oids in the output?

Huh?  Those files are the definition of that mapping, no?  Isn't what
you're proposing circular?
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

12 December 2015, 01:41:35

I took a look at a few of the most recent bulk edit cases for pg_proc.h:

There were two this year:

* The addition of proparallel [1]

* The addition of protransform [2]

And prior to that the most recent seems to be from 2012:

* The addition of proleakproof [3]

Quick TLDR - the changes needed to reflect these are super simple to reflect when generating SQL for CREATE FUNCTION statements.

Attached is the SQL that would generate function definitions prior to proleakproof and the diffs that would be required after adding support for proleakproof, protransform and proparallel.

Each of the diffs indicates the changes that would be needed after the new column is added, the question of how to populate default values for the new columns is beyond the scope that can easily be expressed in general terms and depends entirely on what the nature of the new column is.

Note: Currently I have focused on the 'pure' functions, e.g. not the drivers of type serialization, language validation, operators, or other object types. I would want to deal with each of those while handling the conversion for each of those object types in turn. Additional modifications would likely be needed for other types of functions.

[1] https://github.com/postgres/postgres/commit/7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b

[2] https://github.com/postgres/postgres/commit/8f9fe6edce358f7904e0db119416b4d1080a83aa

[3] https://github.com/postgres/postgres/commit/cd30728fb2ed7c367d545fc14ab850b5fa2a4850

On Fri, Dec 11, 2015 at 12:55 PM, Caleb Welton <cwelton@pivotal.io> wrote:

Makes sense.

During my own prototyping what I did was generate the sql statements via sql querying the existing catalog. Way easier than hand writing 1000+ function definitions and not difficult to modify for future changes. As affirmed that it was very easy to adapt my existing sql to account for some of the newer features in master.

The biggest challenge was establishing a sort order that ensures both a unique ordering and that the dependencies needed for SQL functions have been processed before trying to define them. Which effects about 4/1000 functions based on a natural oid ordering.

> On Dec 11, 2015, at 11:43 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Caleb Welton wrote:
>> I'm happy working these ideas forward if there is interest.
>>
>> Basic design proposal is:
>> - keep a minimal amount of bootstrap to avoid intrusive changes to core
>> components
>> - Add capabilities of creating objects with specific OIDs via DDL during
>> initdb
>> - Update the caching/resolution mechanism for builtin functions to be
>> more dynamic.
>> - Move as much of bootstrap as possible into SQL files and create catalog
>> via DDL
>
> I think the point we got stuck last time at was deciding on a good
> format for the data coming from the DATA lines. One of the objections
> raised for formats such as JSON is that it's trivial for "git merge" (or
> similar tools) to make a mistake because object-end/object-start lines
> are all identical. And as for the SQL-format version, the objection was
> that it's hard to modify the lines en-masse when modifying the catalog
> definition (new column, etc). Ideally we would like a format that can
> be bulk-edited without too much trouble.
>
> A SQL file would presumably not have the merge issue, but mass-editing
> would be a pain.
>
> Crazy idea: we could just have a CSV file which can be loaded into a
> table for mass changes using regular DDL commands, then dumped back from
> there into the file. We already know how to do these things, using
> \copy etc. Since CSV uses one line per entry, there would be no merge
> problems either (or rather: all merge problems would become conflicts,
> which is what we want.)
>
> --
> Álvaro Herrera http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Bootstrap DATA is a pita

From

Mark Dilger

Date:

12 December 2015, 01:53:28

> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
>>> On Dec 11, 2015, at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That's an interesting proposal.  It would mean that the catalog files
>>> stay at more or less their current semantic level (direct representations
>>> of bootstrap catalog contents), but it does sound like a more attractive
>>> way to perform complex edits than writing Emacs macros ;-).
>
>> I would be happy to work on this, if there is much chance of the community
>> accepting a patch.  Do you think replacing the numeric Oids for functions,
>> operators, opclasses and such in the source files with their names would
>> be ok, with the SQL converting those to Oids in the output?
>
> Huh?  Those files are the definition of that mapping, no?  Isn't what
> you're proposing circular?

No, there are far more references to Oids than there are definitions of them.

For example, the line in pg_operator.h:

DATA(insert OID =  15 ( "="        PGNSP PGUID b t t    23  20  16 416  36 int48eq eqsel eqjoinsel ));

defines 15 as the oid for the equals operator for (int8,int4) returning bool, but the
fact that 23 is the Oid for int4, 20 is the Oid for int8, and 16 is the Oid for bool
is already defined elsewhere (int pg_type.h) and need not be duplicated here.

I'm just proposing that we don't keep specifying things by number everywhere.
Once you've established the Oid for something (operator, type, function) you
should use the name everywhere else.

mark

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

12 December 2015, 01:54:25

The current semantic level is pretty low level, somewhat cumbersome, and requires filling in values that most of the time the system has a pretty good idea how to fill in default values.

Compare:

CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal STRICT AS 'lo_export' WITH (OID=765);

DATA(insert OID = 765 ( lo_export PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 23 "26 25" _null_ _null_ _null_ _null_ _null_ lo_export _null_ _null_ _null_ ));

In the first one someone has indicated:

1. a function name,

2. two parameter type names

3. a return type

4. a language

5. null handling

6. a symbol

7. an oid

In the second case 30 separate items have been indicated, and yet both of them will generate identical end results within the catalog.

The former is more immune to even needing modification in the event that the catalog structure changes.

- adding proleakproof? No change needed, default value is correct

- adding protransform? No change needed, not relevant

- adding proparallel? No change needed, default value is correct

- adding procost? No change needed, default value is correct

On Fri, Dec 11, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Crazy idea: we could just have a CSV file which can be loaded into a
> table for mass changes using regular DDL commands, then dumped back from
> there into the file. We already know how to do these things, using
> \copy etc. Since CSV uses one line per entry, there would be no merge
> problems either (or rather: all merge problems would become conflicts,
> which is what we want.)

That's an interesting proposal. It would mean that the catalog files
stay at more or less their current semantic level (direct representations
of bootstrap catalog contents), but it does sound like a more attractive
way to perform complex edits than writing Emacs macros ;-).

You could actually do that the hard way right now, with a bit of script
to convert between DATA lines and CSV format. But if we anticipate that
becoming the standard approach, it would definitely make sense to migrate
the master copies into CSV or traditional COPY format, and teach BKI mode
to read that (or, perhaps, leave bootstrap.c alone and modify the code
that produces the .bki file).

This is somewhat orthogonal to the question of whether we want to do
things like converting noncritical operator-class definitions into
regular CREATE OPERATOR CLASS syntax. There's almost certainly going
to be some hard core of catalog entries that aren't amenable to that,
and will still need to be loaded from data files of some sort.

regards, tom lane

Re: Bootstrap DATA is a pita

From

Caleb Welton

Date:

12 December 2015, 02:02:42

Yes, that alone without any other changes would be a marked improvement and could be implemented in many places, pg_operator is a good example.

... but there is some circularity especially with respect to type definitions and the functions that define those types. If you changed the definition of prorettype into a regtype then bootstrap would try to lookup the type before the pg_type entry exists and throw a fit. That's handled in SQL via shell types. If we wanted bootstrap to be able to handle this then we'd have to make two passes of pg_type, the first to create the shells and the second to handle populating the serialization functions.

Unfortunately types and functions tend to be the more volatile areas of the catalog so this particular circularity is particularly vexing.

On Fri, Dec 11, 2015 at 2:53 PM, Mark Dilger <hornschnorter@gmail.com> wrote:

> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
>>> On Dec 11, 2015, at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That's an interesting proposal. It would mean that the catalog files
>>> stay at more or less their current semantic level (direct representations
>>> of bootstrap catalog contents), but it does sound like a more attractive
>>> way to perform complex edits than writing Emacs macros ;-).
>
>> I would be happy to work on this, if there is much chance of the community
>> accepting a patch. Do you think replacing the numeric Oids for functions,
>> operators, opclasses and such in the source files with their names would
>> be ok, with the SQL converting those to Oids in the output?
>
> Huh? Those files are the definition of that mapping, no? Isn't what
> you're proposing circular?

No, there are far more references to Oids than there are definitions of them.

For example, the line in pg_operator.h:

DATA(insert OID = 15 ( "=" PGNSP PGUID b t t 23 20 16 416 36 int48eq eqsel eqjoinsel ));

defines 15 as the oid for the equals operator for (int8,int4) returning bool, but the
fact that 23 is the Oid for int4, 20 is the Oid for int8, and 16 is the Oid for bool
is already defined elsewhere (int pg_type.h) and need not be duplicated here.

I'm just proposing that we don't keep specifying things by number everywhere.
Once you've established the Oid for something (operator, type, function) you
should use the name everywhere else.

mark

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 02:03:06

Mark Dilger <hornschnorter@gmail.com> writes:
>> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Huh?  Those files are the definition of that mapping, no?  Isn't what
>> you're proposing circular?

> No, there are far more references to Oids than there are definitions of them.

Well, you're still not being very clear, but I *think* what you're
proposing is to put a lot more smarts into the script that converts
the master source files into .bki format.  That is, we might have
"=(int8,int4)" in an entry in the master source file for pg_amop, but
the script would look up that entry using the source data for pg_type
and pg_operator, and then emit a simple numeric OID into the .bki file.
(Presumably, it would know to do this because we'd redefine the
pg_amop.amopopr column as of regoperator type not plain OID.)

Yeah, that could work, though I'd be a bit concerned about the complexity
and speed of the script.  Still, one doesn't usually rebuild postgres.bki
many times a day, so speed might not be a big problem.

This seems more or less orthogonal to the question of whether to get rid
of the DATA() lines in favor of a COPY-friendly data format.  I'd suggest
treating those as separate patches.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 02:12:30

Caleb Welton <cwelton@pivotal.io> writes:
> ... but there is some circularity especially with respect to type
> definitions and the functions that define those types.  If you changed the
> definition of prorettype into a regtype then bootstrap would try to lookup
> the type before the pg_type entry exists and throw a fit.  That's handled
> in SQL via shell types.  If we wanted bootstrap to be able to handle this
> then we'd have to make two passes of pg_type, the first to create the
> shells and the second to handle populating the serialization functions.

I think what Mark is proposing is to do the lookups while preparing the
.bki file, which would eliminate the circularity ... at the cost of having
to, essentially, reimplement regprocedure_in and friends in Perl.

If we push hard on doing the other thing that you're proposing, which is
to take as much as possible out of the pure bootstrap-data phase, then
maybe it wouldn't be worth the work to do that.  Not sure.

On the other hand, I'm not very much in love with the thought of having
two different notations for "core" and "not so core" built-in function
creation.  There's something to be said for keeping all the data in one
format.  If we push on making the .bki creation script smarter, then in
addition to the name lookup facilities Mark envisions, we could have
things like default column values.  That would take us a long way toward
the same ease-of-use as full SQL definitions.  We'd still be lacking
some error checks that the SQL commands could perform; but we've
traditionally used sanity checks in the regression tests to do
cross-checking that covers more or less those same bases.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Mark Dilger

Date:

12 December 2015, 02:30:15

> On Dec 11, 2015, at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
>>> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Huh?  Those files are the definition of that mapping, no?  Isn't what
>>> you're proposing circular?
>
>> No, there are far more references to Oids than there are definitions of them.
>
> Well, you're still not being very clear, but I *think* what you're
> proposing is to put a lot more smarts into the script that converts
> the master source files into .bki format.  That is, we might have
> "=(int8,int4)" in an entry in the master source file for pg_amop, but
> the script would look up that entry using the source data for pg_type
> and pg_operator, and then emit a simple numeric OID into the .bki file.
> (Presumably, it would know to do this because we'd redefine the
> pg_amop.amopopr column as of regoperator type not plain OID.)
>
> Yeah, that could work, though I'd be a bit concerned about the complexity
> and speed of the script.  Still, one doesn't usually rebuild postgres.bki
> many times a day, so speed might not be a big problem.

I am proposing that each of the catalog headers that currently has DATA
lines instead have a COPY loadable file that contains the same information.
So, for pg_type.h, there would be a pg_type.dat file.  All the DATA lines
would be pulled out of pg_type.h and a corresponding tab delimited row
would be written to pg_type.dat.  Henceforth, if you cloned the git repository,
you'd find no DATA lines in pg_type.h, but would find a pg_type.dat file
in the src/include/catalog directory.  Likewise for the other header files.

There would be some script, SQL or perl or whatever, that would convert
these .dat files into the .bki file.

Now, if we know that pg_type.dat will be processed before pg_proc.dat,
we can replace all the Oids representing datatypes in pg_proc.dat with the
names for those types, given that we already have a name <=> oid
mapping for types.

Likewise, if we know that pg_proc.dat will be processed before pg_operator.dat,
we can specify both functions and datatypes by name rather than by Oid
in that file, making it much easier to read.  By the time pg_operator.dat is
read, pg_type.dat and pg_proc.dat will already have been read and processed,
so there shouldn't be ambiguity.

By the time pg_amop.dat is processed, the operators, procs, datatypes,
opfamilies and so forth would already be know.  The example I gave up
thread would be easy to parse:

amopfamily          amoplefttype        amoprighttype       amopstrategy        amoppurpose         amopopr
amopmethod         amopsortfamily 
integer_ops         int2                int2                1                   search              "<"
btree              0 
integer_ops         int2                int2                2                   search              "<="
btree              0 
integer_ops         int2                int2                3                   search              "="
btree              0 
integer_ops         int2                int2                4                   search              ">="
btree              0 
integer_ops         int2                int2                5                   search              ">"
btree              0 

And if I came along and defined a new datatype, int384, I could add rows to
this file much more easily, as:

amopfamily          amoplefttype        amoprighttype       amopstrategy        amoppurpose         amopopr
amopmethod         amopsortfamily 
integer_ops         int384                int384                1                   search              "<"
   btree               0 
integer_ops         int384                int384                2                   search              "<="
   btree               0 
integer_ops         int384                int384                3                   search              "="
   btree               0 
integer_ops         int384                int384                4                   search              ">="
   btree               0 
integer_ops         int384                int384                5                   search              ">"
   btree               0 

I don't see how this creates all that much complication, and I clearly see
how it makes files like pg_operator.{h,dat} and pg_amop.{h,dat} easier to read.

mark

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

12 December 2015, 02:55:52

On 2015-12-11 18:12:16 -0500, Tom Lane wrote:
> I think what Mark is proposing is to do the lookups while preparing the
> .bki file, which would eliminate the circularity ... at the cost of having
> to, essentially, reimplement regprocedure_in and friends in Perl.

FWIW, I did that, when this came up last. Rather interesting, because it
leads to rather noticeable speedups - currently initdb spents a
significant amount of its time doing reproc lookups. Especially
interesting because at that stage we're largely not using indexes yet, IIRC.

Andres

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 03:26:48

Mark Dilger <hornschnorter@gmail.com> writes:
> Now, if we know that pg_type.dat will be processed before pg_proc.dat,
> we can replace all the Oids representing datatypes in pg_proc.dat with the
> names for those types, given that we already have a name <=> oid
> mapping for types.

I don't think this is quite as simple as you paint it.  How can you
process pg_type.dat first, when it contains pg_proc references?  Doing
pg_proc first is no better, because it contains pg_type references.

I believe it's soluble, but it's going to take something more like
loading up all the data at once and then doing lookups as we write
out the .bki entries for each catalog.  Fortunately, the volume of
bootstrap data is small enough that that won't be a problem on any
machine capable of running modern Postgres ...
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

12 December 2015, 03:36:53

On 2015-12-11 19:26:38 -0500, Tom Lane wrote:
> I believe it's soluble, but it's going to take something more like
> loading up all the data at once and then doing lookups as we write
> out the .bki entries for each catalog.  Fortunately, the volume of
> bootstrap data is small enough that that won't be a problem on any
> machine capable of running modern Postgres ...

I think that's exactly the right approach. Just building a few perl
hashes worked well enough, in my prototype of that.

If additionally a few more plain oid fields are converted into reg*
types, the source data fields are easier to understand and the catalogs
get much nicer to query...

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 03:44:18

Andres Freund <andres@anarazel.de> writes:
> On 2015-12-11 19:26:38 -0500, Tom Lane wrote:
>> I believe it's soluble, but it's going to take something more like
>> loading up all the data at once and then doing lookups as we write
>> out the .bki entries for each catalog.  Fortunately, the volume of
>> bootstrap data is small enough that that won't be a problem on any
>> machine capable of running modern Postgres ...

> I think that's exactly the right approach. Just building a few perl
> hashes worked well enough, in my prototype of that.

Right.  I would draw Mark's attention to src/backend/catalog/Catalog.pm
and the things that use that.  Presumably all that would have be
rewritten, but the existing code would be a useful starting point
perhaps.
        regards, tom lane

Re: Bootstrap DATA is a pita

From

Mark Dilger

Date:

12 December 2015, 07:43:26

On Dec 11, 2015, at 2:54 PM, Caleb Welton <cwelton@pivotal.io> wrote:

The current semantic level is pretty low level, somewhat cumbersome, and requires filling in values that most of the time the system has a pretty good idea how to fill in default values.

Compare:
CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal STRICT AS 'lo_export' WITH (OID=765);
DATA(insert OID = 765 ( lo_export PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 23 "26 25" _null_ _null_ _null_ _null_ _null_ lo_export _null_ _null_ _null_ ));

I would like to hear more about this idea. Are you proposing that we use something

like the above CREATE FUNCTION format to express what is currently being expressed

with DATA statements? That is an interesting idea, though I don't know what exactly

that would look like. If you want to forward this idea, I'd be eager to hear your thoughts.

If not, I'll try to make progress with my idea of tab delimited files and such (or really,

Alvaro's idea of csv files that I only slightly corrupted).

mark

Re: Bootstrap DATA is a pita

From

Tom Lane

Date:

12 December 2015, 21:28:41

Mark Dilger <hornschnorter@gmail.com> writes:
>> On Dec 11, 2015, at 2:54 PM, Caleb Welton <cwelton@pivotal.io> wrote:
>> Compare:
>> CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal STRICT AS 'lo_export' WITH (OID=765);  
>> 
>> DATA(insert OID = 765 (  lo_export           PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 23 "26 25" _null_ _null_
_null__null_ _null_ lo_export _null_ _null_ _null_ ));
 

> I would like to hear more about this idea.  Are you proposing that we use something
> like the above CREATE FUNCTION format to express what is currently being expressed
> with DATA statements?

Yes, that sort of idea has been kicked around some already, see the
archives.

> That is an interesting idea, though I don't know what exactly
> that would look like.  If you want to forward this idea, I'd be eager to hear your thoughts.
> If not, I'll try to make progress with my idea of tab delimited files and such (or really,
> Alvaro's idea of csv files that I only slightly corrupted).

Personally I would like to see both approaches explored.  Installing as
much as we can via SQL commands is attractive for a number of reasons;
but there is going to be an irreducible minimum amount of stuff that
has to be inserted by something close to the current bootstrapping
process.  (And I'm not convinced that that "minimum amount" is going
to be very small...)  So it's not impossible that we'd end up accepting
*both* types of patches, one to do more in the post-bootstrap SQL world
and one to make the bootstrap data notation less cumbersome.  In any
case it would be useful to push both approaches forward some more before
we make any decisions between them.

BTW, there's another thing I'd like to see improved in this area, which is
a problem already but will get a lot worse if we push more work into the
post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
both inefficient and impossible to debug.  If you've ever had a failure
there, you'll have seen that the backend spits out an entire SQL script
and says there's an error in it somewhere; that's because it gets the
whole per-stage script as one submission.  (Try introducing a syntax error
somewhere in information_schema.sql, and you'll see what I mean.)
Breaking the stage scripts down further would help, but that is
unattractive because each one requires a fresh backend startup/shutdown,
including a full checkpoint.  I'd like to see things rejiggered so that
there's only one post-bootstrap standalone backend session that performs
all the steps, but initdb feeds it just one SQL command at a time so that
errors are better localized.  That should both speed up initdb noticeably
and make debugging easier.
        regards, tom lane

Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 01:32:00

I wrote:
> BTW, there's another thing I'd like to see improved in this area, which is
> a problem already but will get a lot worse if we push more work into the
> post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
> both inefficient and impossible to debug.  If you've ever had a failure
> there, you'll have seen that the backend spits out an entire SQL script
> and says there's an error in it somewhere; that's because it gets the
> whole per-stage script as one submission.  (Try introducing a syntax error
> somewhere in information_schema.sql, and you'll see what I mean.)
> Breaking the stage scripts down further would help, but that is
> unattractive because each one requires a fresh backend startup/shutdown,
> including a full checkpoint.  I'd like to see things rejiggered so that
> there's only one post-bootstrap standalone backend session that performs
> all the steps, but initdb feeds it just one SQL command at a time so that
> errors are better localized.  That should both speed up initdb noticeably
> and make debugging easier.

I thought this sounded like a nice lazy-Saturday project, so I started
poking at it, and attached is a WIP patch.  The core issue that has to
be dealt with is that standalone-backend mode currently has just two
rules for deciding when to stop collecting input and execute the command
buffer, and they both suck:

1. By default, execute after every newline.  (Actually, you can quote
a newline with a backslash, but we don't use that ability anywhere.)

2. With -j, slurp the entire input until EOF, and execute it as one
giant multicommand string.

We're doing #2 to handle information_schema.sql and the other large
SQL scripts that initdb runs, which is why the response to an error in
those scripts is so yucky.

After some experimentation, I came up with the idea of executing any
time that a semicolon followed by two newlines is seen.  This nicely
breaks up input like information_schema.sql.  There are probably some
residual places where more than one command is executed in a single
string, but we could fix that with some more newlines.  Obviously,
this rule is capable of being fooled if you have a newline followed by
a blank line in a comment or quoted literal --- but it turns out that
no such case exists anywhere in initdb's data.

I'm not particularly wedded to this rule.  In principle we could go so
far as to import psql's code that parses commands and figures out which
semicolons are command terminators --- but that is a pretty large chunk
of code, and I think it'd really be overkill considering that initdb
deals only with fixed input scripts.  But if anyone has another simple
rule for breaking SQL into commands, we can certainly discuss
alternatives.

Anyway, the attached patch tweaks postgres.c to follow that rule instead
of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
compatible is a problem here; in fact, I'm tempted to rip out the -j
switch altogether and just have standalone mode always parse input the
same way.  Does anyone know of people using standalone mode other than
for initdb?

The other part of the patch modifies initdb to do all its post-bootstrap
steps using a single standalone backend session.  I had to remove the
code that currently prints out progress markers for individual phases
of that processing, so that now you get output that looks like

creating directory /home/postgres/testversion/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /home/postgres/testversion/data/base/1 ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Since initdb is just printing to the backend in an open-loop fashion,
it doesn't really know whether each command succeeds; in fact it is
usually pretty far ahead of the backend, until it does pclose() which
waits for the subprocess to exit.  So we can't readily keep the old
progress markers.  I don't think we'll miss them though.  The whole
"post-bootstrap initialization" step only takes a second or so on my
main dev machine, so breaking down the progress more finely isn't that
useful anymore.

I had to change ;\n to ;\n\n in a few places in initdb's internal
scripts, to ensure that VACUUM commands were executed by themselves
(otherwise you get "VACUUM can't run in a transaction block" type
failures).  I wasn't very thorough about that though, pending a
decision on exactly what the new command-boundary rule will be.

The upshot of these changes is that initdb runs about 10% faster overall
(more in -N mode), which is a useful savings.  Also, the response to a
syntax error in information_schema.sql now looks like this:

creating template1 database in /home/postgres/testversion/data/base/1 ... ok
performing post-bootstrap initialization ... FATAL:  column "routzine_schema" does not exist at character 243
HINT:  Perhaps you meant to reference the column "routine_privileges.routine_schema".
STATEMENT:
    /*
     * 5.42
     * ROLE_ROUTINE_GRANTS view
     */

    CREATE VIEW role_routine_grants AS
        SELECT grantor,
               grantee,
               specific_catalog,
               specific_schema,
               specific_name,
               routine_catalog,
               routzine_schema,
               routine_name,
               privilege_type,
               is_grantable
        FROM routine_privileges
        WHERE grantor IN (SELECT role_name FROM enabled_roles)
              OR grantee IN (SELECT role_name FROM enabled_roles);

child process exited with exit code 1
initdb: removing data directory "/home/postgres/testversion/data"

which is a *huge* improvement over what you got before.

What remains to be done before this'd be committable is updating
the documentation around the -j switch (or removing it altogether),
and going through initdb and its input scripts to ensure there's
a double newline after every nontrivial SQL command.  I don't see
much point in finishing that detail work until we have consensus on
whether to use this or another command-boundary rule.

Thoughts?

            regards, tom lane

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1dc2eb0..1cb0cd9 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static int
*** 219,226 ****
  InteractiveBackend(StringInfo inBuf)
  {
      int            c;                /* character read from getc() */
-     bool        end = false;    /* end-of-input flag */
-     bool        backslashSeen = false;    /* have we seen a \ ? */

      /*
       * display a prompt and obtain input from the user
--- 219,224 ----
*************** InteractiveBackend(StringInfo inBuf)
*** 230,284 ****

      resetStringInfo(inBuf);

!     if (UseNewLine)
      {
!         /*
!          * if we are using \n as a delimiter, then read characters until the
!          * \n.
!          */
!         while ((c = interactive_getc()) != EOF)
          {
!             if (c == '\n')
              {
!                 if (backslashSeen)
                  {
                      /* discard backslash from inBuf */
                      inBuf->data[--inBuf->len] = '\0';
-                     backslashSeen = false;
                      continue;
                  }
                  else
                  {
!                     /* keep the newline character */
                      appendStringInfoChar(inBuf, '\n');
                      break;
                  }
              }
-             else if (c == '\\')
-                 backslashSeen = true;
              else
!                 backslashSeen = false;
!
!             appendStringInfoChar(inBuf, (char) c);
          }

!         if (c == EOF)
!             end = true;
!     }
!     else
!     {
!         /*
!          * otherwise read characters until EOF.
!          */
!         while ((c = interactive_getc()) != EOF)
!             appendStringInfoChar(inBuf, (char) c);
!
!         /* No input before EOF signal means time to quit. */
!         if (inBuf->len == 0)
!             end = true;
      }

!     if (end)
          return EOF;

      /*
--- 228,282 ----

      resetStringInfo(inBuf);

!     /*
!      * Read characters until EOF or the appropriate delimiter is seen.
!      */
!     while ((c = interactive_getc()) != EOF)
      {
!         if (c == '\n')
          {
!             if (UseNewLine)
              {
!                 /*
!                  * In newline mode, newline is delimiter unless preceded by
!                  * backslash.
!                  */
!                 if (inBuf->len > 0 &&
!                     inBuf->data[inBuf->len - 1] == '\\')
                  {
                      /* discard backslash from inBuf */
                      inBuf->data[--inBuf->len] = '\0';
                      continue;
                  }
                  else
                  {
!                     /* keep the newline character, but end the command */
                      appendStringInfoChar(inBuf, '\n');
                      break;
                  }
              }
              else
!             {
!                 /*
!                  * In -j mode, semicolon followed by two newlines ends the
!                  * command; otherwise treat newline as regular character.
!                  */
!                 if (inBuf->len > 1 &&
!                     inBuf->data[inBuf->len - 1] == '\n' &&
!                     inBuf->data[inBuf->len - 2] == ';')
!                 {
!                     /* might as well drop the second newline */
!                     break;
!                 }
!             }
          }

!         /* Not newline, or newline treated as regular character */
!         appendStringInfoChar(inBuf, (char) c);
      }

!     /* No input before EOF signal means time to quit. */
!     if (c == EOF && inBuf->len == 0)
          return EOF;

      /*
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index feeff9e..b056639 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*************** static void set_null_conf(void);
*** 244,264 ****
  static void test_config_settings(void);
  static void setup_config(void);
  static void bootstrap_template1(void);
! static void setup_auth(void);
! static void get_set_pwd(void);
! static void setup_depend(void);
! static void setup_sysviews(void);
! static void setup_description(void);
! static void setup_collation(void);
! static void setup_conversion(void);
! static void setup_dictionary(void);
! static void setup_privileges(void);
  static void set_info_version(void);
! static void setup_schema(void);
! static void load_plpgsql(void);
! static void vacuum_db(void);
! static void make_template0(void);
! static void make_postgres(void);
  static void fsync_pgdata(void);
  static void trapsig(int signum);
  static void check_ok(void);
--- 244,264 ----
  static void test_config_settings(void);
  static void setup_config(void);
  static void bootstrap_template1(void);
! static void setup_auth(FILE *cmdfd);
! static void get_set_pwd(FILE *cmdfd);
! static void setup_depend(FILE *cmdfd);
! static void setup_sysviews(FILE *cmdfd);
! static void setup_description(FILE *cmdfd);
! static void setup_collation(FILE *cmdfd);
! static void setup_conversion(FILE *cmdfd);
! static void setup_dictionary(FILE *cmdfd);
! static void setup_privileges(FILE *cmdfd);
  static void set_info_version(void);
! static void setup_schema(FILE *cmdfd);
! static void load_plpgsql(FILE *cmdfd);
! static void vacuum_db(FILE *cmdfd);
! static void make_template0(FILE *cmdfd);
! static void make_postgres(FILE *cmdfd);
  static void fsync_pgdata(void);
  static void trapsig(int signum);
  static void check_ok(void);
*************** bootstrap_template1(void)
*** 1545,1553 ****
   * set up the shadow password table
   */
  static void
! setup_auth(void)
  {
-     PG_CMD_DECL;
      const char **line;
      static const char *pg_authid_setup[] = {
          /*
--- 1545,1552 ----
   * set up the shadow password table
   */
  static void
! setup_auth(FILE *cmdfd)
  {
      const char **line;
      static const char *pg_authid_setup[] = {
          /*
*************** setup_auth(void)
*** 1558,1589 ****
          NULL
      };

-     fputs(_("initializing pg_authid ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = pg_authid_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * get the superuser password if required, and call postgres to set it
   */
  static void
! get_set_pwd(void)
  {
-     PG_CMD_DECL;
-
      char       *pwd1,
                 *pwd2;

--- 1557,1572 ----
          NULL
      };

      for (line = pg_authid_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * get the superuser password if required, and call postgres to set it
   */
  static void
! get_set_pwd(FILE *cmdfd)
  {
      char       *pwd1,
                 *pwd2;

*************** get_set_pwd(void)
*** 1640,1673 ****
          pwd1 = pg_strdup(pwdbuf);

      }
-     printf(_("setting password ... "));
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;

      PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n",
                     username, escape_quotes(pwd1));

-     /* MM: pwd1 is no longer needed, freeing it */
      free(pwd1);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * set up pg_depend
   */
  static void
! setup_depend(void)
  {
-     PG_CMD_DECL;
      const char **line;
      static const char *pg_depend_setup[] = {
          /*
--- 1623,1641 ----
          pwd1 = pg_strdup(pwdbuf);

      }

      PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n",
                     username, escape_quotes(pwd1));

      free(pwd1);
  }

  /*
   * set up pg_depend
   */
  static void
! setup_depend(FILE *cmdfd)
  {
      const char **line;
      static const char *pg_depend_setup[] = {
          /*
*************** setup_depend(void)
*** 1684,1693 ****
           * First delete any already-made entries; PINs override all else, and
           * must be the only entries for their objects.
           */
!         "DELETE FROM pg_depend;\n",
!         "VACUUM pg_depend;\n",
!         "DELETE FROM pg_shdepend;\n",
!         "VACUUM pg_shdepend;\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
          " FROM pg_class;\n",
--- 1652,1661 ----
           * First delete any already-made entries; PINs override all else, and
           * must be the only entries for their objects.
           */
!         "DELETE FROM pg_depend;\n\n",
!         "VACUUM pg_depend;\n\n",
!         "DELETE FROM pg_shdepend;\n\n",
!         "VACUUM pg_shdepend;\n\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
          " FROM pg_class;\n",
*************** setup_depend(void)
*** 1740,1819 ****
          NULL
      };

-     fputs(_("initializing dependencies ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = pg_depend_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * set up system views
   */
  static void
! setup_sysviews(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **sysviews_setup;

-     fputs(_("creating system views ... "), stdout);
-     fflush(stdout);
-
      sysviews_setup = readfile(system_views_file);

-     /*
-      * We use -j here to avoid backslashing stuff in system_views.sql
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = sysviews_setup; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
          free(*line);
      }

-     PG_CMD_CLOSE;
-
      free(sysviews_setup);
-
-     check_ok();
  }

  /*
   * load description data
   */
  static void
! setup_description(void)
  {
-     PG_CMD_DECL;
-
-     fputs(_("loading system objects' descriptions ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_description ( "
                  "    objoid oid, "
                  "    classname name, "
--- 1708,1743 ----
          NULL
      };

      for (line = pg_depend_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * set up system views
   */
  static void
! setup_sysviews(FILE *cmdfd)
  {
      char      **line;
      char      **sysviews_setup;

      sysviews_setup = readfile(system_views_file);

      for (line = sysviews_setup; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
          free(*line);
      }

      free(sysviews_setup);
  }

  /*
   * load description data
   */
  static void
! setup_description(FILE *cmdfd)
  {
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_description ( "
                  "    objoid oid, "
                  "    classname name, "
*************** setup_description(void)
*** 1853,1862 ****
                  "  WHERE opdesc NOT LIKE 'deprecated%' AND "
                  "  NOT EXISTS (SELECT 1 FROM pg_description "
            "    WHERE objoid = p_oid AND classoid = 'pg_proc'::regclass);\n");
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  #ifdef HAVE_LOCALE_T
--- 1777,1782 ----
*************** normalize_locale_name(char *new, const c
*** 1899,1905 ****
   * populate pg_collation
   */
  static void
! setup_collation(void)
  {
  #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      int            i;
--- 1819,1825 ----
   * populate pg_collation
   */
  static void
! setup_collation(FILE *cmdfd)
  {
  #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      int            i;
*************** setup_collation(void)
*** 1907,1930 ****
      char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
      int            count = 0;

-     PG_CMD_DECL;
- #endif
-
-     fputs(_("creating collations ... "), stdout);
-     fflush(stdout);
-
- #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
      locale_a_handle = popen_check("locale -a", "r");
      if (!locale_a_handle)
          return;                    /* complaint already printed */

-     PG_CMD_OPEN;
-
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
                  "    collname name, "
                  "    locale name, "
--- 1827,1836 ----
*************** setup_collation(void)
*** 2032,2048 ****
         "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n");

      pclose(locale_a_handle);
-     PG_CMD_CLOSE;

-     check_ok();
      if (count == 0 && !debug)
      {
          printf(_("No usable system locales were found.\n"));
          printf(_("Use the option \"--debug\" to see details.\n"));
      }
- #else                            /* not HAVE_LOCALE_T && not WIN32 */
-     printf(_("not supported on this platform\n"));
-     fflush(stdout);
  #endif   /* not HAVE_LOCALE_T  && not WIN32 */
  }

--- 1938,1949 ----
*************** setup_collation(void)
*** 2050,2071 ****
   * load conversion functions
   */
  static void
! setup_conversion(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **conv_lines;

-     fputs(_("creating conversions ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      conv_lines = readfile(conversion_file);
      for (line = conv_lines; *line != NULL; line++)
      {
--- 1951,1961 ----
   * load conversion functions
   */
  static void
! setup_conversion(FILE *cmdfd)
  {
      char      **line;
      char      **conv_lines;

      conv_lines = readfile(conversion_file);
      for (line = conv_lines; *line != NULL; line++)
      {
*************** setup_conversion(void)
*** 2075,2109 ****
      }

      free(conv_lines);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * load extra dictionaries (Snowball stemmers)
   */
  static void
! setup_dictionary(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **conv_lines;

-     fputs(_("creating dictionaries ... "), stdout);
-     fflush(stdout);
-
-     /*
-      * We use -j here to avoid backslashing stuff
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      conv_lines = readfile(dictionary_file);
      for (line = conv_lines; *line != NULL; line++)
      {
--- 1965,1981 ----
      }

      free(conv_lines);
  }

  /*
   * load extra dictionaries (Snowball stemmers)
   */
  static void
! setup_dictionary(FILE *cmdfd)
  {
      char      **line;
      char      **conv_lines;

      conv_lines = readfile(dictionary_file);
      for (line = conv_lines; *line != NULL; line++)
      {
*************** setup_dictionary(void)
*** 2112,2121 ****
      }

      free(conv_lines);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 1984,1989 ----
*************** setup_dictionary(void)
*** 2130,2138 ****
   * set (NOT NULL).
   */
  static void
! setup_privileges(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **priv_lines;
      static char *privileges_setup[] = {
--- 1998,2005 ----
   * set (NOT NULL).
   */
  static void
! setup_privileges(FILE *cmdfd)
  {
      char      **line;
      char      **priv_lines;
      static char *privileges_setup[] = {
*************** setup_privileges(void)
*** 2145,2168 ****
          NULL
      };

-     fputs(_("setting privileges on built-in objects ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      priv_lines = replace_token(privileges_setup, "$POSTGRES_SUPERUSERNAME",
                                 escape_quotes(username));
      for (line = priv_lines; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 2012,2021 ----
*************** set_info_version(void)
*** 2197,2223 ****
   * load info schema and populate from features file
   */
  static void
! setup_schema(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **lines;

-     fputs(_("creating information schema ... "), stdout);
-     fflush(stdout);
-
      lines = readfile(info_schema_file);

-     /*
-      * We use -j here to avoid backslashing stuff in information_schema.sql
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = lines; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
--- 2050,2062 ----
   * load info schema and populate from features file
   */
  static void
! setup_schema(FILE *cmdfd)
  {
      char      **line;
      char      **lines;

      lines = readfile(info_schema_file);

      for (line = lines; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
*************** setup_schema(void)
*** 2226,2240 ****

      free(lines);

-     PG_CMD_CLOSE;
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      PG_CMD_PRINTF1("UPDATE information_schema.sql_implementation_info "
                     "  SET character_value = '%s' "
                     "  WHERE implementation_info_name = 'DBMS VERSION';\n",
--- 2065,2070 ----
*************** setup_schema(void)
*** 2245,2390 ****
                     "  sub_feature_name, is_supported, comments) "
                     " FROM E'%s';\n",
                     escape_quotes(features_file));
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * load PL/pgsql server-side language
   */
  static void
! load_plpgsql(void)
  {
-     PG_CMD_DECL;
-
-     fputs(_("loading PL/pgSQL server-side language ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      PG_CMD_PUTS("CREATE EXTENSION plpgsql;\n");
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * clean everything up in template1
   */
  static void
! vacuum_db(void)
  {
-     PG_CMD_DECL;
-
-     fputs(_("vacuuming database template1 ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      /* Run analyze before VACUUM so the statistics are frozen. */
!     PG_CMD_PUTS("ANALYZE;\nVACUUM FREEZE;\n");
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }

  /*
   * copy template1 to template0
   */
  static void
! make_template0(void)
  {
-     PG_CMD_DECL;
      const char **line;
      static const char *template0_setup[] = {
!         "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n",

          /*
           * We use the OID of template0 to determine lastsysoid
           */
          "UPDATE pg_database SET datlastsysoid = "
          "    (SELECT oid FROM pg_database "
!         "    WHERE datname = 'template0');\n",

          /*
           * Explicitly revoke public create-schema and create-temp-table
           * privileges in template1 and template0; else the latter would be on
           * by default
           */
!         "REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n",
!         "REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n",

!         "COMMENT ON DATABASE template0 IS 'unmodifiable empty database';\n",

          /*
           * Finally vacuum to clean up dead rows in pg_database
           */
!         "VACUUM FULL pg_database;\n",
          NULL
      };

-     fputs(_("copying template1 to template0 ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = template0_setup; *line; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * copy template1 to postgres
   */
  static void
! make_postgres(void)
  {
-     PG_CMD_DECL;
      const char **line;
      static const char *postgres_setup[] = {
!         "CREATE DATABASE postgres;\n",
!         "COMMENT ON DATABASE postgres IS 'default administrative connection database';\n",
          NULL
      };

-     fputs(_("copying template1 to postgres ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = postgres_setup; *line; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 2075,2154 ----
                     "  sub_feature_name, is_supported, comments) "
                     " FROM E'%s';\n",
                     escape_quotes(features_file));
  }

  /*
   * load PL/pgsql server-side language
   */
  static void
! load_plpgsql(FILE *cmdfd)
  {
      PG_CMD_PUTS("CREATE EXTENSION plpgsql;\n");
  }

  /*
   * clean everything up in template1
   */
  static void
! vacuum_db(FILE *cmdfd)
  {
      /* Run analyze before VACUUM so the statistics are frozen. */
!     PG_CMD_PUTS("ANALYZE;\n\nVACUUM FREEZE;\n\n");
  }

  /*
   * copy template1 to template0
   */
  static void
! make_template0(FILE *cmdfd)
  {
      const char **line;
      static const char *template0_setup[] = {
!         "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",

          /*
           * We use the OID of template0 to determine lastsysoid
           */
          "UPDATE pg_database SET datlastsysoid = "
          "    (SELECT oid FROM pg_database "
!         "    WHERE datname = 'template0');\n\n",

          /*
           * Explicitly revoke public create-schema and create-temp-table
           * privileges in template1 and template0; else the latter would be on
           * by default
           */
!         "REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n\n",
!         "REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n\n",

!         "COMMENT ON DATABASE template0 IS 'unmodifiable empty database';\n\n",

          /*
           * Finally vacuum to clean up dead rows in pg_database
           */
!         "VACUUM FULL pg_database;\n\n",
          NULL
      };

      for (line = template0_setup; *line; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * copy template1 to postgres
   */
  static void
! make_postgres(FILE *cmdfd)
  {
      const char **line;
      static const char *postgres_setup[] = {
!         "CREATE DATABASE postgres;\n\n",
!         "COMMENT ON DATABASE postgres IS 'default administrative connection database';\n\n",
          NULL
      };

      for (line = postgres_setup; *line; line++)
          PG_CMD_PUTS(*line);
  }

  /*
*************** warn_on_mount_point(int error)
*** 3303,3308 ****
--- 3067,3073 ----
  void
  initialize_data_directory(void)
  {
+     PG_CMD_DECL;
      int            i;

      setup_signals();
*************** initialize_data_directory(void)
*** 3343,3377 ****
       */
      write_version_file("base/1");

!     /* Create the stuff we don't need to use bootstrap mode for */

!     setup_auth();
      if (pwprompt || pwfilename)
!         get_set_pwd();

!     setup_depend();

!     setup_sysviews();

!     setup_description();

!     setup_collation();

!     setup_conversion();

!     setup_dictionary();

!     setup_privileges();

!     setup_schema();

!     load_plpgsql();

!     vacuum_db();

!     make_template0();

!     make_postgres();
  }


--- 3108,3158 ----
       */
      write_version_file("base/1");

!     /*
!      * Create the stuff we don't need to use bootstrap mode for, using a
!      * backend running in simple standalone mode.
!      */
!     fputs(_("performing post-bootstrap initialization ... "), stdout);
!     fflush(stdout);

!     snprintf(cmd, sizeof(cmd),
!              "\"%s\" %s -j template1 >%s",
!              backend_exec, backend_options,
!              DEVNULL);
!
!     PG_CMD_OPEN;
!
!     setup_auth(cmdfd);
      if (pwprompt || pwfilename)
!         get_set_pwd(cmdfd);

!     setup_depend(cmdfd);

!     setup_sysviews(cmdfd);

!     setup_description(cmdfd);

!     setup_collation(cmdfd);

!     setup_conversion(cmdfd);

!     setup_dictionary(cmdfd);

!     setup_privileges(cmdfd);

!     setup_schema(cmdfd);

!     load_plpgsql(cmdfd);

!     vacuum_db(cmdfd);

!     make_template0(cmdfd);

!     make_postgres(cmdfd);
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }

Re: Bootstrap DATA is a pita

From

Andres Freund

Date:

13 December 2015, 01:45:23

On 2015-12-12 13:28:28 -0500, Tom Lane wrote:
> BTW, there's another thing I'd like to see improved in this area, which is
> a problem already but will get a lot worse if we push more work into the
> post-bootstrap phase of initdb.  That is that the post-bootstrap phase is
> both inefficient and impossible to debug.  If you've ever had a failure
> there, you'll have seen that the backend spits out an entire SQL script
> and says there's an error in it somewhere; that's because it gets the
> whole per-stage script as one submission.

Seen that more than once :(

> Breaking the stage scripts down further would help, but that is
> unattractive because each one requires a fresh backend startup/shutdown,
> including a full checkpoint.  I'd like to see things rejiggered so that
> there's only one post-bootstrap standalone backend session that performs
> all the steps, but initdb feeds it just one SQL command at a time so that
> errors are better localized.  That should both speed up initdb noticeably
> and make debugging easier.

One way to do that would be to not use the single user mode for that
stage. Afair, at that point we could actually just start the cluster
"normally", with the socket pointing somewhere locally (ugh, some path
length issues afoot, maybe allow relative directories? And, uh,
windows), and use psql to do the splitting and everything for us.

Your approach has probably some significant performance benefits,
because it essentially does pipelining. So while aesthetically
attractive, I'm afraid my proposal would lead to worse performance. So
it's probably actually DOA :(

Andres

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Andres Freund

Date:

13 December 2015, 02:06:44

On 2015-12-12 17:31:49 -0500, Tom Lane wrote:
> I thought this sounded like a nice lazy-Saturday project, so I started
> poking at it, and attached is a WIP patch.

Not bad, not bad at all.


> After some experimentation, I came up with the idea of executing any
> time that a semicolon followed by two newlines is seen.  ...
> but it turns out that no such case exists anywhere in initdb's data.

Not pretty, but hardly any worse than the current situation.


> I'm not particularly wedded to this rule.  In principle we could go so
> far as to import psql's code that parses commands and figures out which
> semicolons are command terminators --- but that is a pretty large chunk
> of code, and I think it'd really be overkill considering that initdb
> deals only with fixed input scripts.

Having that code somewhere abstracted wouldn't be bad though, extension
scripts have a somewhat similar problem.


> Anyway, the attached patch tweaks postgres.c to follow that rule instead
> of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
> compatible is a problem here; in fact, I'm tempted to rip out the -j
> switch altogether and just have standalone mode always parse input the
> same way.

No objection here.


> Does anyone know of people using standalone mode other than
> for initdb?

Unfortunately yes. There's docker instances around that configure users
and everything using it.


> The other part of the patch modifies initdb to do all its post-bootstrap
> steps using a single standalone backend session.  I had to remove the
> code that currently prints out progress markers for individual phases
> of that processing, so that now you get output that looks like

That's cool too. Besides processing the .bki files, and there largely
reg*_in, the many restarts are the most expensive parts of initdb.


Greetings,

Andres Freund

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Joe Conway

Date:

13 December 2015, 02:07:42

On 12/12/2015 02:31 PM, Tom Lane wrote:
> I'm not particularly wedded to this rule.  In principle we could go so
> far as to import psql's code that parses commands and figures out which
> semicolons are command terminators --- but that is a pretty large chunk
> of code, and I think it'd really be overkill considering that initdb
> deals only with fixed input scripts.  But if anyone has another simple
> rule for breaking SQL into commands, we can certainly discuss
> alternatives.

Possibly inadequate, but I wrote a get_one_query() function to grab one
statement at a time from a possibly multi-statement string and it isn't
all that many lines of code:
 https://github.com/jconway/pgsynck/blob/master/pgsynck.c

> Anyway, the attached patch tweaks postgres.c to follow that rule instead
> of slurp-to-EOF when -j is given.  I doubt that being non-backwards-
> compatible is a problem here; in fact, I'm tempted to rip out the -j
> switch altogether and just have standalone mode always parse input the
> same way.  Does anyone know of people using standalone mode other than
> for initdb?

sepgsql uses it for installation, but it does not appear to use -j
I'm not sure why it is required but at some point I'd like to dig into that.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 02:27:11

Andres Freund <andres@anarazel.de> writes:
> On 2015-12-12 17:31:49 -0500, Tom Lane wrote:
>> Does anyone know of people using standalone mode other than
>> for initdb?

> Unfortunately yes. There's docker instances around that configure users
> and everything using it.

Hm, that means that we *do* have to worry about backwards compatibility.
We might be able to get away with changing the behavior of -j mode anyway,
though, since this proposal mostly only changes when execution happens
and not what is valid input for -j mode.  (It would probably break some
apps if we took away the switch, since right now, you do not need a
semicolon to terminate commands in the regular standalone mode.)
Failing that, we could define a new switch, I guess.

> That's cool too. Besides processing the .bki files, and there largely
> reg*_in, the many restarts are the most expensive parts of initdb.

Right.  The proposal we were discussing upthread would move all the reg*
lookups into creation of the .bki file, basically, which would improve
that part of things quite a bit.  (BTW, if we are concerned about initdb
speed, that might be a reason not to be too eager to shift processing
from bootstrap to non-bootstrap mode.  Other than the reg* issue,
bootstrap is certainly a far faster way to put rows into the catalogs
than individual SQL commands could be.)
        regards, tom lane

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 02:42:26

Joe Conway <mail@joeconway.com> writes:
> On 12/12/2015 02:31 PM, Tom Lane wrote:
>> I'm not particularly wedded to this rule.  In principle we could go so
>> far as to import psql's code that parses commands and figures out which
>> semicolons are command terminators --- but that is a pretty large chunk
>> of code, and I think it'd really be overkill considering that initdb
>> deals only with fixed input scripts.  But if anyone has another simple
>> rule for breaking SQL into commands, we can certainly discuss
>> alternatives.

> Possibly inadequate, but I wrote a get_one_query() function to grab one
> statement at a time from a possibly multi-statement string and it isn't
> all that many lines of code:
>   https://github.com/jconway/pgsynck/blob/master/pgsynck.c

Hmm.  Doesn't look like that handles semicolons embedded in CREATE RULE;
for that you'd have to track parenthesis nesting as well.  (It's arguable
that we won't ever need that case during initdb, but I'd just as soon not
wire in such an assumption.)  In general, though, I'd rather not try to
teach InteractiveBackend() such a large amount about SQL syntax.

With a rule like "break at ;\n\n" it's possible to ensure that command
breaks occur only where wanted, though in corner cases you might have to
format your input oddly.  (For instance, if you needed that in a SQL
literal, you might resort to E';\n\n' or use the standard's rules about
concatenated string literals.)  If you get it wrong the consequences
aren't too disastrous: you'll get an unterminated-input syntax error,
or in the other direction multiple commands will get run together for
execution, which most of the time isn't a big issue.

>> Does anyone know of people using standalone mode other than
>> for initdb?

> sepgsql uses it for installation, but it does not appear to use -j
> I'm not sure why it is required but at some point I'd like to dig into that.

It might be easier than starting a full postmaster and having to figure
out a secure place for the socket etc.  I'm prepared to back off the
proposal about changing the default behavior of standalone mode; that
leaves us with a choice between changing -j's behavior and inventing
a new switch.
        regards, tom lane

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 03:52:30

Andres Freund <andres@anarazel.de> writes:
> That's cool too. Besides processing the .bki files, and there largely
> reg*_in, the many restarts are the most expensive parts of initdb.

BTW, in case anyone is doubting it, I did a little bit of "perf" tracing
and confirmed Andres' comment here: more than 50% of the runtime of the
bootstrap phase is eaten by the pg_proc seqscans performed by regprocin.
There's nothing else amounting to more than a few percent, so basically
nothing else in bootstrap is worth optimizing before we fix that.
        regards, tom lane

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Mark Dilger

Date:

13 December 2015, 05:24:31

> On Dec 12, 2015, at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Joe Conway <mail@joeconway.com> writes:
>> On 12/12/2015 02:31 PM, Tom Lane wrote:
>>> I'm not particularly wedded to this rule.  In principle we could go so
>>> far as to import psql's code that parses commands and figures out which
>>> semicolons are command terminators --- but that is a pretty large chunk
>>> of code, and I think it'd really be overkill considering that initdb
>>> deals only with fixed input scripts.  But if anyone has another simple
>>> rule for breaking SQL into commands, we can certainly discuss
>>> alternatives.
>
>> Possibly inadequate, but I wrote a get_one_query() function to grab one
>> statement at a time from a possibly multi-statement string and it isn't
>> all that many lines of code:
>>  https://github.com/jconway/pgsynck/blob/master/pgsynck.c
>
> Hmm.  Doesn't look like that handles semicolons embedded in CREATE RULE;
> for that you'd have to track parenthesis nesting as well.  (It's arguable
> that we won't ever need that case during initdb, but I'd just as soon not
> wire in such an assumption.)  In general, though, I'd rather not try to
> teach InteractiveBackend() such a large amount about SQL syntax.

I use CREATE RULE within startup files in the fork that I maintain.  I have
lots of them, totaling perhaps 50k lines of rule code.  I don't think any of that
code would have a problem with the double-newline separation you propose,
which seems a more elegant solution to me.  Admittedly, the double-newline
separation would need to be documented at the top of each sql file, otherwise
it would be quite surprising to those unfamiliar with it.

You mentioned upthread that introducing a syntax error in one of these files
results in a not-so-helpful error message that dumps the contents of the
entire file.  I can confirm that happens, and is hardly useful.

mark

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 08:40:34

Mark Dilger <hornschnorter@gmail.com> writes:
>> On Dec 12, 2015, at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... In general, though, I'd rather not try to
>> teach InteractiveBackend() such a large amount about SQL syntax.

> I use CREATE RULE within startup files in the fork that I maintain.  I have
> lots of them, totaling perhaps 50k lines of rule code.  I don't think any of that
> code would have a problem with the double-newline separation you propose,
> which seems a more elegant solution to me.

Yeah?  Just for proof-of-concept, could you run your startup files with
the postgres.c patch as proposed, and see whether you get any failures?

> Admittedly, the double-newline
> separation would need to be documented at the top of each sql file, otherwise
> it would be quite surprising to those unfamiliar with it.

Agreed, that wouldn't be a bad thing.

I thought of a positive argument not to do the "fully right" thing by
means of implementing the exactly-right command boundary rules.  Suppose
that you mess up in information_schema.sql or another large input file
by introducing an extra left parenthesis in some query.  What would happen
if InteractiveBackend() were cognizant of the paren-matching rule is that
it would slurp everything till the end-of-file and then produce a syntax
error message quoting all that text; not much better than what happens
today.  With a command break rule like semi-newline-newline, there'll be
a limited horizon as to how much text gets swallowed before you get the
error message.

Note that this is different from the situation with a fully interactive
input processor like psql: if you're typing the same thing in psql,
you'll realize as soon as it doesn't execute the command when-expected
that something is wrong.  You won't type another thousand lines of input
before looking closely at what you typed already.

I'm still not quite sold on semi-newline-newline as being the best
possible command boundary rule here; but I do think that "fully correct"
boundary rules are less attractive than they might sound.
        regards, tom lane

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Craig Ringer

Date:

13 December 2015, 14:39:46

On 13 December 2015 at 06:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not particularly wedded to this rule. In principle we could go so
far as to import psql's code that parses commands and figures out which
semicolons are command terminators --- but that is a pretty large chunk
of code, and I think it'd really be overkill considering that initdb
deals only with fixed input scripts.

Shouldn't that be a bison/flex job anyway, rather than hand-coded? Or a simple(ish) state machine?

Dollar-quoted strings are probably the only quite ugly bit due to their arbitrary delimiters. So I thought I'd sketch out how it'd look as a state machine. At which point I remembered that we allow $ in identifiers too. So the state machine would have to bother with unquoted identifiers. Of course $ marks parameters, so it has to keep track of if it's reading a parameter. At which point you have half an SQL parser.

This strikes me as a really good reason for making it re-usable, because it's horrid to write code that handles statement splitting in the PostgreSQL dialect.

Optional handling of psql \commands would be required, but that'd make it easier for PgAdmin to support psql backslash commands, so there'd be a win there too.

I figured I'd sketch it out for kicks. Comment: yuck.

States would be at least:

SQL_TEXT

SQL_UNQUOTED_IDENTIFIER_OR_KEYWORD

NUMBER

QUOTED_IDENTIFIER

QUOTED_IDENTIFIER_QUOTE

SQL_TEXT_DOLLAR

DOLLAR_STRING_START_DELIM

DOLLAR_STRING

DOLLAR_STRING_DOLLAR

DOLLAR_STRING_END_DELIM

STANDARD_STRING

STANDARD_STRING_QUOTE

SQL_TEXT_E

ESCAPE_STRING

ESCAPE_STRING_ESCAPE

Transitions

SQL_TEXT => { SQL_TEXT, SQL_UNQUOTED_IDENTIFIER_OR_KEYWORD, NUMBER, QUOTED_IDENTIFIER, SQL_TEXT_DOLLAR, STANDARD_STRING, SQL_TEXT_E, ESCAPE_STRING }

SQL_TEXT_E => { SQL_TEXT, ESCAPE_STRING, SQL_UNQUOTED_IDENTIFIER_OR_KEYWORD }

SQL_TEXT_DOLLAR => { SQL_TEXT, NUMBER, DOLLAR_STRING_START_DELIM }

QUOTED_IDENTIFIER => { QUOTED_IDENTIFIER, QUOTED_IDENTIFIER_QUOTE }

QUOTED_IDENTIFIER_QUOTE => { SQL_TEXT, QUOTED_IDENTIFIER }

DOLLAR_STRING_START_DELIM => { DOLLAR_STRING_START_DELIM, DOLLAR_STRING }

DOLLAR_STRING => { DOLLAR_STRING, DOLLAR_STRING_DOLLAR }

DOLLAR_STRING_END_DELIM => { DOLLAR_STRING_END_DELIM, SQL_TEXT, DOLLAR_STRING }

STANDARD_STRING => { STANDARD_STRING, STANDARD_STRING_QUOTE }

STANDARD_STRING_QUOTE => { SQL_TEXT, STANDARD_STRING }

ESCAPE_STRING => { ESCAPE_STRING, ESCAPE_STRING_ESCAPE }

ESCAPE_STRING_ESCAPE => { SQL_TEXT, ESCAPE_STRING }

NUMBER consumes sequential digits and period chars and returns to SQL_TEXT at any non-digit. (That way it can handle Pg's lazy parser quirks like SELECT 123"f" being legal, and equivalent to SELECT 123 AS "f").

SQL_UNQUOTED_IDENTIFIER_OR_KEYWORD is needed because a $ within an identifier is part of the identifier so it can't just be consumed as SQL_TEXT .

For dollar strings, when a $ is found when reading SQL text (not an identifier/keyword), enter SQL_TEXT_DOLLAR. What comes next must be a parameter or the start of a dollar string. If the next char is a digit then it's a parameter so switch to NUMBER, since dollar-quoted string delims follow identifier rules and unquoted identifiers can't start with a number. Otherwise switch to DOLLAR_STRING_START_DELIM and consume until a $ is found or something illegal in an identifier is found. Or of course it could be lone $ which is bogus syntax but as far as we're concerned still just SQL_TEXT. Really, this is just looking for a dollar-quote start and doesn't care what it finds if it isn't a valid dollar-quote start.

If a valid dollar-quote delim is found switch to DOLLAR_STRING and read until we find the matching delim using a similar process, entering DOLLAR_STRING_DOLLAR, looking for param vs end delim, etc. When a full delimiter is read compare to the start delimiter and switch back to SQL_TEXT mode if it matches, otherwise remain in DOLLAR_STRING.

If an invalid dollar string delim was found switch back to SQL_TEXT (since it wasn't a valid beginning of a dollar string) and continue.

For QUOTED_IDENTIFIER_QUOTE and STANDARD_STRING_QUOTE, it found a " or ' while reading a quoted identifier or standard string and transitioned into that state. If the next char doubles the quote it'll return to reading the string; otherwise it'll return to the SQL_TEXT state since the identifier or literal has ended.

Similarly with ESCAPE_STRING_ESCAPE. Having found an escape, consume the next char even if it's a quote and return to the string parsing.

All this ignores psql backslash commands.

Have I missed anything really obvious? Does it seem useful to have more re-usable statement splitting code? Is there any sane justification for doing anything but extracting what psql does verbatim while carefully changing nothing?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Mark Dilger

Date:

13 December 2015, 19:34:33

> On Dec 12, 2015, at 9:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <hornschnorter@gmail.com> writes:
>>> On Dec 12, 2015, at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... In general, though, I'd rather not try to
>>> teach InteractiveBackend() such a large amount about SQL syntax.
>
>> I use CREATE RULE within startup files in the fork that I maintain.  I have
>> lots of them, totaling perhaps 50k lines of rule code.  I don't think any of that
>> code would have a problem with the double-newline separation you propose,
>> which seems a more elegant solution to me.
>
> Yeah?  Just for proof-of-concept, could you run your startup files with
> the postgres.c patch as proposed, and see whether you get any failures?

Given all the changes I've made to initdb.c in my fork, that patch
of yours doesn't apply.

mark

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

13 December 2015, 20:05:50

Craig Ringer <craig@2ndquadrant.com> writes:
> On 13 December 2015 at 06:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not particularly wedded to this rule.  In principle we could go so
>> far as to import psql's code that parses commands and figures out which
>> semicolons are command terminators --- but that is a pretty large chunk
>> of code, and I think it'd really be overkill considering that initdb
>> deals only with fixed input scripts.

> Shouldn't that be a bison/flex job anyway, rather than hand-coded?

It is, if you're speaking of how psql does it.

I thought about trying to get the backend's existing lexer to do it,
but that code will want to throw an error if it sees unterminated
input (such as an incomplete slash-star comment).  I'm not sure that
it'd be a good thing to try to make that lexer serve two masters.

I'm also getting less and less enthused about trying to share code with
psql.  In the first place, the backend has no interest in recognizing
psql backslash-commands, nor does it need to deal with some of the weird
constraints psql has like having to handle non-backend-safe encodings.
In the second, while it's reasonable for psql to deal with CREATE RULE
syntax by counting parentheses, there's a good argument that that is
not the behavior we want for noninteractive situations such as reading
information_schema.sql.  We won't, for example, have anything
corresponding to psql's changing input prompt to help debug problems.
In the third place, it's just difficult and ugly to write code that
will work in both backend and frontend contexts.  We've done it,
certainly, but not for any problem as involved as this would be.
        regards, tom lane

Re: Using a single standalone-backend run in initdb (was Re: Bootstrap DATA is a pita)

From

Tom Lane

Date:

17 December 2015, 19:19:46

Mark Dilger <hornschnorter@gmail.com> writes:
> I use CREATE RULE within startup files in the fork that I maintain.  I have
> lots of them, totaling perhaps 50k lines of rule code.  I don't think any of that
> code would have a problem with the double-newline separation you propose,
> which seems a more elegant solution to me.  Admittedly, the double-newline
> separation would need to be documented at the top of each sql file, otherwise
> it would be quite surprising to those unfamiliar with it.

> You mentioned upthread that introducing a syntax error in one of these files
> results in a not-so-helpful error message that dumps the contents of the
> entire file.  I can confirm that happens, and is hardly useful.

Not having heard any ideas that sounded better than semi-newline-newline,
I went ahead and finished up this patch on that basis.  Attached are two
patch files; the first one redefines the behavior of -j, and the second
one modifies initdb to use only one standalone-backend run.  I present
them this way to emphasize that the -j change doesn't break much of
anything: initdb still works if you apply only the first patch.  And
I didn't change anything in the initdb input files, except for adding
the comment documentation Mark suggests above.

In passing in the first patch, I got rid of the TCOP_DONTUSENEWLINE
#define, which could not have been used by anyone in a very long time
because it would break initdb.  I then realized that
src/include/tcop/tcopdebug.h is completely dead code, because the
other debugging symbol it claims to specify got ripped out long ago.
And to add insult to injury, that file is included noplace.  I imagine
Bruce's pgrminclude tool got rid of the inclusion that must once have
existed in postgres.c, after observing that postgres.c still compiled
without it :-(.  (That tool really requires more adult supervision
than it has gotten.) So anyway, this patch removes tcopdebug.h entirely.

The second patch consists of removing extra backend starts/stops
and converting all of initdb's code to run in -j mode, rather than the
mishmash of -j and not-j behavior that was there before.  I changed
all the semicolon-newlines in initdb's command strings to
semicolon-newline-newlines.  As mentioned before, only a small number of
those changes *had* to be made to get it to work, namely the ones around
VACUUM and CREATE DATABASE statements, but I felt that for consistency
and error localization all of them should be changed.  I also failed to
resist the temptation to const-ify some of the arrays more thoroughly.

Barring objections I'll push this into HEAD soon.

            regards, tom lane

diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index e2e9909..d60d4ff 100644
*** a/doc/src/sgml/ref/postgres-ref.sgml
--- b/doc/src/sgml/ref/postgres-ref.sgml
*************** PostgreSQL documentation
*** 529,535 ****
      </indexterm>

      <para>
!      The following options only apply to the single-user mode.
      </para>

      <variablelist>
--- 529,537 ----
      </indexterm>

      <para>
!      The following options only apply to the single-user mode
!      (see <xref linkend="app-postgres-single-user"
!      endterm="app-postgres-single-user-title">).
      </para>

      <variablelist>
*************** PostgreSQL documentation
*** 558,564 ****
        <term><option>-E</option></term>
        <listitem>
         <para>
!         Echo all commands.
         </para>
        </listitem>
       </varlistentry>
--- 560,566 ----
        <term><option>-E</option></term>
        <listitem>
         <para>
!         Echo all commands to standard output before executing them.
         </para>
        </listitem>
       </varlistentry>
*************** PostgreSQL documentation
*** 567,573 ****
        <term><option>-j</option></term>
        <listitem>
         <para>
!         Disables use of newline as a statement delimiter.
         </para>
        </listitem>
       </varlistentry>
--- 569,576 ----
        <term><option>-j</option></term>
        <listitem>
         <para>
!         Use semicolon followed by two newlines, rather than just newline,
!         as the command entry terminator.
         </para>
        </listitem>
       </varlistentry>
*************** PostgreSQL documentation
*** 760,767 ****
    </para>
   </refsect1>

!  <refsect1>
!   <title>Usage</title>

     <para>
      To start a single-user mode server, use a command like
--- 763,770 ----
    </para>
   </refsect1>

!  <refsect1 id="app-postgres-single-user">
!   <title id="app-postgres-single-user-title">Single-User Mode</title>

     <para>
      To start a single-user mode server, use a command like
*************** PostgreSQL documentation
*** 778,807 ****
      entry terminator; there is no intelligence about semicolons,
      as there is in <application>psql</>.  To continue a command
      across multiple lines, you must type backslash just before each
!     newline except the last one.
     </para>

     <para>
!     But if you use the <option>-j</> command line switch, then newline does
!     not terminate command entry.  In this case, the server will read the standard input
!     until the end-of-file (<acronym>EOF</>) marker, then
!     process the input as a single command string.  Backslash-newline is not
!     treated specially in this case.
     </para>

     <para>
      To quit the session, type <acronym>EOF</acronym>
      (<keycombo action="simul"><keycap>Control</><keycap>D</></>, usually).
!     If you've
!     used <option>-j</>, two consecutive <acronym>EOF</>s are needed to exit.
     </para>

     <para>
      Note that the single-user mode server does not provide sophisticated
      line-editing features (no command history, for example).
!     Single-User mode also does not do any background processing, like
!     automatic checkpoints.
!
     </para>
   </refsect1>

--- 781,820 ----
      entry terminator; there is no intelligence about semicolons,
      as there is in <application>psql</>.  To continue a command
      across multiple lines, you must type backslash just before each
!     newline except the last one.  The backslash and adjacent newline are
!     both dropped from the input command.  Note that this will happen even
!     when within a string literal or comment.
     </para>

     <para>
!     But if you use the <option>-j</> command line switch, a single newline
!     does not terminate command entry; instead, the sequence
!     semicolon-newline-newline does.  That is, type a semicolon immediately
!     followed by a completely empty line.  Backslash-newline is not
!     treated specially in this mode.  Again, there is no intelligence about
!     such a sequence appearing within a string literal or comment.
!    </para>
!
!    <para>
!     In either input mode, if you type a semicolon that is not just before or
!     part of a command entry terminator, it is considered a command separator.
!     When you do type a command entry terminator, the multiple statements
!     you've entered will be executed as a single transaction.
     </para>

     <para>
      To quit the session, type <acronym>EOF</acronym>
      (<keycombo action="simul"><keycap>Control</><keycap>D</></>, usually).
!     If you've entered any text since the last command entry terminator,
!     then <acronym>EOF</acronym> will be taken as a command entry terminator,
!     and another <acronym>EOF</> will be needed to exit.
     </para>

     <para>
      Note that the single-user mode server does not provide sophisticated
      line-editing features (no command history, for example).
!     Single-user mode also does not do any background processing, such as
!     automatic checkpoints or replication.
     </para>
   </refsect1>

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 6e1b241..2239859 100644
*** a/src/backend/catalog/information_schema.sql
--- b/src/backend/catalog/information_schema.sql
***************
*** 5,10 ****
--- 5,18 ----
   * Copyright (c) 2003-2015, PostgreSQL Global Development Group
   *
   * src/backend/catalog/information_schema.sql
+  *
+  * Note: this file is read in single-user -j mode, which means that the
+  * command terminator is semicolon-newline-newline; whenever the backend
+  * sees that, it stops and executes what it's got.  If you write a lot of
+  * statements without empty lines between, they'll all get quoted to you
+  * in any error message about one of them, so don't do that.  Also, you
+  * cannot write a semicolon immediately followed by an empty line in a
+  * string literal (including a function body!) or a multiline comment.
   */

  /*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ccc030f..536c805 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 4,9 ****
--- 4,17 ----
   * Copyright (c) 1996-2015, PostgreSQL Global Development Group
   *
   * src/backend/catalog/system_views.sql
+  *
+  * Note: this file is read in single-user -j mode, which means that the
+  * command terminator is semicolon-newline-newline; whenever the backend
+  * sees that, it stops and executes what it's got.  If you write a lot of
+  * statements without empty lines between, they'll all get quoted to you
+  * in any error message about one of them, so don't do that.  Also, you
+  * cannot write a semicolon immediately followed by an empty line in a
+  * string literal (including a function body!) or a multiline comment.
   */

  CREATE VIEW pg_roles AS
diff --git a/src/backend/snowball/snowball.sql.in b/src/backend/snowball/snowball.sql.in
index 2f68393..68363f1 100644
*** a/src/backend/snowball/snowball.sql.in
--- b/src/backend/snowball/snowball.sql.in
***************
*** 1,6 ****
! -- src/backend/snowball/snowball.sql.in$

- -- text search configuration for _LANGNAME_ language
  CREATE TEXT SEARCH DICTIONARY _DICTNAME_
      (TEMPLATE = snowball, Language = _LANGNAME_ _STOPWORDS_);

--- 1,22 ----
! /*
!  * text search configuration for _LANGNAME_ language
!  *
!  * Copyright (c) 2007-2015, PostgreSQL Global Development Group
!  *
!  * src/backend/snowball/snowball.sql.in
!  *
!  * _LANGNAME_ and certain other macros are replaced for each language;
!  * see the Makefile for details.
!  *
!  * Note: this file is read in single-user -j mode, which means that the
!  * command terminator is semicolon-newline-newline; whenever the backend
!  * sees that, it stops and executes what it's got.  If you write a lot of
!  * statements without empty lines between, they'll all get quoted to you
!  * in any error message about one of them, so don't do that.  Also, you
!  * cannot write a semicolon immediately followed by an empty line in a
!  * string literal (including a function body!) or a multiline comment.
!  */

  CREATE TEXT SEARCH DICTIONARY _DICTNAME_
      (TEMPLATE = snowball, Language = _LANGNAME_ _STOPWORDS_);

diff --git a/src/backend/snowball/snowball_func.sql.in b/src/backend/snowball/snowball_func.sql.in
index e7d4510..debb0e0 100644
*** a/src/backend/snowball/snowball_func.sql.in
--- b/src/backend/snowball/snowball_func.sql.in
***************
*** 1,4 ****
! -- src/backend/snowball/snowball_func.sql.in$

  SET search_path = pg_catalog;

--- 1,21 ----
! /*
!  * Create underlying C functions for Snowball stemmers
!  *
!  * Copyright (c) 2007-2015, PostgreSQL Global Development Group
!  *
!  * src/backend/snowball/snowball_func.sql.in
!  *
!  * This file is combined with multiple instances of snowball.sql.in to
!  * build snowball_create.sql, which is executed during initdb.
!  *
!  * Note: this file is read in single-user -j mode, which means that the
!  * command terminator is semicolon-newline-newline; whenever the backend
!  * sees that, it stops and executes what it's got.  If you write a lot of
!  * statements without empty lines between, they'll all get quoted to you
!  * in any error message about one of them, so don't do that.  Also, you
!  * cannot write a semicolon immediately followed by an empty line in a
!  * string literal (including a function body!) or a multiline comment.
!  */

  SET search_path = pg_catalog;

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1dc2eb0..4ae2548 100644
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*************** static CachedPlanSource *unnamed_stmt_ps
*** 157,174 ****

  /* assorted command-line switches */
  static const char *userDoption = NULL;    /* -D switch */
-
  static bool EchoQuery = false;    /* -E switch */
!
! /*
!  * people who want to use EOF should #define DONTUSENEWLINE in
!  * tcop/tcopdebug.h
!  */
! #ifndef TCOP_DONTUSENEWLINE
! static int    UseNewLine = 1;        /* Use newlines query delimiters (the default) */
! #else
! static int    UseNewLine = 0;        /* Use EOF as query delimiters */
! #endif   /* TCOP_DONTUSENEWLINE */

  /* whether or not, and why, we were canceled by conflict with recovery */
  static bool RecoveryConflictPending = false;
--- 157,164 ----

  /* assorted command-line switches */
  static const char *userDoption = NULL;    /* -D switch */
  static bool EchoQuery = false;    /* -E switch */
! static bool UseSemiNewlineNewline = false;        /* -j switch */

  /* whether or not, and why, we were canceled by conflict with recovery */
  static bool RecoveryConflictPending = false;
*************** static int
*** 219,226 ****
  InteractiveBackend(StringInfo inBuf)
  {
      int            c;                /* character read from getc() */
-     bool        end = false;    /* end-of-input flag */
-     bool        backslashSeen = false;    /* have we seen a \ ? */

      /*
       * display a prompt and obtain input from the user
--- 209,214 ----
*************** InteractiveBackend(StringInfo inBuf)
*** 230,284 ****

      resetStringInfo(inBuf);

!     if (UseNewLine)
      {
!         /*
!          * if we are using \n as a delimiter, then read characters until the
!          * \n.
!          */
!         while ((c = interactive_getc()) != EOF)
          {
!             if (c == '\n')
              {
!                 if (backslashSeen)
                  {
                      /* discard backslash from inBuf */
                      inBuf->data[--inBuf->len] = '\0';
!                     backslashSeen = false;
                      continue;
                  }
                  else
                  {
!                     /* keep the newline character */
                      appendStringInfoChar(inBuf, '\n');
                      break;
                  }
              }
-             else if (c == '\\')
-                 backslashSeen = true;
-             else
-                 backslashSeen = false;
-
-             appendStringInfoChar(inBuf, (char) c);
          }

!         if (c == EOF)
!             end = true;
!     }
!     else
!     {
!         /*
!          * otherwise read characters until EOF.
!          */
!         while ((c = interactive_getc()) != EOF)
!             appendStringInfoChar(inBuf, (char) c);
!
!         /* No input before EOF signal means time to quit. */
!         if (inBuf->len == 0)
!             end = true;
      }

!     if (end)
          return EOF;

      /*
--- 218,273 ----

      resetStringInfo(inBuf);

!     /*
!      * Read characters until EOF or the appropriate delimiter is seen.
!      */
!     while ((c = interactive_getc()) != EOF)
      {
!         if (c == '\n')
          {
!             if (UseSemiNewlineNewline)
              {
!                 /*
!                  * In -j mode, semicolon followed by two newlines ends the
!                  * command; otherwise treat newline as regular character.
!                  */
!                 if (inBuf->len > 1 &&
!                     inBuf->data[inBuf->len - 1] == '\n' &&
!                     inBuf->data[inBuf->len - 2] == ';')
!                 {
!                     /* might as well drop the second newline */
!                     break;
!                 }
!             }
!             else
!             {
!                 /*
!                  * In plain mode, newline ends the command unless preceded by
!                  * backslash.
!                  */
!                 if (inBuf->len > 0 &&
!                     inBuf->data[inBuf->len - 1] == '\\')
                  {
                      /* discard backslash from inBuf */
                      inBuf->data[--inBuf->len] = '\0';
!                     /* discard newline too */
                      continue;
                  }
                  else
                  {
!                     /* keep the newline character, but end the command */
                      appendStringInfoChar(inBuf, '\n');
                      break;
                  }
              }
          }

!         /* Not newline, or newline treated as regular character */
!         appendStringInfoChar(inBuf, (char) c);
      }

!     /* No input before EOF signal means time to quit. */
!     if (c == EOF && inBuf->len == 0)
          return EOF;

      /*
*************** process_postgres_switches(int argc, char
*** 3391,3397 ****

              case 'j':
                  if (secure)
!                     UseNewLine = 0;
                  break;

              case 'k':
--- 3380,3386 ----

              case 'j':
                  if (secure)
!                     UseSemiNewlineNewline = true;
                  break;

              case 'k':
diff --git a/src/include/tcop/tcopdebug.h b/src/include/tcop/tcopdebug.h
index d7145ce..e69de29 100644
*** a/src/include/tcop/tcopdebug.h
--- b/src/include/tcop/tcopdebug.h
***************
*** 1,44 ****
- /*-------------------------------------------------------------------------
-  *
-  * tcopdebug.h
-  *      #defines governing debugging behaviour in the traffic cop
-  *
-  *
-  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
-  * Portions Copyright (c) 1994, Regents of the University of California
-  *
-  * src/include/tcop/tcopdebug.h
-  *
-  *-------------------------------------------------------------------------
-  */
- #ifndef TCOPDEBUG_H
- #define TCOPDEBUG_H
-
- /* ----------------------------------------------------------------
-  *        debugging defines.
-  *
-  *        If you want certain debugging behaviour, then #define
-  *        the variable to 1, else #undef it. -cim 10/26/89
-  * ----------------------------------------------------------------
-  */
-
- /* ----------------
-  *        TCOP_SHOWSTATS controls whether or not buffer and
-  *        access method statistics are shown for each query.  -cim 2/9/89
-  * ----------------
-  */
- #undef TCOP_SHOWSTATS
-
- /* ----------------
-  *        TCOP_DONTUSENEWLINE controls the default setting of
-  *        the UseNewLine variable in postgres.c
-  * ----------------
-  */
- #undef TCOP_DONTUSENEWLINE
-
- /* ----------------------------------------------------------------
-  *        #defines controlled by above definitions
-  * ----------------------------------------------------------------
-  */
-
- #endif   /* TCOPDEBUG_H */
--- 0 ----
diff --git a/src/backend/utils/mb/conversion_procs/Makefile b/src/backend/utils/mb/conversion_procs/Makefile
index 8481721..cad5948 100644
*** a/src/backend/utils/mb/conversion_procs/Makefile
--- b/src/backend/utils/mb/conversion_procs/Makefile
*************** $(SQLSCRIPT): Makefile
*** 181,186 ****
--- 181,187 ----
          echo "DROP CONVERSION pg_catalog.$$name;"; \
          echo "CREATE DEFAULT CONVERSION pg_catalog.$$name FOR '$$se' TO '$$de' FROM $$func;"; \
              echo "COMMENT ON CONVERSION pg_catalog.$$name IS 'conversion for $$se to $$de';"; \
+         echo; \
      done > $@

  $(REGRESSION_SCRIPT): Makefile
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index feeff9e..31b2588 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
***************
*** 79,85 ****
  /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
  extern const char *select_default_timezone(const char *share_path);

! static const char *auth_methods_host[] = {"trust", "reject", "md5", "password", "ident", "radius",
  #ifdef ENABLE_GSS
      "gss",
  #endif
--- 79,86 ----
  /* Ideally this would be in a .h file, but it hardly seems worth the trouble */
  extern const char *select_default_timezone(const char *share_path);

! static const char *const auth_methods_host[] = {
!     "trust", "reject", "md5", "password", "ident", "radius",
  #ifdef ENABLE_GSS
      "gss",
  #endif
*************** static const char *auth_methods_host[] =
*** 95,109 ****
  #ifdef USE_SSL
      "cert",
  #endif
! NULL};
! static const char *auth_methods_local[] = {"trust", "reject", "md5", "password", "peer", "radius",
  #ifdef USE_PAM
      "pam", "pam ",
  #endif
  #ifdef USE_LDAP
      "ldap",
  #endif
! NULL};

  /*
   * these values are passed in by makefile defines
--- 96,113 ----
  #ifdef USE_SSL
      "cert",
  #endif
!     NULL
! };
! static const char *const auth_methods_local[] = {
!     "trust", "reject", "md5", "password", "peer", "radius",
  #ifdef USE_PAM
      "pam", "pam ",
  #endif
  #ifdef USE_LDAP
      "ldap",
  #endif
!     NULL
! };

  /*
   * these values are passed in by makefile defines
*************** static char *authwarning = NULL;
*** 185,193 ****
   * (no quoting to worry about).
   */
  static const char *boot_options = "-F";
! static const char *backend_options = "--single -F -O -c search_path=pg_catalog -c exit_on_error=true";

! static const char *subdirs[] = {
      "global",
      "pg_xlog",
      "pg_xlog/archive_status",
--- 189,197 ----
   * (no quoting to worry about).
   */
  static const char *boot_options = "-F";
! static const char *backend_options = "--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true";

! static const char *const subdirs[] = {
      "global",
      "pg_xlog",
      "pg_xlog/archive_status",
*************** static void set_null_conf(void);
*** 244,264 ****
  static void test_config_settings(void);
  static void setup_config(void);
  static void bootstrap_template1(void);
! static void setup_auth(void);
! static void get_set_pwd(void);
! static void setup_depend(void);
! static void setup_sysviews(void);
! static void setup_description(void);
! static void setup_collation(void);
! static void setup_conversion(void);
! static void setup_dictionary(void);
! static void setup_privileges(void);
  static void set_info_version(void);
! static void setup_schema(void);
! static void load_plpgsql(void);
! static void vacuum_db(void);
! static void make_template0(void);
! static void make_postgres(void);
  static void fsync_pgdata(void);
  static void trapsig(int signum);
  static void check_ok(void);
--- 248,268 ----
  static void test_config_settings(void);
  static void setup_config(void);
  static void bootstrap_template1(void);
! static void setup_auth(FILE *cmdfd);
! static void get_set_pwd(FILE *cmdfd);
! static void setup_depend(FILE *cmdfd);
! static void setup_sysviews(FILE *cmdfd);
! static void setup_description(FILE *cmdfd);
! static void setup_collation(FILE *cmdfd);
! static void setup_conversion(FILE *cmdfd);
! static void setup_dictionary(FILE *cmdfd);
! static void setup_privileges(FILE *cmdfd);
  static void set_info_version(void);
! static void setup_schema(FILE *cmdfd);
! static void load_plpgsql(FILE *cmdfd);
! static void vacuum_db(FILE *cmdfd);
! static void make_template0(FILE *cmdfd);
! static void make_postgres(FILE *cmdfd);
  static void fsync_pgdata(void);
  static void trapsig(int signum);
  static void check_ok(void);
*************** bootstrap_template1(void)
*** 1545,1589 ****
   * set up the shadow password table
   */
  static void
! setup_auth(void)
  {
!     PG_CMD_DECL;
!     const char **line;
!     static const char *pg_authid_setup[] = {
          /*
           * The authid table shouldn't be readable except through views, to
           * ensure passwords are not publicly visible.
           */
!         "REVOKE ALL on pg_authid FROM public;\n",
          NULL
      };

-     fputs(_("initializing pg_authid ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = pg_authid_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * get the superuser password if required, and call postgres to set it
   */
  static void
! get_set_pwd(void)
  {
-     PG_CMD_DECL;
-
      char       *pwd1,
                 *pwd2;

--- 1549,1576 ----
   * set up the shadow password table
   */
  static void
! setup_auth(FILE *cmdfd)
  {
!     const char *const * line;
!     static const char *const pg_authid_setup[] = {
          /*
           * The authid table shouldn't be readable except through views, to
           * ensure passwords are not publicly visible.
           */
!         "REVOKE ALL on pg_authid FROM public;\n\n",
          NULL
      };

      for (line = pg_authid_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * get the superuser password if required, and call postgres to set it
   */
  static void
! get_set_pwd(FILE *cmdfd)
  {
      char       *pwd1,
                 *pwd2;

*************** get_set_pwd(void)
*** 1640,1675 ****
          pwd1 = pg_strdup(pwdbuf);

      }
-     printf(_("setting password ... "));
-     fflush(stdout);

!     snprintf(cmd, sizeof(cmd),
!              "\"%s\" %s template1 >%s",
!              backend_exec, backend_options,
!              DEVNULL);
!
!     PG_CMD_OPEN;
!
!     PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n",
                     username, escape_quotes(pwd1));

-     /* MM: pwd1 is no longer needed, freeing it */
      free(pwd1);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * set up pg_depend
   */
  static void
! setup_depend(void)
  {
!     PG_CMD_DECL;
!     const char **line;
!     static const char *pg_depend_setup[] = {
          /*
           * Make PIN entries in pg_depend for all objects made so far in the
           * tables that the dependency code handles.  This is overkill (the
--- 1627,1647 ----
          pwd1 = pg_strdup(pwdbuf);

      }

!     PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
                     username, escape_quotes(pwd1));

      free(pwd1);
  }

  /*
   * set up pg_depend
   */
  static void
! setup_depend(FILE *cmdfd)
  {
!     const char *const * line;
!     static const char *const pg_depend_setup[] = {
          /*
           * Make PIN entries in pg_depend for all objects made so far in the
           * tables that the dependency code handles.  This is overkill (the
*************** setup_depend(void)
*** 1684,1845 ****
           * First delete any already-made entries; PINs override all else, and
           * must be the only entries for their objects.
           */
!         "DELETE FROM pg_depend;\n",
!         "VACUUM pg_depend;\n",
!         "DELETE FROM pg_shdepend;\n",
!         "VACUUM pg_shdepend;\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_class;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_proc;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_type;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_cast;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_constraint;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_attrdef;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_language;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_operator;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_opclass;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_opfamily;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_amop;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_amproc;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_rewrite;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_trigger;\n",

          /*
           * restriction here to avoid pinning the public namespace
           */
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
          " FROM pg_namespace "
!         "    WHERE nspname LIKE 'pg%';\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_parser;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_dict;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_template;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_config;\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_collation;\n",
          "INSERT INTO pg_shdepend SELECT 0,0,0,0, tableoid,oid, 'p' "
!         " FROM pg_authid;\n",
          NULL
      };

-     fputs(_("initializing dependencies ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = pg_depend_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * set up system views
   */
  static void
! setup_sysviews(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **sysviews_setup;

-     fputs(_("creating system views ... "), stdout);
-     fflush(stdout);
-
      sysviews_setup = readfile(system_views_file);

-     /*
-      * We use -j here to avoid backslashing stuff in system_views.sql
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = sysviews_setup; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
          free(*line);
      }

-     PG_CMD_CLOSE;
-
      free(sysviews_setup);
-
-     check_ok();
  }

  /*
   * load description data
   */
  static void
! setup_description(void)
  {
-     PG_CMD_DECL;
-
-     fputs(_("loading system objects' descriptions ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_description ( "
                  "    objoid oid, "
                  "    classname name, "
                  "    objsubid int4, "
!                 "    description text) WITHOUT OIDS;\n");

!     PG_CMD_PRINTF1("COPY tmp_pg_description FROM E'%s';\n",
                     escape_quotes(desc_file));

      PG_CMD_PUTS("INSERT INTO pg_description "
                  " SELECT t.objoid, c.oid, t.objsubid, t.description "
                  "  FROM tmp_pg_description t, pg_class c "
!                 "    WHERE c.relname = t.classname;\n");

      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_shdescription ( "
                  " objoid oid, "
                  " classname name, "
!                 " description text) WITHOUT OIDS;\n");

!     PG_CMD_PRINTF1("COPY tmp_pg_shdescription FROM E'%s';\n",
                     escape_quotes(shdesc_file));

      PG_CMD_PUTS("INSERT INTO pg_shdescription "
                  " SELECT t.objoid, c.oid, t.description "
                  "  FROM tmp_pg_shdescription t, pg_class c "
!                 "   WHERE c.relname = t.classname;\n");

      /* Create default descriptions for operator implementation functions */
      PG_CMD_PUTS("WITH funcdescs AS ( "
--- 1656,1773 ----
           * First delete any already-made entries; PINs override all else, and
           * must be the only entries for their objects.
           */
!         "DELETE FROM pg_depend;\n\n",
!         "VACUUM pg_depend;\n\n",
!         "DELETE FROM pg_shdepend;\n\n",
!         "VACUUM pg_shdepend;\n\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_class;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_proc;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_type;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_cast;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_constraint;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_attrdef;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_language;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_operator;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_opclass;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_opfamily;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_amop;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_amproc;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_rewrite;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_trigger;\n\n",

          /*
           * restriction here to avoid pinning the public namespace
           */
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
          " FROM pg_namespace "
!         "    WHERE nspname LIKE 'pg%';\n\n",

          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_parser;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_dict;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_template;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_ts_config;\n\n",
          "INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
!         " FROM pg_collation;\n\n",
          "INSERT INTO pg_shdepend SELECT 0,0,0,0, tableoid,oid, 'p' "
!         " FROM pg_authid;\n\n",
          NULL
      };

      for (line = pg_depend_setup; *line != NULL; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * set up system views
   */
  static void
! setup_sysviews(FILE *cmdfd)
  {
      char      **line;
      char      **sysviews_setup;

      sysviews_setup = readfile(system_views_file);

      for (line = sysviews_setup; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
          free(*line);
      }

      free(sysviews_setup);
  }

  /*
   * load description data
   */
  static void
! setup_description(FILE *cmdfd)
  {
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_description ( "
                  "    objoid oid, "
                  "    classname name, "
                  "    objsubid int4, "
!                 "    description text) WITHOUT OIDS;\n\n");

!     PG_CMD_PRINTF1("COPY tmp_pg_description FROM E'%s';\n\n",
                     escape_quotes(desc_file));

      PG_CMD_PUTS("INSERT INTO pg_description "
                  " SELECT t.objoid, c.oid, t.objsubid, t.description "
                  "  FROM tmp_pg_description t, pg_class c "
!                 "    WHERE c.relname = t.classname;\n\n");

      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_shdescription ( "
                  " objoid oid, "
                  " classname name, "
!                 " description text) WITHOUT OIDS;\n\n");

!     PG_CMD_PRINTF1("COPY tmp_pg_shdescription FROM E'%s';\n\n",
                     escape_quotes(shdesc_file));

      PG_CMD_PUTS("INSERT INTO pg_shdescription "
                  " SELECT t.objoid, c.oid, t.description "
                  "  FROM tmp_pg_shdescription t, pg_class c "
!                 "   WHERE c.relname = t.classname;\n\n");

      /* Create default descriptions for operator implementation functions */
      PG_CMD_PUTS("WITH funcdescs AS ( "
*************** setup_description(void)
*** 1852,1862 ****
                  "  FROM funcdescs "
                  "  WHERE opdesc NOT LIKE 'deprecated%' AND "
                  "  NOT EXISTS (SELECT 1 FROM pg_description "
!           "    WHERE objoid = p_oid AND classoid = 'pg_proc'::regclass);\n");
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }

  #ifdef HAVE_LOCALE_T
--- 1780,1786 ----
                  "  FROM funcdescs "
                  "  WHERE opdesc NOT LIKE 'deprecated%' AND "
                  "  NOT EXISTS (SELECT 1 FROM pg_description "
!         "    WHERE objoid = p_oid AND classoid = 'pg_proc'::regclass);\n\n");
  }

  #ifdef HAVE_LOCALE_T
*************** normalize_locale_name(char *new, const c
*** 1899,1905 ****
   * populate pg_collation
   */
  static void
! setup_collation(void)
  {
  #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      int            i;
--- 1823,1829 ----
   * populate pg_collation
   */
  static void
! setup_collation(FILE *cmdfd)
  {
  #if defined(HAVE_LOCALE_T) && !defined(WIN32)
      int            i;
*************** setup_collation(void)
*** 1907,1934 ****
      char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
      int            count = 0;

-     PG_CMD_DECL;
- #endif
-
-     fputs(_("creating collations ... "), stdout);
-     fflush(stdout);
-
- #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
      locale_a_handle = popen_check("locale -a", "r");
      if (!locale_a_handle)
          return;                    /* complaint already printed */

-     PG_CMD_OPEN;
-
      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
                  "    collname name, "
                  "    locale name, "
!                 "    encoding int) WITHOUT OIDS;\n");

      while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
      {
--- 1831,1844 ----
      char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
      int            count = 0;

      locale_a_handle = popen_check("locale -a", "r");
      if (!locale_a_handle)
          return;                    /* complaint already printed */

      PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
                  "    collname name, "
                  "    locale name, "
!                 "    encoding int) WITHOUT OIDS;\n\n");

      while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
      {
*************** setup_collation(void)
*** 1988,1994 ****

          quoted_locale = escape_quotes(localebuf);

!         PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n",
                         quoted_locale, quoted_locale, enc);

          /*
--- 1898,1904 ----

          quoted_locale = escape_quotes(localebuf);

!         PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
                         quoted_locale, quoted_locale, enc);

          /*
*************** setup_collation(void)
*** 2000,2006 ****
          {
              char       *quoted_alias = escape_quotes(alias);

!             PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n",
                             quoted_alias, quoted_locale, enc);
              free(quoted_alias);
          }
--- 1910,1916 ----
          {
              char       *quoted_alias = escape_quotes(alias);

!             PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
                             quoted_alias, quoted_locale, enc);
              free(quoted_alias);
          }
*************** setup_collation(void)
*** 2008,2014 ****
      }

      /* Add an SQL-standard name */
!     PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n", PG_UTF8);

      /*
       * When copying collations to the final location, eliminate aliases that
--- 1918,1924 ----
      }

      /* Add an SQL-standard name */
!     PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);

      /*
       * When copying collations to the final location, eliminate aliases that
*************** setup_collation(void)
*** 2029,2048 ****
                  "   encoding, locale, locale "
                  "  FROM tmp_pg_collation"
                  "  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
!        "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n");

      pclose(locale_a_handle);
-     PG_CMD_CLOSE;

-     check_ok();
      if (count == 0 && !debug)
      {
          printf(_("No usable system locales were found.\n"));
          printf(_("Use the option \"--debug\" to see details.\n"));
      }
- #else                            /* not HAVE_LOCALE_T && not WIN32 */
-     printf(_("not supported on this platform\n"));
-     fflush(stdout);
  #endif   /* not HAVE_LOCALE_T  && not WIN32 */
  }

--- 1939,1953 ----
                  "   encoding, locale, locale "
                  "  FROM tmp_pg_collation"
                  "  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
!      "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n\n");

      pclose(locale_a_handle);

      if (count == 0 && !debug)
      {
          printf(_("No usable system locales were found.\n"));
          printf(_("Use the option \"--debug\" to see details.\n"));
      }
  #endif   /* not HAVE_LOCALE_T  && not WIN32 */
  }

*************** setup_collation(void)
*** 2050,2071 ****
   * load conversion functions
   */
  static void
! setup_conversion(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **conv_lines;

-     fputs(_("creating conversions ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      conv_lines = readfile(conversion_file);
      for (line = conv_lines; *line != NULL; line++)
      {
--- 1955,1965 ----
   * load conversion functions
   */
  static void
! setup_conversion(FILE *cmdfd)
  {
      char      **line;
      char      **conv_lines;

      conv_lines = readfile(conversion_file);
      for (line = conv_lines; *line != NULL; line++)
      {
*************** setup_conversion(void)
*** 2075,2109 ****
      }

      free(conv_lines);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * load extra dictionaries (Snowball stemmers)
   */
  static void
! setup_dictionary(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **conv_lines;

-     fputs(_("creating dictionaries ... "), stdout);
-     fflush(stdout);
-
-     /*
-      * We use -j here to avoid backslashing stuff
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      conv_lines = readfile(dictionary_file);
      for (line = conv_lines; *line != NULL; line++)
      {
--- 1969,1985 ----
      }

      free(conv_lines);
  }

  /*
   * load extra dictionaries (Snowball stemmers)
   */
  static void
! setup_dictionary(FILE *cmdfd)
  {
      char      **line;
      char      **conv_lines;

      conv_lines = readfile(dictionary_file);
      for (line = conv_lines; *line != NULL; line++)
      {
*************** setup_dictionary(void)
*** 2112,2121 ****
      }

      free(conv_lines);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 1988,1993 ----
*************** setup_dictionary(void)
*** 2130,2168 ****
   * set (NOT NULL).
   */
  static void
! setup_privileges(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **priv_lines;
      static char *privileges_setup[] = {
          "UPDATE pg_class "
          "  SET relacl = E'{\"=r/\\\\\"$POSTGRES_SUPERUSERNAME\\\\\"\"}' "
!         "  WHERE relkind IN ('r', 'v', 'm', 'S') AND relacl IS NULL;\n",
!         "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n",
!         "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n",
!         "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n",
          NULL
      };

-     fputs(_("setting privileges on built-in objects ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      priv_lines = replace_token(privileges_setup, "$POSTGRES_SUPERUSERNAME",
                                 escape_quotes(username));
      for (line = priv_lines; *line != NULL; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 2002,2025 ----
   * set (NOT NULL).
   */
  static void
! setup_privileges(FILE *cmdfd)
  {
      char      **line;
      char      **priv_lines;
      static char *privileges_setup[] = {
          "UPDATE pg_class "
          "  SET relacl = E'{\"=r/\\\\\"$POSTGRES_SUPERUSERNAME\\\\\"\"}' "
!         "  WHERE relkind IN ('r', 'v', 'm', 'S') AND relacl IS NULL;\n\n",
!         "GRANT USAGE ON SCHEMA pg_catalog TO PUBLIC;\n\n",
!         "GRANT CREATE, USAGE ON SCHEMA public TO PUBLIC;\n\n",
!         "REVOKE ALL ON pg_largeobject FROM PUBLIC;\n\n",
          NULL
      };

      priv_lines = replace_token(privileges_setup, "$POSTGRES_SUPERUSERNAME",
                                 escape_quotes(username));
      for (line = priv_lines; *line != NULL; line++)
          PG_CMD_PUTS(*line);
  }

  /*
*************** set_info_version(void)
*** 2197,2223 ****
   * load info schema and populate from features file
   */
  static void
! setup_schema(void)
  {
-     PG_CMD_DECL;
      char      **line;
      char      **lines;

-     fputs(_("creating information schema ... "), stdout);
-     fflush(stdout);
-
      lines = readfile(info_schema_file);

-     /*
-      * We use -j here to avoid backslashing stuff in information_schema.sql
-      */
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s -j template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = lines; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
--- 2054,2066 ----
   * load info schema and populate from features file
   */
  static void
! setup_schema(FILE *cmdfd)
  {
      char      **line;
      char      **lines;

      lines = readfile(info_schema_file);

      for (line = lines; *line != NULL; line++)
      {
          PG_CMD_PUTS(*line);
*************** setup_schema(void)
*** 2226,2390 ****

      free(lines);

-     PG_CMD_CLOSE;
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      PG_CMD_PRINTF1("UPDATE information_schema.sql_implementation_info "
                     "  SET character_value = '%s' "
!                    "  WHERE implementation_info_name = 'DBMS VERSION';\n",
                     infoversion);

      PG_CMD_PRINTF1("COPY information_schema.sql_features "
                     "  (feature_id, feature_name, sub_feature_id, "
                     "  sub_feature_name, is_supported, comments) "
!                    " FROM E'%s';\n",
                     escape_quotes(features_file));
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * load PL/pgsql server-side language
   */
  static void
! load_plpgsql(void)
  {
!     PG_CMD_DECL;
!
!     fputs(_("loading PL/pgSQL server-side language ... "), stdout);
!     fflush(stdout);
!
!     snprintf(cmd, sizeof(cmd),
!              "\"%s\" %s template1 >%s",
!              backend_exec, backend_options,
!              DEVNULL);
!
!     PG_CMD_OPEN;
!
!     PG_CMD_PUTS("CREATE EXTENSION plpgsql;\n");
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }

  /*
   * clean everything up in template1
   */
  static void
! vacuum_db(void)
  {
-     PG_CMD_DECL;
-
-     fputs(_("vacuuming database template1 ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      /* Run analyze before VACUUM so the statistics are frozen. */
!     PG_CMD_PUTS("ANALYZE;\nVACUUM FREEZE;\n");
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }

  /*
   * copy template1 to template0
   */
  static void
! make_template0(void)
  {
!     PG_CMD_DECL;
!     const char **line;
!     static const char *template0_setup[] = {
!         "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n",

          /*
           * We use the OID of template0 to determine lastsysoid
           */
          "UPDATE pg_database SET datlastsysoid = "
          "    (SELECT oid FROM pg_database "
!         "    WHERE datname = 'template0');\n",

          /*
           * Explicitly revoke public create-schema and create-temp-table
           * privileges in template1 and template0; else the latter would be on
           * by default
           */
!         "REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n",
!         "REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n",

!         "COMMENT ON DATABASE template0 IS 'unmodifiable empty database';\n",

          /*
           * Finally vacuum to clean up dead rows in pg_database
           */
!         "VACUUM FULL pg_database;\n",
          NULL
      };

-     fputs(_("copying template1 to template0 ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = template0_setup; *line; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
   * copy template1 to postgres
   */
  static void
! make_postgres(void)
  {
!     PG_CMD_DECL;
!     const char **line;
!     static const char *postgres_setup[] = {
!         "CREATE DATABASE postgres;\n",
!         "COMMENT ON DATABASE postgres IS 'default administrative connection database';\n",
          NULL
      };

-     fputs(_("copying template1 to postgres ... "), stdout);
-     fflush(stdout);
-
-     snprintf(cmd, sizeof(cmd),
-              "\"%s\" %s template1 >%s",
-              backend_exec, backend_options,
-              DEVNULL);
-
-     PG_CMD_OPEN;
-
      for (line = postgres_setup; *line; line++)
          PG_CMD_PUTS(*line);
-
-     PG_CMD_CLOSE;
-
-     check_ok();
  }

  /*
--- 2069,2158 ----

      free(lines);

      PG_CMD_PRINTF1("UPDATE information_schema.sql_implementation_info "
                     "  SET character_value = '%s' "
!                    "  WHERE implementation_info_name = 'DBMS VERSION';\n\n",
                     infoversion);

      PG_CMD_PRINTF1("COPY information_schema.sql_features "
                     "  (feature_id, feature_name, sub_feature_id, "
                     "  sub_feature_name, is_supported, comments) "
!                    " FROM E'%s';\n\n",
                     escape_quotes(features_file));
  }

  /*
   * load PL/pgsql server-side language
   */
  static void
! load_plpgsql(FILE *cmdfd)
  {
!     PG_CMD_PUTS("CREATE EXTENSION plpgsql;\n\n");
  }

  /*
   * clean everything up in template1
   */
  static void
! vacuum_db(FILE *cmdfd)
  {
      /* Run analyze before VACUUM so the statistics are frozen. */
!     PG_CMD_PUTS("ANALYZE;\n\nVACUUM FREEZE;\n\n");
  }

  /*
   * copy template1 to template0
   */
  static void
! make_template0(FILE *cmdfd)
  {
!     const char *const * line;
!     static const char *const template0_setup[] = {
!         "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",

          /*
           * We use the OID of template0 to determine lastsysoid
           */
          "UPDATE pg_database SET datlastsysoid = "
          "    (SELECT oid FROM pg_database "
!         "    WHERE datname = 'template0');\n\n",

          /*
           * Explicitly revoke public create-schema and create-temp-table
           * privileges in template1 and template0; else the latter would be on
           * by default
           */
!         "REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n\n",
!         "REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n\n",

!         "COMMENT ON DATABASE template0 IS 'unmodifiable empty database';\n\n",

          /*
           * Finally vacuum to clean up dead rows in pg_database
           */
!         "VACUUM FULL pg_database;\n\n",
          NULL
      };

      for (line = template0_setup; *line; line++)
          PG_CMD_PUTS(*line);
  }

  /*
   * copy template1 to postgres
   */
  static void
! make_postgres(FILE *cmdfd)
  {
!     const char *const * line;
!     static const char *const postgres_setup[] = {
!         "CREATE DATABASE postgres;\n\n",
!         "COMMENT ON DATABASE postgres IS 'default administrative connection database';\n\n",
          NULL
      };

      for (line = postgres_setup; *line; line++)
          PG_CMD_PUTS(*line);
  }

  /*
*************** check_authmethod_unspecified(const char
*** 2794,2802 ****
  }

  static void
! check_authmethod_valid(const char *authmethod, const char **valid_methods, const char *conntype)
  {
!     const char **p;

      for (p = valid_methods; *p; p++)
      {
--- 2562,2570 ----
  }

  static void
! check_authmethod_valid(const char *authmethod, const char *const * valid_methods, const char *conntype)
  {
!     const char *const * p;

      for (p = valid_methods; *p; p++)
      {
*************** warn_on_mount_point(int error)
*** 3303,3308 ****
--- 3071,3077 ----
  void
  initialize_data_directory(void)
  {
+     PG_CMD_DECL;
      int            i;

      setup_signals();
*************** initialize_data_directory(void)
*** 3343,3377 ****
       */
      write_version_file("base/1");

!     /* Create the stuff we don't need to use bootstrap mode for */

!     setup_auth();
      if (pwprompt || pwfilename)
!         get_set_pwd();

!     setup_depend();

!     setup_sysviews();

!     setup_description();

!     setup_collation();

!     setup_conversion();

!     setup_dictionary();

!     setup_privileges();

!     setup_schema();

!     load_plpgsql();

!     vacuum_db();

!     make_template0();

!     make_postgres();
  }


--- 3112,3162 ----
       */
      write_version_file("base/1");

!     /*
!      * Create the stuff we don't need to use bootstrap mode for, using a
!      * backend running in simple standalone mode.
!      */
!     fputs(_("performing post-bootstrap initialization ... "), stdout);
!     fflush(stdout);

!     snprintf(cmd, sizeof(cmd),
!              "\"%s\" %s template1 >%s",
!              backend_exec, backend_options,
!              DEVNULL);
!
!     PG_CMD_OPEN;
!
!     setup_auth(cmdfd);
      if (pwprompt || pwfilename)
!         get_set_pwd(cmdfd);

!     setup_depend(cmdfd);

!     setup_sysviews(cmdfd);

!     setup_description(cmdfd);

!     setup_collation(cmdfd);

!     setup_conversion(cmdfd);

!     setup_dictionary(cmdfd);

!     setup_privileges(cmdfd);

!     setup_schema(cmdfd);

!     load_plpgsql(cmdfd);

!     vacuum_db(cmdfd);

!     make_template0(cmdfd);

!     make_postgres(cmdfd);
!
!     PG_CMD_CLOSE;
!
!     check_ok();
  }


diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index f955725..40e06f6 100644
*** a/src/tools/msvc/Install.pm
--- b/src/tools/msvc/Install.pm
*************** sub GenerateConversionScript
*** 365,371 ****
          $sql .=
  "CREATE DEFAULT CONVERSION pg_catalog.$name FOR '$se' TO '$de' FROM $func;\n";
          $sql .=
! "COMMENT ON CONVERSION pg_catalog.$name IS 'conversion for $se to $de';\n";
      }
      open($F, ">$target/share/conversion_create.sql")
        || die "Could not write to conversion_create.sql\n";
--- 365,371 ----
          $sql .=
  "CREATE DEFAULT CONVERSION pg_catalog.$name FOR '$se' TO '$de' FROM $func;\n";
          $sql .=
! "COMMENT ON CONVERSION pg_catalog.$name IS 'conversion for $se to $de';\n\n";
      }
      open($F, ">$target/share/conversion_create.sql")
        || die "Could not write to conversion_create.sql\n";