initdb / bootstrap design - Mailing list pgsql-hackers

From Andres Freund
Subject initdb / bootstrap design
Date
Msg-id 20220216021219.ygzrtb3hd5bn7olz@alap3.anarazel.de
Whole thread Raw
Responses Re: initdb / bootstrap design
Re: initdb / bootstrap design
Re: initdb / bootstrap design
List pgsql-hackers
Hi,

[1] reminded me of a topic that I wanted to bring up at some point:

To me the division of labor between initdb and bootstrap doesn't make much
sense anymore:


initdb reads postgres.bki, replaces a few tokens, starts postgres in bootstrap
mode, and then painstakenly feeds bootstrap.bki lines to the server.

Given that bootstrap mode parsing is a dedicated parser, only invoked from a
single point, what's the point of initdb doing the preprocessing and then
incurring pipe overhead?

Sure, there's a few tokens that we replace in initdb. As it turns out there's
only two rows that are actually variable. The username of the initial
superuser in pg_authid and the pg_database row for template 1, where encoding,
lc_collate and lc_ctype varies. The rest is all compile time constant
replacements we could do as part of genbki.pl.

It seems we could save a good number of context switches by opening
postgres.bki just before boot_yyparse() in BootstrapModeMain() and having the
parser read it.  The pg_authid / pg_database rows we could just do via
explicit insertions in BootstrapModeMain(), provided by commandline args?


Similarly, since the introduction of extensions at the latest, the server
knows how to execute SQL from a file. Why don't we just process
information_schema.sql, system_views.sql et al that way?


If we don't need a dedicated "input" mode feeding boot_yyparse() in bootstrap
mode anymore (because bootstrap mode feeds it from postgres.bki directly), we
likely could avoid the restart between bootstrap and single user mode. Afaics
that only really is needed because we need to send SQL after
bootstrap_template1(). That'd likely be a nice speedup, because we don't need
to write the bootstrap contents from shared buffers to the OS just to read
them back in single user mode.


I don't plan to work on this immediately, but I thought it's worth bringing up
anyway.

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20220216012953.6d7bzmsblqou3ru4%40alap3.anarazel.de



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: USE_BARRIER_SMGRRELEASE on Linux?
Next
From: John Naylor
Date:
Subject: Re: Mark all GUC variable as PGDLLIMPORT