Thread: Move bki file pre-processing from initdb to bootstrap
Hi All, This patch moves the pre-processing for tokens in the bki file from initdb to bootstrap. With these changes the bki file will only be opened once in bootstrap and parsing will be done by the bootstrap parser. The flow of bki file processing will be as follows: - In initdb gather the values used to replace the tokens in the bki file. - Pass these values into postgres bootstrap startup using '-i' option as key-value pairs. - In bootstrap open the bki file (the bki file name was received as a parameter). - During the parsing of the bki file, replace the tokens received as parameters with their values. Related discussion can be found here: https://www.postgresql.org/message-id/20220216021219.ygzrtb3hd5bn7olz%40alap3.anarazel.de Note: Currently the patch breaks on windows due to placement of extra quotes when passing parameters (Thanks to Thomas Munro for helping me find that). Will follow up with v2 fixing the windows issues on passing the parameters and format fixes. Please review and provide feedback. -- Thanks and Regards, Krishnakumar (KK). [Microsoft]
Attachment
Krishnakumar R <kksrcv001@gmail.com> writes: > This patch moves the pre-processing for tokens in the bki file from > initdb to bootstrap. With these changes the bki file will only be > opened once in bootstrap and parsing will be done by the bootstrap > parser. You haven't provided any iota of evidence why this would be an improvement. regards, tom lane
On 01.09.23 14:37, Tom Lane wrote: > Krishnakumar R <kksrcv001@gmail.com> writes: >> This patch moves the pre-processing for tokens in the bki file from >> initdb to bootstrap. With these changes the bki file will only be >> opened once in bootstrap and parsing will be done by the bootstrap >> parser. > > You haven't provided any iota of evidence why this would be an > improvement. I had played with similar ideas in the past, because it would shave some time of initdb, which would accumulate noticeably over a full test run. But now with the initdb caching mechanism, I wonder whether this is still needed.
Hi, On 2023-09-01 14:59:57 +0200, Peter Eisentraut wrote: > On 01.09.23 14:37, Tom Lane wrote: > > Krishnakumar R <kksrcv001@gmail.com> writes: > > > This patch moves the pre-processing for tokens in the bki file from > > > initdb to bootstrap. With these changes the bki file will only be > > > opened once in bootstrap and parsing will be done by the bootstrap > > > parser. > > > > You haven't provided any iota of evidence why this would be an > > improvement. > > I had played with similar ideas in the past, because it would shave some > time of initdb, which would accumulate noticeably over a full test run. > > But now with the initdb caching mechanism, I wonder whether this is still > needed. I think it's still relevant - it's not just our own test infrastructure that runs a lot of initdbs, it's also lots of projects using postgres. The main reason I'd like to move this infrastructure to the backend is that I really would like to get rid of single user mode. It adds complications all over, it's barely tested, pointlessly hard to use. I wrote a rough prototype of that a while back: https://postgr.es/m/20220220214439.bhc35hhbaub6dush%40alap3.anarazel.de Greetings, Andres Freund
On 01.09.23 10:01, Krishnakumar R wrote: > This patch moves the pre-processing for tokens in the bki file from > initdb to bootstrap. With these changes the bki file will only be > opened once in bootstrap and parsing will be done by the bootstrap > parser. I did some rough performance tests on this. I get about a 10% improvement on initdb run time, so this appears to have merit. I wonder whether we can reduce the number of symbols that we need this treatment for. For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER, FLOAT8PASSBYVAL, these are known at build time, so we could have genbki.pl substitute them at build time. The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder whether we can eliminate the need for them. Right now, these are only used in the bki entry for the template1 database. How about initdb creates template0 first, with hardcoded default encoding, collation, etc., and then creates template1 from that, using the normal CREATE DATABASE command with the appropriate options. Or initdb could just run an UPDATE on pg_database to put the right settings in place. I don't like this part so much, because it adds like 4 more places each of these variables is mentioned, which increases the mental and testing overhead for dealing with these features (which are an area of active development). In general, it would be good if this could be factored a bit more so each variable doesn't have to be hardcoded in so many places. Some more detailed comments on the code: + boot_yylval.str = pstrdup(yytext); + sprintf(boot_yylval.str, "%d", NAMEDATALEN); This is weird. You are first assigning the text and then overwriting it with the numeric value? You can also use boot_yylval.ival for storing numbers. + if (bootp_null(ebootp, ebootp->username)) return NULLVAL; Add proper line breaks in the code. +bool bootp_null(extra_bootstrap_params *e, char *s) Add a comment what this function is supposed to do. This function could be static. + while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1) You should use an option letter that isn't already in use in one of the other modes of "postgres". We try to keep those consistent. New options should be added to the --help output (usage() in main.c). + elog(INFO, "Open bki file %s\n", bki_file); + boot_yyin = fopen(bki_file, "r"); Why is this needed? It already reads the bki file from stdin? + printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i %s=%s,%s=%s,%s=%s," + "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c", + backend_exec, + wal_segment_size_mb * (1024 * 1024), + boot_options, extra_options, + data_checksums ? "-k" : "", + debug ? "-d 5" : "", This appears to undo some of the changes done in cccdbc5d95. +#define BOOT_LC_COLLATE "lc_collate" +#define BOOT_LC_CTYPE "lc_ctype" +#define BOOT_ICU_LOCALE "icu_locale" etc. This doesn't look particularly useful. You can just use the strings directly.
Thank you, Peter, Andres and Tom for your comments and thoughts. Hi Peter, > For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER, > FLOAT8PASSBYVAL, these are known at build time, so we could have > genbki.pl substitute them at build time. I have modified the patch to use genbki to generate these ones during build time. > The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder > whether we can eliminate the need for them. Right now, these are only > used in the bki entry for the template1 database. How about initdb > creates template0 first, with hardcoded default encoding, collation, > etc., and then creates template1 from that, using the normal CREATE > DATABASE command with the appropriate options. Or initdb could just run > an UPDATE on pg_database to put the right settings in place. Using a combination of this suggestion and discussions Andres pointed to in this thread, updated the patch to add placeholder values first into template1 and then do UPDATEs in initdb itself. > You should use an option letter that isn't already in use in one of the > other modes of "postgres". We try to keep those consistent. > > New options should be added to the --help output (usage() in main.c). Used a -b option under bootstrap mode and added help. > elog(INFO, "Open bki file %s\n", bki_file); > + boot_yyin = fopen(bki_file, "r"); > > Why is this needed? It already reads the bki file from stdin? We no longer open the bki file in initdb and pass to postgres to parse from stdin, instead we open the bki file directly in bootstrap and pass the file stream to the parser. Hence the need to switch the yyin. Have added a comment in the commit logs to capture this. The version comparison has been moved from initdb to bootstrap. This created some compatibility problems with windows tests. For now I kept the version check to not have \n added, which worked fine and serves the purpose. However hoping to have something better in v3 in addition to addressing any other comments. Please let me know your thoughts and review comments. -- Thanks and Regards, Krishnakumar (KK). [Microsoft] On Tue, Sep 19, 2023 at 3:18 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 01.09.23 10:01, Krishnakumar R wrote: > > This patch moves the pre-processing for tokens in the bki file from > > initdb to bootstrap. With these changes the bki file will only be > > opened once in bootstrap and parsing will be done by the bootstrap > > parser. > > I did some rough performance tests on this. I get about a 10% > improvement on initdb run time, so this appears to have merit. > > I wonder whether we can reduce the number of symbols that we need this > treatment for. > > For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER, > FLOAT8PASSBYVAL, these are known at build time, so we could have > genbki.pl substitute them at build time. > > The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder > whether we can eliminate the need for them. Right now, these are only > used in the bki entry for the template1 database. How about initdb > creates template0 first, with hardcoded default encoding, collation, > etc., and then creates template1 from that, using the normal CREATE > DATABASE command with the appropriate options. Or initdb could just run > an UPDATE on pg_database to put the right settings in place. > > I don't like this part so much, because it adds like 4 more places each > of these variables is mentioned, which increases the mental and testing > overhead for dealing with these features (which are an area of active > development). > > In general, it would be good if this could be factored a bit more so > each variable doesn't have to be hardcoded in so many places. > > > Some more detailed comments on the code: > > + boot_yylval.str = pstrdup(yytext); > + sprintf(boot_yylval.str, "%d", NAMEDATALEN); > > This is weird. You are first assigning the text and then overwriting it > with the numeric value? > > You can also use boot_yylval.ival for storing numbers. > > + if (bootp_null(ebootp, ebootp->username)) return > NULLVAL; > > Add proper line breaks in the code. > > +bool bootp_null(extra_bootstrap_params *e, char *s) > > Add a comment what this function is supposed to do. > > This function could be static. > > + while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1) > > You should use an option letter that isn't already in use in one of the > other modes of "postgres". We try to keep those consistent. > > New options should be added to the --help output (usage() in main.c). > > + elog(INFO, "Open bki file %s\n", bki_file); > + boot_yyin = fopen(bki_file, "r"); > > Why is this needed? It already reads the bki file from stdin? > > + printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i > %s=%s,%s=%s,%s=%s," > + "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c", > + backend_exec, > + wal_segment_size_mb * (1024 * 1024), > + boot_options, extra_options, > + data_checksums ? "-k" : "", > + debug ? "-d 5" : "", > > This appears to undo some of the changes done in cccdbc5d95. > > +#define BOOT_LC_COLLATE "lc_collate" > +#define BOOT_LC_CTYPE "lc_ctype" > +#define BOOT_ICU_LOCALE "icu_locale" > > etc. This doesn't look particularly useful. You can just use the > strings directly. >
Attachment
> The version comparison has been moved from initdb to bootstrap. This > created some compatibility problems with windows tests. For now I kept > the version check to not have \n added, which worked fine and serves > the purpose. However hoping to have something better in v3 in addition > to addressing any other comments. With help from Thomas, figured out that on windows fopen uses binary mode in the backend which causes issues with EOL. Please find the attached patch updated with a fix for this. -- Thanks and Regards, Krishnakumar (KK). [Microsoft]
Attachment
On 06.10.23 02:24, Krishnakumar R wrote: >> elog(INFO, "Open bki file %s\n", bki_file); >> + boot_yyin = fopen(bki_file, "r"); >> >> Why is this needed? It already reads the bki file from stdin? > We no longer open the bki file in initdb and pass to postgres to parse > from stdin, instead we open the bki file directly in bootstrap and > pass the file stream to the parser. Hence the need to switch the yyin. > Have added a comment in the commit logs to capture this. Why this change? I mean, there is nothing wrong with it, but I don't follow how changing from reading from stdin to reading from a named file is related to moving the parameter substitution from initdb to the backend. One effect of this is that we would now have two different ways initdb interacts with the backend. In bootstrap mode, it reads from a named file, and the second run (the one that loads the system views etc.) reads from stdin. It's already confusing enough, so any further divergence should be adequately explained.
On 17.10.23 03:32, Krishnakumar R wrote: >> The version comparison has been moved from initdb to bootstrap. This >> created some compatibility problems with windows tests. For now I kept >> the version check to not have \n added, which worked fine and serves >> the purpose. However hoping to have something better in v3 in addition >> to addressing any other comments. > > With help from Thomas, figured out that on windows fopen uses binary > mode in the backend which causes issues with EOL. Please find the > attached patch updated with a fix for this. I suggest that this patch set be split up into three incremental parts: 1. Move some build-time settings from initdb to postgres.bki. 2. The database locale handling. 3. The bki file handling. Each of these topics really needs a separate detailed consideration.
Thank you for review, Peter.
Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back shortly.
On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.10.23 03:32, Krishnakumar R wrote:
>> The version comparison has been moved from initdb to bootstrap. This
>> created some compatibility problems with windows tests. For now I kept
>> the version check to not have \n added, which worked fine and serves
>> the purpose. However hoping to have something better in v3 in addition
>> to addressing any other comments.
>
> With help from Thomas, figured out that on windows fopen uses binary
> mode in the backend which causes issues with EOL. Please find the
> attached patch updated with a fix for this.
I suggest that this patch set be split up into three incremental parts:
1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.
Each of these topics really needs a separate detailed consideration.
On Sat, 11 Nov 2023 at 00:03, Krishnakumar R <kksrcv001@gmail.com> wrote: > > Thank you for review, Peter. > > Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back shortly. > > On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> On 17.10.23 03:32, Krishnakumar R wrote: >> >> The version comparison has been moved from initdb to bootstrap. This >> >> created some compatibility problems with windows tests. For now I kept >> >> the version check to not have \n added, which worked fine and serves >> >> the purpose. However hoping to have something better in v3 in addition >> >> to addressing any other comments. >> > >> > With help from Thomas, figured out that on windows fopen uses binary >> > mode in the backend which causes issues with EOL. Please find the >> > attached patch updated with a fix for this. >> >> I suggest that this patch set be split up into three incremental parts: >> >> 1. Move some build-time settings from initdb to postgres.bki. >> 2. The database locale handling. >> 3. The bki file handling. >> >> Each of these topics really needs a separate detailed consideration. The patch which you submitted has been awaiting your attention for quite some time now. As such, we have moved it to "Returned with Feedback" and removed it from the reviewing queue. Depending on timing, this may be reversible. Kindly address the feedback you have received, and resubmit the patch to the next CommitFest. Regards, Vignesh