Thread: pgbench - extend initialization phase control
Hello devs, the attached patch adds some more control on the initialization phase. In particular, ( and ) allow to begin/commit explicitely, and G generates the data server-side instead of client side, which might be a good idea depending on the available bandwidth. Together with the previously submitted patch about getting stats on the initialization phase, the idea is to possibly improve this phase, or use it as a benchmark tool in itself. -- Fabien.
Attachment
Does both client/server side data generation in a single command make sense?
Hello Ibrar, > Does both client/server side data generation in a single command make > sense? I think yes, especially with the other patch which adds timing measures to the initialization phases. It really depends what you want to test. With client-side generation you test the libpq COPY interface and network protocol for bulk loading. With server-side generation you are get the final result faster when network bandwidth is low, and somehow you are testing a different kind of small query which generates a lot of data. -- Fabien.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: not tested Other than that, the patch looks good to me. The new status of this patch is: Ready for Committer
Hello Ibrar, > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation: not tested > > Other than that, the patch looks good to me. > > The new status of this patch is: Ready for Committer Thanks for the review. Attached v2 is a rebase after ce8f9467. -- Fabien.
Attachment
Hello Fabien, > ---------- Forwarded message --------- > From: Fabien COELHO <coelho@cri.ensmp.fr> > Date: Tue, Jul 16, 2019 at 4:58 PM > Subject: Re: pgbench - extend initialization phase control > To: Ibrar Ahmed <ibrar.ahmad@gmail.com> > Cc: PostgreSQL Developers <pgsql-hackers@lists.postgresql.org> > > > > Hello Ibrar, > >> The following review has been posted through the commitfest >> application: >> make installcheck-world: tested, passed >> Implements feature: tested, passed >> Spec compliant: tested, passed >> Documentation: not tested >> >> Other than that, the patch looks good to me. >> >> The new status of this patch is: Ready for Committer > > Thanks for the review. > > Attached v2 is a rebase after ce8f9467. Thanks for your new patch. But I failed to apply it. Please rebase it against HEAD. Regards, --------- Anna
>> Attached v2 is a rebase after ce8f9467. Here is rebase v3. -- Fabien.
Attachment
> > Here is rebase v3. Hi, Thanks for your new patch. Failed regression test. It's necessary to change the first a in “allowed step characters are” to uppercase A in the regression test of 002_pgbench_no_server.pl. The behavior of "g" is different between v12 and the patche, and backward compatibility is lost. In v12, BEGIN and COMMIT are specified only by choosing "g". It's a problem that backward compatibility is lost. When using ( and ) with the -I, the documentation should indicate that double quotes are required, and "v" not be able to enclose in ( and ). Regards, -- Anna
Hi, When g is specified, null is inserted in the filler column of pgbentch_tellrs, acounts, branches. But when G is specified, empty string is inserted. Do you have any intention of this difference? -- Anna
Hello, > Failed regression test. It's necessary to change the first a in “allowed > step characters are” to uppercase A in the regression test of > 002_pgbench_no_server.pl. Argh. I think I ran the test, then stupidly updated the message afterwards to better match best practices, without rechecking:-( > The behavior of "g" is different between v12 and the patche, and > backward compatibility is lost. In v12, BEGIN and COMMIT are specified > only by choosing "g". It's a problem that backward compatibility is > lost. Somehow yes, but I do not see this as an actual problem from a functional point of view: it just means that if you use a 'dtgvp' with the newer version and if the inserts were to fail, then they are not under an explicit transaction, so previous inserts are not cleaned up. However, this is a pretty unlikely case, and anyway the error is reported, so any user would be expected not to go on after the initialization phase. So basically I do not see the very small regression for an unlikely corner case to induce any problem in practice. The benefit of controlling where begin/end actually occur is that it may have an impact on performance, and it allows to check that. > When using ( and ) with the -I, the documentation should indicate that double > quotes are required, Or single quotes, or backslash, if launch from the command line. I added a mention of escaping or protection in the doc in that case. > and "v" not be able to enclose in ( and ). That is a postgresql limitation, which may evolve. There could be others. I updated the doc to say that some commands may not work inside an explicit transaction. > When g is specified, null is inserted in the filler column of > pgbentch_tellrs, acounts, branches. But when G is specified, empty > string is inserted. Indeed there is a small diff. ISTM that the actual filling with the initial client version is NULL for branches and tellers, and a blank-padded string for accounts. I fixed the patch so that the end-result is the same with both g and G. > Do you have any intention of this difference? Yes and no. I intended that tellers & branches filler are filled, but I did not really notice that the client side was implicitely using NULL, although it says so in a comment. Although I'm not happy with the fact because it cheats with the benchmark design which requires the filler columns to be really filled and stored as is, it is indeed the place to change this (bad) behavior. Attached a v4 with the updates described above. -- Fabien.
Attachment
> Attached a v4 with the updates described above. Hi, Thanks for updating the patch. All tests are passed. There is no problem in operation. -- Anna
On Thu, Oct 17, 2019 at 8:09 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello, > > > Failed regression test. It's necessary to change the first a in “allowed > > step characters are” to uppercase A in the regression test of > > 002_pgbench_no_server.pl. > > Argh. I think I ran the test, then stupidly updated the message afterwards > to better match best practices, without rechecking:-( > > > The behavior of "g" is different between v12 and the patche, and > > backward compatibility is lost. In v12, BEGIN and COMMIT are specified > > only by choosing "g". It's a problem that backward compatibility is > > lost. > > Somehow yes, but I do not see this as an actual problem from a functional > point of view: it just means that if you use a 'dtgvp' with the newer > version and if the inserts were to fail, then they are not under an > explicit transaction, so previous inserts are not cleaned up. However, > this is a pretty unlikely case, and anyway the error is reported, so any > user would be expected not to go on after the initialization phase. > > So basically I do not see the very small regression for an unlikely corner > case to induce any problem in practice. > > The benefit of controlling where begin/end actually occur is that it may > have an impact on performance, and it allows to check that. I still fail to understand the benefit of addition of () settings. Could you clarify what case () settings are useful for? You are thinking to execute all initialization SQL statements within single transaction, e.g., -I (dtgp), for some reasons? > > When using ( and ) with the -I, the documentation should indicate that double > > quotes are required, > > Or single quotes, or backslash, if launch from the command line. I added a > mention of escaping or protection in the doc in that case. What about using, for example, b (BEGIN) and c (COMMIT) instead to avoid such restriction? > > and "v" not be able to enclose in ( and ). > > That is a postgresql limitation, which may evolve. There could be others. > I updated the doc to say that some commands may not work inside an > explicit transaction. I think that it's better to check whehter "v" is enclosed with () or not at the beginning of pgbench, and report an error if it is. Otherwise, if -I (dtgv) is specified, pgbench reports an error after time-consuming data generation is performed, and of course that data generation is rollbacked. Regards, -- Fujii Masao
Hello Masao-san, >> The benefit of controlling where begin/end actually occur is that it may >> have an impact on performance, and it allows to check that. > > I still fail to understand the benefit of addition of () settings. > Could you clarify what case () settings are useful for? You are > thinking to execute all initialization SQL statements within > single transaction, e.g., -I (dtgp), for some reasons? Yep. Or anything else, including without (), to allow checking the performance impact or non impact of transactions on the initialization phase. >>> When using ( and ) with the -I, the documentation should indicate that double >>> quotes are required, >> >> Or single quotes, or backslash, if launch from the command line. I added a >> mention of escaping or protection in the doc in that case. > > What about using, for example, b (BEGIN) and c (COMMIT) instead > to avoid such restriction? It is indeed possible. Using a open/close symmetric character ( (), {}, []) looks more pleasing and allows to see easily whether everything is properly closed. I switched to {} which does not generate the same quoting issue in shell. > I think that it's better to check whehter "v" is enclosed with () or not > at the beginning of pgbench, and report an error if it is. > > Otherwise, if -I (dtgv) is specified, pgbench reports an error after > time-consuming data generation is performed, and of course that data > generation is rollbacked. Patch v5 attached added a check for v inside (), although I'm not keen on putting it there, and uses {} instead of (). -- Fabien.
Attachment
On Thu, Oct 24, 2019 at 9:16 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masao-san, > > >> The benefit of controlling where begin/end actually occur is that it may > >> have an impact on performance, and it allows to check that. > > > > I still fail to understand the benefit of addition of () settings. > > Could you clarify what case () settings are useful for? You are > > thinking to execute all initialization SQL statements within > > single transaction, e.g., -I (dtgp), for some reasons? > > Yep. Or anything else, including without (), to allow checking the > performance impact or non impact of transactions on the initialization > phase. Is there actually such performance impact? AFAIR most time-consuming part in initialization phase is the generation of pgbench_accounts data. This part is performed within single transaction whether () are specified or not. No? So I'm not sure how () are useful to check performance impact in init phase. Maybe I'm missing something... > >>> When using ( and ) with the -I, the documentation should indicate that double > >>> quotes are required, > >> > >> Or single quotes, or backslash, if launch from the command line. I added a > >> mention of escaping or protection in the doc in that case. > > > > What about using, for example, b (BEGIN) and c (COMMIT) instead > > to avoid such restriction? > > It is indeed possible. Using a open/close symmetric character ( (), {}, > []) looks more pleasing and allows to see easily whether everything is > properly closed. I switched to {} which does not generate the same quoting > issue in shell. > > > I think that it's better to check whehter "v" is enclosed with () or not > > at the beginning of pgbench, and report an error if it is. > > > > Otherwise, if -I (dtgv) is specified, pgbench reports an error after > > time-consuming data generation is performed, and of course that data > > generation is rollbacked. > > Patch v5 attached added a check for v inside (), although I'm not keen on > putting it there, and uses {} instead of (). Thanks for updating the patch! Regards, -- Fujii Masao
Hello, >> Yep. Or anything else, including without (), to allow checking the >> performance impact or non impact of transactions on the initialization >> phase. > > Is there actually such performance impact? AFAIR most time-consuming part in > initialization phase is the generation of pgbench_accounts data. Maybe. If you cannot check, you can only guess. Probably it should be small, but the current version does not allow to check whether it is so. -- Fabien.
On Fri, Oct 25, 2019 at 12:06 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello, > > >> Yep. Or anything else, including without (), to allow checking the > >> performance impact or non impact of transactions on the initialization > >> phase. > > > > Is there actually such performance impact? AFAIR most time-consuming part in > > initialization phase is the generation of pgbench_accounts data. > > Maybe. If you cannot check, you can only guess. Probably it should be > small, but the current version does not allow to check whether it is so. Could you elaborate what you actually want to measure the performance impact by adding explicit begin and commit? Currently pgbench -i issues the following queries. The data generation part is already executed within single transaction. You want to execute not only data generation but also drop/creation of tables within single transaction, and measure how much performance impact happens? I'm sure that would be negligible. Or you want to execute data generate in multiple transactions, i.e., execute each statement for data generation (e.g., one INSERT) in single transaction, and then want to measure the performance impact? But the patch doesn't enable us to do such data generation yet. So I'm thinking that it's maybe better to commit the addtion of "G" option first separately. And then we can discuss how much "(" and ")" options are useful later. ------------------------------------------ drop table if exists pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers create table pgbench_history(tid int,bid int,aid int,delta int,mtime timestamp,filler char(22)) create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=100) create table pgbench_accounts(aid int not null,bid int,abalance int,filler char(84)) with (fillfactor=100) create table pgbench_branches(bid int not null,bbalance int,filler char(88)) with (fillfactor=100) begin truncate table pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers insert into pgbench_branches(bid,bbalance) values(1,0) insert into pgbench_tellers(tid,bid,tbalance) values (1,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (2,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (3,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (4,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (5,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (6,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (7,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (8,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (9,1,0) insert into pgbench_tellers(tid,bid,tbalance) values (10,1,0) copy pgbench_accounts from stdin commit vacuum analyze pgbench_branches vacuum analyze pgbench_tellers vacuum analyze pgbench_accounts vacuum analyze pgbench_history alter table pgbench_branches add primary key (bid) alter table pgbench_tellers add primary key (tid) alter table pgbench_accounts add primary key (aid) ------------------------------------------ Regards, -- Fujii Masao
Hello Masao-san, >> Maybe. If you cannot check, you can only guess. Probably it should be >> small, but the current version does not allow to check whether it is so. > > Could you elaborate what you actually want to measure the performance > impact by adding explicit begin and commit? Currently pgbench -i issues > the following queries. The data generation part is already executed within > single transaction. You want to execute not only data generation but also > drop/creation of tables within single transaction, and measure how much > performance impact happens? I'm sure that would be negligible. > Or you want to execute data generate in multiple transactions, i.e., > execute each statement for data generation (e.g., one INSERT) in single > transaction, and then want to measure the performance impact? > But the patch doesn't enable us to do such data generation yet. Indeed, you cannot do this precise thing, but you can do others. > So I'm thinking that it's maybe better to commit the addtion of "G" option > first separately. And then we can discuss how much "(" and ")" options > are useful later. Attached patch v6 only provides G - server side data generation. -- Fabien.
Attachment
On Mon, Oct 28, 2019 at 10:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masao-san, > > >> Maybe. If you cannot check, you can only guess. Probably it should be > >> small, but the current version does not allow to check whether it is so. > > > > Could you elaborate what you actually want to measure the performance > > impact by adding explicit begin and commit? Currently pgbench -i issues > > the following queries. The data generation part is already executed within > > single transaction. You want to execute not only data generation but also > > drop/creation of tables within single transaction, and measure how much > > performance impact happens? I'm sure that would be negligible. > > Or you want to execute data generate in multiple transactions, i.e., > > execute each statement for data generation (e.g., one INSERT) in single > > transaction, and then want to measure the performance impact? > > But the patch doesn't enable us to do such data generation yet. > > Indeed, you cannot do this precise thing, but you can do others. > > > So I'm thinking that it's maybe better to commit the addtion of "G" option > > first separately. And then we can discuss how much "(" and ")" options > > are useful later. > > Attached patch v6 only provides G - server side data generation. Thanks for the patch! + snprintf(sql, sizeof(sql), + "insert into pgbench_branches(bid,bbalance) " + "select bid, 0 " + "from generate_series(1, %d) as bid", scale); "scale" should be "nbranches * scale". + snprintf(sql, sizeof(sql), + "insert into pgbench_accounts(aid,bid,abalance,filler) " + "select aid, (aid - 1) / %d + 1, 0, '' " + "from generate_series(1, %d) as aid", naccounts, scale * naccounts); Like client-side data generation, INT64_FORMAT should be used here instead of %d? If large scale factor is specified, the query for generating pgbench_accounts data can take a very long time. While that query is running, operators may be likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep running to the end. - for (step = initialize_steps; *step != '\0'; step++) + for (const char *step = initialize_steps; *step != '\0'; step++) Per PostgreSQL basic coding style, ISTM that "const char *step" should be declared separately from "for" loop, like the original. - fprintf(stderr, "unrecognized initialization step \"%c\"\n", + fprintf(stderr, + "unrecognized initialization step \"%c\"\n" + "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n", *step); - fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n"); The original message seems better to me. So what about just appending "G" into the above latter message? That is, "allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n" - <term><literal>g</literal> (Generate data)</term> + <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term> Isn't it better to explain a bit more what "client-side / server-side data generation" is? For example, something like When "g" (client-side data generation) is specified, data is generated in pgbench client and sent to the server. When "G" (server-side data generation) is specified, only queries are sent from pgbench client and then data is generated in the server. If the network bandwidth is low between pgbench and the server, using "G" might make the data generation faster. Regards, -- Fujii Masao
Hello Masao-san, > + snprintf(sql, sizeof(sql), > + "insert into pgbench_branches(bid,bbalance) " > + "select bid, 0 " > + "from generate_series(1, %d) as bid", scale); > > "scale" should be "nbranches * scale". Yep, even if nbranches is 1, it should be there. > + snprintf(sql, sizeof(sql), > + "insert into pgbench_accounts(aid,bid,abalance,filler) " > + "select aid, (aid - 1) / %d + 1, 0, '' " > + "from generate_series(1, %d) as aid", naccounts, scale * naccounts); > > Like client-side data generation, INT64_FORMAT should be used here > instead of %d? Indeed. > If large scale factor is specified, the query for generating pgbench_accounts > data can take a very long time. While that query is running, operators may be > likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench > should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep > running to the end. Hmmm. Why not. Now the infra to do that seems to already exists twice, once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c". I cannot say I'm thrilled to replicate this once more. I think that the reasonable option is to share this in fe-utils and then to reuse it from there. However, ISTM that such a restructuring patch which not belong to this feature. > - for (step = initialize_steps; *step != '\0'; step++) > + for (const char *step = initialize_steps; *step != '\0'; step++) > > Per PostgreSQL basic coding style, C99 (20 years ago) is new the norm, and this style is now allowed, there are over a hundred instances of these already. I tend to use that where appropriate. > - fprintf(stderr, "unrecognized initialization step \"%c\"\n", > + fprintf(stderr, > + "unrecognized initialization step \"%c\"\n" > + "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n", > *step); > - fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", > \"p\", \"f\"\n"); > > The original message seems better to me. So what about just appending "G" > into the above latter message? That is, > "allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n" I needed this list in several places, so it makes sense to share the definition, and frankly the list of half a dozen comma-separated chars does not strike me as much better than just giving the allowed chars directly. So the simpler the better, from my point of view. > Isn't it better to explain a bit more what "client-side / server-side data > generation" is? For example, something like Ok. Attached v7 does most of the above, but the list of char message and the signal handling. The first one does not look really better to me, and the second one belongs to a restructuring patch that I'll try to submit. -- Fabien.
Attachment
Hello Masao-san, >> If large scale factor is specified, the query for generating >> pgbench_accounts data can take a very long time. While that query is >> running, operators may be likely to do Ctrl-C to cancel the data >> generation. In this case, IMO pgbench should cancel the query, i.e., >> call PQcancel(). Otherwise, the query will keep running to the end. > > Hmmm. Why not. Now the infra to do that seems to already exists twice, once > in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c". > > I cannot say I'm thrilled to replicate this once more. I think that the > reasonable option is to share this in fe-utils and then to reuse it from > there. However, ISTM that such a restructuring patch which not belong to this > feature. [...] I just did a patch to share the code: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1910311939430.27369@lancre https://commitfest.postgresql.org/25/2336/ -- Fabien.
On Thu, Oct 31, 2019 at 11:54 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masao-san, > > > + snprintf(sql, sizeof(sql), > > + "insert into pgbench_branches(bid,bbalance) " > > + "select bid, 0 " > > + "from generate_series(1, %d) as bid", scale); > > > > "scale" should be "nbranches * scale". > > Yep, even if nbranches is 1, it should be there. > > > + snprintf(sql, sizeof(sql), > > + "insert into pgbench_accounts(aid,bid,abalance,filler) " > > + "select aid, (aid - 1) / %d + 1, 0, '' " > > + "from generate_series(1, %d) as aid", naccounts, scale * naccounts); > > > > Like client-side data generation, INT64_FORMAT should be used here > > instead of %d? > > Indeed. > > > If large scale factor is specified, the query for generating pgbench_accounts > > data can take a very long time. While that query is running, operators may be > > likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench > > should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep > > running to the end. > > Hmmm. Why not. Now the infra to do that seems to already exists twice, > once in "src/bin/psql/common.c" and once in "src/bin/scripts/common.c". > > I cannot say I'm thrilled to replicate this once more. I think that the > reasonable option is to share this in fe-utils and then to reuse it from > there. However, ISTM that such a restructuring patch which not belong to > this feature. Understood. Ok, let's discuss this in other thread that you started. > > - for (step = initialize_steps; *step != '\0'; step++) > > + for (const char *step = initialize_steps; *step != '\0'; step++) > > > > Per PostgreSQL basic coding style, > > C99 (20 years ago) is new the norm, and this style is now allowed, there > are over a hundred instances of these already. I tend to use that where > appropriate. Yes, I understood there are several places using such style. But I still wonder why we should apply such change here. If there is the reason why this change is necessary here, I'm OK with that. But if not, basically I'd like to avoid the change. Otherwise it may make the back-patch a bit harder when we change the surrounding code. > > - fprintf(stderr, "unrecognized initialization step \"%c\"\n", > > + fprintf(stderr, > > + "unrecognized initialization step \"%c\"\n" > > + "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n", > > *step); > > - fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", > > \"p\", \"f\"\n"); > > > > The original message seems better to me. So what about just appending "G" > > into the above latter message? That is, > > "allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n" > > I needed this list in several places, so it makes sense to share the > definition, and frankly the list of half a dozen comma-separated chars > does not strike me as much better than just giving the allowed chars > directly. So the simpler the better, from my point of view. OK. > > Isn't it better to explain a bit more what "client-side / server-side data > > generation" is? For example, something like > > Ok. > > Attached v7 does most of the above, but the list of char message and the > signal handling. The first one does not look really better to me, and the > second one belongs to a restructuring patch that I'll try to submit. Thanks for updating the patch! Attached is the slightly updated version of the patch. Based on your patch, I added the descriptions about logging of "g" and "G" steps into the doc, and did some cosmetic changes. Barrying any objections, I'm thinking to commit this patch. While reviewing the patch, I found that current code allows space character to be specified in -I. That is, checkInitSteps() accepts space character. Why should we do this? Probably I understand why runInitSteps() needs to accept space character (because "v" in the specified string with -I is replaced with a space character when --no-vacuum option is given). But I'm not sure why that's also necessary in checkInitSteps(). Instead, we should treat a space character as invalid in checkInitSteps()? Regards, -- Fujii Masao
Attachment
Hello, >>> - for (step = initialize_steps; *step != '\0'; step++) >>> + for (const char *step = initialize_steps; *step != '\0'; step++) > > But I still wonder why we should apply such change here. Because it removes one declaration and reduces the scope of one variable? > If there is the reason why this change is necessary here, Nope, such changes are never necessary. > I'm OK with that. But if not, basically I'd like to avoid the change. > Otherwise it may make the back-patch a bit harder > when we change the surrounding code. I think that this is small enough so that it can be managed, if any back patch occurs on the surrounding code, which is anyway pretty unlikely. > Attached is the slightly updated version of the patch. Based on your > patch, I added the descriptions about logging of "g" and "G" steps into > the doc, and did some cosmetic changes. Barrying any objections, > I'm thinking to commit this patch. I'd suggest: "to print one message each ..." -> "to print one message every ..." "to print no progress ..." -> "not to print any progress ..." I would not call "fprintf(stderr" twice in a row if I can call it once. > While reviewing the patch, I found that current code allows space > character to be specified in -I. That is, checkInitSteps() accepts > space character. Why should we do this? > Probably I understand why runInitSteps() needs to accept space character > (because "v" in the specified string with -I is replaced with a space > character when --no-vacuum option is given). Yes, that is the reason, otherwise the string would have to be shifted. > But I'm not sure why that's also necessary in checkInitSteps(). Instead, > we should treat a space character as invalid in checkInitSteps()? I think that it may break --no-vacuum, and I thought that there may be other option which remove things, eventually. Also, having a NO-OP looks ok to me. -- Fabien.
On Wed, Nov 6, 2019 at 6:23 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello, > > >>> - for (step = initialize_steps; *step != '\0'; step++) > >>> + for (const char *step = initialize_steps; *step != '\0'; step++) > > > > But I still wonder why we should apply such change here. > > Because it removes one declaration and reduces the scope of one variable? > > > If there is the reason why this change is necessary here, > > Nope, such changes are never necessary. > > > I'm OK with that. But if not, basically I'd like to avoid the change. > > Otherwise it may make the back-patch a bit harder > > when we change the surrounding code. > > I think that this is small enough so that it can be managed, if any back > patch occurs on the surrounding code, which is anyway pretty unlikely. > > > Attached is the slightly updated version of the patch. Based on your > > patch, I added the descriptions about logging of "g" and "G" steps into > > the doc, and did some cosmetic changes. Barrying any objections, > > I'm thinking to commit this patch. > > I'd suggest: > > "to print one message each ..." -> "to print one message every ..." > > "to print no progress ..." -> "not to print any progress ..." > > I would not call "fprintf(stderr" twice in a row if I can call it once. Thanks for the suggestion! I updated the patch in that way and committed it! This commit doesn't include the change "for (const char ...)" and "merge two fprintf into one" ones that we were discussing. Because they are trivial but I'm not sure if they are improvements or not, yet. If they are, probably it's better to apply such changes to all the places having the similar issues. But that seems overkill. > > > While reviewing the patch, I found that current code allows space > > character to be specified in -I. That is, checkInitSteps() accepts > > space character. Why should we do this? > > > Probably I understand why runInitSteps() needs to accept space character > > (because "v" in the specified string with -I is replaced with a space > > character when --no-vacuum option is given). > > Yes, that is the reason, otherwise the string would have to be shifted. > > > But I'm not sure why that's also necessary in checkInitSteps(). Instead, > > we should treat a space character as invalid in checkInitSteps()? > > I think that it may break --no-vacuum, and I thought that there may be > other option which remove things, eventually. Also, having a NO-OP looks > ok to me. As far as I read the code, checkInitSteps() checks the initialization steps that users specified. The initialization steps string that "v" was replaced with blank character is not given to checkInitSteps(). So ISTM that dropping the handling of blank character from checkInitSteps() doesn't break --no-vacuum. Regards, -- Fujii Masao
2019-11-06 11:31 に Fujii Masao さんは書きました: > On Wed, Nov 6, 2019 at 6:23 AM Fabien COELHO <coelho@cri.ensmp.fr> > wrote: >> >> >> Hello, >> >> >>> - for (step = initialize_steps; *step != '\0'; step++) >> >>> + for (const char *step = initialize_steps; *step != '\0'; step++) >> > >> > But I still wonder why we should apply such change here. >> >> Because it removes one declaration and reduces the scope of one >> variable? >> >> > If there is the reason why this change is necessary here, >> >> Nope, such changes are never necessary. >> >> > I'm OK with that. But if not, basically I'd like to avoid the change. >> > Otherwise it may make the back-patch a bit harder >> > when we change the surrounding code. >> >> I think that this is small enough so that it can be managed, if any >> back >> patch occurs on the surrounding code, which is anyway pretty unlikely. >> >> > Attached is the slightly updated version of the patch. Based on your >> > patch, I added the descriptions about logging of "g" and "G" steps into >> > the doc, and did some cosmetic changes. Barrying any objections, >> > I'm thinking to commit this patch. >> >> I'd suggest: >> >> "to print one message each ..." -> "to print one message every ..." >> >> "to print no progress ..." -> "not to print any progress ..." >> >> I would not call "fprintf(stderr" twice in a row if I can call it >> once. > > Thanks for the suggestion! > I updated the patch in that way and committed it! > > This commit doesn't include the change "for (const char ...)" > and "merge two fprintf into one" ones that we were discussing. > Because they are trivial but I'm not sure if they are improvements > or not, yet. If they are, probably it's better to apply such changes > to all the places having the similar issues. But that seems overkill. > >> >> > While reviewing the patch, I found that current code allows space >> > character to be specified in -I. That is, checkInitSteps() accepts >> > space character. Why should we do this? >> >> > Probably I understand why runInitSteps() needs to accept space character >> > (because "v" in the specified string with -I is replaced with a space >> > character when --no-vacuum option is given). >> >> Yes, that is the reason, otherwise the string would have to be >> shifted. >> >> > But I'm not sure why that's also necessary in checkInitSteps(). Instead, >> > we should treat a space character as invalid in checkInitSteps()? >> >> I think that it may break --no-vacuum, and I thought that there may be >> other option which remove things, eventually. Also, having a NO-OP >> looks >> ok to me. > > As far as I read the code, checkInitSteps() checks the initialization > steps that users specified. The initialization steps string that > "v" was replaced with blank character is not given to checkInitSteps(). > So ISTM that dropping the handling of blank character from > checkInitSteps() doesn't break --no-vacuum. > This is a patch which does not allow space character in -I options . Regard, Yu Kimura
Attachment
>>> I think that it may break --no-vacuum, and I thought that there may be >>> other option which remove things, eventually. Also, having a NO-OP looks >>> ok to me. >> >> As far as I read the code, checkInitSteps() checks the initialization >> steps that users specified. The initialization steps string that >> "v" was replaced with blank character is not given to checkInitSteps(). >> So ISTM that dropping the handling of blank character from >> checkInitSteps() doesn't break --no-vacuum. >> > This is a patch which does not allow space character in -I options . I do not think that this is desirable. It would be a regression, and allowing a no-op is not an issue in anyway. -- Fabien Coelho - CRI, MINES ParisTech
On Thu, Nov 7, 2019 at 5:18 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > >>> I think that it may break --no-vacuum, and I thought that there may be > >>> other option which remove things, eventually. Also, having a NO-OP looks > >>> ok to me. > >> > >> As far as I read the code, checkInitSteps() checks the initialization > >> steps that users specified. The initialization steps string that > >> "v" was replaced with blank character is not given to checkInitSteps(). > >> So ISTM that dropping the handling of blank character from > >> checkInitSteps() doesn't break --no-vacuum. > >> > > This is a patch which does not allow space character in -I options . > > I do not think that this is desirable. It would be a regression, and > allowing a no-op is not an issue in anyway. Why is that regression, you think? I think that's an oversight. If I'm missing something and accepting a blank character as no-op in also checkInitSteps() is really necessary for some reasons, which should be documented. But, if so, another question is; why should only blank character be treated as no-op, in checkInitSteps()? Regards, -- Fujii Masao
Hello Masao-san, >> I do not think that this is desirable. It would be a regression, and >> allowing a no-op is not an issue in anyway. > > Why is that regression, you think? Because "pgbench -I ' d'" currently works and it would cease to work after the patch. > I think that's an oversight. If I'm missing something and accepting a > blank character as no-op in also checkInitSteps() is really necessary > for some reasons, which should be documented. But, if so, another > question is; why should only blank character be treated as no-op, in > checkInitSteps()? The idea is to have one character that can be substituted to remove any operation. On principle, allowing a no-op character, whatever the choice, is a good idea, because it means that the caller can take advantage of that if need be. I think that the actual oversight is that the checkInitSteps should be called at the beginning of processing initialization steps rather than while processing -I, because currently other places modify the initialization string (no-vacuum, foreign key) and thus are not checked. I agree that it should be documented. Attached patch adds a doc and moves the check where it should be, and modifies a test with an explicit no-op space initialization step. -- Fabien.
Attachment
On Thu, Nov 7, 2019 at 6:35 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masao-san, > > >> I do not think that this is desirable. It would be a regression, and > >> allowing a no-op is not an issue in anyway. > > > > Why is that regression, you think? > > Because "pgbench -I ' d'" currently works and it would cease to work after > the patch. If the behavior has been documented and visible to users, I agree that it should not be dropped for compatibility basically. But in this case, that was not. > > I think that's an oversight. If I'm missing something and accepting a > > blank character as no-op in also checkInitSteps() is really necessary > > for some reasons, which should be documented. But, if so, another > > question is; why should only blank character be treated as no-op, in > > checkInitSteps()? > > The idea is to have one character that can be substituted to remove any > operation. Probably I understand that idea is necessary in the internal of pgbench because pgbench internally may modify the initialization steps string. But I'm not sure why it needs to be exposed, yet. > On principle, allowing a no-op character, whatever the choice, is a good > idea, because it means that the caller can take advantage of that if need > be. > > I think that the actual oversight is that the checkInitSteps should be > called at the beginning of processing initialization steps rather than > while processing -I, because currently other places modify the > initialization string (no-vacuum, foreign key) and thus are not checked. As far as I read the code, runInitSteps() does the check. If the initialization steps string contains unrecognized character, runInitSteps() emits an error. * (We could just leave it to runInitSteps() to fail if there are wrong * characters, but since initialization can take awhile, it seems friendlier * to check during option parsing.) The above comment in checkInitSteps() seems to explain why checkInitSteps() is called at the beginning of processing initialization steps. Regards, -- Fujii Masao
Hello, >> I think that the actual oversight is that the checkInitSteps should be >> called at the beginning of processing initialization steps rather than >> while processing -I, because currently other places modify the >> initialization string (no-vacuum, foreign key) and thus are not checked. > > As far as I read the code, runInitSteps() does the check. If the initialization > steps string contains unrecognized character, runInitSteps() emits an error. Sure, but the previous step have been executed and committed, the point of the check is to detect the issue before starting the execution. > * (We could just leave it to runInitSteps() to fail if there are wrong > * characters, but since initialization can take awhile, it seems friendlier > * to check during option parsing.) > > The above comment in checkInitSteps() seems to explain why > checkInitSteps() is called at the beginning of processing initialization > steps. Yep, the comment is right in the motivation, but not accurate anymore wrt the submitted patch. V2 attached updates this comment. -- Fabien.