Re: pgbench - extend initialization phase control - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: pgbench - extend initialization phase control |
Date | |
Msg-id | CAHGQGwHWEyTXxZh46qgFY8a2bDF_EYeUdp3+_Hy=qLZSzwVPKg@mail.gmail.com Whole thread Raw |
In response to | Re: pgbench - extend initialization phase control (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench - extend initialization phase control
|
List | pgsql-hackers |
On Mon, Oct 28, 2019 at 10:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Masao-san, > > >> Maybe. If you cannot check, you can only guess. Probably it should be > >> small, but the current version does not allow to check whether it is so. > > > > Could you elaborate what you actually want to measure the performance > > impact by adding explicit begin and commit? Currently pgbench -i issues > > the following queries. The data generation part is already executed within > > single transaction. You want to execute not only data generation but also > > drop/creation of tables within single transaction, and measure how much > > performance impact happens? I'm sure that would be negligible. > > Or you want to execute data generate in multiple transactions, i.e., > > execute each statement for data generation (e.g., one INSERT) in single > > transaction, and then want to measure the performance impact? > > But the patch doesn't enable us to do such data generation yet. > > Indeed, you cannot do this precise thing, but you can do others. > > > So I'm thinking that it's maybe better to commit the addtion of "G" option > > first separately. And then we can discuss how much "(" and ")" options > > are useful later. > > Attached patch v6 only provides G - server side data generation. Thanks for the patch! + snprintf(sql, sizeof(sql), + "insert into pgbench_branches(bid,bbalance) " + "select bid, 0 " + "from generate_series(1, %d) as bid", scale); "scale" should be "nbranches * scale". + snprintf(sql, sizeof(sql), + "insert into pgbench_accounts(aid,bid,abalance,filler) " + "select aid, (aid - 1) / %d + 1, 0, '' " + "from generate_series(1, %d) as aid", naccounts, scale * naccounts); Like client-side data generation, INT64_FORMAT should be used here instead of %d? If large scale factor is specified, the query for generating pgbench_accounts data can take a very long time. While that query is running, operators may be likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep running to the end. - for (step = initialize_steps; *step != '\0'; step++) + for (const char *step = initialize_steps; *step != '\0'; step++) Per PostgreSQL basic coding style, ISTM that "const char *step" should be declared separately from "for" loop, like the original. - fprintf(stderr, "unrecognized initialization step \"%c\"\n", + fprintf(stderr, + "unrecognized initialization step \"%c\"\n" + "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n", *step); - fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\", \"p\", \"f\"\n"); The original message seems better to me. So what about just appending "G" into the above latter message? That is, "allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n" - <term><literal>g</literal> (Generate data)</term> + <term><literal>g</literal> or <literal>G</literal> (Generate data, client or server side)</term> Isn't it better to explain a bit more what "client-side / server-side data generation" is? For example, something like When "g" (client-side data generation) is specified, data is generated in pgbench client and sent to the server. When "G" (server-side data generation) is specified, only queries are sent from pgbench client and then data is generated in the server. If the network bandwidth is low between pgbench and the server, using "G" might make the data generation faster. Regards, -- Fujii Masao
pgsql-hackers by date: