Re: pgbench - extend initialization phase control - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: pgbench - extend initialization phase control
Date
Msg-id CAHGQGwHWEyTXxZh46qgFY8a2bDF_EYeUdp3+_Hy=qLZSzwVPKg@mail.gmail.com
Whole thread Raw
In response to Re: pgbench - extend initialization phase control  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench - extend initialization phase control  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Mon, Oct 28, 2019 at 10:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Masao-san,
>
> >> Maybe. If you cannot check, you can only guess. Probably it should be
> >> small, but the current version does not allow to check whether it is so.
> >
> > Could you elaborate what you actually want to measure the performance
> > impact by adding explicit begin and commit? Currently pgbench -i issues
> > the following queries. The data generation part is already executed within
> > single transaction. You want to execute not only data generation but also
> > drop/creation of tables within single transaction, and measure how much
> > performance impact happens? I'm sure that would be negligible.
> > Or you want to execute data generate in multiple transactions, i.e.,
> > execute each statement for data generation (e.g., one INSERT) in single
> > transaction, and then want to measure the performance impact?
> > But the patch doesn't enable us to do such data generation yet.
>
> Indeed, you cannot do this precise thing, but you can do others.
>
> > So I'm thinking that it's maybe better to commit the addtion of "G" option
> > first separately. And then we can discuss how much "(" and ")" options
> > are useful later.
>
> Attached patch v6 only provides G - server side data generation.

Thanks for the patch!

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_branches(bid,bbalance) "
+ "select bid, 0 "
+ "from generate_series(1, %d) as bid", scale);

"scale" should be "nbranches * scale".

+ snprintf(sql, sizeof(sql),
+ "insert into pgbench_accounts(aid,bid,abalance,filler) "
+ "select aid, (aid - 1) / %d + 1, 0, '' "
+ "from generate_series(1, %d) as aid", naccounts, scale * naccounts);

Like client-side data generation, INT64_FORMAT should be used here
instead of %d?

If large scale factor is specified, the query for generating pgbench_accounts
data can take a very long time. While that query is running, operators may be
likely to do Ctrl-C to cancel the data generation. In this case, IMO pgbench
should cancel the query, i.e., call PQcancel(). Otherwise, the query will keep
running to the end.

- for (step = initialize_steps; *step != '\0'; step++)
+ for (const char *step = initialize_steps; *step != '\0'; step++)

Per PostgreSQL basic coding style, ISTM that "const char *step"
should be declared separately from "for" loop, like the original.

- fprintf(stderr, "unrecognized initialization step \"%c\"\n",
+ fprintf(stderr,
+ "unrecognized initialization step \"%c\"\n"
+ "Allowed step characters are: \"" ALL_INIT_STEPS "\".\n",
  *step);
- fprintf(stderr, "allowed steps are: \"d\", \"t\", \"g\", \"v\",
\"p\", \"f\"\n");

The original message seems better to me. So what about just appending "G"
into the above latter message? That is,
"allowed steps are: \"d\", \"t\", \"g\", \"G\", \"v\", \"p\", \"f\"\n"

-          <term><literal>g</literal> (Generate data)</term>
+          <term><literal>g</literal> or <literal>G</literal>
(Generate data, client or server side)</term>

Isn't it better to explain a bit more what "client-side / server-side data
generation" is? For example, something like

    When "g" (client-side data generation) is specified, data is generated
    in pgbench client and sent to the server. When "G" (server-side data
    generation) is specified, only queries are sent from pgbench client
    and then data is generated in the server. If the network bandwidth is low
    between pgbench and the server, using "G" might make the data
    generation faster.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: v12.0: ERROR: could not find pathkey item to sort
Next
From: Peter Eisentraut
Date:
Subject: Re: Remove one use of IDENT_USERNAME_MAX