Thread: [HACKERS] pgbench: Skipping the creating primary keys after initialization

[HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
Hi all,

I'd like to propose a new option -I for pgbench command which skips
the creating primary keys after initialized tables. This option is
useful for users who want to do bench marking with no index or indexes
other than btree primary index. If we initialize pgbench tables at a
large number scale factor the primary key index creation takes a long
time even if we're going to use other types of indexes. With this
option, the initialization time is reduced and you can create indexes
as you want.

Feedback is very welcome. I'll add this patch to the next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I'd like to propose a new option -I for pgbench command which skips
> the creating primary keys after initialized tables. This option is
> useful for users who want to do bench marking with no index or indexes
> other than btree primary index. If we initialize pgbench tables at a
> large number scale factor the primary key index creation takes a long
> time even if we're going to use other types of indexes. With this
> option, the initialization time is reduced and you can create indexes
> as you want.
>
> Feedback is very welcome. I'll add this patch to the next CF.

I support adding an option for this, but I propose that we just make
it a long-form option, similar to --log-prefix or --index-tablespace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Aug 2, 2017 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I'd like to propose a new option -I for pgbench command which skips
>> the creating primary keys after initialized tables. This option is
>> useful for users who want to do bench marking with no index or indexes
>> other than btree primary index. If we initialize pgbench tables at a
>> large number scale factor the primary key index creation takes a long
>> time even if we're going to use other types of indexes. With this
>> option, the initialization time is reduced and you can create indexes
>> as you want.
>>
>> Feedback is very welcome. I'll add this patch to the next CF.
>
> I support adding an option for this, but I propose that we just make
> it a long-form option, similar to --log-prefix or --index-tablespace.
>

Yeah, that's better. I'll update the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I'd like to propose a new option -I for pgbench command which skips
>> the creating primary keys after initialized tables.

> I support adding an option for this, but I propose that we just make
> it a long-form option, similar to --log-prefix or --index-tablespace.

I think we could probably do without this ... if you want a non-default
test setup, why do you need to use "pgbench -i" to create it?

It's not that there's anything greatly wrong with this particular idea,
it's just that pgbench has too many switches already, and omitting random
subsets of the initialization actions doesn't seem like it contributes
fundamental new benchmarking capability.

I could get behind a proposal that generalized pgbench's "-i" behavior
in some meaningful way.  I wonder whether it would be possible to convert
that behavior into a script.  Some of what it does is just SQL commands
with injected parameters, which pgbench does already.  There's also
data-loading actions, which could be converted to backslash commands
perhaps.  Then desires like this could be addressed by invoking a
customized script instead of complicating pgbench's option set.
        regards, tom lane



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Tatsuo Ishii
Date:
> I think we could probably do without this ... if you want a non-default
> test setup, why do you need to use "pgbench -i" to create it?
> 
> It's not that there's anything greatly wrong with this particular idea,
> it's just that pgbench has too many switches already, and omitting random
> subsets of the initialization actions doesn't seem like it contributes
> fundamental new benchmarking capability.
> 
> I could get behind a proposal that generalized pgbench's "-i" behavior
> in some meaningful way.  I wonder whether it would be possible to convert
> that behavior into a script.  Some of what it does is just SQL commands
> with injected parameters, which pgbench does already.  There's also
> data-loading actions, which could be converted to backslash commands
> perhaps.  Then desires like this could be addressed by invoking a
> customized script instead of complicating pgbench's option set.

+1.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



On Wed, Aug 2, 2017 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Aug 1, 2017 at 9:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I'd like to propose a new option -I for pgbench command which skips
>>> the creating primary keys after initialized tables.
>
>> I support adding an option for this, but I propose that we just make
>> it a long-form option, similar to --log-prefix or --index-tablespace.
>
> I think we could probably do without this ... if you want a non-default
> test setup, why do you need to use "pgbench -i" to create it?
>
> It's not that there's anything greatly wrong with this particular idea,
> it's just that pgbench has too many switches already, and omitting random
> subsets of the initialization actions doesn't seem like it contributes
> fundamental new benchmarking capability.
>
> I could get behind a proposal that generalized pgbench's "-i" behavior
> in some meaningful way.  I wonder whether it would be possible to convert
> that behavior into a script.  Some of what it does is just SQL commands
> with injected parameters, which pgbench does already.  There's also
> data-loading actions, which could be converted to backslash commands
> perhaps.  Then desires like this could be addressed by invoking a
> customized script instead of complicating pgbench's option set.

I've actually wanted this exact thing multiple times: most recently,
to make a non-unique btree index instead of a unique one, and to make
a hash index instead of a btree one.  I don't object to a modest
effort at coming up with a more general mechanism here, but I also
think the switch as proposed is something that would have met my real
needs on multiple occasions.  I've probably had 10 different occasions
when I wanted all of the standard pgbench initialization *except for*
something different about the indexes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> I've actually wanted this exact thing multiple times: most recently,
> to make a non-unique btree index instead of a unique one, and to make
> a hash index instead of a btree one.  I don't object to a modest
> effort at coming up with a more general mechanism here, but I also
> think the switch as proposed is something that would have met my real
> needs on multiple occasions.  I've probably had 10 different occasions
> when I wanted all of the standard pgbench initialization *except for*
> something different about the indexes.

Sure, but "no indexes at all" is hardly ever the real goal, is it?
So the switch as proposed is only solving part of your problem.
I'd rather see a solution that addresses a larger range of desires.

Or in other words, this looks to me quite a bit like the hackery
that resulted in pgbench's -S and -N options, before we figured out
that making it scriptable was a better answer.
        regards, tom lane



On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sure, but "no indexes at all" is hardly ever the real goal, is it?

Right.

> So the switch as proposed is only solving part of your problem.
> I'd rather see a solution that addresses a larger range of desires.

That's reasonable.

> Or in other words, this looks to me quite a bit like the hackery
> that resulted in pgbench's -S and -N options, before we figured out
> that making it scriptable was a better answer.

But it's not very clear to me how we could make this case scriptable,
and it would probably not be much different from just using the
proposed option and then running the script afterwards yourself via
psql.  The thing about -N and -S is that those scripts are being run
repeatedly, so pgbench has to be involved.  If you just want to create
different/extra indexes, you can do that yourself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Or in other words, this looks to me quite a bit like the hackery
>> that resulted in pgbench's -S and -N options, before we figured out
>> that making it scriptable was a better answer.

> But it's not very clear to me how we could make this case scriptable,

Well, I'm imagining that "-i" would essentially become a short form
of "-b initialize", as already happened for -S and -N, where the script
looks something like

drop table if exists pgbench_branches;
create table pgbench_branches ( bid int not null,bbalance int,filler char(88)
);
\load_data pgbench_branches [ other parameters to-be-determined ]
alter table pgbench_branches add primary key (bid);
... repeat for other tables ...

and we'd document that the same way we do for the existing built-in
scripts.  Then, if there's something you don't like about it, you
just paste the script into a file and edit to taste.

I'm sure there's complexities that would only become apparent when
someone tries to write the patch, but that seems to me like a better
foundation for this class of desires than extending the option set
with various one-off options having no discernible architecture.

> If you just want to create
> different/extra indexes, you can do that yourself.

Sure, but there's no end to the number of small variations on this
theme that somebody might want.  For example, we realized years ago
that the "filler" fields as-implemented don't really meet the intent
of the TPC-B spec (cf comment in the init() function).  If someone
comes along with a patch adding a "--really-tpc-b" option to change
the table declarations and/or data loading code to fix that, will we
take that patch?  What about one that wants all the id fields (not
just accounts.aid) to be bigint, or one that wants the balance fields
to be numeric?

You can say "let 'em set up the tables manually if they want that",
but I don't see why a nonstandard set of indexes is much different.
        regards, tom lane



On Wed, Aug 2, 2017 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Or in other words, this looks to me quite a bit like the hackery
>>> that resulted in pgbench's -S and -N options, before we figured out
>>> that making it scriptable was a better answer.
>
>> But it's not very clear to me how we could make this case scriptable,
>
> Well, I'm imagining that "-i" would essentially become a short form
> of "-b initialize", as already happened for -S and -N, where the script
> looks something like
>
> drop table if exists pgbench_branches;
> create table pgbench_branches (
>   bid int not null,bbalance int,filler char(88)
> );
> \load_data pgbench_branches [ other parameters to-be-determined ]
> alter table pgbench_branches add primary key (bid);
> ... repeat for other tables ...
>
> and we'd document that the same way we do for the existing built-in
> scripts.  Then, if there's something you don't like about it, you
> just paste the script into a file and edit to taste.

I imagine that would be useful for some use cases, but it's a heck of
a lot more work than just writing --no-indexes-please.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Aug 2, 2017 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, I'm imagining that "-i" would essentially become a short form
>> of "-b initialize", as already happened for -S and -N, where the script
>> looks something like ...

> I imagine that would be useful for some use cases, but it's a heck of
> a lot more work than just writing --no-indexes-please.

Of course.  It's also a heck of a lot more flexible.  Adding on another
ad-hoc option that does the minimum possible amount of work needed to
address one specific problem is always going to be less work; but after
we repeat that process five or ten times, we're going to have a mess.
        regards, tom lane



On Wed, Aug 2, 2017 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Of course.  It's also a heck of a lot more flexible.  Adding on another
> ad-hoc option that does the minimum possible amount of work needed to
> address one specific problem is always going to be less work; but after
> we repeat that process five or ten times, we're going to have a mess.

Well, I still like Masahiko-san's proposal, but I'm not prepared to
keep arguing about it right now.  Maybe some other people will weigh
in with an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Michael Paquier
Date:
On Wed, Aug 2, 2017 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Aug 2, 2017 at 10:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Or in other words, this looks to me quite a bit like the hackery
>>> that resulted in pgbench's -S and -N options, before we figured out
>>> that making it scriptable was a better answer.
>
>> But it's not very clear to me how we could make this case scriptable,
>
> Well, I'm imagining that "-i" would essentially become a short form
> of "-b initialize", as already happened for -S and -N, where the script
> looks something like

Yes, I would imagine a facility where one could do pgbench $script and
issue a complete test set. Here is for example a funky idea: let's
separate each script with a set of meta-commands, \init being what is
used just for initialization, and then use \script to define a set of
commands with a custom weight. Say:
\init
CREATE TABLE foo(a int);
\script select_query [weight N]
SELECT count(*) FROM foo;
\script insert_query [weight N]
INSERT INTO foo VALUES ('1');

That may be over-engineering things, but personally I don't like much
having just a switch to remove indexes. Next time we will come with
another option that only selects a portion of the indexes created.
-- 
Michael



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Thu, Aug 3, 2017 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Aug 2, 2017 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Of course.  It's also a heck of a lot more flexible.  Adding on another
>> ad-hoc option that does the minimum possible amount of work needed to
>> address one specific problem is always going to be less work; but after
>> we repeat that process five or ten times, we're going to have a mess.
>
> Well, I still like Masahiko-san's proposal, but I'm not prepared to
> keep arguing about it right now.  Maybe some other people will weigh
> in with an opinion.
>

My motivation of this proposal is same as what Robert has. I
understand that ad-hoc option can solve only the part of big problem
and it could be cause of mess. However It seems me that the script
especially for table initialization will not be flexible than we
expected. I mean, even if we provide some meta commands for table
initialization or data loading, these meta commands work for only
pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
If we want to create other tables and load data to them as we want we
can do that using psql -f. So an alternative ways is having a flexible
style option for example --custom-initialize = { [load, create_pkey,
create_fkey, vacuum], ... }. That would solve this in a better way.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello,

> My motivation of this proposal is same as what Robert has. I
> understand that ad-hoc option can solve only the part of big problem
> and it could be cause of mess. However It seems me that the script
> especially for table initialization will not be flexible than we
> expected. I mean, even if we provide some meta commands for table
> initialization or data loading, these meta commands work for only
> pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
> If we want to create other tables and load data to them as we want we
> can do that using psql -f. So an alternative ways is having a flexible
> style option for example --custom-initialize = { [load, create_pkey,
> create_fkey, vacuum], ... }. That would solve this in a better way.

Personnaly, I could be fine with a limited number of long options to 
adjust pgbench initialization to various needs, eg --use-hash-index, 
--skip-whetever-index, etc.

The flexible --custom-init idea outlined above looks nice as well.


As for a more generic solution, the easy part are the "CREATE" stuff and 
the transaction script stuff (existing pgbench scripts).

For the CREATE stuff, the script language is SQL, the command to use it is 
"psql"...

The real and hard part is to fill tables with meaningful pseudo-random 
test data which do not violate constraints for any non trivial schema 
involving foreign keys and various unique constraints.

The solution for this is SQL for trivial cases, think of:
  "INSERT INTO Foo() SELECT ... FROM generate_series(...);"

For instance the pgbench initialization is really close to:
 psql -Dscale=10 <<EOF   CREATE TABLE ... ;   INSERT INTO pgbench_account(...)    SELECT ... FROM generate_series(1,
100000* :scale) AS i;   INSERT ...   CREATE INDEX ...;   VACUUM FULL ANALYZE; EOF
 

And all existing options could probably be implemented easilly with the 
recently added conditional (\if).

So my 0.02€ is that if something is to be done, I would suggest to turn 
the creation and initialization stuff into a standard "psql" script that 
could be called from pgbench instead of integrating much more ad-hoc stuff 
into pgbench.

Note that non trivial schema initialization requires more general 
programming, so I do not believe in doing a lot at pgbench or psql levels. 
The best I could come with is a data generator which takes as input the 
schema with added directives on how to generate the various attributes 
(tool named "datafiller", that some people use:-).

-- 
Fabien.

Fabien COELHO <coelho@cri.ensmp.fr> writes:
> As for a more generic solution, the easy part are the "CREATE" stuff and 
> the transaction script stuff (existing pgbench scripts).

> For the CREATE stuff, the script language is SQL, the command to use it is 
> "psql"...

> The real and hard part is to fill tables with meaningful pseudo-random 
> test data which do not violate constraints for any non trivial schema 
> involving foreign keys and various unique constraints.

> The solution for this is SQL for trivial cases, think of:
>    "INSERT INTO Foo() SELECT ... FROM generate_series(...);"

Yeah.  I was also thinking that complicated data-generation requirements
could be handled with plpgsql DO blocks, avoiding the need for hard-wired
code inside pgbench.
        regards, tom lane



Masahiko Sawada <sawada.mshk@gmail.com> writes:
> If we want to create other tables and load data to them as we want we
> can do that using psql -f. So an alternative ways is having a flexible
> style option for example --custom-initialize = { [load, create_pkey,
> create_fkey, vacuum], ... }. That would solve this in a better way.

FWIW, I like that significantly better than your original proposal.
It'd allow people to execute parts of pgbench's standard initialization
sequence and then do other things in between (in psql).  Realistically,
that's probably about as much win as we need here --- if you're veering
far enough away from the standard scenario that that doesn't do it for
you, you might as well just write an all-custom setup script in psql.
        regards, tom lane



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> For the CREATE stuff, the script language is SQL, the command to use it is
>> "psql"...
>
>> The real and hard part is to fill tables with meaningful pseudo-random
>> test data which do not violate constraints for any non trivial schema
>> involving foreign keys and various unique constraints.
>
>> The solution for this is SQL for trivial cases, think of:
>>    "INSERT INTO Foo() SELECT ... FROM generate_series(...);"
>
> Yeah.  I was also thinking that complicated data-generation requirements
> could be handled with plpgsql DO blocks, avoiding the need for hard-wired
> code inside pgbench.

I do not thing that it is really be needed for what pgbench does, though. 
See attached attempt, including a no_foreign_keys option.

The only tricky thing is to have the elapsed/remaining advancement report 
on stdout, maybe with some PL/pgSQL.

Timings are very similar compared to "pgbench -i".

-- 
Fabien.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Fri, Aug 4, 2017 at 3:24 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> For the CREATE stuff, the script language is SQL, the command to use it
>>> is
>>> "psql"...
>>
>>
>>> The real and hard part is to fill tables with meaningful pseudo-random
>>> test data which do not violate constraints for any non trivial schema
>>> involving foreign keys and various unique constraints.
>>
>>
>>> The solution for this is SQL for trivial cases, think of:
>>>    "INSERT INTO Foo() SELECT ... FROM generate_series(...);"
>>
>>
>> Yeah.  I was also thinking that complicated data-generation requirements
>> could be handled with plpgsql DO blocks, avoiding the need for hard-wired
>> code inside pgbench.
>
>
> I do not thing that it is really be needed for what pgbench does, though.
> See attached attempt, including a no_foreign_keys option.
>
> The only tricky thing is to have the elapsed/remaining advancement report on
> stdout, maybe with some PL/pgSQL.
>
> Timings are very similar compared to "pgbench -i".
>

The generating data with plpgsql DO blocks means that we do the
data-generation on sever side rather than on client side. I think it's
preferable in a sense because could speed up initialization time by
reducing the network traffic.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Thu, Aug 3, 2017 at 11:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
>> If we want to create other tables and load data to them as we want we
>> can do that using psql -f. So an alternative ways is having a flexible
>> style option for example --custom-initialize = { [load, create_pkey,
>> create_fkey, vacuum], ... }. That would solve this in a better way.
>
> FWIW, I like that significantly better than your original proposal.
> It'd allow people to execute parts of pgbench's standard initialization
> sequence and then do other things in between (in psql).  Realistically,
> that's probably about as much win as we need here --- if you're veering
> far enough away from the standard scenario that that doesn't do it for
> you, you might as well just write an all-custom setup script in psql.
>

Attached patch introduces --custom-initialize option that allows us to
specify the initialization step and its order. For example, If you
want to skip building primary keys you can specify
--custom-initialize="create_table, load_data, vacuum". Since each
custom initialization commands is invoked in specified order, for
example we also can create primary keys *before* loading data. The
data-generation is doing on client side, so progress information for
initialization is still supported.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
> Attached patch

I'll look into it.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Mahahiko-san,

My 0.02€ about the patch & feature, which only reflect my point of view:

Please add a number to patches to avoid ambiguities. This was patch was 
the second sent on the thread.

There is no need to have both custom_init & init functions. I'll suggest 
to remove function "init", rename "custom_init" as "init", and make the 
option default to what is appropriate so that it initialize the schema as
expected, and there is only one initialization mechanism.

I would suggest to make initialization sub options (no-vacuum, 
foreign-key...) rely on updating the initialization operations instead of 
being maintained separately. Currently "--no-vacuum --custom-init=vacuum" 
would skip vacuum, which is quite debatable...

I'm not sure of the "custom-initialize" option name, but why not. I 
suggest to remove "is_initialize_suite", and make custom-initialize 
require -i as seems logical and upward compatible.

ISTM that there should be short names for the phases. Maybe using only one 
letter could simplify the code significantly, dropping commas and list, 
eg:  "t" for "create_table", "d" for "load_data", "p" for "create_pkey", 
"f" for "create_fkey", "v" for "vacuum".

I do not think that allowing a space in the list is a shell-wise idea.

I'm also wondering whether using a list is a good option, because it 
implies a large parse function, list management and so on. With the one 
letter version, it could be just a string to be scanned char by char for 
operations.

Otherwise, at least allow short two letter codes (eg: ct ld pk fk va), in 
order to avoid writing a very long thing on the command line, which is 
quite a pain:
  sh> pgbench --custom-initialize=create_table,load_data,primary_key,foreign_key,vacuum ...

vs
  sh> pgbench -i -I tdpfv ...

Maybe there could be short-hands for usual setups, eg "default" for "tdpv" 
or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...

Typo in doc "initailize" -> "initialize". Option values should be 
presented in their logical execution order, i.e. put vacuum at the end.

Typo in help "initilize" -> "initialize". I would suggest to drop the space in the
option value in the presentation so that quotes are not needed.

Remove the "no-primary-keys" from the long option array as it has 
disappeared. You might consider make "custom-initialize" take the 'I' 
short option.

Ranting unrelated to this patch: the automatic aid type switching based on 
scale is a bad idea (tm), because when trying to benchmark it means that 
changing the scale also changes the schema, and you really do not need 
that. ISTM that it should always use INT8.

-- 
Fabien.

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Aug 8, 2017 at 10:50 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Mahahiko-san,
>
> My 0.02€ about the patch & feature, which only reflect my point of view:

Thank you for reviewing the patch!

> Please add a number to patches to avoid ambiguities. This was patch was the
> second sent on the thread.
>
> There is no need to have both custom_init & init functions. I'll suggest to
> remove function "init", rename "custom_init" as "init", and make the option
> default to what is appropriate so that it initialize the schema as
> expected, and there is only one initialization mechanism.
>
> I would suggest to make initialization sub options (no-vacuum,
> foreign-key...) rely on updating the initialization operations instead of
> being maintained separately. Currently "--no-vacuum --custom-init=vacuum"
> would skip vacuum, which is quite debatable...
>
> I'm not sure of the "custom-initialize" option name, but why not. I suggest
> to remove "is_initialize_suite", and make custom-initialize require -i as
> seems logical and upward compatible.
>
> ISTM that there should be short names for the phases. Maybe using only one
> letter could simplify the code significantly, dropping commas and list, eg:
> "t" for "create_table", "d" for "load_data", "p" for "create_pkey", "f" for
> "create_fkey", "v" for "vacuum".
>
> I do not think that allowing a space in the list is a shell-wise idea.

I think we can use it like --custom-initialize="create_table, vacuum"
which is similar to what we specify a connection option to psql for
example. But it will be unnecessary if we have the one letter version.

> I'm also wondering whether using a list is a good option, because it implies
> a large parse function, list management and so on. With the one letter
> version, it could be just a string to be scanned char by char for
> operations.

I basically agree with the one letter version. But I'm concerned that
it'll confuse users if we have more initialization steps for the
pgbench initialization. If we add more various initialization steps it
makes pgbench command hard to read and the users might have to
remember these options.

>
> Otherwise, at least allow short two letter codes (eg: ct ld pk fk va), in
> order to avoid writing a very long thing on the command line, which is quite
> a pain:
>
>   sh> pgbench
> --custom-initialize=create_table,load_data,primary_key,foreign_key,vacuum
> ...
>
> vs
>
>   sh> pgbench -i -I tdpfv ...
>
> Maybe there could be short-hands for usual setups, eg "default" for "tdpv"
> or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...

If --custom-initialize option requires for i option to be set,
"pgbench -i" means the initialization with full steps and "pgbench -i
--custom-initialize=..." means the initialization with custom
operation steps.

> Typo in doc "initailize" -> "initialize". Option values should be presented
> in their logical execution order, i.e. put vacuum at the end.
>
> Typo in help "initilize" -> "initialize". I would suggest to drop the space
> in the
> option value in the presentation so that quotes are not needed.
>
> Remove the "no-primary-keys" from the long option array as it has
> disappeared. You might consider make "custom-initialize" take the 'I' short
> option.
>
> Ranting unrelated to this patch: the automatic aid type switching based on
> scale is a bad idea (tm), because when trying to benchmark it means that
> changing the scale also changes the schema, and you really do not need that.
> ISTM that it should always use INT8.

Hmm, I think it's a valid point. Should we allow users to specify like
the above thing in the custom initialization feature as well?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello,

> I think we can use it like --custom-initialize="create_table, vacuum"
> which is similar to what we specify a connection option to psql for
> example.

Even if it is allowed, do not advertise it. Or use space as a separator 
and not comma. ISTM that with psql connections space is the higher level 
separator, not an optional thing, and comma is used for lower level 
splitting: "host=foo port=5432,5433 ..."

> But it will be unnecessary if we have the one letter version.

Sure.

>> I'm also wondering whether using a list is a good option, because it implies
>> a large parse function, list management and so on. With the one letter
>> version, it could be just a string to be scanned char by char for
>> operations.
>
> I basically agree with the one letter version. But I'm concerned that
> it'll confuse users if we have more initialization steps for the
> pgbench initialization. If we add more various initialization steps it
> makes pgbench command hard to read and the users might have to
> remember these options.

I think that if we get to the point where so many initialization steps 
that people get confused, then adding long names will not be an issue:-)

In the mean time it only needs 5 values.

>> Maybe there could be short-hands for usual setups, eg "default" for "tdpv"
>> or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...
>
> If --custom-initialize option requires for i option to be set,
> "pgbench -i" means the initialization with full steps and "pgbench -i
> --custom-initialize=..." means the initialization with custom
> operation steps.

Sure. It does not preclude the default to have a name.

>> Remove the "no-primary-keys" from the long option array as it has
>> disappeared. You might consider make "custom-initialize" take the 'I' short
>> option.
>>
>> Ranting unrelated to this patch: the automatic aid type switching based on
>> scale is a bad idea (tm), because when trying to benchmark it means that
>> changing the scale also changes the schema, and you really do not need that.
>> ISTM that it should always use INT8.
>
> Hmm, I think it's a valid point. Should we allow users to specify like
> the above thing in the custom initialization feature as well?

I would be in favor of having an option to do a tpc-b conforming schema 
which would include that, but which would also change the default balance 
type which is not large enough per spec. Maybe it could be "T".

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Aug 15, 2017 at 2:43 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello,
>
>> I think we can use it like --custom-initialize="create_table, vacuum"
>> which is similar to what we specify a connection option to psql for
>> example.
>
>
> Even if it is allowed, do not advertise it. Or use space as a separator and
> not comma. ISTM that with psql connections space is the higher level
> separator, not an optional thing, and comma is used for lower level
> splitting: "host=foo port=5432,5433 ..."
>
>> But it will be unnecessary if we have the one letter version.
>
>
> Sure.
>
>>> I'm also wondering whether using a list is a good option, because it
>>> implies
>>> a large parse function, list management and so on. With the one letter
>>> version, it could be just a string to be scanned char by char for
>>> operations.
>>
>>
>> I basically agree with the one letter version. But I'm concerned that
>> it'll confuse users if we have more initialization steps for the
>> pgbench initialization. If we add more various initialization steps it
>> makes pgbench command hard to read and the users might have to
>> remember these options.
>
>
> I think that if we get to the point where so many initialization steps that
> people get confused, then adding long names will not be an issue:-)
>
> In the mean time it only needs 5 values.

Agreed.

>
>>> Maybe there could be short-hands for usual setups, eg "default" for
>>> "tdpv"
>>> or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...
>>
>>
>> If --custom-initialize option requires for i option to be set,
>> "pgbench -i" means the initialization with full steps and "pgbench -i
>> --custom-initialize=..." means the initialization with custom
>> operation steps.
>
>
> Sure. It does not preclude the default to have a name.
>
>>> Remove the "no-primary-keys" from the long option array as it has
>>> disappeared. You might consider make "custom-initialize" take the 'I'
>>> short
>>> option.
>>>
>>> Ranting unrelated to this patch: the automatic aid type switching based
>>> on
>>> scale is a bad idea (tm), because when trying to benchmark it means that
>>> changing the scale also changes the schema, and you really do not need
>>> that.
>>> ISTM that it should always use INT8.
>>
>>
>> Hmm, I think it's a valid point. Should we allow users to specify like
>> the above thing in the custom initialization feature as well?
>
>
> I would be in favor of having an option to do a tpc-b conforming schema
> which would include that, but which would also change the default balance
> type which is not large enough per spec. Maybe it could be "T".
>

Yeah, once custom initialization patch get committed we can extend it.

Attached updated patch. I've incorporated the all comments from Fabien
to it, and changed it to single letter version.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

> Yeah, once custom initialization patch get committed we can extend it.
>
> Attached updated patch. I've incorporated the all comments from Fabien
> to it, and changed it to single letter version.

Patch applies and works.

A few comments and questions about the code and documentation:

Why not allow -I as a short option for --custom-initialize?

I do not think that the --custom-initialize option needs to appear as a 
specific synopsis in the documentation, as it is now a sub-option of -i.

checkCustomCmds: I would suggest to simplify test code with strchr
and to merge the two fprintf into one, something like:
  if (strchr("tdpfv", *cmd) == NULL) {     fprintf(stderr,            "....\n"        "....\n",        ...);     ...

Moreover there is already an error message later if checkCustomCmds fails, I think
it could be expanded and the two-line one in the function dropped 
entirely? It seems strange to have two level error messages for that.

Help message looks strange. I suggest something regexp-like:
 --custom-initialize=[tdvpf]+

I would suggest to put the various init* functions in a more logical order:
first create table, then load data, etc.

In case 0: do not exchange unlogged_tables & foreign_keys gratuiously.

After checking the initial code, I understand that the previous default 
was "tdvp", but you put "tdpv". I'm unsure whether vaccuum would do 
something to the indexes and that would make more sense. In doubt, I 
suggest to keep the previous default.

Maybe --foreign-keys should really do "tdvpf"?

I may be okay with disallowing --foreign-keys and --no-vacuum if --custom-init is used,
but then there is no need to test it again in init... I think that in any case 'f' and 'v'
should always trigger the corresponding initializations.

On the other hand, I think that it could be more pragmatic with these 
options, i.e. --foreign-keys could just append 'f' to the current command 
if not already there, and '--no-vacuum' could remove 'v' if there, on the 
fly, so that nothing would be banned. This would require to always have a 
malloced custom_init string. It would allow to remove the "foreign_keys" 
global variable. "is_no_vacuum" is probably still needed for benchmarking.
This way there would be no constraints and "is_custom_init" could be 
dropped as well.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Aug 16, 2017 at 4:55 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> Yeah, once custom initialization patch get committed we can extend it.
>>
>> Attached updated patch. I've incorporated the all comments from Fabien
>> to it, and changed it to single letter version.
>
>
> Patch applies and works.
>
> A few comments and questions about the code and documentation:

Thank you for the comments!

>
> Why not allow -I as a short option for --custom-initialize?

Other options for similar purpose such as --foreign-keys also have
only a long option. Since I think --custom-initialize option is in the
same context as other options I didn't add short option to it for now.
Because the options name is found by a prefix searching we can use a
short name --cu for now.

> I do not think that the --custom-initialize option needs to appear as a
> specific synopsis in the documentation, as it is now a sub-option of -i.
>
> checkCustomCmds: I would suggest to simplify test code with strchr
> and to merge the two fprintf into one, something like:
>
>   if (strchr("tdpfv", *cmd) == NULL) {
>      fprintf(stderr,
>             "....\n"
>             "....\n",
>             ...);
>      ...
>
> Moreover there is already an error message later if checkCustomCmds fails, I
> think
> it could be expanded and the two-line one in the function dropped entirely?
> It seems strange to have two level error messages for that.
>
> Help message looks strange. I suggest something regexp-like:
>
>  --custom-initialize=[tdvpf]+
>
> I would suggest to put the various init* functions in a more logical order:
> first create table, then load data, etc.
>
> In case 0: do not exchange unlogged_tables & foreign_keys gratuiously.
>
> After checking the initial code, I understand that the previous default was
> "tdvp", but you put "tdpv". I'm unsure whether vaccuum would do something to
> the indexes and that would make more sense. In doubt, I suggest to keep the
> previous default.
>
> Maybe --foreign-keys should really do "tdvpf"?
>
> I may be okay with disallowing --foreign-keys and --no-vacuum if
> --custom-init is used,
> but then there is no need to test it again in init... I think that in any
> case 'f' and 'v'
> should always trigger the corresponding initializations.
>
> On the other hand, I think that it could be more pragmatic with these
> options, i.e. --foreign-keys could just append 'f' to the current command if
> not already there, and '--no-vacuum' could remove 'v' if there, on the fly,
> so that nothing would be banned. This would require to always have a
> malloced custom_init string. It would allow to remove the "foreign_keys"
> global variable. "is_no_vacuum" is probably still needed for benchmarking.
> This way there would be no constraints and "is_custom_init" could be dropped
> as well.

I'm inclined to remove the restriction so that we can specify
--foreign-keys, --no-vacuum and --custom-initialize at the same time.
I think a list of char would be better here rather than a single
malloced string to remove particular initialization step easily.
Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> Why not allow -I as a short option for --custom-initialize?
>
> Other options for similar purpose such as --foreign-keys also have
> only a long option. Since I think --custom-initialize option is in the
> same context as other options I didn't add short option to it for now.
> Because the options name is found by a prefix searching we can use a
> short name --cu for now.

Hmmm. I like short options:-)

> I'm inclined to remove the restriction so that we can specify
> --foreign-keys, --no-vacuum and --custom-initialize at the same time.

Ok. I favor that as well.

> I think a list of char would be better here rather than a single
> malloced string to remove particular initialization step easily.
> Thought?

My thought is not to bother with a list of char.

To remove a char from a string, I suggest to allow ' ' to stand for 
nothing and be skipped, so that substituting a letter by space would 
simply remove an initialization phase.

For adding, realloc & append.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Aug 22, 2017 at 5:15 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> Why not allow -I as a short option for --custom-initialize?
>>
>>
>> Other options for similar purpose such as --foreign-keys also have
>> only a long option. Since I think --custom-initialize option is in the
>> same context as other options I didn't add short option to it for now.
>> Because the options name is found by a prefix searching we can use a
>> short name --cu for now.
>
>
> Hmmm. I like short options:-)

Okay, I added -I option for custom initialization :)

>
>> I'm inclined to remove the restriction so that we can specify
>> --foreign-keys, --no-vacuum and --custom-initialize at the same time.
>
>
> Ok. I favor that as well.

If building foreign keys command is not specified in -I but
--foreign-keys options is specified (e.g. pgbench -i -I tpd
--foreign-keys) I think we can add 'f' to the end of the
initialization commands.

>
>> I think a list of char would be better here rather than a single
>> malloced string to remove particular initialization step easily.
>> Thought?
>
>
> My thought is not to bother with a list of char.
>
> To remove a char from a string, I suggest to allow ' ' to stand for nothing
> and be skipped, so that substituting a letter by space would simply remove
> an initialization phase.

I think we should not add/remove a command of initialization commands
during parsing pgbench options in order to not depend on its order.
Therefore, if -I,  --foreign-keys and --no-vacuum are specified at the
same time, what we do is removing some 'v' commands if novacuum and
adding a 'f' command if foreignkey. Also we expect that the length of
initialization steps would not long. Using malloced string would less
the work. Ok, I changed the patch so.

Attached latest v4 patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

> Attached latest v4 patch. Please review it.

Patch applies, compiles.

The messages/options do not seem to work properly:
 sh> ./pgbench -i -I t done.

Does not seem to have initialized the tables although it was requested...
 sh> ./pgbench -i -I d creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.08 s, remaining 0.00 s) done.

It seems that "d" triggered table creation... In fact it seems that the
work is done correctly, but the messages are not in the right place.

Also another issue:
 sh> ./pgbench -i --foreign-keys creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00
s)vacuum... set primary keys... done.
 

Foreign keys do not seem to have been set... Please check that all really 
work as expected.


About the documentation:

If a native English speaker could review the text, it would be great.

At least: "Required to invoke" -> "Require to invoke".


About the code:

is_no_vacuum should be a bool?

I'm really hesitating about the out of order processing of options. If the user writes
  sh> pgbench -i --no-vacuum -I v  done.

Then does it make sense to ignore the last thing the user asked for? ISTM 
that processing options in order and keeping the last resulting spec is 
more natural. Appending contradictory options can happen easily when 
scripting, and usual what is meant is the last one.

Again, as pointed out in the previous review, I do not like much 
checkCustomCmds implementation: switch/case, fprintf and return on error 
which will trigger another fprintf and error above... ISTM that you should 
either take into account previous comments or explain why you disagree 
with them, but not send the same code without addressing them in any way.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Quick precision to my previous review.

> sh> ./pgbench -i -I t
> done.

There should be "creating tables..."

> Does not seem to have initialized the tables although it was requested...
>
> sh> ./pgbench -i -I d
> creating tables...

Probably "filling tables..." would be more appropriate.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Sun, Aug 27, 2017 at 5:12 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> Attached latest v4 patch. Please review it.

Thank you for reviewing this patch!

>
> Patch applies, compiles.
>
> The messages/options do not seem to work properly:
>
>  sh> ./pgbench -i -I t
>  done.

Fixed this so that it ouptut "creating tables..." as you pointed out.

> Does not seem to have initialized the tables although it was requested...
>
>  sh> ./pgbench -i -I d
>  creating tables...
>  100000 of 100000 tuples (100%) done (elapsed 0.08 s, remaining 0.00 s)
>  done.
>
> It seems that "d" triggered table creation... In fact it seems that the
> work is done correctly, but the messages are not in the right place.

Fixed, but I just removed "creating tables..." from -I d command. I
think it's not good if we change the output messages by this patch.

> Also another issue:
>
>  sh> ./pgbench -i --foreign-keys
>  creating tables...
>  100000 of 100000 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
>  vacuum...
>  set primary keys...
>  done.
>
> Foreign keys do not seem to have been set... Please check that all really
> work as expected.

Fixed.

>
> About the documentation:
>
> If a native English speaker could review the text, it would be great.
>
> At least: "Required to invoke" -> "Require to invoke".

Fixed.

>
>
> About the code:
>
> is_no_vacuum should be a bool?

We can change it but I think there is no difference actually. So
keeping it would be better.

>
> I'm really hesitating about the out of order processing of options. If the
> user writes
>
>   sh> pgbench -i --no-vacuum -I v
>   done.
>
> Then does it make sense to ignore the last thing the user asked for? ISTM
> that processing options in order and keeping the last resulting spec is more
> natural. Appending contradictory options can happen easily when scripting,
> and usual what is meant is the last one.

Agreed. I changed it so that it processes options in order and keeps
the last resulting spec.

>
> Again, as pointed out in the previous review, I do not like much
> checkCustomCmds implementation: switch/case, fprintf and return on error
> which will trigger another fprintf and error above... ISTM that you should
> either take into account previous comments or explain why you disagree with
> them, but not send the same code without addressing them in any way.

Sorry, I didn't mean to ignore, I'd just missed the comment. Fixed it.

Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

Patch applies cleanly, compiles, works for me.

>> At least: "Required to invoke" -> "Require to invoke".
>
> Fixed.

I meant the added one about -I, not the pre-existing one about -i.

>> About the code:
>>
>> is_no_vacuum should be a bool?
>
> We can change it but I think there is no difference actually. So
> keeping it would be better.

I would like to insist a little bit: the name was declared as an int but 
passed to init and used as a bool there before the patch. Conceptually 
what is meant is really a bool, and I see no added value bar saving a few 
strokes to have an int. ISTM that recent pgbench changes have started 
turning old int-for-bool habits into using bool when bool is meant.

initialize_cmds initialization: rather use pg_strdup instead of 
pg_malloc/strcpy?

-I: pg_free before pg_strdup to avoid a small memory leak?

I'm not sure I would have bothered with sizeof(char), but why not!

I'm still a little bit annoyed by checkCustomCmds: when unhappy, it shows 
an error message and return false, which immediatly results in exit(1). 
However the pattern elsewhere in pgbench is to show the error and exit 
immediatly. I would suggest to simplify by void-ing the function and 
exiting instead of returning false.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Mon, Aug 28, 2017 at 4:59 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> Patch applies cleanly, compiles, works for me.

Thank you for reviewing!

>
>>> At least: "Required to invoke" -> "Require to invoke".
>>
>>
>> Fixed.
>
>
> I meant the added one about -I, not the pre-existing one about -i.

Fixed.

>>> About the code:
>>>
>>> is_no_vacuum should be a bool?
>>
>>
>> We can change it but I think there is no difference actually. So
>> keeping it would be better.
>
>
> I would like to insist a little bit: the name was declared as an int but
> passed to init and used as a bool there before the patch. Conceptually what
> is meant is really a bool, and I see no added value bar saving a few strokes
> to have an int. ISTM that recent pgbench changes have started turning old
> int-for-bool habits into using bool when bool is meant.

Since is_no_vacuum is a existing one, if we follow the habit we should
change other similar variables as well: is_init_mode,
do_vacuum_accounts and debug. And I think we should change them in a
separated patch.

>
> initialize_cmds initialization: rather use pg_strdup instead of
> pg_malloc/strcpy?

Fixed.

> -I: pg_free before pg_strdup to avoid a small memory leak?

Fixed.

> I'm not sure I would have bothered with sizeof(char), but why not!
>
> I'm still a little bit annoyed by checkCustomCmds: when unhappy, it shows an
> error message and return false, which immediatly results in exit(1). However
> the pattern elsewhere in pgbench is to show the error and exit immediatly. I
> would suggest to simplify by void-ing the function and exiting instead of
> returning false.

Agreed, fixed.

After more thought, I'm bit inclined to not have a short option for
--custom-initialize because this option will be used for not primary
cases. It would be better to save short options for future
enhancements of pgbench. Thought?

Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello,

Patch applies and works.

>> I would like to insist a little bit: the name was declared as an int but
>> passed to init and used as a bool there before the patch. Conceptually what
>> is meant is really a bool, and I see no added value bar saving a few strokes
>> to have an int. ISTM that recent pgbench changes have started turning old
>> int-for-bool habits into using bool when bool is meant.
>
> Since is_no_vacuum is a existing one, if we follow the habit we should 
> change other similar variables as well: is_init_mode, do_vacuum_accounts 
> and debug. And I think we should change them in a separated patch.

Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in 
"main") and as bool (in "init"), called by main (yuk!). I see no reason to 
choose the bad one for the global:-)

> After more thought, I'm bit inclined to not have a short option for
> --custom-initialize because this option will be used for not primary
> cases. It would be better to save short options for future
> enhancements of pgbench. Thought?

I like it as is, especially as now the associated value is a simple and 
short string, I think that it makes sense to have a simple and short 
option to trigger it. Moreover -I stands cleanly for "initialization", and 
the capital stands for something a little special which it is. Its to good 
to miss.

I think that the "-I" it should be added to the "--help" line, as it is 
done with other short & long options.

Repeating "-I f" results in multiple foreign key constraints:
  Foreign-key constraints:    "pgbench_tellers_bid_fkey" FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)
"pgbench_tellers_bid_fkey1"FOREIGN KEY (bid) REFERENCES pgbench_branches(bid)    "pgbench_tellers_bid_fkey2" FOREIGN
KEY(bid) REFERENCES pgbench_branches(bid)    "pgbench_tellers_bid_fkey3" FOREIGN KEY (bid) REFERENCES
pgbench_branches(bid)

I wonder if this could be avoided easily? Maybe by setting the constraint 
name explicitely so that the second one fails on the existing one, which 
is fine, like for primary keys? Or adding a DROP CONSTRAINT IF EXISTS 
before the CREATE CONSTRAINT, like for tables? Or doing nothing about it? 
I would prefer the first option.

Maybe the initial cleanup (DROP TABLE) could be made an option added to 
the default, so that cleaning up the database could be achieved with some 
"pgbench -i -I c", instead of connecting and droping the tables one by one 
which I have done quite a few times... What do you think?

Before it is definitely engraved, I'm thinking about the letters:
 c - cleanup t - create table d - data p - primary key f - foreign key v - vacuum

I think it is mostly okay, but it is the last time to think about it. 
Using "d" for cleanup (drop) would mean finding another letter for filling 
in data... maybe "g" for data generation? "c" may have been chosen for 
"create table", but then would not be available for "cleanup". Thoughts?

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Aug 29, 2017 at 4:47 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello,
>
> Patch applies and works.
>
>>> I would like to insist a little bit: the name was declared as an int but
>>> passed to init and used as a bool there before the patch. Conceptually
>>> what
>>> is meant is really a bool, and I see no added value bar saving a few
>>> strokes
>>> to have an int. ISTM that recent pgbench changes have started turning old
>>> int-for-bool habits into using bool when bool is meant.
>>
>>
>> Since is_no_vacuum is a existing one, if we follow the habit we should
>> change other similar variables as well: is_init_mode, do_vacuum_accounts and
>> debug. And I think we should change them in a separated patch.
>
>
> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
> "main") and as bool (in "init"), called by main (yuk!). I see no reason to
> choose the bad one for the global:-)
>

Yeah, I think this might be a good timing to re-consider int-for-bool
habits in pgbench. If we decided to change is_no_vacuum to bool I want
to change other similar variables as well.

>> After more thought, I'm bit inclined to not have a short option for
>> --custom-initialize because this option will be used for not primary
>> cases. It would be better to save short options for future
>> enhancements of pgbench. Thought?
>
>
> I like it as is, especially as now the associated value is a simple and
> short string, I think that it makes sense to have a simple and short option
> to trigger it. Moreover -I stands cleanly for "initialization", and the
> capital stands for something a little special which it is. Its to good to
> miss.
>
> I think that the "-I" it should be added to the "--help" line, as it is done
> with other short & long options.

Okay, I'll leave it as of now. Maybe we can discuss later.

>
> Repeating "-I f" results in multiple foreign key constraints:
>
>   Foreign-key constraints:
>     "pgbench_tellers_bid_fkey" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
>     "pgbench_tellers_bid_fkey1" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
>     "pgbench_tellers_bid_fkey2" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
>     "pgbench_tellers_bid_fkey3" FOREIGN KEY (bid) REFERENCES
> pgbench_branches(bid)
>
> I wonder if this could be avoided easily? Maybe by setting the constraint
> name explicitely so that the second one fails on the existing one, which is
> fine, like for primary keys? Or adding a DROP CONSTRAINT IF EXISTS before
> the CREATE CONSTRAINT, like for tables? Or doing nothing about it? I would
> prefer the first option.

Good point, I agree with first option.

>
> Maybe the initial cleanup (DROP TABLE) could be made an option added to the
> default, so that cleaning up the database could be achieved with some
> "pgbench -i -I c", instead of connecting and droping the tables one by one
> which I have done quite a few times... What do you think?

Yeah, I sometimes wanted that. Having the cleaning up tables option
would be good idea.

> Before it is definitely engraved, I'm thinking about the letters:
>
>  c - cleanup
>  t - create table
>  d - data
>  p - primary key
>  f - foreign key
>  v - vacuum
>
> I think it is mostly okay, but it is the last time to think about it. Using
> "d" for cleanup (drop) would mean finding another letter for filling in
> data... maybe "g" for data generation? "c" may have been chosen for "create
> table", but then would not be available for "cleanup". Thoughts?

I'd say "g" for data generation would be better. Also, I'm inclined to
add a command for the unlogged tables. How about this?

c - cleanup
t - create table
u - create unlogged table
g - data generation
p - primary key
f - foreign key
v - vacuum

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello,

>> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
>> "main") and as bool (in "init"), called by main (yuk!). I see no reason to
>> choose the bad one for the global:-)
>
> Yeah, I think this might be a good timing to re-consider int-for-bool
> habits in pgbench. If we decided to change is_no_vacuum to bool I want
> to change other similar variables as well.

Franckly I would be fine with that, but committers might get touchy about 
"unrelated changes" in the patch... The "is_no_vacuum" is related to the 
patch and is already a bool -- if you chose the "init" definition as a 
reference -- so it is okay to bool it.

>> I think that the "-I" it should be added to the "--help" line, as it is done
>> with other short & long options.
>
> Okay, I'll leave it as of now. Maybe we can discuss later.

Maybe we did not understand one another. I'm just suggesting to insert
-I in the help line, that is change:
   "  --custom-initialize=[...]+\n"

to
   "  -I, --custom-initialize=[...]+\n"

I'm not sure it deserves to be discussed in depth later:-)

>> I wonder if this could be avoided easily? Maybe by setting the constraint
>> name explicitely so that the second one fails on the existing one, which is
>> fine, like for primary keys? [...]
>
> Good point, I agree with first option.

Ok.

>> Maybe the initial cleanup (DROP TABLE) could be made an option added to the
>> default, so that cleaning up the database could be achieved with some
>> "pgbench -i -I c", instead of connecting and droping the tables one by one
>> which I have done quite a few times... What do you think?
>
> Yeah, I sometimes wanted that. Having the cleaning up tables option
> would be good idea.

Ok.

> I'd say "g" for data generation would be better. Also, I'm inclined to
> add a command for the unlogged tables. How about this?
>
> c - [c]leanup / or [d]rop tables
> t - create table / [t]able creation or [c]reate table
> u - create unlogged table
> g - data generation / [g]enerate data
> p - [p]rimary key
> f - [f]oreign key
> v - [v]acuum

I'm okay with that. I also put an alternative with d/c above, without 
any preference from my part.

I'm not sure about "u", though. Unlogged, like tablespace, is an 
orthogonal option: other table creation options (I intend to submit one 
which conforms to the TPC-B standard, that is use an INT8 balance as INT4 
is not wide enough per spec, and always use an INT8 aid) may be also 
unlogged or tablespaced. So that would mean having two ways to trigger 
them... thus I would avoid it and keep only --unlogged.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Aug 30, 2017 at 3:39 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello,
>
>>> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
>>> "main") and as bool (in "init"), called by main (yuk!). I see no reason
>>> to
>>> choose the bad one for the global:-)
>>
>>
>> Yeah, I think this might be a good timing to re-consider int-for-bool
>> habits in pgbench. If we decided to change is_no_vacuum to bool I want
>> to change other similar variables as well.
>
>
> Franckly I would be fine with that, but committers might get touchy about
> "unrelated changes" in the patch... The "is_no_vacuum" is related to the
> patch and is already a bool -- if you chose the "init" definition as a
> reference -- so it is okay to bool it.

Okay, I changed only is_no_vacuum in this patch and other similar
variables would be changed in another patch.

>
>>> I think that the "-I" it should be added to the "--help" line, as it is
>>> done
>>> with other short & long options.
>>
>>
>> Okay, I'll leave it as of now. Maybe we can discuss later.
>
>
> Maybe we did not understand one another. I'm just suggesting to insert
> -I in the help line, that is change:
>
>    "  --custom-initialize=[...]+\n"
>
> to
>
>    "  -I, --custom-initialize=[...]+\n"

Fixed.

> I'm not sure it deserves to be discussed in depth later:-)

Sorry, I meant about having short option --custom-initialize.

>
>>> I wonder if this could be avoided easily? Maybe by setting the constraint
>>> name explicitely so that the second one fails on the existing one, which
>>> is
>>> fine, like for primary keys? [...]
>>
>>
>> Good point, I agree with first option.
>
>
> Ok.
>
>>> Maybe the initial cleanup (DROP TABLE) could be made an option added to
>>> the
>>> default, so that cleaning up the database could be achieved with some
>>> "pgbench -i -I c", instead of connecting and droping the tables one by
>>> one
>>> which I have done quite a few times... What do you think?
>>
>>
>> Yeah, I sometimes wanted that. Having the cleaning up tables option
>> would be good idea.
>
>
> Ok.
>
>> I'd say "g" for data generation would be better. Also, I'm inclined to
>> add a command for the unlogged tables. How about this?
>>
>> c - [c]leanup / or [d]rop tables
>> t - create table / [t]able creation or [c]reate table
>> u - create unlogged table
>> g - data generation / [g]enerate data
>> p - [p]rimary key
>> f - [f]oreign key
>> v - [v]acuum
>
>
> I'm okay with that. I also put an alternative with d/c above, without any
> preference from my part.

Personally I prefer "t" for table creation because "c" for create is a
generic word. We might want to have another initialization command
that creates something.

>
> I'm not sure about "u", though. Unlogged, like tablespace, is an orthogonal
> option: other table creation options (I intend to submit one which conforms
> to the TPC-B standard, that is use an INT8 balance as INT4 is not wide
> enough per spec, and always use an INT8 aid) may be also unlogged or
> tablespaced. So that would mean having two ways to trigger them... thus I
> would avoid it and keep only --unlogged.
>

Yeah, I think I had misunderstood it. -I option is for specifying some
particular initialization steps. So we don't need to have a command as
a option for other initializatoin commands.

Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

> [...] Personally I prefer "t" for table creation because "c" for create 
> is a generic word. We might want to have another initialization command 
> that creates something.

Ok, good point.


About the patch: applies, compiles, works for me. A few minor comments.

While re-reading the documentation, I think that it should be "Set custom 
initialization steps". It could be "Require ..." when -I implied -i, but 
since -i is still required the sentence does not seem to apply as such.

"Destroying any existing tables: ..." -> "Destroy existing pgbench tables: 
...".

I would suggest to add short expanded explanations in the term definition, 
next to the triggering letter, to underline the mnemonic. Something like:
   c (cleanup)   t (table creation)   g (generate data)   v (vacuum)   p (primary key)   f (foreign key)

Also update the error message in checkCustomCommands to "ctgvpf".

Cleanup should have a message when it is executed. I suggest "cleaning 
up...".

Maybe add a comment in front of the array tables to say that the order is 
important, something like "tables in reverse foreign key dependencies 
order"?

case 'I': ISTM that initialize_cmds is necessarily already allocated, thus 
I would not bother to test before pg_free.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Thu, Aug 31, 2017 at 4:35 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> [...] Personally I prefer "t" for table creation because "c" for create is
>> a generic word. We might want to have another initialization command that
>> creates something.
>
>
> Ok, good point.
>
>
> About the patch: applies, compiles, works for me. A few minor comments.

Thank you for dedicated reviewing this patch!

> While re-reading the documentation, I think that it should be "Set custom
> initialization steps". It could be "Require ..." when -I implied -i, but
> since -i is still required the sentence does not seem to apply as such.
>
> "Destroying any existing tables: ..." -> "Destroy existing pgbench tables:
> ...".

Fixed.

> I would suggest to add short expanded explanations in the term definition,
> next to the triggering letter, to underline the mnemonic. Something like:
>
>    c (cleanup)
>    t (table creation)
>    g (generate data)
>    v (vacuum)
>    p (primary key)
>    f (foreign key)

Nice idea, agreed.

> Also update the error message in checkCustomCommands to "ctgvpf".

Fixed.

> Cleanup should have a message when it is executed. I suggest "cleaning
> up...".

Fixed.

> Maybe add a comment in front of the array tables to say that the order is
> important, something like "tables in reverse foreign key dependencies
> order"?

Fixed.

>
> case 'I': ISTM that initialize_cmds is necessarily already allocated, thus I
> would not bother to test before pg_free.

Agreed, fixed.


Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

Patch applies and compiles.

One bug found, and some minor points again. Sorry for this hopefully last 
iteration... I'm kind of an iterative person...

I've generated the doc to look a it.

Short option "-I" does not use a "=", it should be "-I 
custom_init_commands".

Also maybe it would look nicer and clearer if the short mnemonic was 
outside the literal, that is with:
  <term><literal>c</> (cleanup)</>

instead of:
  <term><literal>c (cleanup)</></>

But this is debatable. Do it the way you think is best.

Command "g" does not work after "f", something I had not tested before:
 ./pgbench -i -I ctvpfg cleaning up... creating tables... vacuum... set primary keys... set foreign keys... ERROR:
cannottruncate a table referenced in a foreign key constraint DETAIL:  Table "pgbench_history" references
"pgbench_accounts".HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE ... CASCADE.
 

I think it should work. It probably just mean to TRUNCATE all tables as 
one command, or add the suggested CASCADE. I would favor the first option.

I'm wondering whether this truncation should be yet another available 
command? Hmmm... maybe not.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Fri, Sep 1, 2017 at 4:42 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> Patch applies and compiles.
>
> One bug found, and some minor points again. Sorry for this hopefully last
> iteration... I'm kind of an iterative person...
>
> I've generated the doc to look a it.
>
> Short option "-I" does not use a "=", it should be "-I
> custom_init_commands".
>
> Also maybe it would look nicer and clearer if the short mnemonic was outside
> the literal, that is with:
>
>   <term><literal>c</> (cleanup)</>
>
> instead of:
>
>   <term><literal>c (cleanup)</></>
>
> But this is debatable. Do it the way you think is best.
>
> Command "g" does not work after "f", something I had not tested before:
>
>  ./pgbench -i -I ctvpfg
>  cleaning up...
>  creating tables...
>  vacuum...
>  set primary keys...
>  set foreign keys...
>  ERROR:  cannot truncate a table referenced in a foreign key constraint
>  DETAIL:  Table "pgbench_history" references "pgbench_accounts".
>  HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE
> ... CASCADE.
>
> I think it should work. It probably just mean to TRUNCATE all tables as one
> command, or add the suggested CASCADE. I would favor the first option.
>
> I'm wondering whether this truncation should be yet another available
> command? Hmmm... maybe not.

Currently TRUNCATE pgbench_accounts command is executed within a
transaction started immediately before it. If we move it out of the
transaction, the table data will be truncated even if the copying data
failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
instead. Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> I'm wondering whether this truncation should be yet another available
>> command? Hmmm... maybe not.
>
> Currently TRUNCATE pgbench_accounts command is executed within a
> transaction started immediately before it. If we move it out of the
> transaction, the table data will be truncated even if the copying data
> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
> instead. Thought?

Keep the truncate in the transaction, and truncate both (or all?) tables 
together.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Fri, Sep 1, 2017 at 11:29 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> I'm wondering whether this truncation should be yet another available
>>> command? Hmmm... maybe not.
>>
>>
>> Currently TRUNCATE pgbench_accounts command is executed within a
>> transaction started immediately before it. If we move it out of the
>> transaction, the table data will be truncated even if the copying data
>> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
>> instead. Thought?
>
>
> Keep the truncate in the transaction, and truncate both (or all?) tables
> together.

Attached latest patch incorporated the comments I got so far. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

>>> Currently TRUNCATE pgbench_accounts command is executed within a
>>> transaction started immediately before it. If we move it out of the
>>> transaction, the table data will be truncated even if the copying data
>>> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
>>> instead. Thought?
>>
>>
>> Keep the truncate in the transaction, and truncate both (or all?) tables
>> together.
>
> Attached latest patch incorporated the comments I got so far. Please review it.

"g" does not work for me yet when after "f", only the message is slightly 
different, it chokes on another dependency, branches instead of accounts.
  sh> ./pgbench -i -I ctpfg  cleaning up...  creating tables...  set primary keys...  set foreign keys...  ERROR:
cannottruncate a table referenced in a foreign key constraint  DETAIL:  Table "pgbench_history" references
"pgbench_branches". HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE ... CASCADE.
 

I think that the whole data generation should be in *one* transaction 
which starts with "truncate pgbench_history, pgbench_branches, 
pgbench_tellers, pgbench_accounts;"

In passing, I think that the documentation should tell explicitely what 
the default value is (aka "ctgvp").

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Sep 5, 2017 at 2:33 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>>> Currently TRUNCATE pgbench_accounts command is executed within a
>>>> transaction started immediately before it. If we move it out of the
>>>> transaction, the table data will be truncated even if the copying data
>>>> failed. Maybe we can do TRUNCATE pgbench_accounts, pgbench_history
>>>> instead. Thought?
>>>
>>>
>>>
>>> Keep the truncate in the transaction, and truncate both (or all?) tables
>>> together.
>>
>>
>> Attached latest patch incorporated the comments I got so far. Please
>> review it.
>
>
> "g" does not work for me yet when after "f", only the message is slightly
> different, it chokes on another dependency, branches instead of accounts.
>
>   sh> ./pgbench -i -I ctpfg
>   cleaning up...
>   creating tables...
>   set primary keys...
>   set foreign keys...
>   ERROR:  cannot truncate a table referenced in a foreign key constraint
>   DETAIL:  Table "pgbench_history" references "pgbench_branches".
>   HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE
> ... CASCADE.

Sorry, I'd missed something.

> I think that the whole data generation should be in *one* transaction which
> starts with "truncate pgbench_history, pgbench_branches, pgbench_tellers,
> pgbench_accounts;"

Agreed, and fixed.

>
> In passing, I think that the documentation should tell explicitely what the
> default value is (aka "ctgvp").

Agreed.

Attached the latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
> Attached the latest patch. Please review it.

Patch applies and compiles cleanly.

Three very minor points:

Typo added in the documentation: ".Each" -> ". Each".

In "case 8:" there is a very long line which lacks a newline before 
pg_realloc second argument.

I think that the check should silently accept spaces as they are ignored 
by init later, i.e.:
  sh> ./pgbench -i -I "ctg vpf"  invalid custom initialization script command " "  possible commands are: "c", "t",
"g","v", "p", "f"
 

Could just work...

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Sep 5, 2017 at 4:06 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> Attached the latest patch. Please review it.
>
>
> Patch applies and compiles cleanly.
>
> Three very minor points:
>
> Typo added in the documentation: ".Each" -> ". Each".
>
> In "case 8:" there is a very long line which lacks a newline before
> pg_realloc second argument.

Sorry, I don't follow that. You meant I should add a newline before
pg_realloc()? That is,

+                        initialize_cmds =
+                            (char *) pg_realloc(initialize_cmds,
+                                                sizeof(char) * n_cmds + 1);

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
> Sorry, I don't follow that. You meant I should add a newline before
> pg_realloc()? That is,
>
> +                        initialize_cmds =
> +                            (char *) pg_realloc(initialize_cmds,
> +                                                sizeof(char) * n_cmds + 1);

Yes. Or maybe my terminal was doing tricks, because I had the impression 
that both argument where on the same line with many tabs in between, but 
maybe I just misinterpreted the diff file. My apology if it is the case.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Sep 6, 2017 at 12:11 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> Sorry, I don't follow that. You meant I should add a newline before
>> pg_realloc()? That is,
>>
>> +                        initialize_cmds =
>> +                            (char *) pg_realloc(initialize_cmds,
>> +                                                sizeof(char) * n_cmds +
>> 1);
>
>
> Yes. Or maybe my terminal was doing tricks, because I had the impression
> that both argument where on the same line with many tabs in between, but
> maybe I just misinterpreted the diff file. My apology if it is the case.
>

I understood. It looks ugly in the patch but can be applied properly
by git apply command. Attached latest patch incorporated the comments
I got so far.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Applies, compiles, works for me.

Very very minor comments that I should have noticed before, sorry for this 
additional round trip.

In the help line, move -I just after -i, to put short options in 
alphabetical and decreasing importance order. On this line, also add the 
information about the default, something like:
  -i, --ini...       ....  -I, --custom....=[...]+  (default "ctgvp")                     ...

When/if the pgbench tap test patch get through, the feature should be 
tested there as well. No action needed now.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Sep 6, 2017 at 4:01 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Applies, compiles, works for me.
>
> Very very minor comments that I should have noticed before, sorry for this
> additional round trip.

Thank you for the dedicated review!

>
> In the help line, move -I just after -i, to put short options in
> alphabetical and decreasing importance order. On this line, also add the
> information about the default, something like:
>
>   -i, --ini...       ....
>   -I, --custom....=[...]+  (default "ctgvp")
>                      ...

Agreed. Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> Very very minor comments that I should have noticed before, sorry for this
>> additional round trip.
>
> Thank you for the dedicated review!

I'm someone at times pigheaded, I think in the good sense if it is 
possible, and I like to finish what I start:-)

Patch applies, compiles, works, everything is fine from my point of view.

I switched it to "Ready for Committer".

Again, if the pgbench tap test patch get through, it should be tap tested.

-- 
Fabien.



Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Thu, Sep 7, 2017 at 4:15 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> Very very minor comments that I should have noticed before, sorry for
>>> this
>>> additional round trip.
>>
>>
>> Thank you for the dedicated review!
>
>
> I'm someone at times pigheaded, I think in the good sense if it is possible,
> and I like to finish what I start:-)
>
> Patch applies, compiles, works, everything is fine from my point of view.
>
> I switched it to "Ready for Committer".

Thanks.

>
> Again, if the pgbench tap test patch get through, it should be tap tested.
>

Thank you for the remainder, I'll add tap tests once the patch got committed.

-- 
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Fri, Sep 8, 2017 at 9:52 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Sep 7, 2017 at 4:15 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>
>>>> Very very minor comments that I should have noticed before, sorry for
>>>> this
>>>> additional round trip.
>>>
>>>
>>> Thank you for the dedicated review!
>>
>>
>> I'm someone at times pigheaded, I think in the good sense if it is possible,
>> and I like to finish what I start:-)
>>
>> Patch applies, compiles, works, everything is fine from my point of view.
>>
>> I switched it to "Ready for Committer".
>
> Thanks.
>
>>
>> Again, if the pgbench tap test patch get through, it should be tap tested.
>>
>
> Thank you for the remainder, I'll add tap tests once the patch got committed.
>

Attached the latest version patch incorporated the tap tests.
Please review it.

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

> Attached the latest version patch incorporated the tap tests.
> Please review it.

Patch applies, compilation & make check ok.

Tests are simple and provide good coverage of new functionalities.

I would suggest to add '--unlogged-tables' so speedup the tests a little.

Comment: "# Custom initialization option, including a space"... ISTM that 
there is no space. Space is tested in the next test because of the v's and 
the --no-vacuum which turned them into space, which is enough.

Regex are just check for the whole output, so putting twice "qr{vacuum}" 
does not check that vacuum appears twice, it checks twice that vacuum 
appears once. I do not think that it is worth trying to check for the v 
repetition, so I suggest to remove one from the first test. Repetition of 
' ' is checked with the second test.

Maybe you could check that the data generation message is there.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Sep 19, 2017 at 12:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>> Attached the latest version patch incorporated the tap tests.
>> Please review it.
>
>
> Patch applies, compilation & make check ok.
>
> Tests are simple and provide good coverage of new functionalities.
>
> I would suggest to add '--unlogged-tables' so speedup the tests a little.
>

Good idea, added.

> Comment: "# Custom initialization option, including a space"... ISTM that
> there is no space. Space is tested in the next test because of the v's and
> the --no-vacuum which turned them into space, which is enough.

You're right, I removed it.

> Regex are just check for the whole output, so putting twice "qr{vacuum}"
> does not check that vacuum appears twice, it checks twice that vacuum
> appears once. I do not think that it is worth trying to check for the v
> repetition, so I suggest to remove one from the first test. Repetition of '
> ' is checked with the second test.

Agreed.

> Maybe you could check that the data generation message is there.

Added the check.

Attached the latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

v14 applies, compiles and works. TAP tests provide good coverage.

ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of 
"(.*)" (memorization) in the data generation message check.

Otherwise all is well for me.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Sep 20, 2017 at 3:26 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
> v14 applies, compiles and works. TAP tests provide good coverage.
>
> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
> "(.*)" (memorization) in the data generation message check.
>

Thank you, fixed it.

> Otherwise all is well for me.
>

Attached the updated version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

>> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
>> "(.*)" (memorization) in the data generation message check.
>
> Thank you, fixed it.
>
>> Otherwise all is well for me.
>>
>
> Attached the updated version patch.

Applies, compiles, make check & tap test ok, doc is fine.

All is well for me: the feature is useful, code is simple and clean, it is 
easy to invoke, easy to extend as well, which I'm planning to do once it 
gets in.

I switched the patch to "Ready for Committers". No doubt they will have 
their own opinions about it. Let's wait and see.

Thanks,

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Thu, Sep 21, 2017 at 5:23 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>> ISTM that you probably intended "\(.*\)" (actual parenthesis) instead of
>>> "(.*)" (memorization) in the data generation message check.
>>
>>
>> Thank you, fixed it.
>>
>>> Otherwise all is well for me.
>>>
>>
>> Attached the updated version patch.
>
>
> Applies, compiles, make check & tap test ok, doc is fine.
>
> All is well for me: the feature is useful, code is simple and clean, it is
> easy to invoke, easy to extend as well, which I'm planning to do once it
> gets in.
>
> I switched the patch to "Ready for Committers". No doubt they will have
> their own opinions about it. Let's wait and see.

Thank you for the reviewing this patch!!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Masahiko-san,

>> Attached the updated version patch.
>
> Applies, compiles, make check & tap test ok, doc is fine.
>
> All is well for me: the feature is useful, code is simple and clean, it is 
> easy to invoke, easy to extend as well, which I'm planning to do once it gets 
> in.
>
> I switched the patch to "Ready for Committers". No doubt they will have their 
> own opinions about it. Let's wait and see.

The patch needs a rebase in the documentation because of the xml-ilization 
of the sgml doc.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Wed, Oct 18, 2017 at 5:32 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Masahiko-san,
>
>>> Attached the updated version patch.
>>
>>
>> Applies, compiles, make check & tap test ok, doc is fine.
>>
>> All is well for me: the feature is useful, code is simple and clean, it is
>> easy to invoke, easy to extend as well, which I'm planning to do once it
>> gets in.
>>
>> I switched the patch to "Ready for Committers". No doubt they will have
>> their own opinions about it. Let's wait and see.
>
>
> The patch needs a rebase in the documentation because of the xml-ilization
> of the sgml doc.
>

Thank you for the notification! Attached rebased patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> The patch needs a rebase in the documentation because of the xml-ilization
>> of the sgml doc.
>>
>
> Thank you for the notification! Attached rebased patch.

Ok. The new version works for me.

-- 
Fabien.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada <sawada.mshk@gmail.com> writes:
> [ pgbench_custom_initialization_v16.patch ]

I'm starting to review this patch, and I wonder how it is that you
ended up with "c" as the command letter for dropping existing tables.
Seems like "d" for DROP would be much less confusing.  I see that at
one point "d" meant the data load step, but since you've gone with
"g" for "generate data" that conflict is gone.
        regards, tom lane


Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
Hello Tom,

> Masahiko Sawada <sawada.mshk@gmail.com> writes:
>> [ pgbench_custom_initialization_v16.patch ]
>
> I'm starting to review this patch, and I wonder how it is that you
> ended up with "c" as the command letter for dropping existing tables.
> Seems like "d" for DROP would be much less confusing.  I see that at
> one point "d" meant the data load step, but since you've gone with
> "g" for "generate data" that conflict is gone.

Indeed, you are right. As a reviewer, I can recall that there were some 
hesitations, not sure we ended up with the best possible choice.

Note that if "c" is freed by "d" (drop), then it may be worth considering 
that "t" (table) could be replaced by "c" (create).

I'm fine with anything consistent and easy to memorize, really.

-- 
Fabien.


Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Masahiko Sawada <sawada.mshk@gmail.com> writes:
>>> [ pgbench_custom_initialization_v16.patch ]

>> I'm starting to review this patch, and I wonder how it is that you
>> ended up with "c" as the command letter for dropping existing tables.
>> Seems like "d" for DROP would be much less confusing.  I see that at
>> one point "d" meant the data load step, but since you've gone with
>> "g" for "generate data" that conflict is gone.

> Indeed, you are right. As a reviewer, I can recall that there were some 
> hesitations, not sure we ended up with the best possible choice.

OK, will make the appropriate changes.

> Note that if "c" is freed by "d" (drop), then it may be worth considering 
> that "t" (table) could be replaced by "c" (create).

I thought about that, but the argument that 'c' might mean different
sorts of create steps (e.g. create index) seemed reasonable.  I think
we're best off leaving it as 't' in case of future expansion.
        regards, tom lane


Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization

From
Fabien COELHO
Date:
>> Note that if "c" is freed by "d" (drop), then it may be worth considering
>> that "t" (table) could be replaced by "c" (create).
>
> I thought about that, but the argument that 'c' might mean different 
> sorts of create steps (e.g. create index) seemed reasonable. I think 
> we're best off leaving it as 't' in case of future expansion.

Ok. Fine with me.

-- 
Fabien.


Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>> Note that if "c" is freed by "d" (drop), then it may be worth considering
>>> that "t" (table) could be replaced by "c" (create).

>> I thought about that, but the argument that 'c' might mean different 
>> sorts of create steps (e.g. create index) seemed reasonable. I think 
>> we're best off leaving it as 't' in case of future expansion.

> Ok. Fine with me.

Committed that way, with some additional mostly-cosmetic changes:

* I did not much like the "--custom-initialize" option name, as that
conveys no information; arguably any option makes the initialization
"custom" in some way.  Moreover it invites confusion with the use of
"custom" elsewhere in pgbench's docs for custom benchmarking scripts.
After some thought I chose "--init-steps", though if anyone has a better
idea I'm happy to listen.  I would have gone with just "--steps", except
that the -s and -S short forms are both taken already.  (Hm, maybe we
should use --steps and not have a short form?)

* I also tried to make the code use the "step" terminology consistently
internally, since it's already using "command" to refer to commands in
benchmarking scripts.

* I fixed it so that conflicts between the step-selecting options
are resolved after we've scanned all the options.  The submitted
coding would do the wrong thing with, eg, --foreign-keys before -I.

* I changed the "drop" step to just drop all four tables in one
command; that way avoids having to make any assumption about what
foreign keys exist.  (I suppose that constraints leading in different
directions aren't all that likely, but if we're trying to cater to
not-quite-default configurations, we might as well do this.)

* Minor other cosmetic cleanup.
        regards, tom lane


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

From
Masahiko Sawada
Date:
On Tue, Nov 14, 2017 at 6:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>>> Note that if "c" is freed by "d" (drop), then it may be worth considering
>>>> that "t" (table) could be replaced by "c" (create).
>
>>> I thought about that, but the argument that 'c' might mean different
>>> sorts of create steps (e.g. create index) seemed reasonable. I think
>>> we're best off leaving it as 't' in case of future expansion.
>
>> Ok. Fine with me.
>
> Committed that way, with some additional mostly-cosmetic changes:
>
> * I did not much like the "--custom-initialize" option name, as that
> conveys no information; arguably any option makes the initialization
> "custom" in some way.  Moreover it invites confusion with the use of
> "custom" elsewhere in pgbench's docs for custom benchmarking scripts.
> After some thought I chose "--init-steps", though if anyone has a better
> idea I'm happy to listen.  I would have gone with just "--steps", except
> that the -s and -S short forms are both taken already.  (Hm, maybe we
> should use --steps and not have a short form?)
>
> * I also tried to make the code use the "step" terminology consistently
> internally, since it's already using "command" to refer to commands in
> benchmarking scripts.
>
> * I fixed it so that conflicts between the step-selecting options
> are resolved after we've scanned all the options.  The submitted
> coding would do the wrong thing with, eg, --foreign-keys before -I.
>
> * I changed the "drop" step to just drop all four tables in one
> command; that way avoids having to make any assumption about what
> foreign keys exist.  (I suppose that constraints leading in different
> directions aren't all that likely, but if we're trying to cater to
> not-quite-default configurations, we might as well do this.)
>
> * Minor other cosmetic cleanup.
>

Thank you for reviewing and committing the patch. I couldn't respond
promptly because slept but I agree with the changes.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center