Home > mailing lists

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization
Date	August 3, 2017 12:39:52
Msg-id	alpine.DEB.2.20.1708031038020.19721@lancre Whole thread
In response to	Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization
List	pgsql-hackers

Tree view

Hello,

> My motivation of this proposal is same as what Robert has. I
> understand that ad-hoc option can solve only the part of big problem
> and it could be cause of mess. However It seems me that the script
> especially for table initialization will not be flexible than we
> expected. I mean, even if we provide some meta commands for table
> initialization or data loading, these meta commands work for only
> pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
> If we want to create other tables and load data to them as we want we
> can do that using psql -f. So an alternative ways is having a flexible
> style option for example --custom-initialize = { [load, create_pkey,
> create_fkey, vacuum], ... }. That would solve this in a better way.

Personnaly, I could be fine with a limited number of long options to 
adjust pgbench initialization to various needs, eg --use-hash-index, 
--skip-whetever-index, etc.

The flexible --custom-init idea outlined above looks nice as well.


As for a more generic solution, the easy part are the "CREATE" stuff and 
the transaction script stuff (existing pgbench scripts).

For the CREATE stuff, the script language is SQL, the command to use it is 
"psql"...

The real and hard part is to fill tables with meaningful pseudo-random 
test data which do not violate constraints for any non trivial schema 
involving foreign keys and various unique constraints.

The solution for this is SQL for trivial cases, think of:
  "INSERT INTO Foo() SELECT ... FROM generate_series(...);"

For instance the pgbench initialization is really close to:
 psql -Dscale=10 <<EOF   CREATE TABLE ... ;   INSERT INTO pgbench_account(...)    SELECT ... FROM generate_series(1,
100000* :scale) AS i;   INSERT ...   CREATE INDEX ...;   VACUUM FULL ANALYZE; EOF
 

And all existing options could probably be implemented easilly with the 
recently added conditional (\if).

So my 0.02€ is that if something is to be done, I would suggest to turn 
the creation and initialization stuff into a standard "psql" script that 
could be called from pgbench instead of integrating much more ad-hoc stuff 
into pgbench.

Note that non trivial schema initialization requires more general 
programming, so I do not believe in doing a lot at pgbench or psql levels. 
The best I could come with is a data generator which takes as input the 
schema with added directives on how to generate the various attributes 
(tool named "datafiller", that some people use:-).

-- 
Fabien.

pgsql-hackers by date:

From: Michael Paquier
Date: 03 August 2017, 12:37:32
Subject: Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10(upgrading standby servers)

From: Etsuro Fujita
Date: 03 August 2017, 12:55:21
Subject: Re: [HACKERS] Update comments in nodeModifyTable.c

Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization - Mailing list pgsql-hackers

Previous

Next