Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] pgbench: Skipping the creating primary keys afterinitialization
Date
Msg-id alpine.DEB.2.20.1708031038020.19721@lancre
Whole thread Raw
In response to Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello,

> My motivation of this proposal is same as what Robert has. I
> understand that ad-hoc option can solve only the part of big problem
> and it could be cause of mess. However It seems me that the script
> especially for table initialization will not be flexible than we
> expected. I mean, even if we provide some meta commands for table
> initialization or data loading, these meta commands work for only
> pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
> If we want to create other tables and load data to them as we want we
> can do that using psql -f. So an alternative ways is having a flexible
> style option for example --custom-initialize = { [load, create_pkey,
> create_fkey, vacuum], ... }. That would solve this in a better way.

Personnaly, I could be fine with a limited number of long options to 
adjust pgbench initialization to various needs, eg --use-hash-index, 
--skip-whetever-index, etc.

The flexible --custom-init idea outlined above looks nice as well.


As for a more generic solution, the easy part are the "CREATE" stuff and 
the transaction script stuff (existing pgbench scripts).

For the CREATE stuff, the script language is SQL, the command to use it is 
"psql"...

The real and hard part is to fill tables with meaningful pseudo-random 
test data which do not violate constraints for any non trivial schema 
involving foreign keys and various unique constraints.

The solution for this is SQL for trivial cases, think of:
  "INSERT INTO Foo() SELECT ... FROM generate_series(...);"

For instance the pgbench initialization is really close to:
 psql -Dscale=10 <<EOF   CREATE TABLE ... ;   INSERT INTO pgbench_account(...)    SELECT ... FROM generate_series(1,
100000* :scale) AS i;   INSERT ...   CREATE INDEX ...;   VACUUM FULL ANALYZE; EOF
 

And all existing options could probably be implemented easilly with the 
recently added conditional (\if).

So my 0.02€ is that if something is to be done, I would suggest to turn 
the creation and initialization stuff into a standard "psql" script that 
could be called from pgbench instead of integrating much more ad-hoc stuff 
into pgbench.

Note that non trivial schema initialization requires more general 
programming, so I do not believe in doing a lot at pgbench or psql levels. 
The best I could come with is a data generator which takes as input the 
schema with added directives on how to generate the various attributes 
(tool named "datafiller", that some people use:-).

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Clarification in pg10's pgupgrade.html step 10(upgrading standby servers)
Next
From: Etsuro Fujita
Date:
Subject: Re: [HACKERS] Update comments in nodeModifyTable.c