RE: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: POC: postgres_fdw insert batching
Date
Msg-id TYAPR01MB2990FB16E86A4E51BB3E700AFEFB0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: POC: postgres_fdw insert batching
List pgsql-hackers
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better.  Because Oracle uses "bulk insert" and "bulk
fetch",like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch
updates"and its API method names (addBatch, executeBatch).
 

But it seems better or common to use batch according to the etymology and the following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better.  That is, this API
dealswith multiple records in a single INSERT statement.  Your GetModifyBatchSize will be reserved for statement
batchingwhen libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the
followingJDBC batch updates.  What do you think?
 

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
--------------------------------------------------
Statement stmt = con.createStatement(); 
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 

// submit a batch of update commands for execution 
int[] updateCounts = stmt.executeBatch(); 
--------------------------------------------------


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning.  Unfortunately, it's also
calledfrom BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is
constructedthere.
 


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system
stateunlike shared buffers and parallel workers.  OTOH, I sometimes want to see what configuration parameter values the
userset, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain).  For
example,it'd be nice if EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters could be included
inthat output.
 

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in
create_foreign_modify().


Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Migration Oracle multitenant database to PostgreSQL ?
Next
From: Heikki Linnakangas
Date:
Subject: Re: Deduplicate aggregates and transition functions in planner