RE: POC: postgres_fdw insert batching - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: POC: postgres_fdw insert batching |
Date | |
Msg-id | TYAPR01MB2990FB16E86A4E51BB3E700AFEFB0@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: POC: postgres_fdw insert batching (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: POC: postgres_fdw insert batching
|
List | pgsql-hackers |
From: Tomas Vondra <tomas.vondra@enterprisedb.com> > 1) We're calling it "batch_size" but the API function is named > postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function > to postgresGetModifyBatchSize()? That has the advantage it'd work if we > ever add support for batching to UPDATE/DELETE. Actually, I was in two minds whether the term batch or bulk is better. Because Oracle uses "bulk insert" and "bulk fetch",like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch updates"and its API method names (addBatch, executeBatch). But it seems better or common to use batch according to the etymology and the following Stack Overflow page: https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better. That is, this API dealswith multiple records in a single INSERT statement. Your GetModifyBatchSize will be reserved for statement batchingwhen libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the followingJDBC batch updates. What do you think? CODE EXAMPLE 14-1 Creating and executing a batch of insert statements -------------------------------------------------- Statement stmt = con.createStatement(); stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); // submit a batch of update commands for execution int[] updateCounts = stmt.executeBatch(); -------------------------------------------------- > 2) Do we have to lookup the batch_size in create_foreign_modify (in > server/table options)? I'd have expected to look it up while planning > the modify and then pass it through the list, just like the other > FdwModifyPrivateIndex stuff. But maybe that's not possible. Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning. Unfortunately, it's also calledfrom BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is constructedthere. > 3) That reminds me - should we show the batching info on EXPLAIN? That > seems like a fairly interesting thing to show to the user. Perhaps > showing the average batch size would also be useful? Or maybe not, we > create the batches as large as possible, with the last one smaller. Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system stateunlike shared buffers and parallel workers. OTOH, I sometimes want to see what configuration parameter values the userset, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain). For example,it'd be nice if EXPLAIN (parameters on) could do that. Some relevant FDW-related parameters could be included inthat output. > 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and > over for every tuple. I don't know it that has measurable impact, but it > seems a bit excessive IMO. I don't think we should support the batch > size changing during execution (seems tricky). Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in create_foreign_modify(). Regards Takayuki Tsunakawa
pgsql-hackers by date: