Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id 43b5d643-da69-70a5-16d2-d04d0851b723@enterprisedb.com
Whole thread Raw
In response to RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On 11/23/20 3:17 AM, tsunakawa.takay@fujitsu.com wrote:
> From: Tomas Vondra <tomas.vondra@enterprisedb.com>
>> I don't think this is usable in practice, because a single session
>> may be using multiple FDW servers, with different implementations,
>> latency to the data nodes, etc. It's unlikely a single GUC value
>> will be suitable for all of them.
> 
> That makes sense.  The row size varies from table to table, so the
> user may want to tune this option to reduce memory consumption.
> 
> I think the attached patch has reflected all your comments.  I hope
> this will pass..
> 

Thanks - I didn't have time for a thorough review at the moment, so I
only skimmed through the diff and did a couple  very simple tests. And I
think overall it looks quite nice.

A couple minor comments/questions:

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: deferred primary key and logical replication
Next
From: Stephen Frost
Date:
Subject: Re: Online verification of checksums