Home > mailing lists

Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: POC: postgres_fdw insert batching
Date	November 24, 2020 04:44:45
Msg-id	43b5d643-da69-70a5-16d2-d04d0851b723@enterprisedb.com Whole thread Raw
In response to	RE: POC: postgres_fdw insert batching ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses	RE: POC: postgres_fdw insert batching
List	pgsql-hackers

Tree view

On 11/23/20 3:17 AM, tsunakawa.takay@fujitsu.com wrote:
> From: Tomas Vondra <tomas.vondra@enterprisedb.com>
>> I don't think this is usable in practice, because a single session
>> may be using multiple FDW servers, with different implementations,
>> latency to the data nodes, etc. It's unlikely a single GUC value
>> will be suitable for all of them.
> 
> That makes sense.  The row size varies from table to table, so the
> user may want to tune this option to reduce memory consumption.
> 
> I think the attached patch has reflected all your comments.  I hope
> this will pass..
> 

Thanks - I didn't have time for a thorough review at the moment, so I
only skimmed through the diff and did a couple  very simple tests. And I
think overall it looks quite nice.

A couple minor comments/questions:

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Amit Kapila
Date: 24 November 2020, 04:42:26
Subject: Re: deferred primary key and logical replication

From: Stephen Frost
Date: 24 November 2020, 04:54:41
Subject: Re: Online verification of checksums

Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

Previous

Next