Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: POC: postgres_fdw insert batching |
Date | |
Msg-id | 20201130082902.GD9795@fetter.org Whole thread Raw |
In response to | RE: POC: postgres_fdw insert batching ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
RE: POC: postgres_fdw insert batching
|
List | pgsql-hackers |
On Wed, Nov 25, 2020 at 05:04:36AM +0000, tsunakawa.takay@fujitsu.com wrote: > From: Tomas Vondra <tomas.vondra@enterprisedb.com> > > On 11/24/20 9:45 AM, tsunakawa.takay@fujitsu.com wrote: > > > OTOH, as for the name GetModifyBatchSize() you suggest, I think > > GetInsertBatchSize may be better. That is, this API deals with multiple > > records in a single INSERT statement. Your GetModifyBatchSize will be > > reserved for statement batching when libpq has supported batch/pipelining to > > execute multiple INSERT/UPDATE/DELETE statements, as in the following > > JDBC batch updates. What do you think? > > > > > > > I don't know. I was really only thinking about batching in the context > > of a single DML command, not about batching of multiple commands at the > > protocol level. IMHO it's far more likely we'll add support for batching > > for DELETE/UPDATE than libpq pipelining, which seems rather different > > from how the FDW API works. Which is why I was suggesting to use a name > > that would work for all DML commands, not just for inserts. > > Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statementbatching. So I'll take your suggested name. > > > > Not sure, but I'd guess knowing whether batching is used would be > > useful. We only print the single-row SQL query, which kinda gives the > > impression that there's no batching. > > Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run. > > > > > Don't worry about this, too. GetMaxBulkInsertTuples() just returns a value > > that was already saved in a struct in create_foreign_modify(). > > > > > > > Well, I do worry for two reasons. > > > > Firstly, the fact that in postgres_fdw the call is cheap does not mean > > it'll be like that in every other FDW. Presumably, the other FDWs might > > cache it in the struct and do the same thing, of course. > > > > But the fact that we're calling it over and over for each row kinda > > seems like we allow the value to change during execution, but I very > > much doubt the code is expecting that. I haven't tried, but assume the > > function first returns 10 and then 100. ISTM the code will allocate > > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there. > > That can't end well. Sure, we can claim it's a bug in the FDW extension, > > but it's also due to the API design. > > You worried about other FDWs than postgres_fdw. That's reasonable. I insisted in other threads that PG developers careonly about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake. I madechanges so that the executor calls GetModifyBatchSize() once per relation per statement. Please pardon me for barging in late in this discussion, but if we're going to be using a bulk API here, wouldn't it make more sense to use COPY, except where RETURNING is specified, in place of INSERT? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: