Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
| From | Matheus Alcantara |
|---|---|
| Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Date | |
| Msg-id | DDP9DDKVIYQ0.2LTDOCAWGG9M6@gmail.com Whole thread Raw |
| In response to | Re: postgres_fdw: Use COPY to speed up batch inserts (Matheus Alcantara <matheusssilv97@gmail.com>) |
| Responses |
Re: postgres_fdw: Use COPY to speed up batch inserts
|
| List | pgsql-hackers |
On Tue Oct 21, 2025 at 11:25 AM -03, Matheus Alcantara wrote: >>> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE) >>> output for batch inserts that use the COPY to mention that we are >>> sending the COPY command to the remote server. I guess so? >>> >> >> Good point. We definitely should not show SQL for INSERT, when we're >> actually running a COPY. >> > This seems a bit tricky to implement. The COPY is used based on the > number of slots into the TupleTableSlot array that is used for batch > insert. The numSlots that execute_foreign_modify() receive is coming > from ResultRelInfo->ri_NumSlots during ExecInsert(). We don't have this > information during EXPLAIN that is handled by > postgresExplainForeignModify(), we only have the > ResultRelInfo->ri_BatchSize at this stage. The current idea is to use > the COPY command if the number of slots is > 1 so I'm wondering if we > should use another mechanism to enable the COPY usage, for example, we > could just use if the batch_size is configured to a number greater than > X, but what if the INSERT statement is only inserting a single row, > should we still use the COPY command to ingest a single row into the > foreign table? Any thoughts? > Thinking more about this I realize that when we are deparsing the remote SQL to be sent to the foreign server at the planner phase (via postgresPlanForeignModify()) we don't have the batch_size and number of rows information, so currently we can not know at the plan time if the COPY usage for a batch insert is visible or not because IIUC these information are only visible at query runtime. One way to make it possible is that we could simply use the PgFdwModifyState->copy_data during postgresExplainForeignModify() if it's not null. Since we will only have this information during query execution the drawback of this approach is that we would only show the COPY as a Remote SQL on during EXPLAIN(ANALYZE). Please see the attached v3 version that implements this idea. > I tried to reuse the fmstate->query field to cache the COPY sql but > running the postgres_fdw.sql regress test shows that this may not > work. When we are running a user supplied COPY command on a foreign > table the CopyMultiInsertBufferFlush() call > ri_FdwRoutine->ExecForeignBatchInsert which may pass different values > for numSlots based on the number of slots already sent to the foreign > server, and eventually it may pass numSlots as 1 which will not use the > COPY under the hood to send to the foreign server and if we cache the > COPY command into the fmstate->query this will not work because the > normal INSERT path on execute_foreign_modify uses the fmstate->query to > build a prepared statement to send to the foreign server. So basically > what I'm trying to say is that when the server is executing a COPY into > a foreign it may use the COPY command or INSERT command to send the data > to the foreign server. That being said, I decided to create a new > copy_query field on PgFdwModifyState to cache only COPY commands. Please > let me know if my understanding is wrong or if we could have a better > approach here. > Based on the information that I've mention above I think that we need some way to not mix INSERT with COPY commands when executing a COPY in a foreign table supplied by the user. Or we should disable the COPY under the hood and always fallback to INSERT or enable the COPY to use when the *numSlots is 1, so in case of an EXPLAIN(ANALYZE) output we can show the Remote SQL correctly. Is that make sense? I'm still not sure if the trigger to use the COPY command for batch insert should be *numSlots > 1 or something else. I'm open for better ideas. Thoughts? -- Matheus Alcantara
Attachment
pgsql-hackers by date: