Re: postgres_fdw: Use COPY to speed up batch inserts - Mailing list pgsql-hackers
From | Matheus Alcantara |
---|---|
Subject | Re: postgres_fdw: Use COPY to speed up batch inserts |
Date | |
Msg-id | CAFY6G8ePwjT8GiJX1AK5FDMhfq-sOnny6optgTPg98HQw7oJ0g@mail.gmail.com Whole thread Raw |
In response to | Re: postgres_fdw: Use COPY to speed up batch inserts (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
Thanks for the review! On Thu Oct 16, 2025 at 5:39 PM -03, Tomas Vondra wrote: > Thanks for the patch. Please add it to the next committfest (PG19-3) at > > https://commitfest.postgresql.org/ > > so that we don't lose track of the patch. > Here it is: https://commitfest.postgresql.org/patch/6137/ > Thanks. The code looks sensible in general, I think. I'll have a couple > minor comments. It'd be good to also update the documentation, and add > some tests to postgres_fdw.sql, to exercise this new code. > > The sgml docs (doc/src/sgml/postgres-fdw.sgml) mention batch_size, and > explain how it's related to the number of parameters in the INSERT > state. Which does not matter when using COPY under the hood, so this > should be amended/clarified in some way. It doesn't need to be > super-detailed, though. > I'll work on this and intend to have something on the next version. > A couple minor comments about the code: > > 1) buildCopySql > > - Does this really need the "(FORMAT TEXT, DELIMITER ',')" part? AFAIK > no, if we use the default copy format in convert_slot_to_copy_text. > You're right. I've removed the (FORMAT TEXT, DELIMITER ',') part. > - Shouldn't we cache the COPY SQL, similarly to how we keep insert SQL? > Instead of rebuilding it over and over for every batch. > I tried to reuse the fmstate->query field to cache the COPY sql but running the postgres_fdw.sql regress test shows that this may not work. When we are running a user supplied COPY command on a foreign table the CopyMultiInsertBufferFlush() call ri_FdwRoutine->ExecForeignBatchInsert which may pass different values for numSlots based on the number of slots already sent to the foreign server, and eventually it may pass numSlots as 1 which will not use the COPY under the hood to send to the foreign server and if we cache the COPY command into the fmstate->query this will not work because the normal INSERT path on execute_foreign_modify uses the fmstate->query to build a prepared statement to send to the foreign server. So basically what I'm trying to say is that when the server is executing a COPY into a foreign it may use the COPY command or INSERT command to send the data to the foreign server. That being said, I decided to create a new copy_query field on PgFdwModifyState to cache only COPY commands. Please let me know if my understanding is wrong or if we could have a better approach here. > 2) convert_slot_to_copy_text > > - It's probably better to not hard-code the delimiters etc. > Since now we are using the default copy format, it's safe to hard code the default delimiter which is \t? Or should we still make this parameterizable or something else? > - I wonder if the formatting needs to do something more like what > copyto.c does (through CopyToTextOneRow and CopyAttributeOutText). Maybe > not, not sure. > I'm still checking this, I think that is not needed but I want to do some more tests to make sure. > 3) execute_foreign_modify > > - I think the new block of code is a bit confusing. It essentially does > something similar to the original code, but not entirely. I suggest we > move it to a new function, and call that from execute_foreign_modify. > Fixed > - I agree it's probably better to do COPYBUFSIZ, or something like that > to send the data in smaller (but not tiny) chunks. > Fixed. I've declared the default chunk size as 8192 (which is the same used on psql/copy.c), let me know if we should use another value or perhaps make this configurable. >> Lastly, I don't know if we should change the EXPLAIN(ANALYZE, VERBOSE) >> output for batch inserts that use the COPY to mention that we are >> sending the COPY command to the remote server. I guess so? >> > > Good point. We definitely should not show SQL for INSERT, when we're > actually running a COPY. > This seems a bit tricky to implement. The COPY is used based on the number of slots into the TupleTableSlot array that is used for batch insert. The numSlots that execute_foreign_modify() receive is coming from ResultRelInfo->ri_NumSlots during ExecInsert(). We don't have this information during EXPLAIN that is handled by postgresExplainForeignModify(), we only have the ResultRelInfo->ri_BatchSize at this stage. The current idea is to use the COPY command if the number of slots is > 1 so I'm wondering if we should use another mechanism to enable the COPY usage, for example, we could just use if the batch_size is configured to a number greater than X, but what if the INSERT statement is only inserting a single row, should we still use the COPY command to ingest a single row into the foreign table? Any thoughts? -- Matheus Alcantara
Attachment
pgsql-hackers by date: