Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id 20201008221421.foasnh5of66rupnm@development
Whole thread Raw
In response to RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: POC: postgres_fdw insert batching
List pgsql-hackers
On Thu, Oct 08, 2020 at 02:40:10AM +0000, tsunakawa.takay@fujitsu.com wrote:
>Hello Tomas san,
>
>
>Thank you for picking up this.  I'm interested in this topic, too.  (As an aside, we'd like to submit a bulk insert
patchfor ECPG in the near future.)
 
>
>As others referred, Andrey-san's fast COPY to foreign partitions is also promising.  But I think your bulk INSERT is a
separatefeature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS
SELECT.
>
>Is there anything that makes you worry and stops development?  Could I give it a try to implement this (I'm not sure I
can,sorry.  I'm worried if we can change the executor's call chain easily.)
 
>

It's primarily a matter of having too much other stuff on my plate, thus
not having time to work on this feature. I was not too worried about any
particular issue, but I wanted some feedback before spending more time
on extending the API.

I'm not sure when I'll have time to work on this again, so if you are
interested and willing to work on it, please go ahead. I'll gladly do
reviews and help you with it.

>
>> 1) Extend the FDW API?
>
>Yes, I think, because FDWs for other DBMSs will benefit from this.  (But it's questionable whether we want users to
transferdata in Postgres database to other DBMSs...)
 
>

I think transferring data to other databases is fine - interoperability
is a big advantage for users, I don't see it as something threatening
the PostgreSQL project. I doubt this would make it more likely for users
to migrate from PostgreSQL - there are many ways to do that already.


>MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2),
... Oracle doesn't have this syntax, but it can use CTE as follows:
 
>
>  INSERT INTO table
>  WITH t AS (
>    SELECT record1 FROM DUAL UNION ALL
>    SELECT record2 FROM DUAL UNION ALL
>    ...
>  )
>  SELECT * FROM t;
>
>And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...
>

True. In some cases INSERT may be replaced by COPY, but it has various
other features too.

>The API would simply be:
>
>TupleTableSlot **
>ExecForeignMultiInsert(EState *estate,
>                  ResultRelInfo *rinfo,
>                  TupleTableSlot **slot,
>                  TupleTableSlot **planSlot,
>                  int numSlots);
>
>

+1, seems quite reasonable

>> 2) What about the insert results?
>
>I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely.
OtherFDWs may be able to do it, so the API can be like above.
 
>

Yeah. I think handling complete failure should not be very difficult,
but there are cases that worry me more. For example, what if there's a
before trigger (on the remote db) that "skips" inserting some of the
rows by returning NULL?

>For the same reason, support for RETURNING clause will vary from DBMS to DBMS.
>

Yeah. I wonder if the FDW needs to indicate which features are supported
by the ExecForeignMultiInsert, e.g. by adding a function that decides
whether batch insert is supported (it might also do that internally by
calling ExecForeignInsert, of course).

>
>> 3) What about the other DML operations (DELETE/UPDATE)?
>
>I don't think they are necessary for the time being.  If we want them, they will be implemented using the libpq
batch/pipeliningas Andres-san said.
 
>

I agree.

>
>> 3) Should we do batching for COPY insteads?
>
>I'm thinking of issuing INSERT with multiple records as your patch does, because:
>
>* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.
>
>* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a feature what users care about, though.)
Also,I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.
 
>

I agree.



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: John Naylor
Date:
Subject: Re: speed up unicode normalization quick check