RE: POC: postgres_fdw insert batching - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: POC: postgres_fdw insert batching |
Date | |
Msg-id | TYAPR01MB2990666A2BAC075F3A3FD1CEFEF50@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: POC: postgres_fdw insert batching (Craig Ringer <craig.ringer@enterprisedb.com>) |
List | pgsql-hackers |
From: Craig Ringer <craig.ringer@enterprisedb.com> > It was not my intention to hold this patch up or greatly expand its > scope. I'll spend some time testing it out and have a closer read soon > to see if I can help progress it. Thank you, I'm relieved to hear that. Last weekend, I was scared of a possible mood that's something like "We won't acceptthe insert speedup patch for foreign tables unless you take full advantage of pipelining and achieve maximum conceivablespeed!" > I thought I gave it at the time, and a demo program. IIRC it was just > doing small multi row inserts or single row inserts. Latency would've > been a couple of hundred ms probably, I think I did something like > running on my laptop (Australia, ADSL) to a server on AWS US or EU. a couple of hundred ms, so that would be dominant in each prepare-send-execute-receive, possibly even for batch insert withhundreds of rows in each batch. Then, the synchronous batch insert of the current patch may achieve a few hundreds timesspeedup compared to a single row inserts when the batch size is hundreds or more. > > I'd like to check other DBMSs and your rich references for the FDW interface. > (My first intuition is that many major DBMSs might not have client C APIs that > can be used to implement an async pipelining FDW interface. > > Likely correct for C APIs of other traditional DBMSes. I'd be less > sure about newer non SQL ones, especially cloud oriented. For example > DynamoDB supports at least async requests in the Java client [3] and > C++ client [4]; it's not immediately clear if requests can be > pipelined, but the API suggests they can. I've checked ODBC, MySQL, Microsoft Synapse Analytics, Redshift, and BigQuery, guessing that the data warehouse may haveasynchronous/pipelining API that enables efficient data integration/migration. But none of them had one. (I seem tohave spent too long and am a bit tired... but it was a bit fun as well.) They all support INSERT with multiple recordsin its VALUES clause. So, it will be useful to provide a synchronous batch insert FDW API. I guess Oracle's OCIhas an asynchronous API, but I didn't check it. As an aside, MySQL 8.0.16 added support for asynchronous execution in its C API, but it allows only one active SQL statementin each connection. Likewise, although the ODBC standard defines asynchronous execution (SQLSetStmtAttr(SQL_ASYNC_ENABLE)and SQLCompleteAsync), SQL Server and Synapse Analytics only allows only one active statementper connection. psqlODBC doesn't support asynchronous execution. > Most things with a REST-like API can do a fair bit of concurrency > though. Multiple async nonblocking HTTP connections can be serviced at > once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0 > streams [2]. This is relevant for any REST-like API. I'm not sure if this is related, Google deprecated Batch HTTP API [1]. [1] https://cloud.google.com/bigquery/batch Regards Takayuki Tsunakawa
pgsql-hackers by date: