RE: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: POC: postgres_fdw insert batching
Date
Msg-id TYAPR01MB2990666A2BAC075F3A3FD1CEFEF50@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: POC: postgres_fdw insert batching  (Craig Ringer <craig.ringer@enterprisedb.com>)
List pgsql-hackers
From: Craig Ringer <craig.ringer@enterprisedb.com>
> It was not my intention to hold this patch up or greatly expand its
> scope. I'll spend some time testing it out and have a closer read soon
> to see if I can help progress it.

Thank you, I'm relieved to hear that.  Last weekend, I was scared of a possible mood that's something like "We won't
acceptthe insert speedup patch for foreign tables unless you take full advantage of pipelining and achieve maximum
conceivablespeed!"
 


> I thought I gave it at the time, and a demo program. IIRC it was just
> doing small multi row inserts or single row inserts. Latency would've
> been a couple of hundred ms probably, I think I did something like
> running on my laptop (Australia, ADSL) to a server on AWS US or EU.

a couple of hundred ms, so that would be dominant in each prepare-send-execute-receive, possibly even for batch insert
withhundreds of rows in each batch.  Then, the synchronous batch insert of the current patch may achieve a few hundreds
timesspeedup compared to a single row inserts when the batch size is hundreds or more.
 


> > I'd like to check other DBMSs and your rich references for the FDW interface.
> (My first intuition is that many major DBMSs might not have client C APIs that
> can be used to implement an async pipelining FDW interface.
> 
> Likely correct for C APIs of other traditional DBMSes. I'd be less
> sure about newer non SQL ones, especially cloud oriented. For example
> DynamoDB supports at least async requests in the Java client [3] and
> C++ client [4]; it's not immediately clear if requests can be
> pipelined, but the API suggests they can.

I've checked ODBC, MySQL, Microsoft Synapse Analytics, Redshift, and BigQuery, guessing that the data warehouse may
haveasynchronous/pipelining API that enables efficient data integration/migration.  But none of them had one.  (I seem
tohave spent too long and am a bit tired... but it was a bit fun as well.)  They all support INSERT with multiple
recordsin its VALUES clause.  So, it will be useful to provide a synchronous batch insert FDW API.  I guess Oracle's
OCIhas an asynchronous API, but I didn't check it.
 

As an aside, MySQL 8.0.16 added support for asynchronous execution in its C API, but it allows only one active SQL
statementin each connection.  Likewise, although the ODBC standard defines asynchronous execution
(SQLSetStmtAttr(SQL_ASYNC_ENABLE)and SQLCompleteAsync), SQL Server and Synapse Analytics only allows only one active
statementper connection.  psqlODBC doesn't support asynchronous execution.
 


> Most things with a REST-like API can do a fair bit of concurrency
> though. Multiple async nonblocking HTTP connections can be serviced at
> once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
> streams [2]. This is relevant for any REST-like API.

I'm not sure if this is related, Google deprecated Batch HTTP API [1].


[1]
https://cloud.google.com/bigquery/batch


Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Is Recovery actually paused?
Next
From: Masahiko Sawada
Date:
Subject: Re: [PATCH] BUG FIX: Core dump could happen when VACUUM FULL in standalone mode