Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id CAGRY4nzbrMK19FQiP_DoNQbCD9rOGhkOqXTcjY7-d1A1FyHGGw@mail.gmail.com
Whole thread Raw
In response to RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Fri, 27 Nov 2020, 14:06 tsunakawa.takay@fujitsu.com,
<tsunakawa.takay@fujitsu.com> wrote:
>
>
Also, I'm afraid it requires major surgery or reform of executor.  I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)


Totally sensible. If it isn't feasible without significant executor
change that's all that needs to be said.

I was afraid that'd be the case given the executor's pull flow but
just didn't know enough.

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

I know Andres did some initial work on executor parallelism and
pipelining. I should take a look.

> > But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...
>
> Wow, impressive  number.  I've just seen it in the beginning of the libpq pipelining thread (oh, already four years
ago..!)

Yikes.

> Could you share the workload and the network latency (ping time)?  I'm sorry I'm just overlooking it.

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

> Thank you for your (always) concise explanation.

You joke! I am many things but despite my best efforts concise is
rarely one of them.

> I'd like to check other DBMSs and your rich references for the FDW interface.  (My first intuition is that many major
DBMSsmight not have client C APIs that can be used to implement an async pipelining FDW interface.
 

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3] and
C++ client [4]; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
streams [2]. This is relevant for any REST-like API.

> (It'd be kind of you to send emails in text format.  I've changed the format of this reply from HTML to text.)

I try to remember. Stupid Gmail.  Sorry. On mobile it offers very
little control over format but I'll do my best when I can.

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] https://blog.restcase.com/http2-benefits-for-rest-apis/
[3] https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/
[4]
https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Add docs stub for recovery.conf
Next
From: Craig Ringer
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.