However, the FDW interface as it's implemented today is not designed to allow that, I believe (we pretty much just invoke the FWD callbacks as if it was a local AM). It assumes the calls are synchronous, and redesigning it to work in async way is a much larger/complex patch than what's being discussed here.
I do think the FDW extension proposed here (adding the bulk-insert callback) is useful in general, for two reasons: (a) even if most client libraries support some sort of pipelining, some don't, and (b) I'd bet it's still more efficient to send one large insert than pipelining many individual inserts.
That being said, I'm against expanding the scope of this patch to also require redesign of the whole FDW infrastructure - that would likely mean no such improvement landing in PG14. If the libpq pipelining patch seems likely to get committed, we can try using it for the bulk insert callback (instead of the current multi-value stuff).
I totally agree on all points. It was not my intent to expand the scope of this significantly and I really don't want to hold it back.
I raised the interface consideration in case it was something easy to accommodate. It's not, so that's done, topic over.