Home > mailing lists

Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: POC: postgres_fdw insert batching
Date	June 30, 2020 07:22:44
Msg-id	CAG-ACPW2d-PUTvkHh3=qBdYmJLfj+oYAp6UyV_rNA4LPycHLfw@mail.gmail.com Whole thread Raw
In response to	Re: POC: postgres_fdw insert batching (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses	Re: POC: postgres_fdw insert batching
List	pgsql-hackers

Tree view

On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:

> > 3) What about the other DML operations (DELETE/UPDATE)?
> >
> > The other DML operations could probably benefit from the batching too.
> > INSERT was good enough for a PoC, but having batching only for INSERT
> > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > of quals, but likely doable.
>
> Bulk INSERTs are more common in a sharded environment because of data
> load in say OLAP systems. Bulk update/delete are rare, although not
> that rare. So if an approach just supports bulk insert and not bulk
> UPDATE/DELETE that will address a large number of usecases IMO. But if
> we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify. So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.

> > 3) Should we do batching for COPY insteads?
> >
> > While looking at multi_insert, I've realized it's mostly exactly what
> > the new "batching insert" API function would need to be. But it's only
> > really used in COPY, so I wonder if we should just abandon the idea of
> > batching INSERTs and do batching COPY for FDW tables.

> I think we have find out which performs
> better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side. (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com

Best Wishes,

Ashutosh

pgsql-hackers by date:

From: Tom Lane
Date: 30 June 2020, 07:20:00
Subject: Re: Use of "long" in incremental sort code

From: David Rowley
Date: 30 June 2020, 07:24:00
Subject: Re: Use of "long" in incremental sort code

Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

Previous

Next