Re: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: POC: postgres_fdw insert batching
Date
Msg-id f0fc3678-699a-2bc5-d238-928c5b2382d2@enterprisedb.com
Whole thread Raw
In response to RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
RE: POC: postgres_fdw insert batching  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
Hi,

Thanks for working on this!

On 11/10/20 1:45 AM, tsunakawa.takay@fujitsu.com wrote:
> Hello,
> 
> 
> The attached patch implements the new bulk insert routine for
> postgres_fdw and the executor utilizing it.  It passes make
> check-world.
> 

I haven't done any testing yet, just a quick review.

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.


> I measured performance in a basic non-partitioned case by modifying
> Tomas-san's scripts.  They perform an INSERT SELECT statement that
> copies one million records.  The table consists of two integer
> columns, with a primary key on one of those them.  You can run the
> attached prepare.sql to set up once.  local.sql inserts to the table
> directly, while fdw.sql inserts through a foreign table.
> 
> The performance results, the average time of 5 runs,  were as follows
> on a Linux host where the average round-trip time of "ping localhost"
> was 34 us:
> 
> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
> 11.1 seconds (11x improvement)
> 

Nice. I think we can't really get much closer to local master, so 6.1
vs. 11.1 seconds look quite acceptable.

> 
> The patch accumulates at most 100 records in ModifyTableState before
> inserting in bulk.  Also, when an input record is targeted for a
> different relation (= partition) than that for already accumulated
> records, insert the accumulated records and store the new record for
> later insert.
> 
> [Issues]
> 
> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
> (integer), to control the number of records inserted at once? The
> range of allowed values would be between 1 and 1,000.  1 disables
> bulk insert. The possible reason of the need for this kind of
> parameter would be to limit the amount of memory used for accumulated
> records, which could be prohibitively large if each record is big.  I
> don't think this is a must, but I think we can have it.
> 

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

> 2. Should we accumulate records per relation in ResultRelInfo
> instead? That is, when inserting into a partitioned table that has
> foreign partitions, delay insertion until a certain number of input
> records accumulate, and then insert accumulated records per relation
> (e.g., 50 records to relation A, 30 records to relation B, and 20
> records to relation C.)  If we do that,
> 

I think there's a chunk of text missing here? If we do that, then what?

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

> * The order of insertion differs from the order of input records.  Is
> it OK?
> 

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

> * Should the maximum count of accumulated records be applied per
> relation or the query? When many foreign partitions belong to a
> partitioned table, if the former is chosen, it may use much memory in
> total.  If the latter is chosen, the records per relation could be
> few and thus the benefit of bulk insert could be small.
> 

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Georgios Kokolatos
Date:
Subject: Re: Add session statistics to pg_stat_database
Next
From: Georgios Kokolatos
Date:
Subject: Re: Display individual query in pg_stat_activity