RE: [POC] Fast COPY FROM command for the table with foreign partitions - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: [POC] Fast COPY FROM command for the table with foreign partitions |
Date | |
Msg-id | TYAPR01MB2990D01994E41A47D69F3F0EFE8E9@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [POC] Fast COPY FROM command for the table with foreign partitions ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Responses |
Re: [POC] Fast COPY FROM command for the table with foreign partitions
|
List | pgsql-hackers |
From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> > On 2/9/21 9:35 AM, tsunakawa.takay@fujitsu.com wrote: > > * Why is a separate FDW connection established for each COPY? To avoid > using the same FDW connection for multiple foreign table partitions in a single > COPY run? > With separate connection you can init a 'COPY FROM' session for each > foreign partition just one time on partition initialization. > > > > * In what kind of test did you get 2-4x performance gain? COPY into many > foreign table partitions where the input rows are ordered randomly enough that > many rows don't accumulate in the COPY buffer? > I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate > test data and HASH partitioning to avoid skews. I guess you used many hash partitions. Sadly, The current COPY implementation only accumulates either 1,000 rows or 64 KBof input data (very small!) before flushing all CopyMultiInsertBuffers. One CopyMultiInsertBuffer corresponds to one partition. Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, which connects to a remote database, runs COPYFROM STDIN, and disconnects. Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes first) is sosmall that if there are many target partitions, the amount of data for each partition is small. Looking at the triggering threshold values, the description (of MAX_BUFFERED_TUPLES at least) seems to indicate that theytake effect per CopyMultiInsertBuffer: /* * No more than this many tuples per CopyMultiInsertBuffer * * Caution: Don't make this too big, as we could end up with this many * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's * multiInsertBuffers list. Increasing this can cause quadratic growth in * memory requirements during copies into partitioned tables with a large * number of partitions. */ #define MAX_BUFFERED_TUPLES 1000 /* * Flush buffers if there are >= this many bytes, as counted by the input * size, of tuples stored. */ #define MAX_BUFFERED_BYTES 65535 But these threshold take effect across all CopyMultiInsertBuffers: /* * Returns true if the buffers are full */ static inline bool CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo) { if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES || miinfo->bufferedBytes >= MAX_BUFFERED_BYTES) return true; return false; } So, I think the direction to take is to allow more data to accumulate before flushing. I'm not very excited about the way0003 and 0004 establishes a new connection for each partition; it adds flags to many places, and postgresfdw_xact_callback()has to be aware of COPY-specific processing. Plus, we have to take care of the message differenceyou found in the regression test. Why don't we focus on committing the basic part and addressing the extended part (0003 and 0004) separately later? As Tang-sanand you showed, the basic part already demonstrated impressive improvement. If there's no objection, I'd like tomake this ready for committer in a few days. Regards Takayuki Tsunakawa
pgsql-hackers by date: