RE: [POC] Fast COPY FROM command for the table with foreign partitions - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: [POC] Fast COPY FROM command for the table with foreign partitions
Date
Msg-id TYAPR01MB2990D01994E41A47D69F3F0EFE8E9@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [POC] Fast COPY FROM command for the table with foreign partitions  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
Responses Re: [POC] Fast COPY FROM command for the table with foreign partitions
List pgsql-hackers
From: Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>
> On 2/9/21 9:35 AM, tsunakawa.takay@fujitsu.com wrote:
> > * Why is a separate FDW connection established for each COPY?  To avoid
> using the same FDW connection for multiple foreign table partitions in a single
> COPY run?
> With separate connection you can init a 'COPY FROM' session for each
> foreign partition just one time on partition initialization.
> >
> > * In what kind of test did you get 2-4x performance gain?  COPY into many
> foreign table partitions where the input rows are ordered randomly enough that
> many rows don't accumulate in the COPY buffer?
> I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate
> test data and HASH partitioning to avoid skews.

I guess you used many hash partitions.  Sadly, The current COPY implementation only accumulates either 1,000 rows or 64
KBof input data (very small!) before flushing all CopyMultiInsertBuffers.  One CopyMultiInsertBuffer corresponds to one
partition. Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, which connects to a remote database, runs
COPYFROM STDIN, and disconnects.  Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes first) is
sosmall that if there are many target partitions, the amount of data for each partition is small.
 

Looking at the triggering threshold values, the description (of MAX_BUFFERED_TUPLES at least) seems to indicate that
theytake effect per CopyMultiInsertBuffer:
 


/*
 * No more than this many tuples per CopyMultiInsertBuffer
 *
 * Caution: Don't make this too big, as we could end up with this many
 * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
 * multiInsertBuffers list.  Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
 * number of partitions.
 */
#define MAX_BUFFERED_TUPLES     1000

/*
 * Flush buffers if there are >= this many bytes, as counted by the input
 * size, of tuples stored.
 */
#define MAX_BUFFERED_BYTES      65535


But these threshold take effect across all CopyMultiInsertBuffers:


/*
 * Returns true if the buffers are full
 */
static inline bool
CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
{
    if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
        miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
        return true;
    return false;
}


So, I think the direction to take is to allow more data to accumulate before flushing.  I'm not very excited about the
way0003 and 0004 establishes a new connection for each partition; it adds flags to many places, and
postgresfdw_xact_callback()has to be aware of COPY-specific processing.  Plus, we have to take care of the message
differenceyou found in the regression test.
 

Why don't we focus on committing the basic part and addressing the extended part (0003 and 0004) separately later?  As
Tang-sanand you showed, the basic part already demonstrated impressive improvement.  If there's no objection, I'd like
tomake this ready for committer in a few days.
 


Regards
Takayuki Tsunakawa



pgsql-hackers by date:

Previous
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: parse mistake in ecpg connect string
Next
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?