Home > mailing lists

RE: Parallel copy - Mailing list pgsql-hackers

From	Hou, Zhijie
Subject	RE: Parallel copy
Date	December 7, 2020 09:30:26
Msg-id	7182ec632f944be4b383ffd4fae14aa9@G08CNEXMBPEKD05.g08.fujitsu.local Whole thread Raw
In response to	Re: Parallel copy (vignesh C <vignesh21@gmail.com>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

> > 4.
> > A suggestion for CacheLineInfo.
> >
> > It use appendBinaryStringXXX to store the line in memory.
> > appendBinaryStringXXX will double the str memory when there is no enough
> spaces.
> >
> > How about call enlargeStringInfo in advance, if we already know the whole
> line size?
> > It can avoid some memory waste and may impove a little performance.
> >
> 
> Here we will not know the size beforehand, in some cases we will start
> processing the data when current block is populated and keep processing
> block by block, we will come to know of the size at the end. We cannot use
> enlargeStringInfo because of this.
> 
> Attached v11 patch has the fix for this, it also includes the changes to
> rebase on top of head.

Thanks for the explanation.

I think there is still chances we can know the size.

+         * line_size will be set. Read the line_size again to be sure if it is
+         * completed or partial block.
+         */
+        dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+        if (dataSize != -1)
+        {

If I am not wrong, this seems the branch that procsssing the populated block.
I think we can check the copiedSize here, if copiedSize == 0, that means
Datasizes is the size of the whole line and in this case we can do the enlarge.


Best regards,
houzj

pgsql-hackers by date:

From: Amit Kapila
Date: 07 December 2020, 09:26:37
Subject: Re: Parallel Inserts in CREATE TABLE AS

From: Dean Rasheed
Date: 07 December 2020, 09:56:00
Subject: Re: PoC/WIP: Extended statistics on expressions

RE: Parallel copy - Mailing list pgsql-hackers

Previous

Next