RE: Parallel copy - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel copy
Date
Msg-id 7182ec632f944be4b383ffd4fae14aa9@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel copy
List pgsql-hackers
> > 4.
> > A suggestion for CacheLineInfo.
> >
> > It use appendBinaryStringXXX to store the line in memory.
> > appendBinaryStringXXX will double the str memory when there is no enough
> spaces.
> >
> > How about call enlargeStringInfo in advance, if we already know the whole
> line size?
> > It can avoid some memory waste and may impove a little performance.
> >
> 
> Here we will not know the size beforehand, in some cases we will start
> processing the data when current block is populated and keep processing
> block by block, we will come to know of the size at the end. We cannot use
> enlargeStringInfo because of this.
> 
> Attached v11 patch has the fix for this, it also includes the changes to
> rebase on top of head.

Thanks for the explanation.

I think there is still chances we can know the size.

+         * line_size will be set. Read the line_size again to be sure if it is
+         * completed or partial block.
+         */
+        dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+        if (dataSize != -1)
+        {

If I am not wrong, this seems the branch that procsssing the populated block.
I think we can check the copiedSize here, if copiedSize == 0, that means
Datasizes is the size of the whole line and in this case we can do the enlarge.


Best regards,
houzj





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Dean Rasheed
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions