Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel copy
Date	October 19, 2020 09:11:04
Msg-id	CAA4eK1+==1q3mk5Lpsyzw5SGDPfv+RLuZ7w47Oo0St6jpbcs+Q@mail.gmail.com Whole thread Raw
In response to	RE: Parallel copy ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi Vignesh,
>
> After having a look over the patch,
> I have some suggestions for
> 0003-Allow-copy-from-command-to-process-data-from-file.patch.
>
> 1.
>
> +static uint32
> +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
> +                                  char **whereClauseStr, char **rangeTableStr,
> +                                  char **attnameListStr, char **notnullListStr,
> +                                  char **nullListStr, char **convertListStr)
> +{
> +       uint32          strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
> +
> +       strsize += EstimateStringSize(cstate->null_print);
> +       strsize += EstimateStringSize(cstate->delim);
> +       strsize += EstimateStringSize(cstate->quote);
> +       strsize += EstimateStringSize(cstate->escape);
>
>
> It use function EstimateStringSize to get the strlen of null_print, delim, quote and escape.
> But the length of null_print seems has been stored in null_print_len.
> And delim/quote/escape must be 1 byte, so I think call strlen again seems unnecessary.
>
> How about  " strsize += sizeof(uint32) + cstate->null_print_len + 1"
>

+1. This seems like a good suggestion but add comments for
delim/quote/escape to indicate that we are considering one-byte for
each. I think this will obviate the need of function
EstimateStringSize. Another thing in this regard is that we normally
use add_size function to compute the size but I don't see that being
used in this and nearby computation. That helps us to detect overflow
of addition if any.

EstimateCstateSize()
{
..
+
+ strsize++;
..
}

Why do we need this additional one-byte increment? Does it make sense
to add a small comment for the same?

> 2.
> +       strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
>
> +       copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr,
> +                                                                                  shmptr + copiedsize);
>
> Some string length is counted for two times.
> The ' whereClauseStr ' has call strlen in EstimateNodeSize once and call strlen in CopyStringToSharedMemory again.
> I don't know wheather it's worth to refacor the code to avoid duplicate strlen . what do you think ?
>

It doesn't seem worth to me. We probably need to use additional
variables to save those lengths. I think it will add more
code/complexity than we will save. See EstimateParamListSpace and
SerializeParamList where we get the typeLen each time, that way code
looks neat to me and we are don't going to save much by not following
a similar thing here.

-- 
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: "tsunakawa.takay@fujitsu.com"
Date: 19 October 2020, 09:07:17
Subject: RE: Transactions involving multiple postgres foreign servers, take 2

From: Peter Eisentraut
Date: 19 October 2020, 09:19:10
Subject: Make procedure OUT parameters work with JDBC

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next