Re: Improvements in Copy From - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Improvements in Copy From
Date
Msg-id 20200911.155804.359271394064499501.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Improvements in Copy From  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
At Thu, 10 Sep 2020 21:55:27 +0300, Surafel Temesgen <surafel3000@gmail.com> wrote in 
> On Thu, Sep 10, 2020 at 1:17 PM vignesh C <vignesh21@gmail.com> wrote:
> 
> >
> > >
> > > We have a patch for column matching feature [1] that may need a header
> > line to be further processed. Even without that I think it is preferable to
> > process the header line for nothing than adding those checks to the loop,
> > performance-wise.
> >
> > I had seen that patch, I feel that change to match the header if the
> > header is specified can be addressed in this patch if that patch gets
> > committed first or vice versa. We are doing a lot of processing for
> > the data which we need not do anything. Shouldn't this be skipped if
> > not required. Similar check is present in NextCopyFromRawFields also
> > to skip header.
> >
> 
> The existing check is unavoidable but we can live better without the checks
> added by the patch. For very large files the loop may iterate millions of
> times if it is not in billion and I am sure doing the check that many times
> will incur noticeable performance degradation than further processing a
> single line.

FWIW, I thought the same thing seeing the additional if-conditions. It
gives more loss than gain.

For the first part, the patch reveals COPY_NEW_FE, which I don't think
to be a knowledge for the function, to CopyGetData. Considering that
that doesn't seem to offer noticeable performance gain, I don't think
we should do that. On the contrary, if incoming data were
intermittently delayed for some reasons (heavy load of client or
in-between network), this patch would make things worse by waiting for
delayed bits before processing already received bits.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Ian Barwick
Date:
Subject: Corner-case bug in pg_rewind
Next
From: Michael Paquier
Date:
Subject: Re: Range checks of pg_test_fsync --secs-per-test and pg_test_timing --duration