Re: [PATCH] Performance Improvement For Copy From Binary Files - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: [PATCH] Performance Improvement For Copy From Binary Files
Date
Msg-id CALj2ACUefSOG=ffyhN4gNA3pCOhFwC8tpVFcBXY6utfvivYLMw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Performance Improvement For Copy From Binary Files  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: [PATCH] Performance Improvement For Copy From Binary Files
List pgsql-hackers
>
> > This is due to the recent commit
> > cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the
> > raw_buf and line_buf allocations for binary files. Since we are using
> > raw_buf for this performance improvement feature, now, it's enough to
> > restrict only line_buf for binary files. I made the changes
> > accordingly in the v3 patch attached here.
> >
> > Regression tests(make check & make check-world) ran cleanly with the v3 patch.
>
> Thank you Bharath.  I was a bit surprised that you had also submitted
> a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-)
>

Yes that was by me, before I started to work on this feature. I think
we can backpatch that change(assuming we don't backpatch this
feature), I will make the request accordingly.

Anyways, now we don't allow line_buf allocation for binary files,
which is also a good thing.

>
> > > > For the case where required nbytes may not fit into the buffer in
> > > > CopyReadFromRawBuf, I'm sure this can happen only for field data,
> > > > (field count , and field size are of fixed length and can fit in the
> > > > buffer), instead of reading them in parts of buff size into the buffer
> > > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> > > > destination, I think we can detect this condition using requested
> > > > bytes and the buffer size and directly read from the file to the
> > > > destination buffer and then reload the raw_buffer for further
> > > > processing. I think this way, it will be good.
> > >
> > > Hmm, I'm afraid that this will make the code more complex for
> > > apparently small benefit.  Is this really that much of a problem
> > > performance wise?
> > >
> >
> > Yes it makes CopyReadFromRawBuf(), code a bit complex from a
> > readability perspective. I'm convinced not to have the
> > abovementioned(by me) change, due to 3 reasons,1)  the
> > readability/understandability 2)  how many use cases can we have where
> > requested field size greater than RAW_BUF_SIZE(64KB)? I think very few
> > cases. I may be wrong here. 3) Performance wise it may not be much as
> > we do one extra memcpy only in situations where field sizes are
> > greater than 64KB(as we have already seen and observed by you as well
> > in one of the response [1]) that memcpy cost for this case may be
> > negligible.
>
> Actually, an extra memcpy is incurred on every call of
> CopyReadFromRawBuf(), but I haven't seen it to be very problematic.
>

Yes.

>
> CopyReadFromRawBuf as a name for the new function might be misleading
> as long as we are only using it for binary data.  Maybe
> CopyReadBinaryData is more appropriate?  See attached v4 with these
> and a few other cosmetic changes.
>

CopyReadBinaryData() looks meaningful. +1.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [PATCH] Performance Improvement For Copy From Binary Files
Next
From: Fujii Masao
Date:
Subject: Re: Creating a function for exposing memory usage of backend process