Undocumented feature costs a lot of performance in COPY IN - Mailing list pgsql-hackers

From Tom Lane
Subject Undocumented feature costs a lot of performance in COPY IN
Date
Msg-id 2841.1007495345@sss.pgh.pa.us
Whole thread Raw
Responses Re: Undocumented feature costs a lot of performance in COPY  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Undocumented feature costs a lot of performance in  (Bill Studenmund <wrstuden@netbsd.org>)
List pgsql-hackers
I have been fooling around profiling various ways of inserting wide
(8000-byte, not all that wide) bytea fields, per Brent Verner's note
of a few days ago.  COPY IN should be, and is, the fastest way to
do it.  But I was rather startled to discover that 25% of the runtime
of COPY IN went to an inefficient way of fetching single bytes from
pqcomm.c (pq_getbytes(&ch, 1) instead of ch = pq_getbyte()), and
20% of what's left after fixing that is going into the strchr() call
in CopyReadAttribute.

Now the point of that strchr() call is to detect whether the current
character is the column delimiter.  The COPY reference page clearly
says:
By default, a text copy uses a tab ("\t") character as adelimiter between fields. The field delimiter may be changed
toanyother single character with the keyword phrase USINGDELIMITERS. Characters in data fields which happen to match
thedelimitercharacter will be backslash quoted. Note that thedelimiter is always a single character. If multiple
charactersarespecified in the delimiter string, only the first characteris used.
 

and indeed, only the first character is used by COPY OUT.  But COPY IN
is presently coded so that if multiple characters are mentioned in
USING DELIMITERS, any one of them will be taken as a field delimiter.

I would like to change the code to just "if (c == delim[0])",
which should buy back most of that 20% and make the behavior match the
documentation.  Question for the list: is this a bad change?  Is anyone
out there actually using this undocumented behavior?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] Problem (bug?) with like
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] Problem (bug?) with like