Re: Undocumented feature costs a lot of performance in COPY IN - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Undocumented feature costs a lot of performance in COPY IN
Date
Msg-id 3040.1007497378@sss.pgh.pa.us
Whole thread Raw
In response to Re: Undocumented feature costs a lot of performance in  (Bill Studenmund <wrstuden@netbsd.org>)
Responses Re: Undocumented feature costs a lot of performance in
List pgsql-hackers
Bill Studenmund <wrstuden@netbsd.org> writes:
> One alternative would be to make the code use different paths for the
> just-one and many delimiter cases. But then COPY OUT would need fixing.

Well, it's not clear what COPY OUT should *do* with multiple
alternatives, anyway.  Pick one at random?  I guess it does that now,
if you consider "always use the first one" as a random choice.  The
real problem is that it will only backslash the first one, too.  That
means that data emitted with DELIMITERS "|_=", say, will fail to be
reloaded correctly if that same DELIMITERS string is given to COPY IN
--- because any _ or = characters in the data won't be backslashed,
but would need to be to keep COPY IN from treating them as delimiters.

For COPY OUT's purposes, a sensible interpretation of a multicharacter
delimiter string would be that the whole string is emitted as the
delimiter.  Eg,
COPY OUT WITH DELIMITERS "<TAB>";
foo<TAB>bar<TAB>baz...

But as long as COPY IN considers that delimiter spec to mean "any one of
these characters", and not a multicharacter string, we couldn't do that.

If we restrict DELIMITERS strings to be exactly one character for a
release or three, we could think about implementing this idea of
multicharacter delimiter strings later on.  Not sure if anyone really
needs it though.  In any case, the current behavior is inconsistent.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [GENERAL] Problem (bug?) with like
Next
From: Bill Studenmund
Date:
Subject: Re: Undocumented feature costs a lot of performance in