Thread: psql copy command - 1 char limitation on delimiter

psql copy command - 1 char limitation on delimiter

From
rey
Date:

Hi,

delimiter

The single character that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.


Why limit this to a single character?
For easy of use, should it be lifted, so we can specify such as '|*|'?

Rey

Re: psql copy command - 1 char limitation on delimiter

From
Tom Lane
Date:
rey <reywang@optonline.net> writes:
> Why limit this to a single character?

Performance.  Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

            regards, tom lane

Re: psql copy command - 1 char limitation on delimiter

From
rey
Date:
On 09/25/2010 10:03 AM, Tom Lane wrote:
> rey<reywang@optonline.net>  writes:
>
>> Why limit this to a single character?
>>
> Performance.  Believe it or not, breaking fields at the delimiter is
> a significant factor in COPY speed.
>
>             regards, tom lane
>
>
True, but just for 5% to 10% degradation here.
For RDBMS, correct indexes and good logical design we are talking about
10 times or more performance gains.

Who cares about 10% waste here? Is it Oracle and other commercial RDBMS
no such limitation.


Re: psql copy command - 1 char limitation on delimiter

From
Merlin Moncure
Date:
On Sat, Sep 25, 2010 at 3:12 PM, rey <reywang@optonline.net> wrote:
> On 09/25/2010 10:03 AM, Tom Lane wrote:
>>
>> rey<reywang@optonline.net>  writes:
>>
>>>
>>> Why limit this to a single character?
>>>
>>
>> Performance.  Believe it or not, breaking fields at the delimiter is
>> a significant factor in COPY speed.
>>
>>                        regards, tom lane
>>
>>
>
> True, but just for 5% to 10% degradation here.
> For RDBMS, correct indexes and good logical design we are talking about 10
> times or more performance gains.
>
> Who cares about 10% waste here? Is it Oracle and other commercial RDBMS no
> such limitation.

Believe it or not, data loading performance is one of the most common
standard metrics people used to benchmark databases.  A large class of
applications need to slam data in the db as quickly as possible, do
some work, and slam it out/dump it.  Copy performance matters.

merlin

Re: psql copy command - 1 char limitation on delimiter

From
Steve Crawford
Date:
On 09/25/2010 07:03 AM, Tom Lane wrote:
> rey<reywang@optonline.net>  writes:
>
>> Why limit this to a single character?
>>
> Performance.  Believe it or not, breaking fields at the delimiter is
> a significant factor in COPY speed.
>
>             regards, tom lane
>
>
I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?

Cheers,
Steve


Re: psql copy command - 1 char limitation on delimiter

From
Bruce Momjian
Date:
Steve Crawford wrote:
> On 09/25/2010 07:03 AM, Tom Lane wrote:
> > rey<reywang@optonline.net>  writes:
> >
> >> Why limit this to a single character?
> >>
> > Performance.  Believe it or not, breaking fields at the delimiter is
> > a significant factor in COPY speed.
> >
> >             regards, tom lane
> >
> >
> I agree that that multi-character (or even regex) delimiters would be
> useful. Would it be reasonable for the copy process to differentiate
> between single character delimiters which could be processed in
> "high-speed" mode and multi-character or regex delimiters which would be
> available as needed albeit at the expense of a performance hit?

I am not sure you are aware but Postgres never confuses delimiters from
data because it uses a backslash before literal data that matches
delimiters.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: psql copy command - 1 char limitation on delimiter

From
Steve Crawford
Date:
On 10/12/2010 08:28 PM, Bruce Momjian wrote:
> Steve Crawford wrote:
>
>> On 09/25/2010 07:03 AM, Tom Lane wrote:
>>
>>> rey<reywang@optonline.net>   writes:
>>>
>>>
>>>> Why limit this to a single character?
>>>>
>>>>
>>> Performance.  Believe it or not, breaking fields at the delimiter is
>>> a significant factor in COPY speed.
>>>
>>>             regards, tom lane
>>>
>>>
>>>
>> I agree that that multi-character (or even regex) delimiters would be
>> useful. Would it be reasonable for the copy process to differentiate
>> between single character delimiters which could be processed in
>> "high-speed" mode and multi-character or regex delimiters which would be
>> available as needed albeit at the expense of a performance hit?
>>
> I am not sure you are aware but Postgres never confuses delimiters from
> data because it uses a backslash before literal data that matches
> delimiters.
>
>
Yes, I am. But the discussion was about using multi-character strings as
delimiters.

But while I have encountered files using multiple-character delimiters,
I'm finding myself leaning toward the camp that says that such cases are
better processed externally by Perl/Python/sed/awk/Ruby/ETL/etc.
Especially given the "fun" of defining how to properly escape a
regex-matching string in a regex delimited file.

Cheers,
Steve