Thread: psql copy command - 1 char limitation on delimiter
Hi,
- delimiter
The single character that separates columns within each row (line) of the file. The default is a tab character in text mode, a comma in CSV mode.
Why limit this to a single character?
For easy of use, should it be lifted, so we can specify such as '|*|'?
Rey
rey <reywang@optonline.net> writes: > Why limit this to a single character? Performance. Believe it or not, breaking fields at the delimiter is a significant factor in COPY speed. regards, tom lane
On 09/25/2010 10:03 AM, Tom Lane wrote: > rey<reywang@optonline.net> writes: > >> Why limit this to a single character? >> > Performance. Believe it or not, breaking fields at the delimiter is > a significant factor in COPY speed. > > regards, tom lane > > True, but just for 5% to 10% degradation here. For RDBMS, correct indexes and good logical design we are talking about 10 times or more performance gains. Who cares about 10% waste here? Is it Oracle and other commercial RDBMS no such limitation.
On Sat, Sep 25, 2010 at 3:12 PM, rey <reywang@optonline.net> wrote: > On 09/25/2010 10:03 AM, Tom Lane wrote: >> >> rey<reywang@optonline.net> writes: >> >>> >>> Why limit this to a single character? >>> >> >> Performance. Believe it or not, breaking fields at the delimiter is >> a significant factor in COPY speed. >> >> regards, tom lane >> >> > > True, but just for 5% to 10% degradation here. > For RDBMS, correct indexes and good logical design we are talking about 10 > times or more performance gains. > > Who cares about 10% waste here? Is it Oracle and other commercial RDBMS no > such limitation. Believe it or not, data loading performance is one of the most common standard metrics people used to benchmark databases. A large class of applications need to slam data in the db as quickly as possible, do some work, and slam it out/dump it. Copy performance matters. merlin
On 09/25/2010 07:03 AM, Tom Lane wrote: > rey<reywang@optonline.net> writes: > >> Why limit this to a single character? >> > Performance. Believe it or not, breaking fields at the delimiter is > a significant factor in COPY speed. > > regards, tom lane > > I agree that that multi-character (or even regex) delimiters would be useful. Would it be reasonable for the copy process to differentiate between single character delimiters which could be processed in "high-speed" mode and multi-character or regex delimiters which would be available as needed albeit at the expense of a performance hit? Cheers, Steve
Steve Crawford wrote: > On 09/25/2010 07:03 AM, Tom Lane wrote: > > rey<reywang@optonline.net> writes: > > > >> Why limit this to a single character? > >> > > Performance. Believe it or not, breaking fields at the delimiter is > > a significant factor in COPY speed. > > > > regards, tom lane > > > > > I agree that that multi-character (or even regex) delimiters would be > useful. Would it be reasonable for the copy process to differentiate > between single character delimiters which could be processed in > "high-speed" mode and multi-character or regex delimiters which would be > available as needed albeit at the expense of a performance hit? I am not sure you are aware but Postgres never confuses delimiters from data because it uses a backslash before literal data that matches delimiters. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 10/12/2010 08:28 PM, Bruce Momjian wrote: > Steve Crawford wrote: > >> On 09/25/2010 07:03 AM, Tom Lane wrote: >> >>> rey<reywang@optonline.net> writes: >>> >>> >>>> Why limit this to a single character? >>>> >>>> >>> Performance. Believe it or not, breaking fields at the delimiter is >>> a significant factor in COPY speed. >>> >>> regards, tom lane >>> >>> >>> >> I agree that that multi-character (or even regex) delimiters would be >> useful. Would it be reasonable for the copy process to differentiate >> between single character delimiters which could be processed in >> "high-speed" mode and multi-character or regex delimiters which would be >> available as needed albeit at the expense of a performance hit? >> > I am not sure you are aware but Postgres never confuses delimiters from > data because it uses a backslash before literal data that matches > delimiters. > > Yes, I am. But the discussion was about using multi-character strings as delimiters. But while I have encountered files using multiple-character delimiters, I'm finding myself leaning toward the camp that says that such cases are better processed externally by Perl/Python/sed/awk/Ruby/ETL/etc. Especially given the "fun" of defining how to properly escape a regex-matching string in a regex delimited file. Cheers, Steve