Thread: Copy From suggestion
Hello all, Firstly, I apologise if this is not the correct list for this subject. Lately, I've been working on a data conversion, importing into Postgres using Copy From. The text file I'm copying from is produced from an ancient program and produces either a tab or semi-colon delimited file. One file contains about 1.8M rows and has a 'comments' column. The exporting program, which I am forced to use, does not surround this column with quotes and this column contains cr/lf characters, which I must deal with (and have dealt with) before I can import the file via Copy. Hence to my suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was 100% confident that all columns are accounted for in the input file, could be used to alleviate the need to deal with cr/lf's in varchar and text columns. i.e., if copy loaded a line with fewer delimiters than delimiter_count, the next line from the text file would be read and the assignment of columns would continue for the current row/column. Just curious as to the thoughts out there. Thanks to all for this excellent product, and a merry Christmas/holiday period to all. Mark Watson
On Friday 17 December 2010 7:46:12 am Mark Watson wrote: > Hello all, > Firstly, I apologise if this is not the correct list for this subject. > Lately, I've been working on a data conversion, importing into Postgres > using Copy From. The text file I'm copying from is produced from an ancient > program and produces either a tab or semi-colon delimited file. One file > contains about 1.8M rows and has a 'comments' column. The exporting > program, which I am forced to use, does not surround this column with > quotes and this column contains cr/lf characters, which I must deal with > (and have dealt with) before I can import the file via Copy. Hence to my > suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was > 100% confident that all columns are accounted for in the input file, could > be used to alleviate the need to deal with cr/lf's in varchar and text > columns. i.e., if copy loaded a line with fewer delimiters than > delimiter_count, the next line from the text file would be read and the > assignment of columns would continue for the current row/column. > Just curious as to the thoughts out there. > Thanks to all for this excellent product, and a merry Christmas/holiday > period to all. > > Mark Watson A suggestion,give pgloader a look; http://pgloader.projects.postgresql.org/ If I am following you it might already have the solution to the multi-line problem. In particular read the History section of the docs. Thanks, -- Adrian Klaver adrian.klaver@gmail.com
Thanks, Adrian,
I’ll try a windows compile of pgloader sometime during the holidays. It’s true that I already have a solution (export <= 65000 row chunks, import into Excel, export via Excel puts quotes around the text columns), but something faster and more efficient would really help in this case.
-Mark
De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion
On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
> Hello all,
> Firstly, I apologise if this is not the correct list for this subject.
> Lately, I've been working on a data conversion, importing into Postgres
> using Copy From. The text file I'm copying from is produced from an ancient
> program and produces either a tab or semi-colon delimited file. One file
> contains about 1.8M rows and has a 'comments' column. The exporting
> program, which I am forced to use, does not surround this column with
> quotes and this column contains cr/lf characters, which I must deal with
> (and have dealt with) before I can import the file via Copy. Hence to my
> suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
> 100% confident that all columns are accounted for in the input file, could
> be used to alleviate the need to deal with cr/lf's in varchar and text
> columns. i.e., if copy loaded a line with fewer delimiters than
> delimiter_count, the next line from the text file would be read and the
> assignment of columns would continue for the current row/column.
> Just curious as to the thoughts out there.
> Thanks to all for this excellent product, and a merry Christmas/holiday
> period to all.
>
> Mark Watson
A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/
If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.
Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10
--
Jorge Godoy <jgodoy@gmail.com>
Thanks, Adrian,
I’ll try a windows compile of pgloader sometime during the holidays. It’s true that I already have a solution (export <= 65000 row chunks, import into Excel, export via Excel puts quotes around the text columns), but something faster and more efficient would really help in this case.
-Mark
De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion
On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
> Hello all,
> Firstly, I apologise if this is not the correct list for this subject.
> Lately, I've been working on a data conversion, importing into Postgres
> using Copy From. The text file I'm copying from is produced from an ancient
> program and produces either a tab or semi-colon delimited file. One file
> contains about 1.8M rows and has a 'comments' column. The exporting
> program, which I am forced to use, does not surround this column with
> quotes and this column contains cr/lf characters, which I must deal with
> (and have dealt with) before I can import the file via Copy. Hence to my
> suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
> 100% confident that all columns are accounted for in the input file, could
> be used to alleviate the need to deal with cr/lf's in varchar and text
> columns. i.e., if copy loaded a line with fewer delimiters than
> delimiter_count, the next line from the text file would be read and the
> assignment of columns would continue for the current row/column.
> Just curious as to the thoughts out there.
> Thanks to all for this excellent product, and a merry Christmas/holiday
> period to all.
>
> Mark Watson
A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/
If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.
Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-generalNo virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10
On Monday 20. December 2010 15.24.58 Jorge Godoy wrote: > With OpenOffice.org that 65K limit goes away as well... > > I don't know why it is still like that today for MS Office... It is almost > 2011 and they still think 64K is enough? :-) Maybe there's an uncrippled «Professional» or «Enterprise» version costing an arm and a leg? ;) regards, Leif B. Kristensen
On Monday 20 December 2010 7:09:23 am Leif Biberg Kristensen wrote: > On Monday 20. December 2010 15.24.58 Jorge Godoy wrote: > > With OpenOffice.org that 65K limit goes away as well... > > > > I don't know why it is still like that today for MS Office... It is > > almost > > > 2011 and they still think 64K is enough? :-) > > Maybe there's an uncrippled «Professional» or «Enterprise» version > costing an arm and a leg? ;) > > regards, > Leif B. Kristensen FYI with Office 2007 that limit went to a little over 1 million rows. -- Adrian Klaver adrian.klaver@gmail.com