Thread: Copy From suggestion

Copy From suggestion

From

"Mark Watson"

Date:

17 December 2010, 12:05:37

Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting program,
which I am forced to use, does not surround this column with quotes and this
column contains cr/lf characters, which I must deal with (and have dealt
with) before I can import the file via Copy. Hence to my suggestion:
I was envisioning a parameter DELIMITER_COUNT which, if one was 100%
confident that all columns are accounted for in the input file, could be
used to alleviate the need to deal with cr/lf's in varchar and text columns.
i.e., if copy loaded a line with fewer delimiters than delimiter_count, the
next line from the text file would be read and the assignment of columns
would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.

Mark Watson

Re: Copy From suggestion

From

Adrian Klaver

Date:

18 December 2010, 19:05:24

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
> Hello all,
> Firstly, I apologise if this is not the correct list for this subject.
> Lately, I've been working on a data conversion, importing into Postgres
> using Copy From. The text file I'm copying from is produced from an ancient
> program and produces either a tab or semi-colon delimited file. One file
> contains about 1.8M rows and has a 'comments' column. The exporting
> program, which I am forced to use, does not surround this column with
> quotes and this column contains cr/lf characters, which I must deal with
> (and have dealt with) before I can import the file via Copy. Hence to my
> suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
> 100% confident that all columns are accounted for in the input file, could
> be used to alleviate the need to deal with cr/lf's in varchar and text
> columns. i.e., if copy loaded a line with fewer delimiters than
> delimiter_count, the next line from the text file would be read and the
> assignment of columns would continue for the current row/column.
> Just curious as to the thoughts out there.
> Thanks to all for this excellent product, and a merry Christmas/holiday
> period to all.
>
> Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.


Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

Re: Copy From suggestion

From

"Mark Watson"

Date:

20 December 2010, 09:50:00

Thanks, Adrian,

I’ll try a windows compile of pgloader sometime during the holidays. It’s true that I already have a solution (export <= 65000 row chunks, import into Excel, export via Excel puts quotes around the text columns), but something faster and more efficient would really help in this case.

-Mark

De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
> Hello all,
> Firstly, I apologise if this is not the correct list for this subject.
> Lately, I've been working on a data conversion, importing into Postgres
> using Copy From. The text file I'm copying from is produced from an ancient
> program and produces either a tab or semi-colon delimited file. One file
> contains about 1.8M rows and has a 'comments' column. The exporting
> program, which I am forced to use, does not surround this column with
> quotes and this column contains cr/lf characters, which I must deal with
> (and have dealt with) before I can import the file via Copy. Hence to my
> suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
> 100% confident that all columns are accounted for in the input file, could
> be used to alleviate the need to deal with cr/lf's in varchar and text
> columns. i.e., if copy loaded a line with fewer delimiters than
> delimiter_count, the next line from the text file would be read and the
> assignment of columns would continue for the current row/column.
> Just curious as to the thoughts out there.
> Thanks to all for this excellent product, and a merry Christmas/holiday
> period to all.
>
> Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.

Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10

Re: Copy From suggestion

From

Jorge Godoy

Date:

20 December 2010, 10:25:35

With OpenOffice.org that 65K limit goes away as well...

I don't know why it is still like that today for MS Office... It is almost 2011 and they still think 64K is enough? :-)

--
Jorge Godoy <jgodoy@gmail.com>

On Mon, Dec 20, 2010 at 11:49, Mark Watson <mark.watson@jurisconcept.ca> wrote:

Thanks, Adrian,
I’ll try a windows compile of pgloader sometime during the holidays. It’s true that I already have a solution (export <= 65000 row chunks, import into Excel, export via Excel puts quotes around the text columns), but something faster and more efficient would really help in this case.
-Mark
De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
> Hello all,
> Firstly, I apologise if this is not the correct list for this subject.
> Lately, I've been working on a data conversion, importing into Postgres
> using Copy From. The text file I'm copying from is produced from an ancient
> program and produces either a tab or semi-colon delimited file. One file
> contains about 1.8M rows and has a 'comments' column. The exporting
> program, which I am forced to use, does not surround this column with
> quotes and this column contains cr/lf characters, which I must deal with
> (and have dealt with) before I can import the file via Copy. Hence to my
> suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
> 100% confident that all columns are accounted for in the input file, could
> be used to alleviate the need to deal with cr/lf's in varchar and text
> columns. i.e., if copy loaded a line with fewer delimiters than
> delimiter_count, the next line from the text file would be read and the
> assignment of columns would continue for the current row/column.
> Just curious as to the thoughts out there.
> Thanks to all for this excellent product, and a merry Christmas/holiday
> period to all.
>
> Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.

Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10

Re: Copy From suggestion

From

Leif Biberg Kristensen

Date:

20 December 2010, 11:09:34

On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:
> With OpenOffice.org that 65K limit goes away as well...
>
> I don't know why it is still like that today for MS Office...  It is
almost
> 2011 and they still think 64K is enough? :-)

Maybe there's an uncrippled «Professional» or «Enterprise» version
costing an arm and a leg? ;)

regards,
Leif B. Kristensen

Re: Copy From suggestion

From

Adrian Klaver

Date:

20 December 2010, 11:24:10

On Monday 20 December 2010 7:09:23 am Leif Biberg Kristensen wrote:
> On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:
> > With OpenOffice.org that 65K limit goes away as well...
> >
> > I don't know why it is still like that today for MS Office...  It is
>
> almost
>
> > 2011 and they still think 64K is enough? :-)
>
> Maybe there's an uncrippled «Professional» or «Enterprise» version
> costing an arm and a leg? ;)
>
> regards,
> Leif B. Kristensen

FYI with Office 2007 that limit went to a little over 1 million rows.

--
Adrian Klaver
adrian.klaver@gmail.com