Thread: Importing lines of variable length from ASCII

Importing lines of variable length from ASCII

From

"Louis Becker"

Date:

25 February 2010, 10:17:48

I have an ASCII file that I try to import via the COPY command. The ASCII file is pipe delimited. Some lines in the file have no value in the last field and because there is no value in the last field, it also omits the last delimiter. It therefore seems that some lines have 11 columns while some other lines have 10 columns although these 10 columns correspond with the first 10 columns of lines that contain 11 columns. The omission of the last delimiter apparently is an accordance with the CSV files standard.

Apart from editing the ASCII files, is there a way to import these files that contain lines of variable length?

Louis

DISCLAIMER

This message is intended solely for the individual(s) and entity(s) to which it is addressed. It is confidential and may contain legally privileged information. Any unauthorized review, use, copying, storage, disclosure or distribution of this e-mail and any attachments is strictly prohibited. If you are not the named recipient or have otherwise received this communication in error, please destroy this message from your system and kindly notify the sender by e-mail.

Re: Importing lines of variable length from ASCII

From

"A. Kretschmer"

Date:

25 February 2010, 10:52:34

In response to Louis Becker :
> Hi
>
>
>
> I have an ASCII file that I try to import via the COPY command. The ASCII file
> is pipe delimited. Some lines in the file have no value in the last field and
> because there is no value in the last field, it also omits the last delimiter.
> It therefore seems that some lines have 11 columns while some other lines have
> 10 columns although these 10 columns correspond with the first 10 columns of
> lines that contain 11 columns. The omission of the last delimiter apparently is
> an accordance with the CSV files standard.
>
>
>
> Apart from editing the ASCII files, is there a way to import these files that
> contain lines of variable length?

Pipe your file through a filter like my example below:

kretschmer@tux:~$ cat file.txt
foo|bar|1|batz
bar|bla|2
this|is|3|line
kretschmer@tux:~$ cat file.txt | awk  '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}'
foo|bar|1|batz
bar|bla|2|
this|is|3|line
kretschmer@tux:~$


Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Importing lines of variable length from ASCII

From

Andrej

Date:

25 February 2010, 19:36:12

On 26 February 2010 00:52, A. Kretschmer
<andreas.kretschmer@schollglas.com> wrote:
> kretschmer@tux:~$ cat file.txt | awk  '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}'

Slightly shorter variant:
awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt



> Regards, Andreas

Cheers,
Andrej


--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Importing lines of variable length from ASCII

From

"A. Kretschmer"

Date:

26 February 2010, 04:51:40

In response to Andrej :
> On 26 February 2010 00:52, A. Kretschmer
> <andreas.kretschmer@schollglas.com> wrote:
> > kretschmer@tux:~$ cat file.txt | awk  '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}'
>
> Slightly shorter variant:
> awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt

Morning,

Hey, yeah, thx.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Importing lines of variable length from ASCII

From

"Louis Becker"

Date:

26 February 2010, 08:06:52

Hi Andre

Luckily have installed cygwin on my WindowsXP box, so I should be able
to execute your suggested command. Is there not a way for PostGres to
absorb this? I prefer not to pre-process all the time, as this would be
a daily task. If I am just able to adjust the copy query, that would be
preferred. Very few of my colleagues are Unix/Linux capable and most are
command line phobic. If I am able to save the query/command that would
be much easier.

Louis

-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Andrej
Sent: 25 February 2010 22:36
To: A. Kretschmer
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Importing lines of variable length from ASCII

On 26 February 2010 00:52, A. Kretschmer
<andreas.kretschmer@schollglas.com> wrote:
> kretschmer@tux:~$ cat file.txt | awk  '{cols=split($0,a,"|"); if
(cols==4) print $0; else printf ("%s|\n",$0)}'

Slightly shorter variant:
awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt

> Regards, Andreas

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes
concise.

http://www.american.edu/econ/notes/htmlmail.htm

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

DISCLAIMER
This message is intended solely for the individual(s) and entity(s) to which it is addressed.  It is confidential and
maycontain legally privileged information.   
Any unauthorized review, use, copying, storage, disclosure or distribution of this e-mail and any attachments is
strictlyprohibited.  If you are not the named  
recipient or have otherwise received this communication in error, please destroy this message from your system and
kindlynotify the sender by e-mail.

Re: Importing lines of variable length from ASCII

From

"A. Kretschmer"

Date:

26 February 2010, 08:39:57

In response to Louis Becker :
> Hi Andre
>
> Luckily have installed cygwin on my WindowsXP box, so I should be able
> to execute your suggested command. Is there not a way for PostGres to
> absorb this? I prefer not to pre-process all the time, as this would be

Not that i know of.

Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Importing lines of variable length from ASCII

From

Thom Brown

Date:

26 February 2010, 09:05:11

On 26 February 2010 09:06, Louis Becker <Louis.Becker@leo.na> wrote:
> Hi Andre
>
> Luckily have installed cygwin on my WindowsXP box, so I should be able
> to execute your suggested command. Is there not a way for PostGres to
> absorb this? I prefer not to pre-process all the time, as this would be
> a daily task. If I am just able to adjust the copy query, that would be
> preferred. Very few of my colleagues are Unix/Linux capable and most are
> command line phobic. If I am able to save the query/command that would
> be much easier.
>
> Louis
>

It appears that you will either have to fix whatever is generating the
file to have non-variable columns, or import the whole file into a
table in PostgreSQL without delimiters, and process the table to
insert into another table, sort of like this:

CREATE TABLE initial_file_import
(
    data text NOT NULL
);

/* Do your import into initial_file_import here but for now, we'll
just do some inserts */
INSERT INTO initial_file_import VALUES ('stuff|4|more
stuff|2'),('things|2|yay'),('hello|9');

CREATE TABLE file_import
(
    field1 text NOT NULL,
    field2 int NOT NULL,
    field3 text NULL,
    field4 int NULL
);

CREATE FUNCTION split_file_records() RETURNS VOID AS $$
DECLARE importline TEXT;
BEGIN
    FOR importline IN SELECT data FROM initial_file_import LOOP
        INSERT INTO file_import VALUES (
            split_part(importline, '|', 1),
            split_part(importline, '|', 2)::int,
            split_part(importline, '|', 3),
            ('0'||split_part(importline, '|', 4))::int);
END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT split_file_records();

SELECT * FROM file_import;

Note that I prefixed the numeric field which can be absent with a 0 to
prevent issues with blank strings casting to integers.

Thom