Thread: Importing lines of variable length from ASCII
Hi
I have an ASCII file that I try to import via the COPY command. The ASCII file is pipe delimited. Some lines in the file have no value in the last field and because there is no value in the last field, it also omits the last delimiter. It therefore seems that some lines have 11 columns while some other lines have 10 columns although these 10 columns correspond with the first 10 columns of lines that contain 11 columns. The omission of the last delimiter apparently is an accordance with the CSV files standard.
Apart from editing the ASCII files, is there a way to import these files that contain lines of variable length?
Louis
DISCLAIMER
This message is intended solely for the individual(s) and entity(s) to which it is addressed. It is confidential and may contain legally privileged information. Any unauthorized review, use, copying, storage, disclosure or distribution of this e-mail and any attachments is strictly prohibited. If you are not the named recipient or have otherwise received this communication in error, please destroy this message from your system and kindly notify the sender by e-mail.
In response to Louis Becker : > Hi > > > > I have an ASCII file that I try to import via the COPY command. The ASCII file > is pipe delimited. Some lines in the file have no value in the last field and > because there is no value in the last field, it also omits the last delimiter. > It therefore seems that some lines have 11 columns while some other lines have > 10 columns although these 10 columns correspond with the first 10 columns of > lines that contain 11 columns. The omission of the last delimiter apparently is > an accordance with the CSV files standard. > > > > Apart from editing the ASCII files, is there a way to import these files that > contain lines of variable length? Pipe your file through a filter like my example below: kretschmer@tux:~$ cat file.txt foo|bar|1|batz bar|bla|2 this|is|3|line kretschmer@tux:~$ cat file.txt | awk '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}' foo|bar|1|batz bar|bla|2| this|is|3|line kretschmer@tux:~$ Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
On 26 February 2010 00:52, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > kretschmer@tux:~$ cat file.txt | awk '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}' Slightly shorter variant: awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt > Regards, Andreas Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
In response to Andrej : > On 26 February 2010 00:52, A. Kretschmer > <andreas.kretschmer@schollglas.com> wrote: > > kretschmer@tux:~$ cat file.txt | awk '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}' > > Slightly shorter variant: > awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt Morning, Hey, yeah, thx. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
Hi Andre Luckily have installed cygwin on my WindowsXP box, so I should be able to execute your suggested command. Is there not a way for PostGres to absorb this? I prefer not to pre-process all the time, as this would be a daily task. If I am just able to adjust the copy query, that would be preferred. Very few of my colleagues are Unix/Linux capable and most are command line phobic. If I am able to save the query/command that would be much easier. Louis -----Original Message----- From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Andrej Sent: 25 February 2010 22:36 To: A. Kretschmer Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Importing lines of variable length from ASCII On 26 February 2010 00:52, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > kretschmer@tux:~$ cat file.txt | awk '{cols=split($0,a,"|"); if (cols==4) print $0; else printf ("%s|\n",$0)}' Slightly shorter variant: awk -F'|' '{if(NF<4){print $0"|"}else{print $0}}' file.txt > Regards, Andreas Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm -- Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-novice DISCLAIMER This message is intended solely for the individual(s) and entity(s) to which it is addressed. It is confidential and maycontain legally privileged information. Any unauthorized review, use, copying, storage, disclosure or distribution of this e-mail and any attachments is strictlyprohibited. If you are not the named recipient or have otherwise received this communication in error, please destroy this message from your system and kindlynotify the sender by e-mail.
In response to Louis Becker : > Hi Andre > > Luckily have installed cygwin on my WindowsXP box, so I should be able > to execute your suggested command. Is there not a way for PostGres to > absorb this? I prefer not to pre-process all the time, as this would be Not that i know of. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99
On 26 February 2010 09:06, Louis Becker <Louis.Becker@leo.na> wrote: > Hi Andre > > Luckily have installed cygwin on my WindowsXP box, so I should be able > to execute your suggested command. Is there not a way for PostGres to > absorb this? I prefer not to pre-process all the time, as this would be > a daily task. If I am just able to adjust the copy query, that would be > preferred. Very few of my colleagues are Unix/Linux capable and most are > command line phobic. If I am able to save the query/command that would > be much easier. > > Louis > It appears that you will either have to fix whatever is generating the file to have non-variable columns, or import the whole file into a table in PostgreSQL without delimiters, and process the table to insert into another table, sort of like this: CREATE TABLE initial_file_import ( data text NOT NULL ); /* Do your import into initial_file_import here but for now, we'll just do some inserts */ INSERT INTO initial_file_import VALUES ('stuff|4|more stuff|2'),('things|2|yay'),('hello|9'); CREATE TABLE file_import ( field1 text NOT NULL, field2 int NOT NULL, field3 text NULL, field4 int NULL ); CREATE FUNCTION split_file_records() RETURNS VOID AS $$ DECLARE importline TEXT; BEGIN FOR importline IN SELECT data FROM initial_file_import LOOP INSERT INTO file_import VALUES ( split_part(importline, '|', 1), split_part(importline, '|', 2)::int, split_part(importline, '|', 3), ('0'||split_part(importline, '|', 4))::int); END LOOP; END; $$ LANGUAGE plpgsql; SELECT split_file_records(); SELECT * FROM file_import; Note that I prefixed the numeric field which can be absent with a 0 to prevent issues with blank strings casting to integers. Thom