Thread: Need help for import of text file

Need help for import of text file

From

Andreas

Date:

15 December 2012, 22:05:53

Hi,

I need to import textfiles that have 5 columns but there is just blanks
as delimitors.
I could use COPY to read them but there is a time column that shows
times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon.

Problem here is in the morning the first digit of the hour is shown as a
blank so there are 2 blanks before the time so COPY misstakes this as an
empty column and gets confused.

Can someone point me in the direction of an COPY option I'm not aware
of, or alternativly to some console tool that I can put in the batch
before the import step and replace the 2 blanks with 1 blank.

I use an OpenSuse server so some linux tool would do.


regards
Andreas

Re: Need help for import of text file

From

Adrian Klaver

Date:

15 December 2012, 22:21:45

On 12/15/2012 11:06 AM, Andreas wrote:
> Hi,
>
> I need to import textfiles that have 5 columns but there is just blanks
> as delimitors.
> I could use COPY to read them but there is a time column that shows
> times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon.
>
> Problem here is in the morning the first digit of the hour is shown as a
> blank so there are 2 blanks before the time so COPY misstakes this as an
> empty column and gets confused.
>
> Can someone point me in the direction of an COPY option I'm not aware
> of, or alternativly to some console tool that I can put in the batch
> before the import step and replace the 2 blanks with 1 blank.
>
> I use an OpenSuse server so some linux tool would do.

How big a file are we talking about?
I found using the OO/LibreOffice spreadsheet good for this, assuming a
reasonable file size.

If you use the CSV import you can make the columns where you want them
and then save the file with another delimiter(I tend to use the pipe
symbol |). Then use that file with COPY.

>
>
> regards
> Andreas
>
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Need help for import of text file

From

Peter Bex

Date:

16 December 2012, 00:19:39

On Sat, Dec 15, 2012 at 08:06:44PM +0100, Andreas wrote:
> Hi,
>
> Problem here is in the morning the first digit of the hour is shown as a
> blank so there are 2 blanks before the time so COPY misstakes this as an
> empty column and gets confused.
>
> Can someone point me in the direction of an COPY option I'm not aware
> of, or alternativly to some console tool that I can put in the batch
> before the import step and replace the 2 blanks with 1 blank.
>
> I use an OpenSuse server so some linux tool would do.

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

GNU sed also allows in-place editing using -i, so you can avoid
writing it to a second file.  Some seds accept a different flag
to enable extended regexps.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Need help for import of text file

From

Peter Bex

Date:

16 December 2012, 00:25:01

On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:
> A simple sed(1) expression should do the trick:
>
> sed -E 's/ +/ /g' old-file > new-file

I just remembered where I could check, and the GNU sed equivalent is:

sed -r 's/ +/ /g' old-file > new-file

Sorry for the confusion.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Need help for import of text file

From

Andreas

Date:

16 December 2012, 20:48:04

Am 15.12.2012 22:22, schrieb Peter Bex:

On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

I just remembered where I could check, and the GNU sed equivalent is:

sed -r 's/ +/ /g' old-file > new-file

Sorry for the confusion.

With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY

dos2unix
sed -i 's/[ \t]*$//'
sed -i 's/ / /g'

The input files get created by a simple windows batch where I can't change anything.
It uses echo to attach a line of 4 parameters to those textfiles.

How would you manage if one or more of those parameters contained blanks in some cases?
This doesn't appear, yet. But I consider this as luck. :}

The real column formats are ( TEXT, TEXT, DATE, TIME ).

Re: Need help for import of text file

From

Peter Bex

Date:

16 December 2012, 21:15:08

On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:
> With sed as startingpoint I figured it out.
> Those 3 steps make the input files consumable for COPY
>
> 1. dos2unix
> 2. sed -i 's/[ \t]*$//'
> 3. sed -i 's/  / /g'

You can reduce this to one invocation by separating the commands
by a semicolon (or by passing multiple -e flags)

sed -i 's/[ \t]*$//;s/  / /g'

> The input files get created by a simple windows batch where I can't
> change anything.
> It uses echo to attach a line of 4 parameters to those textfiles.
>
> How would you manage if one or more of those parameters contained blanks
> in some cases?
> This doesn't appear, yet. But I consider this as luck.   :}
>
> The real column formats are ( TEXT, TEXT, DATE, TIME ).

Well, that's a bit trickier and my sed skills are rather rusty.
I'd probably use awk for these more complex tasks:

awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }'

The "gsub" command acts like sed's "s" command with the "g" modifier.
By prefixing the block with the gsub command with a regex, it only
acts on that regex.  The regex in this example only looks for an opening
and a closing paren anywhere on the line; you might need to tweak it
to more closely match your case.  Alternatively, you could implement
a counter that skips the four lines (which can be done with both sed
and awk).

If it gets more complex than this, you can always write a proper
program in a "real" language to do it.  This can be easier to maintain.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Need help for import of text file

From

Steve Clark

Date:

17 December 2012, 02:30:32

On 12/16/2012 01:12 PM, Peter Bex wrote:

On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:

With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY

1. dos2unix
2. sed -i 's/[ \t]*$//'
3. sed -i 's/  / /g'

You can reduce this to one invocation by separating the commands
by a semicolon (or by passing multiple -e flags)

sed -i 's/[ \t]*$//;s/  / /g'

The input files get created by a simple windows batch where I can't 
change anything.
It uses echo to attach a line of 4 parameters to those textfiles.

How would you manage if one or more of those parameters contained blanks 
in some cases?
This doesn't appear, yet. But I consider this as luck.   :}

The real column formats are ( TEXT, TEXT, DATE, TIME ).

Well, that's a bit trickier and my sed skills are rather rusty.
I'd probably use awk for these more complex tasks:

awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }'

The "gsub" command acts like sed's "s" command with the "g" modifier.
By prefixing the block with the gsub command with a regex, it only
acts on that regex.  The regex in this example only looks for an opening
and a closing paren anywhere on the line; you might need to tweak it
to more closely match your case.  Alternatively, you could implement
a counter that skips the four lines (which can be done with both sed
and awk).

If it gets more complex than this, you can always write a proper
program in a "real" language to do it.  This can be easier to maintain.

Cheers,
Peter

why not use the squeeze option of tr.

tr -s " "

--
Stephen Clark
NetWolves
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

Re: Need help for import of text file

From

Peter Bex

Date:

17 December 2012, 02:36:03

On Sun, Dec 16, 2012 at 06:30:24PM -0500, Steve Clark wrote:
> why not use the squeeze option of tr.
>
> tr -s " "

I wasn't aware of that one, it's even simpler and more elegant.
Thanks!

For this particular case, tr(1) won't do for the same reason
the simple sed(1) expression I gave won't do: it should only
be applied to the data, not the extra 4 lines of meta-data.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Need help for import of text file

From

Sheraz Sharif

Date:

17 December 2012, 07:51:39

On Dec 15, 2012, at 1:06 PM, Andreas wrote:

> Hi,
>
> I need to import textfiles that have 5 columns but there is just blanks as delimitors.
> I could use COPY to read them but there is a time column that shows times as " h:mm.ss,ms" in the morning and
"hh:mm.ss,ms"in the afternoon. 
>
> Problem here is in the morning the first digit of the hour is shown as a blank so there are 2 blanks before the time
soCOPY misstakes this as an empty column and gets confused. 
>
> Can someone point me in the direction of an COPY option I'm not aware of, or alternativly to some console tool that I
canput in the batch before the import step and replace the 2 blanks with 1 blank. 
>
> I use an OpenSuse server so some linux tool would do.
>
>
> regards
> Andreas
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

sed and awk are your friends.

You might consider some text processing prior to import. I do this a lot because I work with external datasets that
requireall kinds of massaging.  

For example:

sed -e 's/^\s{2}/ /g' filename | psql DATABASE -c 'COPY table_name from STDIN'

the above will replace 2 spaces appearing at the front of the file with one space, then pipe the result to psql copy
commandthat expects input from STDIN. 

Hope that is helpful