Thread: Need help for import of text file
Hi, I need to import textfiles that have 5 columns but there is just blanks as delimitors. I could use COPY to read them but there is a time column that shows times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon. Problem here is in the morning the first digit of the hour is shown as a blank so there are 2 blanks before the time so COPY misstakes this as an empty column and gets confused. Can someone point me in the direction of an COPY option I'm not aware of, or alternativly to some console tool that I can put in the batch before the import step and replace the 2 blanks with 1 blank. I use an OpenSuse server so some linux tool would do. regards Andreas
On 12/15/2012 11:06 AM, Andreas wrote: > Hi, > > I need to import textfiles that have 5 columns but there is just blanks > as delimitors. > I could use COPY to read them but there is a time column that shows > times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon. > > Problem here is in the morning the first digit of the hour is shown as a > blank so there are 2 blanks before the time so COPY misstakes this as an > empty column and gets confused. > > Can someone point me in the direction of an COPY option I'm not aware > of, or alternativly to some console tool that I can put in the batch > before the import step and replace the 2 blanks with 1 blank. > > I use an OpenSuse server so some linux tool would do. How big a file are we talking about? I found using the OO/LibreOffice spreadsheet good for this, assuming a reasonable file size. If you use the CSV import you can make the columns where you want them and then save the file with another delimiter(I tend to use the pipe symbol |). Then use that file with COPY. > > > regards > Andreas > > -- Adrian Klaver adrian.klaver@gmail.com
On Sat, Dec 15, 2012 at 08:06:44PM +0100, Andreas wrote: > Hi, > > Problem here is in the morning the first digit of the hour is shown as a > blank so there are 2 blanks before the time so COPY misstakes this as an > empty column and gets confused. > > Can someone point me in the direction of an COPY option I'm not aware > of, or alternativly to some console tool that I can put in the batch > before the import step and replace the 2 blanks with 1 blank. > > I use an OpenSuse server so some linux tool would do. A simple sed(1) expression should do the trick: sed -E 's/ +/ /g' old-file > new-file GNU sed also allows in-place editing using -i, so you can avoid writing it to a second file. Some seds accept a different flag to enable extended regexps. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote: > A simple sed(1) expression should do the trick: > > sed -E 's/ +/ /g' old-file > new-file I just remembered where I could check, and the GNU sed equivalent is: sed -r 's/ +/ /g' old-file > new-file Sorry for the confusion. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
Am 15.12.2012 22:22, schrieb Peter Bex:
On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:A simple sed(1) expression should do the trick: sed -E 's/ +/ /g' old-file > new-fileI just remembered where I could check, and the GNU sed equivalent is: sed -r 's/ +/ /g' old-file > new-file Sorry for the confusion.
With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY
- dos2unix
- sed -i 's/[ \t]*$//'
- sed -i 's/ / /g'
The input files get created by a simple windows batch where I can't change anything.
It uses echo to attach a line of 4 parameters to those textfiles.
How would you manage if one or more of those parameters contained blanks in some cases?
This doesn't appear, yet. But I consider this as luck. :}
The real column formats are ( TEXT, TEXT, DATE, TIME ).
On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote: > With sed as startingpoint I figured it out. > Those 3 steps make the input files consumable for COPY > > 1. dos2unix > 2. sed -i 's/[ \t]*$//' > 3. sed -i 's/ / /g' You can reduce this to one invocation by separating the commands by a semicolon (or by passing multiple -e flags) sed -i 's/[ \t]*$//;s/ / /g' > The input files get created by a simple windows batch where I can't > change anything. > It uses echo to attach a line of 4 parameters to those textfiles. > > How would you manage if one or more of those parameters contained blanks > in some cases? > This doesn't appear, yet. But I consider this as luck. :} > > The real column formats are ( TEXT, TEXT, DATE, TIME ). Well, that's a bit trickier and my sed skills are rather rusty. I'd probably use awk for these more complex tasks: awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }' The "gsub" command acts like sed's "s" command with the "g" modifier. By prefixing the block with the gsub command with a regex, it only acts on that regex. The regex in this example only looks for an opening and a closing paren anywhere on the line; you might need to tweak it to more closely match your case. Alternatively, you could implement a counter that skips the four lines (which can be done with both sed and awk). If it gets more complex than this, you can always write a proper program in a "real" language to do it. This can be easier to maintain. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
On 12/16/2012 01:12 PM, Peter Bex wrote:
why not use the squeeze option of tr.On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:With sed as startingpoint I figured it out. Those 3 steps make the input files consumable for COPY 1. dos2unix 2. sed -i 's/[ \t]*$//' 3. sed -i 's/ / /g'You can reduce this to one invocation by separating the commands by a semicolon (or by passing multiple -e flags) sed -i 's/[ \t]*$//;s/ / /g'The input files get created by a simple windows batch where I can't change anything. It uses echo to attach a line of 4 parameters to those textfiles. How would you manage if one or more of those parameters contained blanks in some cases? This doesn't appear, yet. But I consider this as luck. :} The real column formats are ( TEXT, TEXT, DATE, TIME ).Well, that's a bit trickier and my sed skills are rather rusty. I'd probably use awk for these more complex tasks: awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }' The "gsub" command acts like sed's "s" command with the "g" modifier. By prefixing the block with the gsub command with a regex, it only acts on that regex. The regex in this example only looks for an opening and a closing paren anywhere on the line; you might need to tweak it to more closely match your case. Alternatively, you could implement a counter that skips the four lines (which can be done with both sed and awk). If it gets more complex than this, you can always write a proper program in a "real" language to do it. This can be easier to maintain. Cheers, Peter
tr -s " "
--
Stephen Clark
NetWolves
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com
Stephen Clark
NetWolves
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com
On Sun, Dec 16, 2012 at 06:30:24PM -0500, Steve Clark wrote: > why not use the squeeze option of tr. > > tr -s " " I wasn't aware of that one, it's even simpler and more elegant. Thanks! For this particular case, tr(1) won't do for the same reason the simple sed(1) expression I gave won't do: it should only be applied to the data, not the extra 4 lines of meta-data. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
On Dec 15, 2012, at 1:06 PM, Andreas wrote: > Hi, > > I need to import textfiles that have 5 columns but there is just blanks as delimitors. > I could use COPY to read them but there is a time column that shows times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms"in the afternoon. > > Problem here is in the morning the first digit of the hour is shown as a blank so there are 2 blanks before the time soCOPY misstakes this as an empty column and gets confused. > > Can someone point me in the direction of an COPY option I'm not aware of, or alternativly to some console tool that I canput in the batch before the import step and replace the 2 blanks with 1 blank. > > I use an OpenSuse server so some linux tool would do. > > > regards > Andreas > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general sed and awk are your friends. You might consider some text processing prior to import. I do this a lot because I work with external datasets that requireall kinds of massaging. For example: sed -e 's/^\s{2}/ /g' filename | psql DATABASE -c 'COPY table_name from STDIN' the above will replace 2 spaces appearing at the front of the file with one space, then pipe the result to psql copy commandthat expects input from STDIN. Hope that is helpful