Thread: Off topic? - Solution for a delimiter problem
I receive space delimited files that have spaces within double quoted text fields. Since PostgreSQL's COPY doesn't deal with multiple delimiters, I've written a crude Python script to convert the files to a tab delimited format without the double quotes and without disturbing the spaces within text fields. I'm sure someone could make it more efficient. (Probably by rewriting it in Perl.) Does anyone else need the script? Is the script something I can/should post? Andrew Gould __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/
Maybe you can specify the delimiter to be space. radius=# \h copy Command: COPY Description: Copies data between files and tables Syntax: COPY [ BINARY ] table [ WITH OIDS ] FROM { 'filename' | stdin } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] COPY [ BINARY ] table [ WITH OIDS ] TO { 'filename' | stdout } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] That would probably be COPY table FROM filename USING DELIMITERS ' '; HTH, Poul L. Christiansen Andrew Gould wrote: > I receive space delimited files that have spaces > within double quoted text fields. > > Since PostgreSQL's COPY doesn't deal with multiple > delimiters, I've written a crude Python script to > convert the files to a tab delimited format without > the double quotes and without disturbing the spaces > within text fields. I'm sure someone could make it > more efficient. (Probably by rewriting it in Perl.) > > Does anyone else need the script? Is the script > something I can/should post? > > Andrew Gould > > __________________________________________________ > Do You Yahoo!? > Yahoo! Shopping - Thousands of Stores. Millions of Products. > http://shopping.yahoo.com/
If space was used as the delimiter without accounting for the quotation marks, text fields might be broken apart whenever a space occurred. For example, the street address "123 Sesame Street" might be split into 3 fields containing '"123', 'Sesame' and 'Street"' respectively. In the past, I imported the file into MySQL, then exported the data into a tab delimited text file for importing into PostgreSQL. The Python script saves many steps; so I was offering it up for use and improvement by others. I didn't know if I should post the script to the mailing list or send it directly to people who are interested. (It's only 1k.) Andrew Gould --- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote: > Maybe you can specify the delimiter to be space. > > radius=# \h copy > Command: COPY > Description: Copies data between files and tables > Syntax: > COPY [ BINARY ] table [ WITH OIDS ] > FROM { 'filename' | stdin } > [ [USING] DELIMITERS 'delimiter' ] > [ WITH NULL AS 'null string' ] > COPY [ BINARY ] table [ WITH OIDS ] > TO { 'filename' | stdout } > [ [USING] DELIMITERS 'delimiter' ] > [ WITH NULL AS 'null string' ] > > That would probably be COPY table FROM filename > USING DELIMITERS ' '; > > HTH, > Poul L. Christiansen > > Andrew Gould wrote: > > > I receive space delimited files that have spaces > > within double quoted text fields. > > > > Since PostgreSQL's COPY doesn't deal with multiple > > delimiters, I've written a crude Python script to > > convert the files to a tab delimited format > without > > the double quotes and without disturbing the > spaces > > within text fields. I'm sure someone could make > it > > more efficient. (Probably by rewriting it in > Perl.) > > > > Does anyone else need the script? Is the script > > something I can/should post? > > > > Andrew Gould __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/
Isn't this overkill? Something like: sed -e 's/\" \"/^I/g' -e 's/\"//g' <file> would do the job nicely. I'm sure people on this list can suggest even simpler ways (note: the "^I", or tab character, in the example is created by typing Ctrl-V then Ctrl-I). Sed has been (and is) my friend for many such cases. > -----Original Message----- > From: Andrew Gould [SMTP:andrewgould@yahoo.com] > Sent: Wednesday, December 20, 2000 5:58 PM > To: Postgres Mailing List > Subject: Re: [GENERAL] Off topic? - Solution for a delimiter problem > > If space was used as the delimiter without accounting > for the quotation marks, text fields might be broken > apart whenever a space occurred. For example, the > street address "123 Sesame Street" might be split into > 3 fields containing '"123', 'Sesame' and 'Street"' > respectively. In the past, I imported the file into > MySQL, then exported the data into a tab delimited > text file for importing into PostgreSQL. > > The Python script saves many steps; so I was offering > it up for use and improvement by others. I didn't > know if I should post the script to the mailing list > or send it directly to people who are interested. > (It's only 1k.) > > Andrew Gould > > --- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote: > > Maybe you can specify the delimiter to be space. > > > > radius=# \h copy > > Command: COPY > > Description: Copies data between files and tables > > Syntax: > > COPY [ BINARY ] table [ WITH OIDS ] > > FROM { 'filename' | stdin } > > [ [USING] DELIMITERS 'delimiter' ] > > [ WITH NULL AS 'null string' ] > > COPY [ BINARY ] table [ WITH OIDS ] > > TO { 'filename' | stdout } > > [ [USING] DELIMITERS 'delimiter' ] > > [ WITH NULL AS 'null string' ] > > > > That would probably be COPY table FROM filename > > USING DELIMITERS ' '; > > > > HTH, > > Poul L. Christiansen > > > > Andrew Gould wrote: > > > > > I receive space delimited files that have spaces > > > within double quoted text fields. > > > > > > Since PostgreSQL's COPY doesn't deal with multiple > > > delimiters, I've written a crude Python script to > > > convert the files to a tab delimited format > > without > > > the double quotes and without disturbing the > > spaces > > > within text fields. I'm sure someone could make > > it > > > more efficient. (Probably by rewriting it in > > Perl.) > > > > > > Does anyone else need the script? Is the script > > > something I can/should post? > > > > > > Andrew Gould > > __________________________________________________ > Do You Yahoo!? > Yahoo! Shopping - Thousands of Stores. Millions of Products. > http://shopping.yahoo.com/
I think (I could be miss-reading the sed line) you're assuming that every field is a text field. I receive many files with both text and numeric fields. The numberic fields are not double quoted; so automating the pairing of double quotes as you have done becomes more complicated. The algorithm I used reads the file one character at a time. The detection of a double quotation mark changes the value of a variable by multiplying the variable's value by -1. The variable's value acts as an on/off switch that determines whether or not a space will be replaced by a tab. This algorithm allows for any combination of text and numeric fields. Thank you for the suggestion though. I am trying to learn more about regular expressions. Andrew Gould --- Jeff Eckermann <jeckermann@verio.net> wrote: > Isn't this overkill? > Something like: > sed -e 's/\" \"/^I/g' -e 's/\"//g' <file> > would do the job nicely. I'm sure people on this > list can suggest even > simpler ways (note: the "^I", or tab character, in > the example is created by > typing Ctrl-V then Ctrl-I). > Sed has been (and is) my friend for many such cases. > > > -----Original Message----- > > From: Andrew Gould [SMTP:andrewgould@yahoo.com] > > Sent: Wednesday, December 20, 2000 5:58 PM > > To: Postgres Mailing List > > Subject: Re: [GENERAL] Off topic? - Solution for a > delimiter problem > > > > If space was used as the delimiter without > accounting > > for the quotation marks, text fields might be > broken > > apart whenever a space occurred. For example, the > > street address "123 Sesame Street" might be split > into > > 3 fields containing '"123', 'Sesame' and 'Street"' > > respectively. In the past, I imported the file > into > > MySQL, then exported the data into a tab delimited > > text file for importing into PostgreSQL. > > > > The Python script saves many steps; so I was > offering > > it up for use and improvement by others. I didn't > > know if I should post the script to the mailing > list > > or send it directly to people who are interested. > > (It's only 1k.) > > > > Andrew Gould > > > > --- "Poul L. Christiansen" <poulc@cs.auc.dk> > wrote: > > > Maybe you can specify the delimiter to be space. > > > > > > radius=# \h copy > > > Command: COPY > > > Description: Copies data between files and > tables > > > Syntax: > > > COPY [ BINARY ] table [ WITH OIDS ] > > > FROM { 'filename' | stdin } > > > [ [USING] DELIMITERS 'delimiter' ] > > > [ WITH NULL AS 'null string' ] > > > COPY [ BINARY ] table [ WITH OIDS ] > > > TO { 'filename' | stdout } > > > [ [USING] DELIMITERS 'delimiter' ] > > > [ WITH NULL AS 'null string' ] > > > > > > That would probably be COPY table FROM filename > > > USING DELIMITERS ' '; > > > > > > HTH, > > > Poul L. Christiansen > > > > > > Andrew Gould wrote: > > > > > > > I receive space delimited files that have > spaces > > > > within double quoted text fields. > > > > > > > > Since PostgreSQL's COPY doesn't deal with > multiple > > > > delimiters, I've written a crude Python script > to > > > > convert the files to a tab delimited format > > > without > > > > the double quotes and without disturbing the > > > spaces > > > > within text fields. I'm sure someone could > make > > > it > > > > more efficient. (Probably by rewriting it in > > > Perl.) > > > > > > > > Does anyone else need the script? Is the > script > > > > something I can/should post? > > > > > > > > Andrew Gould > > > > __________________________________________________ > > Do You Yahoo!? > > Yahoo! Shopping - Thousands of Stores. Millions of > Products. > > http://shopping.yahoo.com/ __________________________________________________ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/