Thread: Off topic? - Solution for a delimiter problem

Off topic? - Solution for a delimiter problem

From
Andrew Gould
Date:
I receive space delimited files that have spaces
within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with multiple
delimiters, I've written a crude Python script to
convert the files to a tab delimited format without
the double quotes and without disturbing the spaces
within text fields.  I'm sure someone could make it
more efficient.  (Probably by rewriting it in Perl.)

Does anyone else need the script?  Is the script
something I can/should post?

Andrew Gould


__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

Re: Off topic? - Solution for a delimiter problem

From
"Poul L. Christiansen"
Date:
Maybe you can specify the delimiter to be space.

radius=# \h copy
Command:     COPY
Description: Copies data between files and tables
Syntax:
COPY [ BINARY ] table [ WITH OIDS ]
    FROM { 'filename' | stdin }
    [ [USING] DELIMITERS 'delimiter' ]
    [ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
    TO { 'filename' | stdout }
    [ [USING] DELIMITERS 'delimiter' ]
    [ WITH NULL AS 'null string' ]

That would probably be COPY table FROM filename USING DELIMITERS ' ';

HTH,
Poul L. Christiansen

Andrew Gould wrote:

> I receive space delimited files that have spaces
> within double quoted text fields.
>
> Since PostgreSQL's COPY doesn't deal with multiple
> delimiters, I've written a crude Python script to
> convert the files to a tab delimited format without
> the double quotes and without disturbing the spaces
> within text fields.  I'm sure someone could make it
> more efficient.  (Probably by rewriting it in Perl.)
>
> Does anyone else need the script?  Is the script
> something I can/should post?
>
> Andrew Gould
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Shopping - Thousands of Stores. Millions of Products.
> http://shopping.yahoo.com/


Re: Off topic? - Solution for a delimiter problem

From
Andrew Gould
Date:
If space was used as the delimiter without accounting
for the quotation marks, text fields might be broken
apart whenever a space occurred.  For example, the
street address "123 Sesame Street" might be split into
3 fields containing '"123', 'Sesame' and 'Street"'
respectively.  In the past, I imported the file into
MySQL, then exported the data into a tab delimited
text file for importing into PostgreSQL.

The Python script saves many steps; so I was offering
it up for use and improvement by others.  I didn't
know if I should post the script to the mailing list
or send it directly to people who are interested.
(It's only 1k.)

Andrew Gould

--- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote:
> Maybe you can specify the delimiter to be space.
>
> radius=# \h copy
> Command:     COPY
> Description: Copies data between files and tables
> Syntax:
> COPY [ BINARY ] table [ WITH OIDS ]
>     FROM { 'filename' | stdin }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
> COPY [ BINARY ] table [ WITH OIDS ]
>     TO { 'filename' | stdout }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
>
> That would probably be COPY table FROM filename
> USING DELIMITERS ' ';
>
> HTH,
> Poul L. Christiansen
>
> Andrew Gould wrote:
>
> > I receive space delimited files that have spaces
> > within double quoted text fields.
> >
> > Since PostgreSQL's COPY doesn't deal with multiple
> > delimiters, I've written a crude Python script to
> > convert the files to a tab delimited format
> without
> > the double quotes and without disturbing the
> spaces
> > within text fields.  I'm sure someone could make
> it
> > more efficient.  (Probably by rewriting it in
> Perl.)
> >
> > Does anyone else need the script?  Is the script
> > something I can/should post?
> >
> > Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

RE: Off topic? - Solution for a delimiter problem

From
Jeff Eckermann
Date:
Isn't this overkill?
Something like:
    sed -e 's/\" \"/^I/g' -e 's/\"//g' <file>
would do the job nicely.  I'm sure people on this list can suggest even
simpler ways (note: the "^I", or tab character, in the example is created by
typing Ctrl-V then Ctrl-I).
Sed has been (and is) my friend for many such cases.

> -----Original Message-----
> From:    Andrew Gould [SMTP:andrewgould@yahoo.com]
> Sent:    Wednesday, December 20, 2000 5:58 PM
> To:    Postgres Mailing List
> Subject:    Re: [GENERAL] Off topic? - Solution for a delimiter problem
>
> If space was used as the delimiter without accounting
> for the quotation marks, text fields might be broken
> apart whenever a space occurred.  For example, the
> street address "123 Sesame Street" might be split into
> 3 fields containing '"123', 'Sesame' and 'Street"'
> respectively.  In the past, I imported the file into
> MySQL, then exported the data into a tab delimited
> text file for importing into PostgreSQL.
>
> The Python script saves many steps; so I was offering
> it up for use and improvement by others.  I didn't
> know if I should post the script to the mailing list
> or send it directly to people who are interested.
> (It's only 1k.)
>
> Andrew Gould
>
> --- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote:
> > Maybe you can specify the delimiter to be space.
> >
> > radius=# \h copy
> > Command:     COPY
> > Description: Copies data between files and tables
> > Syntax:
> > COPY [ BINARY ] table [ WITH OIDS ]
> >     FROM { 'filename' | stdin }
> >     [ [USING] DELIMITERS 'delimiter' ]
> >     [ WITH NULL AS 'null string' ]
> > COPY [ BINARY ] table [ WITH OIDS ]
> >     TO { 'filename' | stdout }
> >     [ [USING] DELIMITERS 'delimiter' ]
> >     [ WITH NULL AS 'null string' ]
> >
> > That would probably be COPY table FROM filename
> > USING DELIMITERS ' ';
> >
> > HTH,
> > Poul L. Christiansen
> >
> > Andrew Gould wrote:
> >
> > > I receive space delimited files that have spaces
> > > within double quoted text fields.
> > >
> > > Since PostgreSQL's COPY doesn't deal with multiple
> > > delimiters, I've written a crude Python script to
> > > convert the files to a tab delimited format
> > without
> > > the double quotes and without disturbing the
> > spaces
> > > within text fields.  I'm sure someone could make
> > it
> > > more efficient.  (Probably by rewriting it in
> > Perl.)
> > >
> > > Does anyone else need the script?  Is the script
> > > something I can/should post?
> > >
> > > Andrew Gould
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Shopping - Thousands of Stores. Millions of Products.
> http://shopping.yahoo.com/

RE: Off topic? - Solution for a delimiter problem

From
Andrew Gould
Date:
I think (I could be miss-reading the sed line) you're
assuming that every field is a text field.  I receive
many files with both text and numeric fields.  The
numberic fields are not double quoted; so automating
the pairing of double quotes as you have done becomes
more complicated.

The algorithm I used reads the file one character at a
time.  The detection of a double quotation mark
changes the value of a variable by multiplying the
variable's value by -1.  The variable's value acts as
an on/off switch that determines whether or not a
space will be replaced by a tab.  This algorithm
allows for any combination of text and numeric fields.

Thank you for the suggestion though.  I am trying to
learn more about regular expressions.

Andrew Gould

--- Jeff Eckermann <jeckermann@verio.net> wrote:
> Isn't this overkill?
> Something like:
>     sed -e 's/\" \"/^I/g' -e 's/\"//g' <file>
> would do the job nicely.  I'm sure people on this
> list can suggest even
> simpler ways (note: the "^I", or tab character, in
> the example is created by
> typing Ctrl-V then Ctrl-I).
> Sed has been (and is) my friend for many such cases.
>
> > -----Original Message-----
> > From:    Andrew Gould [SMTP:andrewgould@yahoo.com]
> > Sent:    Wednesday, December 20, 2000 5:58 PM
> > To:    Postgres Mailing List
> > Subject:    Re: [GENERAL] Off topic? - Solution for a
> delimiter problem
> >
> > If space was used as the delimiter without
> accounting
> > for the quotation marks, text fields might be
> broken
> > apart whenever a space occurred.  For example, the
> > street address "123 Sesame Street" might be split
> into
> > 3 fields containing '"123', 'Sesame' and 'Street"'
> > respectively.  In the past, I imported the file
> into
> > MySQL, then exported the data into a tab delimited
> > text file for importing into PostgreSQL.
> >
> > The Python script saves many steps; so I was
> offering
> > it up for use and improvement by others.  I didn't
> > know if I should post the script to the mailing
> list
> > or send it directly to people who are interested.
> > (It's only 1k.)
> >
> > Andrew Gould
> >
> > --- "Poul L. Christiansen" <poulc@cs.auc.dk>
> wrote:
> > > Maybe you can specify the delimiter to be space.
> > >
> > > radius=# \h copy
> > > Command:     COPY
> > > Description: Copies data between files and
> tables
> > > Syntax:
> > > COPY [ BINARY ] table [ WITH OIDS ]
> > >     FROM { 'filename' | stdin }
> > >     [ [USING] DELIMITERS 'delimiter' ]
> > >     [ WITH NULL AS 'null string' ]
> > > COPY [ BINARY ] table [ WITH OIDS ]
> > >     TO { 'filename' | stdout }
> > >     [ [USING] DELIMITERS 'delimiter' ]
> > >     [ WITH NULL AS 'null string' ]
> > >
> > > That would probably be COPY table FROM filename
> > > USING DELIMITERS ' ';
> > >
> > > HTH,
> > > Poul L. Christiansen
> > >
> > > Andrew Gould wrote:
> > >
> > > > I receive space delimited files that have
> spaces
> > > > within double quoted text fields.
> > > >
> > > > Since PostgreSQL's COPY doesn't deal with
> multiple
> > > > delimiters, I've written a crude Python script
> to
> > > > convert the files to a tab delimited format
> > > without
> > > > the double quotes and without disturbing the
> > > spaces
> > > > within text fields.  I'm sure someone could
> make
> > > it
> > > > more efficient.  (Probably by rewriting it in
> > > Perl.)
> > > >
> > > > Does anyone else need the script?  Is the
> script
> > > > something I can/should post?
> > > >
> > > > Andrew Gould
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Shopping - Thousands of Stores. Millions of
> Products.
> > http://shopping.yahoo.com/


__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/