Thread: truncate data before importing
I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David Barton graphically illustrates how America has plummeted from righteous living , prosperity and success in the last quarter century .
171,257 Consider the following chart compiled from his study , America : To Pray or Not to Pray . "
Each Sentence starts with coordinates and each word is delimited with tab. I want to import data to tables without coordinates, just text and if possible to convert tab delimited space with just 'space', not to have such a gap between words. Any solutions how to do it? maybe with shell script ?
On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote: > I am importing sentences from txt file. They look like: > "0,170 A recent statistical analysis by David > Barton graphically illustrates how America has > plummeted from righteous living , prosperity > and success in the last quarter century > . > Each Sentence starts with coordinates and each word is delimited with tab. > I want to import data to tables without coordinates, just text and if > possible to convert tab delimited space with just 'space', not to have > such a gap between words. Any solutions how to do it? maybe with shell > script? You can use the 'PROGRAM' in COPY syntax http://www.postgresql.org/docs/current/static/sql-copy.html -- DROP TABLE IF EXISTS Sentence; CREATE TABLE IF NOT EXISTS Sentence (s text); COPY Sentence FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt' WITH (FORMAT text, NULL ''); Take care of escape sequences – backslashes in the file. If the file is on the client side see the \copy command of psql client instaed. http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY https://www.gnu.org/software/sed/manual/sed.html H.
I was not able to find any details about "\S+\s+ ", can u explain them? Also thanks a lot, this worked perfectly!
On Wed, Nov 18, 2015 at 4:36 PM, Hans Ginzel <hans@matfyz.cz> wrote:
On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote:I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David
Barton graphically illustrates how America has
plummeted from righteous living , prosperity
and success in the last quarter century
.
Each Sentence starts with coordinates and each word is delimited with tab.
I want to import data to tables without coordinates, just text and if
possible to convert tab delimited space with just 'space', not to have
such a gap between words. Any solutions how to do it? maybe with shell
script?
You can use the 'PROGRAM' in COPY syntax
http://www.postgresql.org/docs/current/static/sql-copy.html
-- DROP TABLE IF EXISTS Sentence;
CREATE TABLE IF NOT EXISTS Sentence (s text);
COPY Sentence
FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt'
WITH (FORMAT text, NULL '');
Take care of escape sequences – backslashes in the file.
If the file is on the client side see the \copy command of psql client instaed.
http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
https://www.gnu.org/software/sed/manual/sed.html
H.
2015-11-18 16:08 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:
I was not able to find any details about "\S+\s+ ", can u explain them? Also thanks a lot, this worked perfectly!On Wed, Nov 18, 2015 at 4:36 PM, Hans Ginzel <hans@matfyz.cz> wrote:On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote:I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David
Barton graphically illustrates how America has
plummeted from righteous living , prosperity
and success in the last quarter century
.
Each Sentence starts with coordinates and each word is delimited with tab.
I want to import data to tables without coordinates, just text and if
possible to convert tab delimited space with just 'space', not to have
such a gap between words. Any solutions how to do it? maybe with shell
script?
You can use the 'PROGRAM' in COPY syntax
http://www.postgresql.org/docs/current/static/sql-copy.html
-- DROP TABLE IF EXISTS Sentence;
CREATE TABLE IF NOT EXISTS Sentence (s text);
COPY Sentence
FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt'
WITH (FORMAT text, NULL '');
Take care of escape sequences – backslashes in the file.
If the file is on the client side see the \copy command of psql client instaed.
http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
https://www.gnu.org/software/sed/manual/sed.html
H.
These ("S+" and "s+") are regular expressions (regex).
They are being used as part of the SED tool, which is available from your Linux bash.
I searched and found they are for whitespace characters. But they deleted sentence coordinates from text for me. Still no idea how.
On Wed, Nov 18, 2015 at 7:18 PM, Felipe Santos <felipepts@gmail.com> wrote:
2015-11-18 16:08 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:I was not able to find any details about "\S+\s+ ", can u explain them? Also thanks a lot, this worked perfectly!On Wed, Nov 18, 2015 at 4:36 PM, Hans Ginzel <hans@matfyz.cz> wrote:On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote:I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David
Barton graphically illustrates how America has
plummeted from righteous living , prosperity
and success in the last quarter century
.
Each Sentence starts with coordinates and each word is delimited with tab.
I want to import data to tables without coordinates, just text and if
possible to convert tab delimited space with just 'space', not to have
such a gap between words. Any solutions how to do it? maybe with shell
script?
You can use the 'PROGRAM' in COPY syntax
http://www.postgresql.org/docs/current/static/sql-copy.html
-- DROP TABLE IF EXISTS Sentence;
CREATE TABLE IF NOT EXISTS Sentence (s text);
COPY Sentence
FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt'
WITH (FORMAT text, NULL '');
Take care of escape sequences – backslashes in the file.
If the file is on the client side see the \copy command of psql client instaed.
http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
https://www.gnu.org/software/sed/manual/sed.html
H.These ("S+" and "s+") are regular expressions (regex).They are being used as part of the SED tool, which is available from your Linux bash.
You´re almost there, here it is what they mean:
"''s/\t/ /g" = Replaces tab (\t) for white space (" ")
"''s/^\S+\s+//''" = Replaces everything that starts (^) + with a non-white space (S+) + and is then followed by a white space (s+) for nothing (//).
Hope it helps
2015-11-18 16:24 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:
I searched and found they are for whitespace characters. But they deleted sentence coordinates from text for me. Still no idea how.On Wed, Nov 18, 2015 at 7:18 PM, Felipe Santos <felipepts@gmail.com> wrote:2015-11-18 16:08 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:I was not able to find any details about "\S+\s+ ", can u explain them? Also thanks a lot, this worked perfectly!On Wed, Nov 18, 2015 at 4:36 PM, Hans Ginzel <hans@matfyz.cz> wrote:On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote:I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David
Barton graphically illustrates how America has
plummeted from righteous living , prosperity
and success in the last quarter century
.
Each Sentence starts with coordinates and each word is delimited with tab.
I want to import data to tables without coordinates, just text and if
possible to convert tab delimited space with just 'space', not to have
such a gap between words. Any solutions how to do it? maybe with shell
script?
You can use the 'PROGRAM' in COPY syntax
http://www.postgresql.org/docs/current/static/sql-copy.html
-- DROP TABLE IF EXISTS Sentence;
CREATE TABLE IF NOT EXISTS Sentence (s text);
COPY Sentence
FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt'
WITH (FORMAT text, NULL '');
Take care of escape sequences – backslashes in the file.
If the file is on the client side see the \copy command of psql client instaed.
http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
https://www.gnu.org/software/sed/manual/sed.html
H.These ("S+" and "s+") are regular expressions (regex).They are being used as part of the SED tool, which is available from your Linux bash.
Yes, it helped. Thanks a lot!
On Wed, Nov 18, 2015 at 7:32 PM, Felipe Santos <felipepts@gmail.com> wrote:
You´re almost there, here it is what they mean:"''s/\t/ /g" = Replaces tab (\t) for white space (" ")"''s/^\S+\s+//''" = Replaces everything that starts (^) + with a non-white space (S+) + and is then followed by a white space (s+) for nothing (//).Hope it helps2015-11-18 16:24 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:I searched and found they are for whitespace characters. But they deleted sentence coordinates from text for me. Still no idea how.On Wed, Nov 18, 2015 at 7:18 PM, Felipe Santos <felipepts@gmail.com> wrote:2015-11-18 16:08 GMT-02:00 Shmagi Kavtaradze <kavtaradze.s@gmail.com>:I was not able to find any details about "\S+\s+ ", can u explain them? Also thanks a lot, this worked perfectly!On Wed, Nov 18, 2015 at 4:36 PM, Hans Ginzel <hans@matfyz.cz> wrote:On Wed, Nov 18, 2015 at 01:49:35PM +0100, Shmagi Kavtaradze wrote:I am importing sentences from txt file. They look like:
"0,170 A recent statistical analysis by David
Barton graphically illustrates how America has
plummeted from righteous living , prosperity
and success in the last quarter century
.
Each Sentence starts with coordinates and each word is delimited with tab.
I want to import data to tables without coordinates, just text and if
possible to convert tab delimited space with just 'space', not to have
such a gap between words. Any solutions how to do it? maybe with shell
script?
You can use the 'PROGRAM' in COPY syntax
http://www.postgresql.org/docs/current/static/sql-copy.html
-- DROP TABLE IF EXISTS Sentence;
CREATE TABLE IF NOT EXISTS Sentence (s text);
COPY Sentence
FROM PROGRAM 'sed -re ''s/\t/ /g; s/^\S+\s+//'' file.txt'
WITH (FORMAT text, NULL '');
Take care of escape sequences – backslashes in the file.
If the file is on the client side see the \copy command of psql client instaed.
http://www.postgresql.org/docs/9.4/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
https://www.gnu.org/software/sed/manual/sed.html
H.These ("S+" and "s+") are regular expressions (regex).They are being used as part of the SED tool, which is available from your Linux bash.