Thread: Using psql to feed a file line by line to a table column
Hello, I have a list of 400000 non-english words, each on a separate line and in UTF8 format, which I'd like to put in the "word" column of the following table (also in UTF8 and 8.4.13): create table good_words ( word varchar(64) primary key, verified boolean not null default false, stamp timestamp default current_timestamp ); Is there maybe a psql trick for that (the "psql --help" doesn't mention an "input field separator" option)? Or do I have to write a Perl-script for that task? Thank you Alex
2013/3/13 Alexander Farber <alexander.farber@gmail.com>: > Hello, > > I have a list of 400000 non-english words, > each on a separate line and in UTF8 format, > which I'd like to put in the "word" column > of the following table (also in UTF8 and 8.4.13): > > create table good_words ( > word varchar(64) primary key, > verified boolean not null default false, > stamp timestamp default current_timestamp > ); > > Is there maybe a psql trick for that > (the "psql --help" doesn't mention > an "input field separator" option)? > > Or do I have to write a Perl-script for that task? This should work from psql: \copy good_words(word) from '/path/to/file.txt' HTH Ian Barwick
Unfortunately doesn't work - On Tue, Mar 12, 2013 at 5:53 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote: > 2013/3/13 Alexander Farber <alexander.farber@gmail.com>: >> >> I have a list of 400000 non-english words, >> each on a separate line and in UTF8 format, >> which I'd like to put in the "word" column >> of the following table (also in UTF8 and 8.4.13): >> >> create table good_words ( >> word varchar(64) primary key, >> verified boolean not null default false, >> stamp timestamp default current_timestamp >> ); >> > > This should work from psql: > > \copy good_words(word) from '/path/to/file.txt' I try: bukvy=> \copy bukvy_good_words(word) from WORDS ; \copy: parse error at ";" bukvy=> \copy bukvy_good_words(word) from 'WORDS' ; \copy: parse error at ";" bukvy=> \copy bukvy_good_words(word) from "WORDS" ; \copy: parse error at ";" bukvy=> \copy bukvy_good_words(word) from '/home/afarber/WORDS' ; \copy: parse error at ";" bukvy=> \copy bukvy_good_words(word) from "/home/afarber/WORDS" ; \copy: parse error at ";" (sorry, lied you about the table name :-) The file is in the home dir and readable: # ls -al WORDS -rw-rw-r-- 1 afarber afarber 8263539 Mar 12 2013 WORDS Any ideas, what is wrong there for 8.4.13 ? Thank you Alex
2013/3/13 Alexander Farber <alexander.farber@gmail.com>: > Unfortunately doesn't work - > > On Tue, Mar 12, 2013 at 5:53 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote: >> 2013/3/13 Alexander Farber <alexander.farber@gmail.com>: >>> >>> I have a list of 400000 non-english words, >>> each on a separate line and in UTF8 format, >>> which I'd like to put in the "word" column >>> of the following table (also in UTF8 and 8.4.13): >>> >>> create table good_words ( >>> word varchar(64) primary key, >>> verified boolean not null default false, >>> stamp timestamp default current_timestamp >>> ); >>> > >> >> This should work from psql: >> >> \copy good_words(word) from '/path/to/file.txt' > > I try: > > bukvy=> \copy bukvy_good_words(word) from WORDS ; > \copy: parse error at ";" > bukvy=> \copy bukvy_good_words(word) from 'WORDS' ; > \copy: parse error at ";" > bukvy=> \copy bukvy_good_words(word) from "WORDS" ; > \copy: parse error at ";" > bukvy=> \copy bukvy_good_words(word) from '/home/afarber/WORDS' ; > \copy: parse error at ";" > bukvy=> \copy bukvy_good_words(word) from "/home/afarber/WORDS" ; > \copy: parse error at ";" > > (sorry, lied you about the table name :-) > > The file is in the home dir and readable: > > # ls -al WORDS > -rw-rw-r-- 1 afarber afarber 8263539 Mar 12 2013 WORDS > > Any ideas, what is wrong there for 8.4.13 ? Yup: \copy: parse error at ";" which is psql telling you it doesn't like the semicolon. Try leaving it out...
Thank you, this was indeed the (uneeded) semicolon at end of the COPY line. May I ask another question - On Tue, Mar 12, 2013 at 6:24 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote: >>> 2013/3/13 Alexander Farber <alexander.farber@gmail.com>: >>>> >>>> I have a list of 400000 non-english words, >>>> each on a separate line and in UTF8 format, >>>> which I'd like to put in the "word" column >>>> of the following table (also in UTF8 and 8.4.13): >>>> >>>> create table good_words ( >>>> word varchar(64) primary key, >>>> verified boolean not null default false, >>>> stamp timestamp default current_timestamp >>>> ); >> bukvy=> \copy good_words(word) from '/home/afarber/WORDS' ; >> \copy: parse error at ";" When I add few more words to my text file and then try to load it into my table again, then the COPY command will fail, because of the already stored words: bukvy=> \copy good_words(word) from WORDS ERROR: duplicate key value violates unique constraint "good_words_pkey" CONTEXT: COPY good_words, line 1: "абажур" Can't I change the behaviour to silently ignore inserting such words? I also have an INSERT trigger on my table, can I return a NULL from it or something similar? Below is my complete code: create table good_words ( word varchar(64) primary key, letters integer[33], verified boolean not null default false, stamp timestamp default current_timestamp ); create or replace function count_letters() returns trigger as $body$ declare alphabet varchar[]; i integer; begin alphabet := '{А,Б,В,Г,Д,Е,Ё,Ж,З,И,Й,К,Л,М,Н,О,П,Р,С,Т,У,Ф,Х,Ц,Ч,Ш,Щ,Ъ,Ы,Ь,Э,Ю,Я}'; for i in 1 .. 33 loop -- raise notice '%: %', i, alphabet[i]; new.letters[i] := length(new.word) - length(replace(new.word, alphabet[i], '')); end loop; return new; end; $body$ language plpgsql; create trigger count_letters before insert on good_words for each row execute procedure count_letters(); Regards Alex
2013/3/13 Alexander Farber <alexander.farber@gmail.com>: > Thank you, this was indeed the > (uneeded) semicolon at end of the COPY line. > > May I ask another question - (...) > When I add few more words to my text file > and then try to load it into my table again, > then the COPY command will fail, > because of the already stored words: > > bukvy=> \copy good_words(word) from WORDS > ERROR: duplicate key value violates unique constraint "good_words_pkey" > CONTEXT: COPY good_words, line 1: "абажур" > > Can't I change the behaviour to silently > ignore inserting such words? > > I also have an INSERT trigger on my table, > can I return a NULL from it or something similar? Yes, if you test for the presence of the word you can return NULL and the row will be discarded. See example below. Regards Ian Barwick testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" CREATE TABLE testdb=# testdb=# INSERT INTO foo VALUES('bar'),('baz'); INSERT 0 2 testdb=# INSERT INTO foo VALUES('bar'); ERROR: duplicate key value violates unique constraint "foo_pkey" DETAIL: Key (word)=(bar) already exists. CREATE OR REPLACE FUNCTION foo_check() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$ BEGIN PERFORM TRUE FROM foo WHERE word = NEW.word; IF FOUND THEN RETURN NULL; END IF; RETURN NEW; END; $$; CREATE TRIGGER tr_foo_check BEFORE INSERT ON foo FOR EACH ROW EXECUTE PROCEDURE foo_check(); testdb=# INSERT INTO foo VALUES('bar'); INSERT 0 0
Thank you - On Thu, Mar 14, 2013 at 10:40 AM, Ian Lawrence Barwick <barwick@gmail.com> wrote: >> I also have an INSERT trigger on my table, >> can I return a NULL from it or something similar? > > Yes, if you test for the presence of the word you can return NULL > and the row will be discarded. See example below. > > > testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "foo_pkey" for table "foo" > CREATE TABLE > testdb=# > testdb=# INSERT INTO foo VALUES('bar'),('baz'); > INSERT 0 2 > testdb=# INSERT INTO foo VALUES('bar'); > ERROR: duplicate key value violates unique constraint "foo_pkey" > DETAIL: Key (word)=(bar) already exists. > > > CREATE OR REPLACE FUNCTION foo_check() > RETURNS TRIGGER > LANGUAGE 'plpgsql' > AS > $$ > BEGIN > PERFORM TRUE > FROM foo > WHERE word = NEW.word; > IF FOUND THEN > RETURN NULL; > END IF; > RETURN NEW; > END; > $$; > > CREATE TRIGGER tr_foo_check > BEFORE INSERT ON foo > FOR EACH ROW EXECUTE PROCEDURE foo_check(); > > testdb=# INSERT INTO foo VALUES('bar'); > INSERT 0 0 so the return value of an insert trigger is actually what get's inserted? And it has to be an BEFORE trigger? Regards Alex
2013/3/14 Alexander Farber <alexander.farber@gmail.com>: > Thank you - > > On Thu, Mar 14, 2013 at 10:40 AM, Ian Lawrence Barwick > <barwick@gmail.com> wrote: >>> I also have an INSERT trigger on my table, >>> can I return a NULL from it or something similar? >> >> Yes, if you test for the presence of the word you can return NULL >> and the row will be discarded. See example below. >> >> >> testdb=# CREATE TABLE foo (word TEXT NOT NULL PRIMARY KEY); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "foo_pkey" for table "foo" >> CREATE TABLE >> testdb=# >> testdb=# INSERT INTO foo VALUES('bar'),('baz'); >> INSERT 0 2 >> testdb=# INSERT INTO foo VALUES('bar'); >> ERROR: duplicate key value violates unique constraint "foo_pkey" >> DETAIL: Key (word)=(bar) already exists. >> >> >> CREATE OR REPLACE FUNCTION foo_check() >> RETURNS TRIGGER >> LANGUAGE 'plpgsql' >> AS >> $$ >> BEGIN >> PERFORM TRUE >> FROM foo >> WHERE word = NEW.word; >> IF FOUND THEN >> RETURN NULL; >> END IF; >> RETURN NEW; >> END; >> $$; >> >> CREATE TRIGGER tr_foo_check >> BEFORE INSERT ON foo >> FOR EACH ROW EXECUTE PROCEDURE foo_check(); >> >> testdb=# INSERT INTO foo VALUES('bar'); >> INSERT 0 0 > > so the return value of an insert trigger > is actually what get's inserted? Yup, normally that would be the contents of the NEW record. > And it has to be an BEFORE trigger? Yes, because an AFTER trigger is fired after the row is updated, so the row can't be changed. Regards Ian Barwick