Thread: Fastest way to import only ONE column into a table? (COPY doesn't work)
What is the fastest way to import the values of *only one* column into an already existing table? Say the table looks like this: id (primary key) description created_on I want to import only a new column so the table looks like this: id (primary key) title description created_on So I have a large-ish text file with the following format: id1||title1 id2||title2 id3||title3 id4||title4 id5||title5 id6||title6 I try to import it in with COPY table1 (id, title) from "/usr/tmp/titles.txt" But this doesn't seem to be working, because COPY doesn't know how to UPDATE values, it just inserts. Appreciate any tips, because it would be nasty to have to do this with millions of UPDATE statements! TIA
Re: Fastest way to import only ONE column into a table? (COPY doesn't work)
From
Rodrigo De León
Date:
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote: > Appreciate any tips, because it would > be nasty to have to do this with millions of UPDATE statements! - Create an interim table - COPY the data into it - Do an UPDATE ... FROM ...
Re: Fastest way to import only ONE column into a table? (COPY doesn't work)
From
"Phoenix Kiula"
Date:
On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote: > On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote: > > Appreciate any tips, because it would > > be nasty to have to do this with millions of UPDATE statements! > > - Create an interim table > - COPY the data into it > - Do an UPDATE ... FROM ... Thanks! I thought about it and then gave up because SQL trumped me up. Could you please suggest what the query should look like? Based on this: http://www.postgresql.org/docs/8.1/static/sql-update.html I tried this: UPDATE t1 SET title = title FROM t2 WHERE t1.id = t2.id; But I am miles from what it needs to be! The example in the docs does not refer to a column in the other table, it just increments a value in the same table based on the join ("UPDATE employees SET sales_count = sales_count + 1 FROM accounts...."). TIA!
Re: Fastest way to import only ONE column into a table? (COPY doesn't work)
From
"Phoenix Kiula"
Date:
On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote: > > --- Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > > > On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote: > > > On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote: > > > > Appreciate any tips, because it would > > > > be nasty to have to do this with millions of UPDATE statements! > > > > > > - Create an interim table > > > - COPY the data into it > > > - Do an UPDATE ... FROM ... > > > > > > Thanks! I thought about it and then gave up because SQL trumped me up. > > Could you please suggest what the query should look like? > > > > Based on this: > > http://www.postgresql.org/docs/8.1/static/sql-update.html > > > > I tried this: > > > > UPDATE > > t1 SET title = title FROM t2 > > WHERE > > t1.id = t2.id; > > > UPDATE T1 > SET T1.title = T2.title > FROM T2 > WHERE T1.id = T2.id > AND T1.title IS NULL; > > or > > UPDATE T1 > SET title = ( SELECT title > FROM T2 > WHERE T2.id = T1.id ) > WHERE T1.title IS NULL; Thanks much RIchard, but neither of those work. For me table t1 has over 6 million rows, and table t2 has about 600,000. In both of the queries above I suppose it is going through each and every row of table t1 and taking its own sweet time. I've dropped all indexes on t1, but the query has still been running for over 45 minutes as I write! Any other suggestions?
Re: Fastest way to import only ONE column into a table? (COPY doesn't work)
From
Andrei Kovalevski
Date:
Phoenix Kiula wrote: > On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote: > >> --- Phoenix Kiula <phoenix.kiula@gmail.com> wrote: >> >> >>> On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote: >>> >>>> On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote: >>>> >>>>> Appreciate any tips, because it would >>>>> be nasty to have to do this with millions of UPDATE statements! >>>>> >>>> - Create an interim table >>>> - COPY the data into it >>>> - Do an UPDATE ... FROM ... >>>> >>> Thanks! I thought about it and then gave up because SQL trumped me up. >>> Could you please suggest what the query should look like? >>> >>> Based on this: >>> http://www.postgresql.org/docs/8.1/static/sql-update.html >>> >>> I tried this: >>> >>> UPDATE >>> t1 SET title = title FROM t2 >>> WHERE >>> t1.id = t2.id; >>> >> UPDATE T1 >> SET T1.title = T2.title >> FROM T2 >> WHERE T1.id = T2.id >> AND T1.title IS NULL; >> >> or >> >> UPDATE T1 >> SET title = ( SELECT title >> FROM T2 >> WHERE T2.id = T1.id ) >> WHERE T1.title IS NULL; >> > Thanks much RIchard, but neither of those work. For me table t1 has > over 6 million rows, and table t2 has about 600,000. In both of the > queries above I suppose it is going through each and every row of > table t1 and taking its own sweet time. I've dropped all indexes on > t1, but the query has still been running for over 45 minutes as I > write! Any other suggestions? > > I'm not sure would it be faster - but you can try to create a function which will create new empty table, then fill it with the result of SELECT query. Something like this: CREATE OR REPLACE FUNCTION add_column () RETURNS INTEGER AS $$ DECLARE r RECORD; BEGIN CREATE TABLE new_table (id integer, value varchar); FOR r IN select t1.id, t2.title value from t1 left outer join t2 on (t1.id = t2.id) LOOP INSERT INTO new_table VALUES(r.id, r.title); END LOOP; return 0; end $$ LANGUAGE plpgsql; Try this function and if its' time would be acceptable - you'll need to drop existing table and rename newly created one.
Re: Fastest way to import only ONE column into a table? (COPY doesn't work)
From
Richard Broersma Jr
Date:
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > > UPDATE T1 > > SET T1.title = T2.title > > FROM T2 > > WHERE T1.id = T2.id > > AND T1.title IS NULL; > > Thanks much RIchard, but neither of those work. For me table t1 has > over 6 million rows, and table t2 has about 600,000. In both of the > queries above I suppose it is going through each and every row of > table t1 and taking its own sweet time. I've dropped all indexes on > t1, but the query has still been running for over 45 minutes as I > write! Any other suggestions? You could post the explain plan for this above query to verify your theory. Also it might give ideas about whether carefully placed indexs would help. My guess is that you will need to add indexes at least on T1.id and T2.id to help reduce the join time. Also, since you mention that you already attempted this query, I would view the postgresql-logs to see if you might need to increase your check-point size temporarily just for this operation. Also, since this is hopefully a one time update, you could temporarily turn off fsync during the update. when the update is done, you should turn fsync back on. My guess is that a join and update on 6 million records in just going to take a while. Hopefully this isn't an operation that you will need to preform regularly. Regards, Richard Broersma Jr.