Thread: Fastest way to import only ONE column into a table? (COPY doesn't work)

Fastest way to import only ONE column into a table? (COPY doesn't work)

From
"Phoenix Kiula"
Date:
What is the fastest way to import the values of *only one* column into
an already existing table? Say the table looks like this:

  id (primary key)
  description
  created_on

I want to import only a new column so the table looks like this:

  id (primary key)
  title
  description
  created_on

So I have a large-ish text file with the following format:

   id1||title1
   id2||title2
   id3||title3
   id4||title4
   id5||title5
   id6||title6

I try to import it in with

   COPY table1 (id, title) from "/usr/tmp/titles.txt"

But this doesn't seem to be working, because COPY doesn't know how to
UPDATE values, it just inserts. Appreciate any tips, because it would
be nasty to have to do this with millions of UPDATE statements!

TIA

Re: Fastest way to import only ONE column into a table? (COPY doesn't work)

From
Rodrigo De León
Date:
On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
> Appreciate any tips, because it would
> be nasty to have to do this with millions of UPDATE statements!

- Create an interim table
- COPY the data into it
- Do an UPDATE ... FROM ...


Re: Fastest way to import only ONE column into a table? (COPY doesn't work)

From
"Phoenix Kiula"
Date:
On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote:
> On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
> > Appreciate any tips, because it would
> > be nasty to have to do this with millions of UPDATE statements!
>
> - Create an interim table
> - COPY the data into it
> - Do an UPDATE ... FROM ...


Thanks! I thought about it and then gave up because SQL trumped me up.
Could you please suggest what the query should look like?

Based on this:
http://www.postgresql.org/docs/8.1/static/sql-update.html

I tried this:

UPDATE
    t1 SET title = title FROM t2
WHERE
    t1.id = t2.id;


But I am miles from what it needs to be! The example in the docs does
not refer to a column in the other table, it just increments a value
in the same table based on the join ("UPDATE employees SET sales_count
= sales_count + 1 FROM accounts....").

TIA!

Re: Fastest way to import only ONE column into a table? (COPY doesn't work)

From
"Phoenix Kiula"
Date:
On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
>
> --- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>
> > On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote:
> > > On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
> > > > Appreciate any tips, because it would
> > > > be nasty to have to do this with millions of UPDATE statements!
> > >
> > > - Create an interim table
> > > - COPY the data into it
> > > - Do an UPDATE ... FROM ...
> >
> >
> > Thanks! I thought about it and then gave up because SQL trumped me up.
> > Could you please suggest what the query should look like?
> >
> > Based on this:
> > http://www.postgresql.org/docs/8.1/static/sql-update.html
> >
> > I tried this:
> >
> > UPDATE
> >     t1 SET title = title FROM t2
> > WHERE
> >     t1.id = t2.id;
>
>
> UPDATE T1
>    SET T1.title = T2.title
>   FROM T2
>  WHERE T1.id = T2.id
>    AND T1.title IS NULL;
>
> or
>
> UPDATE T1
>    SET title = ( SELECT title
>                    FROM T2
>                   WHERE T2.id = T1.id )
>  WHERE T1.title IS NULL;



Thanks much RIchard, but neither of those work. For me table t1 has
over 6 million rows, and table t2 has about 600,000. In both of the
queries above I suppose it is going through each and every row of
table t1 and taking its own sweet time. I've dropped all indexes on
t1, but the query has still been running for over 45 minutes as I
write! Any other suggestions?

Re: Fastest way to import only ONE column into a table? (COPY doesn't work)

From
Andrei Kovalevski
Date:
Phoenix Kiula wrote:
> On 16/08/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
>
>> --- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>>
>>
>>> On 16/08/07, Rodrigo De León <rdeleonp@gmail.com> wrote:
>>>
>>>> On Aug 15, 11:46 pm, phoenix.ki...@gmail.com ("Phoenix Kiula") wrote:
>>>>
>>>>> Appreciate any tips, because it would
>>>>> be nasty to have to do this with millions of UPDATE statements!
>>>>>
>>>> - Create an interim table
>>>> - COPY the data into it
>>>> - Do an UPDATE ... FROM ...
>>>>
>>> Thanks! I thought about it and then gave up because SQL trumped me up.
>>> Could you please suggest what the query should look like?
>>>
>>> Based on this:
>>> http://www.postgresql.org/docs/8.1/static/sql-update.html
>>>
>>> I tried this:
>>>
>>> UPDATE
>>>     t1 SET title = title FROM t2
>>> WHERE
>>>     t1.id = t2.id;
>>>
>> UPDATE T1
>>    SET T1.title = T2.title
>>   FROM T2
>>  WHERE T1.id = T2.id
>>    AND T1.title IS NULL;
>>
>> or
>>
>> UPDATE T1
>>    SET title = ( SELECT title
>>                    FROM T2
>>                   WHERE T2.id = T1.id )
>>  WHERE T1.title IS NULL;
>>
> Thanks much RIchard, but neither of those work. For me table t1 has
> over 6 million rows, and table t2 has about 600,000. In both of the
> queries above I suppose it is going through each and every row of
> table t1 and taking its own sweet time. I've dropped all indexes on
> t1, but the query has still been running for over 45 minutes as I
> write! Any other suggestions?
>
>

I'm not sure would it be faster - but you can try to create a function
which will create new empty table, then fill it with the result of
SELECT query. Something like this:

CREATE OR REPLACE FUNCTION add_column () RETURNS INTEGER AS $$
DECLARE
   r RECORD;
BEGIN
    CREATE TABLE new_table (id integer,  value varchar);
    FOR r IN select t1.id, t2.title value from t1 left outer join t2 on
(t1.id = t2.id)  LOOP
       INSERT INTO new_table VALUES(r.id, r.title);
    END LOOP;

    return 0;
end
$$ LANGUAGE plpgsql;

Try this function and if its' time would be acceptable - you'll need to
drop existing table and rename newly created one.

Re: Fastest way to import only ONE column into a table? (COPY doesn't work)

From
Richard Broersma Jr
Date:
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

> > UPDATE T1
> >    SET T1.title = T2.title
> >   FROM T2
> >  WHERE T1.id = T2.id
> >    AND T1.title IS NULL;
>
> Thanks much RIchard, but neither of those work. For me table t1 has
> over 6 million rows, and table t2 has about 600,000. In both of the
> queries above I suppose it is going through each and every row of
> table t1 and taking its own sweet time. I've dropped all indexes on
> t1, but the query has still been running for over 45 minutes as I
> write! Any other suggestions?

You could post the explain plan for this above query to verify your theory.  Also it might give
ideas about whether carefully placed indexs would help.

My guess is that you will need to add indexes at least on T1.id and T2.id to help reduce the join
time.  Also, since you mention that you already attempted this query, I would view the
postgresql-logs to see if you might need to increase your check-point size temporarily just for
this operation.

Also, since this is hopefully a one time update, you could temporarily turn off fsync during the
update.  when the update is done, you should turn fsync back on.

My guess is that a join and update on 6 million records in just going to take a while.  Hopefully
this isn't an operation that you will need to preform regularly.

Regards,
Richard Broersma Jr.