Re: Time consuming process ...3 million records please help - Mailing list pgsql-novice

From Sean Davis
Subject Re: Time consuming process ...3 million records please help
Date
Msg-id 44289FEB.207@mail.nih.gov
Whole thread Raw
In response to Re: Time consuming process ...3 million records please help  (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>)
Responses Re: Non-Unique intems  (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>)
List pgsql-novice
Srinivas Iyyer wrote:
> Hi Sean,
> thanks for your reply. in my temp table,
> temp_refseq_id column cannot be indexed due to
> duplicate values. will that effect drastically.
> I havent tested it yet. on top of that i killed 8 hrs
> process running previously on tab_update() function.
> :-)

Sri,

Columns with duplicate values can be indexed; only if you declare an
index to be "unique" will you then have a problem with inserting
duplicates.

You just need to do:

1.  Create your temp table structure.
2.  Copy in your data.
3.  Create an index for all fields that show up in a "where" or join
clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!)

   create index temp_refseq_id_index on TABLENAME(temp_refseq_id);

4.  Vacuum analyze the table(s) after copying and creating the index
(thereby telling postgres what is in the table)

   vacuum analyze TABLENAME

5.  Do something along the lines of:

   insert into ....
   select from ....

That should do it, approximately.  If you still have problems, then you
will need to be more specific about what the problems are, I think.

As an aside, I have similar data stored in a postgres database, but in
some places on the order of several 10's of millions of records;
postgres is not limiting in this respect.

Sean


> thanks
> sri
>
>
>> Hi, Sri.
>>
>> I would do this differently.  You don't need the
>> function I don't
>> think.  I would do something like this (untested):
>>
>> insert into spontanno (seq_id, spontanno_imageid,
>> spontanno_genbankacc)
>> select seqdump.seq_id, referrer_keys.temp_imageid,
>>     referrer_keys.temp.genbankacc from
>> referrer_keys, seqdump
>>     where
>> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>>
>> Make sure there is an index on seq_acc and
>> temp_refseq_id.  This should
>> be done in seconds to minutes, not hours.
>>
>> Sean
>
> -
>
>
>
> <snip>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

pgsql-novice by date:

Previous
From: Srinivas Iyyer
Date:
Subject: Re: Time consuming process ...3 million records please help
Next
From: Andrea
Date:
Subject: Transfer from MySQL to PostgreSQL