Home > mailing lists

Re: Time consuming process ...3 million records please help - Mailing list pgsql-novice

From	Sean Davis
Subject	Re: Time consuming process ...3 million records please help
Date	March 27, 2006 22:31:07
Msg-id	44289FEB.207@mail.nih.gov Whole thread Raw
In response to	Re: Time consuming process ...3 million records please help (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>)
Responses	Re: Non-Unique intems
List	pgsql-novice

Tree view

Srinivas Iyyer wrote:
> Hi Sean,
> thanks for your reply. in my temp table,
> temp_refseq_id column cannot be indexed due to
> duplicate values. will that effect drastically.
> I havent tested it yet. on top of that i killed 8 hrs
> process running previously on tab_update() function.
> :-)

Sri,

Columns with duplicate values can be indexed; only if you declare an
index to be "unique" will you then have a problem with inserting
duplicates.

You just need to do:

1.  Create your temp table structure.
2.  Copy in your data.
3.  Create an index for all fields that show up in a "where" or join
clause (THIS IS NECESSARY FOR ANY RDBMS TO WORK EFFICIENTLY!!!)

   create index temp_refseq_id_index on TABLENAME(temp_refseq_id);

4.  Vacuum analyze the table(s) after copying and creating the index
(thereby telling postgres what is in the table)

   vacuum analyze TABLENAME

5.  Do something along the lines of:

   insert into ....
   select from ....

That should do it, approximately.  If you still have problems, then you
will need to be more specific about what the problems are, I think.

As an aside, I have similar data stored in a postgres database, but in
some places on the order of several 10's of millions of records;
postgres is not limiting in this respect.

Sean

> thanks
> sri
>
>
>> Hi, Sri.
>>
>> I would do this differently.  You don't need the
>> function I don't
>> think.  I would do something like this (untested):
>>
>> insert into spontanno (seq_id, spontanno_imageid,
>> spontanno_genbankacc)
>> select seqdump.seq_id, referrer_keys.temp_imageid,
>>     referrer_keys.temp.genbankacc from
>> referrer_keys, seqdump
>>     where
>> referrer_keys.temp_refseq_id=seqdump.seq_acc;
>>
>> Make sure there is an index on seq_acc and
>> temp_refseq_id.  This should
>> be done in seconds to minutes, not hours.
>>
>> Sean
>
> -
>
>
>
> <snip>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

pgsql-novice by date:

From: Srinivas Iyyer
Date: 27 March 2006, 22:17:16
Subject: Re: Time consuming process ...3 million records please help

From: Andrea
Date: 28 March 2006, 05:38:43
Subject: Transfer from MySQL to PostgreSQL

Re: Time consuming process ...3 million records please help - Mailing list pgsql-novice

Previous

Next