Re: Time consuming process ...3 million records please help - Mailing list pgsql-novice
From | Sean Davis |
---|---|
Subject | Re: Time consuming process ...3 million records please help |
Date | |
Msg-id | 442892E6.7010408@mail.nih.gov Whole thread Raw |
In response to | Time consuming process ...3 million records please help (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>) |
Responses |
Re: Time consuming process ...3 million records please help
|
List | pgsql-novice |
Srinivas Iyyer wrote: > Dear group, > I have two tables and a temp table where I uploaded > data using \copy. This temp table has "3,348,107" > lines of data. > > I wrote a plpgsql function to read each record in > temp_table, take the firt col. data (temp_refseq_id) > and get corresponding 'seq_id' from table B and insert > into table A. > > I started this process 8 hrs back. It has been running > for last 8 hrs and yet it is not finished. the reason > i did this temp table thing was to speedup process by > writing a server side function. I still did not get > to win over the time issue here. If this is a > continuous problem I will have to search for another > db system since my files from now on are huge and has > over mil records. I am betting a lot of time in this > case. > > Could any one help writing a faster function. > > thanks > looking forward to hear from people. > > Temp_table: > > temp_refseq_id | temp_imageid | temp_genbankacc > ----------------+----------------+----------------- > NM_003604 | IMAGE:1099538 | AA594716 > NM_003604 | IMAGE:853047 | AA668250 > NM_001008860 | IMAGE:3640970 | BC011775 > NM_001008860 | IMAGE:3640970 | BE737509 > NM_001008860 | IMAGE:6040319 | BU079001 > NM_001008860 | IMAGE:6040319 | BU078725 > NM_001008860 | IMAGE:3451448 | BC000957 > NM_001008860 | IMAGE:3451448 | BE539334 > NM_001008860 | IMAGE:4794135 | BG708105 > NM_001008860 | IMAGE:5214087 | BI911674 > > Table A : (I want to upload data from temp to here) > > spota_id | seq_id | spota_imageid | spota_genbacc > ----------+--------+---------------+-------- > 23 | 54525 | IMAGE:1099538 | AA594716 > > > > > Table B : This table is seqdump table where seq_id is > a FK in Table B > > seq_id | seq_acc | seq_name > ------------------------------ > 54519 | NM_152918 | EMR2 > 54520 | NM_001008860| CGGBP1 > 54521 | NM_020040 | TUBB4Q > 54522 | NM_017525 | CDC42BPG > 54523 | NM_032158 | WBSCR20C > 54524 | NM_004047 | ATP6V0B > 54525 | NM_003604 | PLCB3 > > > Function: > > CREATE FUNCTION tab_update() RETURNS integer AS ' > DECLARE > referrer_keys RECORD; > BEGIN > FOR referrer_keys IN SELECT * from > temp_spotanno LOOP > INSERT INTO spotanno(seq_id, > spotanno_imageid,spotanno_genbankacc) values((SELECT > seq_id from seqdump where seq_acc = > referrer_keys.temp_refseq_id),referrer_keys.temp_imageid,referrer_keys.temp_genbankacc); > END LOOP; > return 0; > > END; > ' LANGUAGE plpgsql; > > > Thanks > Sri > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Hi, Sri. I would do this differently. You don't need the function I don't think. I would do something like this (untested): insert into spontanno (seq_id, spontanno_imageid, spontanno_genbankacc) select seqdump.seq_id, referrer_keys.temp_imageid, referrer_keys.temp.genbankacc from referrer_keys, seqdump where referrer_keys.temp_refseq_id=seqdump.seq_acc; Make sure there is an index on seq_acc and temp_refseq_id. This should be done in seconds to minutes, not hours. Sean
pgsql-novice by date: