Re: Time consuming process ...3 million records please help - Mailing list pgsql-novice

From Sean Davis
Subject Re: Time consuming process ...3 million records please help
Date
Msg-id 442892E6.7010408@mail.nih.gov
Whole thread Raw
In response to Time consuming process ...3 million records please help  (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>)
Responses Re: Time consuming process ...3 million records please help  (Srinivas Iyyer <srini_iyyer_bio@yahoo.com>)
List pgsql-novice
Srinivas Iyyer wrote:
> Dear group,
>   I have two tables and a temp table where I uploaded
> data using \copy. This temp table has "3,348,107"
> lines of data.
>
> I wrote a plpgsql function to read each record in
> temp_table, take the firt col. data (temp_refseq_id)
> and get corresponding 'seq_id' from table B and insert
> into table A.
>
> I started this process 8 hrs back. It has been running
> for last 8 hrs and yet it is not finished. the reason
> i did this temp table thing was to speedup process by
> writing a server side function.  I still did not get
> to win over the time issue here. If this is a
> continuous problem I will have to search for another
> db system since my files from now on are huge and has
> over mil records. I am betting a lot of time in this
> case.
>
> Could any one help writing a faster function.
>
> thanks
> looking forward to hear from people.
>
> Temp_table:
>
>  temp_refseq_id |  temp_imageid  | temp_genbankacc
> ----------------+----------------+-----------------
>  NM_003604      | IMAGE:1099538  | AA594716
>  NM_003604      | IMAGE:853047   | AA668250
>  NM_001008860   | IMAGE:3640970  | BC011775
>  NM_001008860   | IMAGE:3640970  | BE737509
>  NM_001008860   | IMAGE:6040319  | BU079001
>  NM_001008860   | IMAGE:6040319  | BU078725
>  NM_001008860   | IMAGE:3451448  | BC000957
>  NM_001008860   | IMAGE:3451448  | BE539334
>  NM_001008860   | IMAGE:4794135  | BG708105
>  NM_001008860   | IMAGE:5214087  | BI911674
>
> Table A : (I want to upload data from temp to here)
>
>  spota_id | seq_id | spota_imageid | spota_genbacc
> ----------+--------+---------------+--------
>      23   |  54525 | IMAGE:1099538 | AA594716
>
>
>
>
> Table B : This table is seqdump table where seq_id is
> a FK in Table B
>
>  seq_id |  seq_acc    | seq_name
> ------------------------------
>   54519 | NM_152918   | EMR2
>   54520 | NM_001008860| CGGBP1
>   54521 | NM_020040   | TUBB4Q
>   54522 | NM_017525   | CDC42BPG
>   54523 | NM_032158   | WBSCR20C
>   54524 | NM_004047   | ATP6V0B
>   54525 | NM_003604   | PLCB3
>
>
> Function:
>
> CREATE FUNCTION tab_update() RETURNS integer AS '
> DECLARE
>         referrer_keys RECORD;
> BEGIN
>         FOR referrer_keys IN SELECT * from
> temp_spotanno LOOP
>                 INSERT INTO spotanno(seq_id,
> spotanno_imageid,spotanno_genbankacc) values((SELECT
> seq_id from seqdump where seq_acc =
> referrer_keys.temp_refseq_id),referrer_keys.temp_imageid,referrer_keys.temp_genbankacc);
>         END LOOP;
> return 0;
>
> END;
> ' LANGUAGE plpgsql;
>
>
> Thanks
> Sri
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
Hi, Sri.

I would do this differently.  You don't need the function I don't
think.  I would do something like this (untested):

insert into spontanno (seq_id, spontanno_imageid, spontanno_genbankacc)
select seqdump.seq_id, referrer_keys.temp_imageid,
    referrer_keys.temp.genbankacc from referrer_keys, seqdump
    where referrer_keys.temp_refseq_id=seqdump.seq_acc;

Make sure there is an index on seq_acc and temp_refseq_id.  This should
be done in seconds to minutes, not hours.

Sean

pgsql-novice by date:

Previous
From: Srinivas Iyyer
Date:
Subject: Time consuming process ...3 million records please help
Next
From: Srinivas Iyyer
Date:
Subject: Re: Time consuming process ...3 million records please help