Re: Please comment on pgsql speed at handling 550,000 records - Mailing list pgsql-novice
From | Srinivas Iyyer |
---|---|
Subject | Re: Please comment on pgsql speed at handling 550,000 records |
Date | |
Msg-id | 20060207183328.45953.qmail@web34507.mail.mud.yahoo.com Whole thread Raw |
In response to | Re: Please comment on pgsql speed at handling 550,000 records (Michael Swierczek <mike.swierczek@gmail.com>) |
List | pgsql-novice |
Hi Mike, I am using a Dell Precission 670, RAM: 2 GB Processor: two Intel® XeonTM 3.00GHz, 2MB L2 Cache OS : Red Hat® Enterprise Linux WS v4, Chipset : .Intel E7525 chipset Disks : 1.5TB SATA disk space (3x500); If I am not wrong they are 7200RPM disks thanks Srini --- Michael Swierczek <mike.swierczek@gmail.com> wrote: > Dear Srinivas, > It may be helpful to know what operating > system you are using, and > also the amount of system RAM, the processor type, > and the hard drive sizes > and speeds. > > -Mike > > On 2/7/06, Srinivas Iyyer > <srini_iyyer_bio@yahoo.com> wrote: > > > > > > dear group, > > I am running a file with 550,000 lines. This data > is > > not so complex. The scheme is > > Table A > > > > seq_id | seq_name | seq_identifier > > 1 | ABC | NM_0000023 > > .... > > > > (23k lines) > > > > > > Table B: > > > > gos_id | go_id | go_cat | go_name > > 1 GO:00003 | P | death > > ...... > > (7k) > > > > > > Now table C > > > > seq_id | gos_id > > 1 1 > > 1 | 2 > > 1 | 200 > > > > ... > > (550K lines) > > > > > > > > > > Table Temp_C > > seq_name | go_id > > ABC | GO:9993934 > > ABC | GO:3489343 > > .... > > (550,500 lines) > > > > > > question: > > > > TAble A and B populated easily without question. > > > > Table C, is completely a relationship table. > > I wrote a python script that writes all 500K lines > > with select statements inside. > > > > insert into tablec (seq_id,gos_id) values ( > > (select seq_id from table a where seq_name > ='xxxx'), > > (select gos_id from table b where go_id = > 'xxxx')); > > > > > > Such 500K + insert statements took a long time and > I > > had to abort it. > > > > 3.5 hrs of time, 300K lines got inserted > > > > > > I wrote a plpgsql function: > > > > create function gosup() returns integer AS ' > > DECLARE > > referrer_key RECORD; > > BEIGN > > FOR referrer_key IN SELECT * from TEMP_C LOOP > > INSERT INTO tableC(seq_id,gos_id) values( > > (select seq_id from table a where seq_name > > =referrer_key.seq_name), > > (select gos_id from table b where go_id = > > referrer_key.go_id)) > > END LOOP > > return 0; > > END; > > ' LANGUAGE plpgsql; > > > > so this function is written to make the inserting > > faster. > > > > I used \copy command to create temp_c. > > > > > > The ironical point is, even now it is taking over > an > > hour (still running as of now). > > > > So experts, > > > > do you really thing inserting 550K lines is really > a > > long time consuming step no matter what method you > > choose (such as regular insert statements or > writing a > > procedure to automate this step on server side). > > > > Could any one throw some light on DB performance > (no > > matter wheter it is postgres, oracle, SyBase or > > MySQL). > > > > In specific what is the state of efficiency in > > postgres. > > > > Thank you. > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please > send an appropriate > > subscribe-nomail command to > majordomo@postgresql.org so that your > > message can get through to the mailing list > cleanly > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
pgsql-novice by date: