Re: Performance Concern - Mailing list pgsql-performance
From | John Pagakis |
---|---|
Subject | Re: Performance Concern |
Date | |
Msg-id | KKEBKDPPLALEFHBEAOCCAECPDEAA.john@pagakis.com Whole thread Raw |
In response to | Re: Performance Concern (Sean Shanny <shannyconsulting@earthlink.net>) |
List | pgsql-performance |
Sean - I believe auto-commit was off (not at the box right now). I'll play with the commit interval; I know commits are expensive operations. Thanks for item 2. I was toying with the notion of pre-creating 100000 bazes off-loading them and then seeing if the COPY would be any faster; you saved me the effort of experimenting. Thanks for the benefit of your experience. __________________________________________________________________ John Pagakis Email: ih8spam_thebfh@toolsmythe.com "Oh, you hate your job? Why didn't you say so? There's a support group for that. It's called EVERYBODY, and they meet at the bar." -- Drew Carey This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com -----Original Message----- From: Sean Shanny [mailto:shannyconsulting@earthlink.net] Sent: Friday, October 24, 2003 11:31 AM To: john@pagakis.com Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance Concern John, Are you treating each insertion as a separate transaction? If so the performance will suffer. I am doing the same thing in building a data warehouse using PG. I have to load millions of records each night. I do two different things: 1) If I need to keep the insertions inside the java process I turn off auto-commit and every n insertions (5000 seems to give me the best performance for my setup) issue a commit. Make sure you do a final commit in a finally block so you don't miss anything. 2) Dump all the data to a file and then use a psql COPY <table> (columns) FROM 'file path' call to load it. Very fast. --sean John Pagakis wrote: >Greetings. > >I have a table that will require 100,000 rows initially. > >Assume the following (some of the field names have been changed for >confidentiality reasons): > >CREATE TABLE baz ( > baz_number CHAR(15) NOT NULL, > customer_id CHAR(39), > foobar_id INTEGER, > is_cancelled BOOL DEFAULT false NOT NULL, > create_user VARCHAR(60) NOT NULL, > create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, > last_update_user VARCHAR(60) NOT NULL, > last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, > CONSTRAINT PK_baz PRIMARY KEY (baz_number) >); > >ALTER TABLE baz > ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); > >ALTER TABLE baz > ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); > > >Using JDBC, it took approximately one hour to insert 100,000 records. I >have an algorithm to generate a unique baz_number - it is a mixture of alpha >and numerics. > >There is a purchase table; one purchase can have many associated baz >records, but the baz records will be pre-allocated - baz.customer_id allows >null. The act of purchasing a baz will cause baz.customer_id to be >populated from the customer_id (key) field in the purchase table. > >If it took an hour to insert 100,000 records, I can only imagine how much >time it will take if one customer were to attempt to purchase all 100,000 >baz. Certainly too long for a web page. > >I've not had to deal with this kind of volume in Postgres before; I have my >suspicions on what is wrong here (could it be using a CHAR( 15 ) as a key?) >but I'd *LOVE* any thoughts. > >Would I be better off making the key an identity field and not indexing on >baz_number? > >Thanks in advance for any help. > >__________________________________________________________________ >John Pagakis >Email: ih8spam_thebfh@toolsmythe.com > > >"The best way to make your dreams come true is to wake up." > -- Paul Valery > >This signature generated by > ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. > www.spazmodicfrog.com > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
pgsql-performance by date: