Re: Performance Concern - Mailing list pgsql-performance
From | John Pagakis |
---|---|
Subject | Re: Performance Concern |
Date | |
Msg-id | KKEBKDPPLALEFHBEAOCCIEDDDEAA.thebfh@toolsmythe.com Whole thread Raw |
In response to | Re: Performance Concern (Christopher Browne <cbbrowne@libertyrms.info>) |
Responses |
Re: Performance Concern
|
List | pgsql-performance |
Christopher - Thanks. Answer 1: I believe auto commit was off (but I'm not at my dev box right now). I'll double-check that and the commit interval. Answer 2: Ah ha!! No indexes on FKs. I'll try that. Yes, each baz is a uniquely identifiable. I had started a SP to create gen the key but scrapped it when I saw no rand() function in pgpsql. Did I miss something? Turns out switching to ints no improvement on the inserts but a rather large one on the updates. Also, I saw evidence in my testing that Postgres seemed to like doing single updates as opposed to being handed a group of updates; see my response to Rod Taylor's post here (and Rod, if you're reading this: you were *GREAT* in "The Time Machine" <g>!! Answer 3: Oh, there was no question three .... <g>!! Thanks again Christopher!! __________________________________________________________________ John Pagakis Email: ih8spam_thebfh@toolsmythe.com "I am *SINCERE* about life, but I'm not *SERIOUS* about it." -- Alan Watts This signature generated by ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc. www.spazmodicfrog.com -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Christopher Browne Sent: Friday, October 24, 2003 12:11 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance Concern john@pagakis.com ("John Pagakis") writes: > Greetings. > > I have a table that will require 100,000 rows initially. > > Assume the following (some of the field names have been changed for > confidentiality reasons): > > CREATE TABLE baz ( > baz_number CHAR(15) NOT NULL, > customer_id CHAR(39), > foobar_id INTEGER, > is_cancelled BOOL DEFAULT false NOT NULL, > create_user VARCHAR(60) NOT NULL, > create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, > last_update_user VARCHAR(60) NOT NULL, > last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL, > CONSTRAINT PK_baz PRIMARY KEY (baz_number) > ); > > ALTER TABLE baz > ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id); > > ALTER TABLE baz > ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id); > > Using JDBC, it took approximately one hour to insert 100,000 records. I > have an algorithm to generate a unique baz_number - it is a mixture of alpha > and numerics. Question #1: How did you do the inserts? If AUTO-COMMIT was turned on, then that would indicate that you invoked 100,000 transactions, and that would contribute considerably to the process being slow. Put them all in as one transaction and you'd probably see it run in a fraction of the time. Question #2. Do you have indices on purchase(customer_id) and on foobar(foobar_id)? If not, then the foreign key check would be rather inefficient. > There is a purchase table; one purchase can have many associated baz > records, but the baz records will be pre-allocated - baz.customer_id > allows null. The act of purchasing a baz will cause baz.customer_id > to be populated from the customer_id (key) field in the purchase > table. > > If it took an hour to insert 100,000 records, I can only imagine how > much time it will take if one customer were to attempt to purchase > all 100,000 baz. Certainly too long for a web page. I take it that each "baz" is a uniquely identifiable product, akin to (say) an RSA certificate or the like? By the way, if you set up a stored procedure in PostgreSQL that can generate the "baz_number" identifiers, you could probably do the inserts Right Well Fast... Consider the following. I have a stored procedure, genauth(), which generates quasi-random values. (They're passwords, sort of...) cctld=# explain analyze insert into baz (baz_number, create_user, last_update_user) cctld-# select substr(genauth(), 1, 15), 'cbbrowne', 'cbbrowne' from big_table; QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------- Seq Scan on big_table (cost=0.00..789.88 rows=28988 width=0) (actual time=0.20..1713.60 rows=28988 loops=1) Total runtime: 3197.40 msec (2 rows) It took about 3 seconds to insert 28988 rows into baz. (big_table, also renamed, to protect the innocent, has 28988 rows. I didn't care about its contents, just that it had a bunch of rows.) And the above is on a cheap desktop PC with IDE disk. > I've not had to deal with this kind of volume in Postgres before; I > have my suspicions on what is wrong here (could it be using a CHAR( > 15 ) as a key?) but I'd *LOVE* any thoughts. > Would I be better off making the key an identity field and not > indexing on baz_number? That might be something of an improvement, but it oughtn't be cripplingly different to use a text field rather than an integer. What's crippling is submitting 100,000 queries in 100,000 transactions. Cut THAT down to size and you'll see performance return to being reasonable. -- "cbbrowne","@","libertyrms.info" <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land) ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
pgsql-performance by date: