Re: Performance Concern - Mailing list pgsql-performance

From John Pagakis
Subject Re: Performance Concern
Date
Msg-id KKEBKDPPLALEFHBEAOCCIEDDDEAA.thebfh@toolsmythe.com
Whole thread Raw
In response to Re: Performance Concern  (Christopher Browne <cbbrowne@libertyrms.info>)
Responses Re: Performance Concern
List pgsql-performance
Christopher -
Thanks.

Answer 1:
I believe auto commit was off (but I'm not at my dev box right now).  I'll
double-check that and the commit interval.

Answer 2:
Ah ha!!  No indexes on FKs.  I'll try that.

Yes, each baz is a uniquely identifiable.  I had started a SP to create gen
the key but scrapped it when I saw no rand() function in pgpsql.  Did I miss
something?

Turns out switching to ints no improvement on the inserts but a rather large
one on the updates.  Also, I saw evidence in my testing that Postgres seemed
to like doing single updates as opposed to being handed a group of updates;
see my response to Rod Taylor's post here (and Rod, if you're reading this:
you were *GREAT* in "The Time Machine" <g>!!

Answer 3:
Oh, there was no question three .... <g>!!


Thanks again Christopher!!

__________________________________________________________________
John Pagakis
Email: ih8spam_thebfh@toolsmythe.com


"I am *SINCERE* about life, but I'm not *SERIOUS* about it."
        -- Alan Watts

This signature generated by
     ... and I Quote!!(tm) Copyright (c) 1999 SpaZmodic Frog Software, Inc.
                                              www.spazmodicfrog.com


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Christopher
Browne
Sent: Friday, October 24, 2003 12:11 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance Concern


john@pagakis.com ("John Pagakis") writes:
> Greetings.
>
> I have a table that will require 100,000 rows initially.
>
> Assume the following (some of the field names have been changed for
> confidentiality reasons):
>
> CREATE TABLE baz (
>     baz_number CHAR(15) NOT NULL,
>     customer_id CHAR(39),
>     foobar_id INTEGER,
>     is_cancelled BOOL DEFAULT false NOT NULL,
>     create_user VARCHAR(60) NOT NULL,
>     create_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
>     last_update_user VARCHAR(60) NOT NULL,
>     last_update_datetime TIMESTAMP DEFAULT 'now()' NOT NULL,
>     CONSTRAINT PK_baz PRIMARY KEY (baz_number)
> );
>
> ALTER TABLE baz
>     ADD FOREIGN KEY (customer_id) REFERENCES purchase (customer_id);
>
> ALTER TABLE baz
>     ADD FOREIGN KEY (foobar_id) REFERENCES foobar (foobar_id);
>
> Using JDBC, it took approximately one hour to insert 100,000 records.  I
> have an algorithm to generate a unique baz_number - it is a mixture of
alpha
> and numerics.

Question #1:  How did you do the inserts?

If AUTO-COMMIT was turned on, then that would indicate that you
invoked 100,000 transactions, and that would contribute considerably
to the process being slow.  Put them all in as one transaction and
you'd probably see it run in a fraction of the time.

Question #2.  Do you have indices on purchase(customer_id) and on
foobar(foobar_id)?

If not, then the foreign key check would be rather inefficient.

> There is a purchase table; one purchase can have many associated baz
> records, but the baz records will be pre-allocated - baz.customer_id
> allows null.  The act of purchasing a baz will cause baz.customer_id
> to be populated from the customer_id (key) field in the purchase
> table.
>
> If it took an hour to insert 100,000 records, I can only imagine how
> much time it will take if one customer were to attempt to purchase
> all 100,000 baz.  Certainly too long for a web page.

I take it that each "baz" is a uniquely identifiable product, akin to
(say) an RSA certificate or the like?

By the way, if you set up a stored procedure in PostgreSQL that can
generate the "baz_number" identifiers, you could probably do the
inserts Right Well Fast...

Consider the following.  I have a stored procedure, genauth(), which
generates quasi-random values.  (They're passwords, sort of...)

cctld=# explain analyze insert into baz (baz_number, create_user,
last_update_user)
cctld-#   select substr(genauth(), 1, 15), 'cbbrowne', 'cbbrowne' from
big_table;
                                                  QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------
 Seq Scan on big_table  (cost=0.00..789.88 rows=28988 width=0) (actual
time=0.20..1713.60 rows=28988 loops=1)
 Total runtime: 3197.40 msec
(2 rows)

It took about 3 seconds to insert 28988 rows into baz.  (big_table,
also renamed, to protect the innocent, has 28988 rows. I didn't care
about its contents, just that it had a bunch of rows.)

And the above is on a cheap desktop PC with IDE disk.

> I've not had to deal with this kind of volume in Postgres before; I
> have my suspicions on what is wrong here (could it be using a CHAR(
> 15 ) as a key?)  but I'd *LOVE* any thoughts.

> Would I be better off making the key an identity field and not
> indexing on baz_number?

That might be something of an improvement, but it oughtn't be
cripplingly different to use a text field rather than an integer.

What's crippling is submitting 100,000 queries in 100,000
transactions.  Cut THAT down to size and you'll see performance return
to being reasonable.
--
"cbbrowne","@","libertyrms.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


pgsql-performance by date:

Previous
From: Sean Shanny
Date:
Subject: Re: Performance Concern
Next
From: Hannu Krosing
Date:
Subject: Re: Performance Concern