Re: index creation order? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: index creation order?
Date
Msg-id 200310310910.30504.josh@agliodbs.com
Whole thread Raw
In response to index creation order?  (Allen Landsidel <all@biosys.net>)
Responses Re: index creation order?  (Allen Landsidel <all@biosys.net>)
List pgsql-performance
Allen,

> a) CREATE TABLE with no indexes or keys.  Run the COPY (fast, ~30min), then
> CREATE INDEX on each column it's needed on, and ALTER TABLE for the pk and
> each fk needed.

Did you ANALYZE after the copy?

> If there isn't a significant difference between all of them, performance
> wise, I think something is dreadfully wrong here.  Running "a", the ALTER
> TABLE to add the PK ran for 17 hours and still wasn't finished.

Adding the *primary key* locked up?   This seems unlikely; we have a known
problem with *foreign* keys until the current beta.  But I've added primary
keys on 20Gb tables and had it complete in a couple of hours.  Ignore this
adivice and look for Stephan Szabo's FK patch instead if what you really
meant was that the FK creation locked up.

> shared_buffers = 30000
hmmm ... 236MB ....
> max_fsm_pages = 2000000
2MB, fine ...
> wal_buffers = 128
1MB, also fine ...
> sort_mem = 1310720 (1.2GB)
Problem here.   As documented everywhere, sort_mem is allocated *per sort* not
per query, user, or shared.   This means that if the "add PK" operation
involves 2 or more sorts (not sure, haven't tested it), then you're
allocating .7GB RAM more than you acutally have.  This may be the cause of
your problem, particularly if *anything* is going on concurrent to the load.

> checkpoint_segments = 64
IF you have the disk space (+ 2GB) I'd raise this to 150-300 during the load
operation.

> commit_delay = 20000
> commit_siblings = 2
These settings are for heavy multi-user update activity. They are not useful
for a single-user load, and may even lower performance.

> stats_start_collector = true
> stats_command_string = true
> stats_row_level = true
> stats_block_level = true

If you can do without stats collection during load, I would suggest that you
do so.  The above add both RAM and I/O overhead to your operation.

--
Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Pg+Linux swap use
Next
From: Allen Landsidel
Date:
Subject: Re: index creation order?