Thread: Dump performance problems following server crash

Dump performance problems following server crash

From
Kim
Date:
Hello Performance,

Yesterday, with help from the admin list, we took care of a problem we were having with creating tables/views/indexes following a server overheat & crash (an index on pg_attribute was corrupted, causing the create to hang and go through the roof on memory usage until it failed out - a reindex fixed it like charm). Following the repair of creates (we also took the system into single user to run reindex system), and with no other known problems with the db itself, we immediately began a dump of the database.

Typical dump time: ~12 hours, we dump overnight but there is still decently heavy activity. However, this dump has the box to itself and after 10 hours we are only about 20% done just with pulling schema for the indexes - something that typically takes it 4-6 hours to complete all schema entirely. Load on the machine is minimal, along with memory usage by the dump process itself (864M, not large for this system). It is definitely moving, but just very slowly. At this point we are attempting to determine if this is a machine level problem (which we haven't seen sign of yet) or still a potential problem in postgres. The dump is currently doing I/O at 10mbps, but in testing our sys admin reports he has no problem getting stronger I/O stats from other processes.

The current dump query running:
SELECT t.tableoid, t.oid, t.relname as indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid = t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid = c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname

Amount of time it took me to run the query from console: ~5secs (I'm counting in my head, sophisticated, eh?)

We tend to respond to slow queries with vacs and analyzes, but considering these are system tables that have recently been reindexed, how likely is it that we could improve things by doing a vac? At this point we plan to halt the dump and run a vac full on the db, but any ideas you may have as to why the dump is sluggish on getting this information, I'd appreciate them.

spec info -
Postgres 8.1.4
db size: 200+ GB
101,745 tables
314,821 indexes
1,569 views
maintenance_work_mem = 262144

Server:
OS: Solaris 10
Sunfire X4100 XL
2x AMD Opteron Model 275 dual core procs
8GB of ram
(this server overheated & crashed due to a cooling problem at the hosting service)

On top of a:
Sun Storedge 6130
14x 146GB Drives in a Raid 5
Brocade 200E switches
Emulex 4gb HBAs
(this server had no known problems)

Thanks in advance,
Kim Hatcher

Re: Dump performance problems following server crash

From
Tom Lane
Date:
Kim <kim@myemma.com> writes:
> The current dump query running:
> SELECT t.tableoid, t.oid, t.relname as indexname,
> pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as
> indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid
> as contableoid, c.oid as conoid, (SELECT spcname FROM
> pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace
> FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
> i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
> t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
> pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
> = c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname

> Amount of time it took me to run the query from console: ~5secs (I'm
> counting in my head, sophisticated, eh?)

Even 5 seconds is way too long.  You've apparently still got something
corrupted somewhere.  Did you reindex *all* the system catalogs?

            regards, tom lane

Re: Dump performance problems following server crash

From
Kim
Date:
We dropped into single user mode and ran reindex system - it was my understanding this would reindex them all, including shared catalogs - but perhaps not?

Kim

Tom Lane wrote:
Kim <kim@myemma.com> writes: 
The current dump query running:
SELECT t.tableoid, t.oid, t.relname as indexname, 
pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef, t.relnatts as 
indnkeys, i.indkey, i.indisclustered, c.contype, c.conname, c.tableoid 
as contableoid, c.oid as conoid, (SELECT spcname FROM 
pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as tablespace 
FROM pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid = 
i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid = 
t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN 
pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid 
= c.oid) WHERE i.indrelid = '44240'::pg_catalog.oid ORDER BY indexname   
 
Amount of time it took me to run the query from console: ~5secs (I'm 
counting in my head, sophisticated, eh?)   
Even 5 seconds is way too long.  You've apparently still got something
corrupted somewhere.  Did you reindex *all* the system catalogs?
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
 

--

kim hatcher
senior developer, emma®
e: kim@myemma.com
p: 800 595 4401
w: www.myemma.com