Postgres server locks up - Mailing list pgsql-general
From | ryan_rs@c4.com (Ryan) |
---|---|
Subject | Postgres server locks up |
Date | |
Msg-id | 7299ab58.0110081128.50cf8fba@posting.google.com Whole thread Raw |
Responses |
Re: Postgres server locks up
|
List | pgsql-general |
I changed the number of shared buffers to 3000 and my database locks on a simple query. I must kill the database with pg_ctl stop -m i. Neither "smart" nor "fast" stops appear to succeed. One CPU gets pinned. When I set the number of shared buffers to 64 everything is fine. No data appears to have been corrupted, but I haven't been able to do a thorough check. I really didn't have a chance to do much of a postmortem. I only have what I can get from my logs: - schema - query - query plan - vacuum results - postgres log If anybody wants more data I can reproduce the problem. It is very repeatable. Also, I prefer to discover what went wrong rather than simply upgrade to 7.1.3 and hope for the best. None of the fixes/enhancements listed in 7.1.3 seem relevant to this problem. I am running postgres 7.1.2 on Solaris 5.7 E450, 4 processors, 2 gig ram. I added the following lines to /etc/system: set shmsys:shminfo_shmmax=0x10000000 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=256 set shmsys:shminfo_shmseg=256 set semsys:seminfo_semmap=256 set semsys:seminfo_semmni=512 set semsys:seminfo_semmsl=32 set semsys:seminfo_semmns=512 set rlim_fd_max=65535 set rlim_fd_cur=65535 Here is the query that locks the DB: select i8, i5, count(*), sum( float8mi(d2,d1) ), sum(i9)::int4 as doll from calls_1001548800 group by i5, i8; i8 has about 5 unique values. i5 has two unique values (boolean) plus NULL. The query was entered into psql by hand. Here is the table def: CREATE TABLE calls_1001548800 ( pk int4 primary key, t1 text NOT NULL, t2 text NOT NULL, i1 int4 NOT NULL, i2 int4 NOT NULL, d1 double precision NOT NULL, d2 double precision NOT NULL, d3 double precision NOT NULL, d4 double precision NOT NULL, i3 int4 NOT NULL, d5 real NOT NULL, i4 int4 NOT NULL, t3 text, i5 boolean, i6 int4 NOT NULL, i7 int4 NOT NULL, i8 int2 NOT NULL, i9 int4, t4 text ); CREATE INDEX calls_1001548800_d21 on calls_1001548800 ( float8mi(d2,d1) ); CREATE INDEX calls_1001548800_i3 on calls_1001548800 ( i3 ); CREATE INDEX calls_1001548800_i9 on calls_1001548800 ( i9 ); Vacuum of the table (before the query): 5681:DEBUG: --Relation calls_1001548800-- 5682:DEBUG: Pages 915: Changed 54, reaped 559, Empty 0, New 0; Tup 33842: Vac 1077, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 1 42, MaxLen 648; Re-using: Free/Avail. Space 279912/279912; EndEmpty/Avail. Pages 0/559. CPU 0.00s/0.11u sec. 5683:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842: Deleted 1077. CPU 0.00s/0.66u sec. 5684:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842: Deleted 1077. CPU 0.00s/0.70u sec. 5685:DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES 5686:DEBUG: Index calls_1001548800_i3: Pages 98; Tuples 33842: Deleted 1077. CPU 0.39s/0.78u sec. 5687:DEBUG: Index calls_1001548800_i9: Pages 100; Tuples 33842: Deleted 1077. CPU 0.03s/0.66u sec. 5688:DEBUG: Rel calls_1001548800: Pages: 915 --> 886; Tuple(s) moved: 1122. CPU 0.00s/1.16u sec. 5689:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842: Deleted 1122. CPU 0.00s/0.38u sec. 5690:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842: Deleted 1122. CPU 0.00s/0.40u sec. 5691:DEBUG: Index calls_1001548800_i3: Pages 102; Tuples 33842: Deleted 1122. CPU 0.01s/0.39u sec. 5692:DEBUG: Index calls_1001548800_i9: Pages 104; Tuples 33842: Deleted 1122. CPU 0.00s/0.40u sec. Here is a snippet of the log file showing the query plan and the events after the query. 7553:NOTICE: QUERY PLAN: 7554: 7555:Aggregate (cost=3770.44..4108.86 rows=3384 width=19) 7556: -> Group (cost=3770.44..3939.65 rows=33842 width=19) 7557: -> Sort (cost=3770.44..3770.44 rows=33842 width=19) 7558: -> Seq Scan on calls_1001548800 (cost=0.00..1224.42 rows=33 842 width=19) 7559: 7560:Smart Shutdown request at Tue Oct 2 18:34:08 2001 7561:Fast Shutdown request at Tue Oct 2 18:35:03 2001 7562:Aborting any active transaction... 7563:FATAL 1: This connection has been terminated by the administrator. 7564:Immediate Shutdown request at Tue Oct 2 18:39:47 2001 7565:NOTICE: Message from PostgreSQL backend: 7566: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. 7567: I have rolled back the current transaction and am going to termina te your database system connection and exit. 7568: Please reconnect to the database system and repeat your query. 7569:DEBUG: database system was interrupted at 2001-10-02 18:11:06 PDT 7570:DEBUG: CheckPoint record at (39, 1811142680) 7571:DEBUG: Redo record at (39, 1811142680); Undo record at (0, 0); Shutdown FA LSE 7572:DEBUG: NextTransactionId: 2086582; NextOid: 134710037 7573:DEBUG: database system was not properly shut down; automatic recovery in p rogress... 7574:DEBUG: ReadRecord: record with zero len at (39, 1811142744) 7575:DEBUG: redo is not required 7576:DEBUG: database system is in production state Ryan
pgsql-general by date: