Re: Postgres server locks up, HELP! - Mailing list pgsql-hackers
From | ryan_rs@c4.com (Ryan) |
---|---|
Subject | Re: Postgres server locks up, HELP! |
Date | |
Msg-id | 7299ab58.0110101007.4fcd7491@posting.google.com Whole thread Raw |
List | pgsql-hackers |
Tom, Bruce, any suggestions? ryan_rs@c4.com (Ryan) wrote in message news:<7299ab58.0110081128.50cf8fba@posting.google.com>... > I changed the number of shared buffers to 3000 and my database locks > on a simple query. I must kill the database with pg_ctl stop -m i. > Neither "smart" nor "fast" stops appear to succeed. One CPU gets > pinned. > > When I set the number of shared buffers to 64 everything is fine. No > data appears to have been corrupted, but I haven't been able to do a > thorough check. I really didn't have a chance to do much of a > postmortem. I only have what I can get from my logs: > - schema > - query > - query plan > - vacuum results > - postgres log > > If anybody wants more data I can reproduce the problem. It is very > repeatable. Also, I prefer to discover what went wrong rather than > simply upgrade to 7.1.3 and hope for the best. None of the > fixes/enhancements listed in 7.1.3 seem relevant to this problem. > > I am running postgres 7.1.2 on Solaris 5.7 E450, 4 processors, 2 gig > ram. > > I added the following lines to /etc/system: > set shmsys:shminfo_shmmax=0x10000000 > set shmsys:shminfo_shmmin=1 > set shmsys:shminfo_shmmni=256 > set shmsys:shminfo_shmseg=256 > set semsys:seminfo_semmap=256 > set semsys:seminfo_semmni=512 > set semsys:seminfo_semmsl=32 > set semsys:seminfo_semmns=512 > set rlim_fd_max=65535 > set rlim_fd_cur=65535 > > Here is the query that locks the DB: > select i8, i5, count(*), sum( float8mi(d2,d1) ), sum(i9)::int4 as > doll from > calls_1001548800 group by i5, i8; > i8 has about 5 unique values. > i5 has two unique values (boolean) plus NULL. > > The query was entered into psql by hand. > > > Here is the table def: > CREATE TABLE calls_1001548800 ( > pk int4 primary key, > t1 text NOT NULL, > t2 text NOT NULL, > i1 int4 NOT NULL, > i2 int4 NOT NULL, > d1 double precision NOT NULL, > d2 double precision NOT NULL, > d3 double precision NOT NULL, > d4 double precision NOT NULL, > i3 int4 NOT NULL, > d5 real NOT NULL, > i4 int4 NOT NULL, > t3 text, > i5 boolean, > i6 int4 NOT NULL, > i7 int4 NOT NULL, > i8 int2 NOT NULL, > i9 int4, > t4 text > ); > CREATE INDEX calls_1001548800_d21 on calls_1001548800 ( > float8mi(d2,d1) ); > CREATE INDEX calls_1001548800_i3 on calls_1001548800 ( i3 ); > CREATE INDEX calls_1001548800_i9 on calls_1001548800 ( i9 ); > > > Vacuum of the table (before the query): > > 5681:DEBUG: --Relation calls_1001548800-- > 5682:DEBUG: Pages 915: Changed 54, reaped 559, Empty 0, New 0; Tup > 33842: Vac 1077, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 1 > 42, MaxLen 648; Re-using: Free/Avail. Space 279912/279912; > EndEmpty/Avail. Pages 0/559. CPU 0.00s/0.11u sec. > 5683:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842: > Deleted 1077. CPU 0.00s/0.66u sec. > 5684:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842: > Deleted 1077. CPU 0.00s/0.70u sec. > 5685:DEBUG: XLogWrite: new log file created - consider increasing > WAL_FILES > 5686:DEBUG: Index calls_1001548800_i3: Pages 98; Tuples 33842: > Deleted 1077. CPU 0.39s/0.78u sec. > 5687:DEBUG: Index calls_1001548800_i9: Pages 100; Tuples 33842: > Deleted 1077. CPU 0.03s/0.66u sec. > 5688:DEBUG: Rel calls_1001548800: Pages: 915 --> 886; Tuple(s) moved: > 1122. CPU 0.00s/1.16u sec. > 5689:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842: > Deleted 1122. CPU 0.00s/0.38u sec. > 5690:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842: > Deleted 1122. CPU 0.00s/0.40u sec. > 5691:DEBUG: Index calls_1001548800_i3: Pages 102; Tuples 33842: > Deleted 1122. CPU 0.01s/0.39u sec. > 5692:DEBUG: Index calls_1001548800_i9: Pages 104; Tuples 33842: > Deleted 1122. CPU 0.00s/0.40u sec. > > > Here is a snippet of the log file showing the query plan and the > events after the query. > > 7553:NOTICE: QUERY PLAN: > 7554: > 7555:Aggregate (cost=3770.44..4108.86 rows=3384 width=19) > 7556: -> Group (cost=3770.44..3939.65 rows=33842 width=19) > 7557: -> Sort (cost=3770.44..3770.44 rows=33842 width=19) > 7558: -> Seq Scan on calls_1001548800 > (cost=0.00..1224.42 rows=33 > 842 width=19) > 7559: > 7560:Smart Shutdown request at Tue Oct 2 18:34:08 2001 > 7561:Fast Shutdown request at Tue Oct 2 18:35:03 2001 > 7562:Aborting any active transaction... > 7563:FATAL 1: This connection has been terminated by the > administrator. > 7564:Immediate Shutdown request at Tue Oct 2 18:39:47 2001 > 7565:NOTICE: Message from PostgreSQL backend: > 7566: The Postmaster has informed me that some other backend died > abnormally > and possibly corrupted shared memory. > 7567: I have rolled back the current transaction and am going > to termina > te your database system connection and exit. > 7568: Please reconnect to the database system and repeat your query. > 7569:DEBUG: database system was interrupted at 2001-10-02 18:11:06 > PDT > 7570:DEBUG: CheckPoint record at (39, 1811142680) > 7571:DEBUG: Redo record at (39, 1811142680); Undo record at (0, 0); > Shutdown FA > LSE > 7572:DEBUG: NextTransactionId: 2086582; NextOid: 134710037 > 7573:DEBUG: database system was not properly shut down; automatic > recovery in p > rogress... > 7574:DEBUG: ReadRecord: record with zero len at (39, 1811142744) > 7575:DEBUG: redo is not required > 7576:DEBUG: database system is in production state > > > Ryan
pgsql-hackers by date: