Postgres server locks up - Mailing list pgsql-general

From ryan_rs@c4.com (Ryan)
Subject Postgres server locks up
Date
Msg-id 7299ab58.0110081128.50cf8fba@posting.google.com
Whole thread Raw
Responses Re: Postgres server locks up  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I changed the number of shared buffers to 3000 and my database locks
on a simple query.  I must kill the database with pg_ctl stop -m i.
Neither "smart" nor "fast" stops appear to succeed.  One CPU gets
pinned.

When I set the number of shared buffers to 64 everything is fine.  No
data appears to have been corrupted, but I haven't been able to do a
thorough check.  I really didn't have a chance to do much of a
postmortem.  I only have what I can get from my logs:
 - schema
 - query
 - query plan
 - vacuum results
 - postgres log

If anybody wants more data I can reproduce the problem.  It is very
repeatable.  Also, I prefer to discover what went wrong rather than
simply upgrade to 7.1.3 and hope for the best.  None of the
fixes/enhancements listed in 7.1.3 seem relevant to this problem.

I am running postgres 7.1.2 on Solaris 5.7 E450, 4 processors, 2 gig
ram.

I added the following lines to /etc/system:
  set shmsys:shminfo_shmmax=0x10000000
  set shmsys:shminfo_shmmin=1
  set shmsys:shminfo_shmmni=256
  set shmsys:shminfo_shmseg=256
  set semsys:seminfo_semmap=256
  set semsys:seminfo_semmni=512
  set semsys:seminfo_semmsl=32
  set semsys:seminfo_semmns=512
  set rlim_fd_max=65535
  set rlim_fd_cur=65535

Here is the query that locks the DB:
  select i8, i5, count(*), sum( float8mi(d2,d1) ), sum(i9)::int4 as
doll from
  calls_1001548800 group by i5, i8;
i8 has about 5 unique values.
i5 has two unique values (boolean) plus NULL.

The query was entered into psql by hand.


Here is the table def:
  CREATE TABLE calls_1001548800 (
  pk    int4 primary key,
  t1    text NOT NULL,
  t2    text NOT NULL,
  i1    int4 NOT NULL,
  i2    int4 NOT NULL,
  d1    double precision NOT NULL,
  d2    double precision NOT NULL,
  d3    double precision NOT NULL,
  d4    double precision NOT NULL,
  i3    int4 NOT NULL,
  d5    real NOT NULL,
  i4    int4 NOT NULL,
  t3    text,
  i5    boolean,
  i6    int4 NOT NULL,
  i7    int4 NOT NULL,
  i8    int2 NOT NULL,
  i9    int4,
  t4    text
  );
  CREATE INDEX calls_1001548800_d21 on calls_1001548800 (
float8mi(d2,d1) );
  CREATE INDEX calls_1001548800_i3 on calls_1001548800 ( i3 );
  CREATE INDEX calls_1001548800_i9 on calls_1001548800 ( i9 );


Vacuum of the table (before the query):

5681:DEBUG:  --Relation calls_1001548800--
5682:DEBUG:  Pages 915: Changed 54, reaped 559, Empty 0, New 0; Tup
33842: Vac 1077, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 1
42, MaxLen 648; Re-using: Free/Avail. Space 279912/279912;
EndEmpty/Avail. Pages 0/559. CPU 0.00s/0.11u sec.
5683:DEBUG:  Index calls_1001548800_pkey: Pages 179; Tuples 33842:
Deleted 1077. CPU 0.00s/0.66u sec.
5684:DEBUG:  Index calls_1001548800_d21: Pages 95; Tuples 33842:
Deleted 1077. CPU 0.00s/0.70u sec.
5685:DEBUG:  XLogWrite: new log file created - consider increasing
WAL_FILES
5686:DEBUG:  Index calls_1001548800_i3: Pages 98; Tuples 33842:
Deleted 1077. CPU 0.39s/0.78u sec.
5687:DEBUG:  Index calls_1001548800_i9: Pages 100; Tuples 33842:
Deleted 1077. CPU 0.03s/0.66u sec.
5688:DEBUG:  Rel calls_1001548800: Pages: 915 --> 886; Tuple(s) moved:
1122. CPU 0.00s/1.16u sec.
5689:DEBUG:  Index calls_1001548800_pkey: Pages 179; Tuples 33842:
Deleted 1122. CPU 0.00s/0.38u sec.
5690:DEBUG:  Index calls_1001548800_d21: Pages 95; Tuples 33842:
Deleted 1122. CPU 0.00s/0.40u sec.
5691:DEBUG:  Index calls_1001548800_i3: Pages 102; Tuples 33842:
Deleted 1122. CPU 0.01s/0.39u sec.
5692:DEBUG:  Index calls_1001548800_i9: Pages 104; Tuples 33842:
Deleted 1122. CPU 0.00s/0.40u sec.


Here is a snippet of the log file showing the query plan and the
events after the query.

7553:NOTICE:  QUERY PLAN:
7554:
7555:Aggregate  (cost=3770.44..4108.86 rows=3384 width=19)
7556:  ->  Group  (cost=3770.44..3939.65 rows=33842 width=19)
7557:        ->  Sort  (cost=3770.44..3770.44 rows=33842 width=19)
7558:              ->  Seq Scan on calls_1001548800
(cost=0.00..1224.42 rows=33
842 width=19)
7559:
7560:Smart Shutdown request at Tue Oct  2 18:34:08 2001
7561:Fast Shutdown request at Tue Oct  2 18:35:03 2001
7562:Aborting any active transaction...
7563:FATAL 1:  This connection has been terminated by the
administrator.
7564:Immediate Shutdown request at Tue Oct  2 18:39:47 2001
7565:NOTICE:  Message from PostgreSQL backend:
7566:   The Postmaster has informed me that some other backend  died
abnormally
and possibly corrupted shared memory.
7567:   I have rolled back the current transaction and am       going
to termina
te your database system connection and exit.
7568:   Please reconnect to the database system and repeat your query.
7569:DEBUG:  database system was interrupted at 2001-10-02 18:11:06
PDT
7570:DEBUG:  CheckPoint record at (39, 1811142680)
7571:DEBUG:  Redo record at (39, 1811142680); Undo record at (0, 0);
Shutdown FA
LSE
7572:DEBUG:  NextTransactionId: 2086582; NextOid: 134710037
7573:DEBUG:  database system was not properly shut down; automatic
recovery in p
rogress...
7574:DEBUG:  ReadRecord: record with zero len at (39, 1811142744)
7575:DEBUG:  redo is not required
7576:DEBUG:  database system is in production state


Ryan

pgsql-general by date:

Previous
From: Kevin HaleBoyes
Date:
Subject: triggers
Next
From: Laurent MARTIN
Date:
Subject: Help with DB corruption: corrupted catalogs!!!