Re: Postgres server locks up, HELP! - Mailing list pgsql-hackers

From ryan_rs@c4.com (Ryan)
Subject Re: Postgres server locks up, HELP!
Date
Msg-id 7299ab58.0110101007.4fcd7491@posting.google.com
Whole thread Raw
List pgsql-hackers
Tom, Bruce, any suggestions?

ryan_rs@c4.com (Ryan) wrote in message news:<7299ab58.0110081128.50cf8fba@posting.google.com>...
> I changed the number of shared buffers to 3000 and my database locks
> on a simple query.  I must kill the database with pg_ctl stop -m i. 
> Neither "smart" nor "fast" stops appear to succeed.  One CPU gets
> pinned.
> 
> When I set the number of shared buffers to 64 everything is fine.  No
> data appears to have been corrupted, but I haven't been able to do a
> thorough check.  I really didn't have a chance to do much of a
> postmortem.  I only have what I can get from my logs:
>  - schema
>  - query
>  - query plan
>  - vacuum results
>  - postgres log
> 
> If anybody wants more data I can reproduce the problem.  It is very
> repeatable.  Also, I prefer to discover what went wrong rather than
> simply upgrade to 7.1.3 and hope for the best.  None of the
> fixes/enhancements listed in 7.1.3 seem relevant to this problem.
> 
> I am running postgres 7.1.2 on Solaris 5.7 E450, 4 processors, 2 gig
> ram.
> 
> I added the following lines to /etc/system:
>   set shmsys:shminfo_shmmax=0x10000000
>   set shmsys:shminfo_shmmin=1
>   set shmsys:shminfo_shmmni=256
>   set shmsys:shminfo_shmseg=256
>   set semsys:seminfo_semmap=256
>   set semsys:seminfo_semmni=512
>   set semsys:seminfo_semmsl=32
>   set semsys:seminfo_semmns=512
>   set rlim_fd_max=65535
>   set rlim_fd_cur=65535
> 
> Here is the query that locks the DB:
>   select i8, i5, count(*), sum( float8mi(d2,d1) ), sum(i9)::int4 as
> doll from
>   calls_1001548800 group by i5, i8;
> i8 has about 5 unique values.
> i5 has two unique values (boolean) plus NULL.
> 
> The query was entered into psql by hand.
> 
> 
> Here is the table def:
>   CREATE TABLE calls_1001548800 (
>   pk    int4 primary key,
>   t1    text NOT NULL,
>   t2    text NOT NULL,
>   i1    int4 NOT NULL,
>   i2    int4 NOT NULL,
>   d1    double precision NOT NULL,
>   d2    double precision NOT NULL,
>   d3    double precision NOT NULL,
>   d4    double precision NOT NULL,
>   i3    int4 NOT NULL,
>   d5    real NOT NULL,
>   i4    int4 NOT NULL,
>   t3    text,
>   i5    boolean,
>   i6    int4 NOT NULL,
>   i7    int4 NOT NULL,
>   i8    int2 NOT NULL,
>   i9    int4,
>   t4    text
>   );
>   CREATE INDEX calls_1001548800_d21 on calls_1001548800 (
> float8mi(d2,d1) );
>   CREATE INDEX calls_1001548800_i3 on calls_1001548800 ( i3 );
>   CREATE INDEX calls_1001548800_i9 on calls_1001548800 ( i9 );
> 
> 
> Vacuum of the table (before the query):
> 
> 5681:DEBUG:  --Relation calls_1001548800--
> 5682:DEBUG:  Pages 915: Changed 54, reaped 559, Empty 0, New 0; Tup
> 33842: Vac 1077, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 1
> 42, MaxLen 648; Re-using: Free/Avail. Space 279912/279912;
> EndEmpty/Avail. Pages 0/559. CPU 0.00s/0.11u sec.
> 5683:DEBUG:  Index calls_1001548800_pkey: Pages 179; Tuples 33842:
> Deleted 1077. CPU 0.00s/0.66u sec.
> 5684:DEBUG:  Index calls_1001548800_d21: Pages 95; Tuples 33842:
> Deleted 1077. CPU 0.00s/0.70u sec.
> 5685:DEBUG:  XLogWrite: new log file created - consider increasing
> WAL_FILES
> 5686:DEBUG:  Index calls_1001548800_i3: Pages 98; Tuples 33842:
> Deleted 1077. CPU 0.39s/0.78u sec.
> 5687:DEBUG:  Index calls_1001548800_i9: Pages 100; Tuples 33842:
> Deleted 1077. CPU 0.03s/0.66u sec.
> 5688:DEBUG:  Rel calls_1001548800: Pages: 915 --> 886; Tuple(s) moved:
> 1122. CPU 0.00s/1.16u sec.
> 5689:DEBUG:  Index calls_1001548800_pkey: Pages 179; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.38u sec.
> 5690:DEBUG:  Index calls_1001548800_d21: Pages 95; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.40u sec.
> 5691:DEBUG:  Index calls_1001548800_i3: Pages 102; Tuples 33842:
> Deleted 1122. CPU 0.01s/0.39u sec.
> 5692:DEBUG:  Index calls_1001548800_i9: Pages 104; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.40u sec.
> 
> 
> Here is a snippet of the log file showing the query plan and the
> events after the query.
> 
> 7553:NOTICE:  QUERY PLAN:
> 7554:
> 7555:Aggregate  (cost=3770.44..4108.86 rows=3384 width=19)
> 7556:  ->  Group  (cost=3770.44..3939.65 rows=33842 width=19)
> 7557:        ->  Sort  (cost=3770.44..3770.44 rows=33842 width=19)
> 7558:              ->  Seq Scan on calls_1001548800 
> (cost=0.00..1224.42 rows=33
> 842 width=19)
> 7559:
> 7560:Smart Shutdown request at Tue Oct  2 18:34:08 2001
> 7561:Fast Shutdown request at Tue Oct  2 18:35:03 2001
> 7562:Aborting any active transaction...
> 7563:FATAL 1:  This connection has been terminated by the
> administrator.
> 7564:Immediate Shutdown request at Tue Oct  2 18:39:47 2001
> 7565:NOTICE:  Message from PostgreSQL backend:
> 7566:   The Postmaster has informed me that some other backend  died
> abnormally
> and possibly corrupted shared memory.
> 7567:   I have rolled back the current transaction and am       going
> to termina
> te your database system connection and exit.
> 7568:   Please reconnect to the database system and repeat your query.
> 7569:DEBUG:  database system was interrupted at 2001-10-02 18:11:06
> PDT
> 7570:DEBUG:  CheckPoint record at (39, 1811142680)
> 7571:DEBUG:  Redo record at (39, 1811142680); Undo record at (0, 0);
> Shutdown FA
> LSE
> 7572:DEBUG:  NextTransactionId: 2086582; NextOid: 134710037
> 7573:DEBUG:  database system was not properly shut down; automatic
> recovery in p
> rogress...
> 7574:DEBUG:  ReadRecord: record with zero len at (39, 1811142744)
> 7575:DEBUG:  redo is not required
> 7576:DEBUG:  database system is in production state
> 
> 
> Ryan


pgsql-hackers by date:

Previous
From: Patrice Hédé
Date:
Subject: Mule internal code ?
Next
From: "Dmitry Chernikov"
Date:
Subject: Problem in pg_dump 7.1.2 dump order