Race-condition with failed block-write? - Mailing list pgsql-bugs

From Arjen van der Meijden
Subject Race-condition with failed block-write?
Date
Msg-id 43214D5F.9080502@tweakers.net
Whole thread Raw
Responses Re: Race-condition with failed block-write?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Race-condition with failed block-write?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi list,

I noticed a development machine was responding terribly slow, and found
out postgresql had consumed all available memory:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres  7465  0.0  0.0  16636   928 ?        S    Sep05   0:00
/usr/bin/postmaster -D /var/lib/postgresql/data
postgres  7472  0.0 81.0 3144408 832032 ?      D    Sep05   0:56  \_
postgres: writer process
postgres  7473  0.0  0.0   7552   916 ?        S    Sep05   0:00  \_
postgres: stats buffer process
postgres  7474  0.0  0.0   6848   900 ?        S    Sep05   0:00      \_
postgres: stats collector process

It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT
enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo +
2.6.12.5 kernel.
The postgresql version that failed was 8.0.3, it may or may not be worth
knowing that it runs a 8.1devel on another port and from another directory.

The 8.0.3-database was not actively in use, since our current
development work was using the 8.1devel.

In the postgresql.log a write-failure messages was repeated enough to
make the log file 50MB larger:

This was the first one:
[ - 2005-09-07 13:03:47 CEST @] ERROR:  xlog flush request 29/67713428
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:47 CEST @] CONTEXT:  writing block 21 of relation
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] ERROR:  xlog flush request 29/67713428
is not satisfied --- flushed only to 29/2E73E488
[ - 2005-09-07 13:03:48 CEST @] CONTEXT:  writing block 21 of relation
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] WARNING:  could not write block 21 of
1663/2013826/9975789
[ - 2005-09-07 13:03:48 CEST @] DETAIL:  Multiple failures --- write
error may be permanent.


And those 4 lines were repeated up untill the memory was empty, as it
seems from the log:


TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free
(924739 chunks); -1633819096 used
MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used
Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used
DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used
smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used
LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0
chunks); 2080 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
[ - 2005-09-09 02:42:22 CEST @] ERROR:  out of memory
[ - 2005-09-09 02:42:22 CEST @] DETAIL:  Failed on request of size 16000.


After issueing an immediate-shutdown, it starts and runs just fine:

[ - 2005-09-09 10:19:51 CEST @] LOG:  received fast shutdown request
[ - 2005-09-09 10:23:01 CEST @] LOG:  received immediate shutdown request
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system was interrupted at
2005-09-05 11:24:08 CEST
[ - 2005-09-09 10:27:20 CEST @] LOG:  checkpoint record is at 29/2E73E44C
[ - 2005-09-09 10:27:20 CEST @] LOG:  redo record is at 29/2E73E44C;
undo record is at 0/0; shutdown TRUE
[ - 2005-09-09 10:27:20 CEST @] LOG:  next transaction ID: 166361; next
OID: 55768251
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system was not properly
shut down; automatic recovery in progress
[ - 2005-09-09 10:27:20 CEST @] LOG:  record with zero length at 29/2E73E488
[ - 2005-09-09 10:27:20 CEST @] LOG:  redo is not required
[ - 2005-09-09 10:27:20 CEST @] LOG:  database system is ready


At 2005-09-05 the machine was rebooted and the only query since then
(which may have triggered the issue) was:
[2005-09-07 13:03:35 CEST - 2005-09-07 13:03:37 CEST acm@acm] LOG:
statement: SELECT c.oid,
           n.nspname,
           c.relname
         FROM pg_catalog.pg_class c
              LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
         WHERE pg_catalog.pg_table_is_visible(c.oid)
               AND c.relname ~ '^pwprijs$'
         ORDER BY 2, 3;

I shut down the postgresql-server after the restart again to prevent
further changes to the data-files of it.

Best regards,

Arjen van der Meijden

pgsql-bugs by date:

Previous
From: Jim.Gray@Bull.com
Date:
Subject: Re: BUG #1862: ECPG Connect, host variable trailing blanks
Next
From: ""
Date:
Subject: BUG #1871: operations with data types