Very bizarre bug with corrupted index - Mailing list pgsql-bugs
From | Josh Berkus |
---|---|
Subject | Very bizarre bug with corrupted index |
Date | |
Msg-id | 200305021108.22359.josh@agliodbs.com Whole thread Raw |
Responses |
Re: Very bizarre bug with corrupted index
|
List | pgsql-bugs |
Folks, Here's the setup: The machine: an x86 machine running generic Linux and about 15 server=20 applications other than PostgreSQL; it is short on both RAM and disk I/O,= =20 resulting in frequent use of swap, which is also slow. The database: a medium-large (> 100mb) multi-function database including= =20 several large tables (> 200,000 records) The index: the index in question is on an aggregates-caching table; that is= ,=20 when updates are made to one of the largest tables, a trigger (written in C= =20 by Joe Conway) automatically updates category totals in the caching table.= =20=20 the index is on the category column of the table, which serves as the prima= ry=20 key. The story:=20=20=20 -- At 2am, the machine had an unexpected power-out and restarted apparently= =20 without problems (the logs show no significant errors at startup, anyway). -- At 8pm, a user makes 3 updates to the table launching the triggers. The= y=20 get an error message: "ERROR: Index pk_table_aggs is not a btree" What's= =20 really puzzling about the client's report is that the first two updates=20 appear to have succeeded, and only the 3rd update triggered the error and= =20 failed (according to the client). -- After trying a number of things and documenting the problem, the client= =20 dropped and rebuilt the corrupted index and things worked normally. Here are the relevant portions of the database log: 2003-04-21 02:10:17 [2857] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:10:38 [2861] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:10:38 [2860] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:14:38 [1521] DEBUG: database system was interrupted at=20 2003-04-21 02:07:20 UTC 2003-04-21 02:14:38 [1521] DEBUG: checkpoint record is at 6/76D71AB4 2003-04-21 02:14:38 [1521] DEBUG: redo record is at 6/76D71AB4; undo rec= ord=20 is at 0/0; shutdown FALSE 2003-04-21 02:14:38 [1521] DEBUG: next transaction id: 63994551; next oi= d:=20 2377601 2003-04-21 02:14:38 [1521] DEBUG: database system was not properly shut= =20 down; automatic recovery in progress 2003-04-21 02:14:38 [1521] DEBUG: redo starts at 6/76D71AF4 2003-04-21 02:14:40 [1521] DEBUG: ReadRecord: record with zero length at= =20 6/7725D26C 2003-04-21 02:14:40 [1521] DEBUG: redo done at 6/7725D248 2003-04-21 02:14:44 [1521] DEBUG: database system is ready 2003-04-21 02:15:51 [2914] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:15:51 [2913] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:15:51 [2912] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:15:51 [2911] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:15:51 [2908] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 02:19:47 [3086] DEBUG: recycled transaction log file=20 0000000600000076 *** 2003-04-21 20:03:08 [2555] ERROR: Index pk_table_aggs is not a btree 2003-04-21 20:03:19 [2828] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 20:03:19 [2825] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 20:03:19 [2822] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 20:03:19 [2818] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection=20 2003-04-21 20:03:19 [2555] DEBUG: pq_recvbuf: unexpected EOF on client= =20 connection 2003-04-21 20:03:20 [18330] ERROR: Index pk_table_aggs is not a btree Copies of the actual corrupted index are available. An examination of the= m=20 shows something really bizarre; the entire index is zero-filled! It all= =20 looks like this: ***************************************************************** * PostgreSQL File/Block Formatted Dump Utility * * File: 2369452 * Options used: -idx -S 8192=20 * * Dump created on: Thu May 1 19:57:15 2003 ***************************************************************** Block 0 ****************************************************** 0000: 00000000 00000000 00000000 00000000 ................ 0010: 00000000 00000000 00000000 00000000 ................ 0020: 00000000 00000000 00000000 00000000 ................ 0030: 00000000 00000000 00000000 00000000 ................ 0040: 00000000 00000000 00000000 00000000 ................ 0050: 00000000 00000000 00000000 00000000 ................ 0060: 00000000 00000000 00000000 00000000 ................ 0070: 00000000 00000000 00000000 00000000 ................ 0080: 00000000 00000000 00000000 00000000 ................ 0090: 00000000 00000000 00000000 00000000 ................ 00a0: 00000000 00000000 00000000 00000000 ................ 00b0: 00000000 00000000 00000000 00000000 ................ 00c0: 00000000 00000000 00000000 00000000 ................ 00d0: 00000000 00000000 00000000 00000000 ................ 00e0: 00000000 00000000 00000000 00000000 ................ 00f0: 00000000 00000000 00000000 00000000 ................ 0100: 00000000 00000000 00000000 00000000 ................ 0110: 00000000 00000000 00000000 00000000 ................ 0120: 00000000 00000000 00000000 00000000 ................ 0130: 00000000 00000000 00000000 00000000 ................ 0140: 00000000 00000000 00000000 00000000 ................ 0150: 00000000 00000000 00000000 00000000 ................ 0160: 00000000 00000000 00000000 00000000 ................ 0170: 00000000 00000000 00000000 00000000 ............... etc. Questions: 1) Is this problem likely to be related to the unexpected power-out, or=20 unrelated? 2) What could cause even a corrupted index to be zero-filled? 3) Could this problem be due to a memory error since the client is using=20 non-ECC RAM? If so, how likely is it and can the chances be reduced? 4) Most importantly: Is this a Postgres bug that needs patching? --=20 Josh Berkus Aglio Database Solutions San Francisco
pgsql-bugs by date: