Re: Postgresql + corrupted disk = data loss. (Need help for database recover) - Mailing list pgsql-general

From Oleg Serov
Subject Re: Postgresql + corrupted disk = data loss. (Need help for database recover)
Date
Msg-id CAMkKa8DMHb+vuy4k4v7_mDtbVoeqjw_V1hU8iQgG8U10ENujNg@mail.gmail.com
Whole thread Raw
In response to Re: Postgresql + corrupted disk = data loss. (Need help for database recover)  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-general
I think, the main problem is that, postgres reads wrong xlog-s file.

I lunched strace to postgres process, and then i grep the log:
# cat /tmp/strace-log  | fgrep xlog
5546  stat("pg_xlog", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546  stat("pg_xlog/archive_status", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546  open("pg_xlog/00000001.history", O_RDONLY) = -1 ENOENT (No such file or directory)
5546  open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
5546  open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4

There is only one file.

But i have a lot of files (i parsed filenames for more readable format):
0000000100000010000000D2        tli:    1       log:    16      seg:    210
0000000100000010000000EE        tli:    1       log:    16      seg:    238
0000000100000010000000D3        tli:    1       log:    16      seg:    211
0000000100000010000000E2        tli:    1       log:    16      seg:    226
0000000100000010000000D5        tli:    1       log:    16      seg:    213
0000000100000010000000E8        tli:    1       log:    16      seg:    232
0000000100000010000000F7        tli:    1       log:    16      seg:    247
0000000100000010000000DF        tli:    1       log:    16      seg:    223
0000000100000010000000DC        tli:    1       log:    16      seg:    220
0000000100000010000000E7        tli:    1       log:    16      seg:    231
0000000100000010000000EA        tli:    1       log:    16      seg:    234
0000000100000010000000D1        tli:    1       log:    16      seg:    209
0000000100000010000000DD        tli:    1       log:    16      seg:    221
0000000100000010000000F5        tli:    1       log:    16      seg:    245
0000000100000010000000E0        tli:    1       log:    16      seg:    224
0000000100000010000000EB        tli:    1       log:    16      seg:    235
0000000100000010000000D0        tli:    1       log:    16      seg:    208
0000000100000010000000F4        tli:    1       log:    16      seg:    244
0000000100000010000000F6        tli:    1       log:    16      seg:    246
0000000100000010000000D7        tli:    1       log:    16      seg:    215
0000000100000010000000DB        tli:    1       log:    16      seg:    219
0000000100000010000000E4        tli:    1       log:    16      seg:    228
0000000100000010000000DE        tli:    1       log:    16      seg:    222
0000000100000010000000E9        tli:    1       log:    16      seg:    233
0000000100000010000000D4        tli:    1       log:    16      seg:    212
0000000100000010000000D9        tli:    1       log:    16      seg:    217
0000000100000010000000F3        tli:    1       log:    16      seg:    243
0000000100000010000000E5        tli:    1       log:    16      seg:    229
0000000100000010000000DA        tli:    1       log:    16      seg:    218
0000000100000010000000EC        tli:    1       log:    16      seg:    236
0000000100000010000000D6        tli:    1       log:    16      seg:    214
0000000100000010000000EF        tli:    1       log:    16      seg:    239
0000000100000010000000E6        tli:    1       log:    16      seg:    230
0000000100000010000000E1        tli:    1       log:    16      seg:    225
0000000100000010000000F0        tli:    1       log:    16      seg:    240
0000000100000010000000D8        tli:    1       log:    16      seg:    216
0000000100000010000000CF        tli:    1       log:    16      seg:    207
0000000100000010000000ED        tli:    1       log:    16      seg:    237
0000000100000010000000E3        tli:    1       log:    16      seg:    227
0000000100000010000000F1        tli:    1       log:    16      seg:    241
0000000100000010000000F2        tli:    1       log:    16      seg:    242
0000000100000010000000F8        tli:    1       log:    16      seg:    248

So, main problem i think, is that pg_control file is corrupted (i guess).

So xlogreset-n sais:
bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data
could not change directory to "/root"
pg_control values:

First log file ID after reset:        16
First log file segment after reset:   249
pg_control version number:            843
Catalog version number:               200904091
Database system identifier:           5592178670599662815
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/7760685
Latest checkpoint's NextOID:          2556003
Latest checkpoint's NextMultiXactId:  3925
Latest checkpoint's NextMultiOffset:  7901
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value

And main question, how to force postgres to read this files?

3 декабря 2011 г. 18:10 пользователь Tomas Vondra <tv@fuzzy.cz> написал:
On 2.12.2011 09:16, Oleg Serov wrote:
> Hello!
>
> i've don't try to do reindex. There was enough space.

Not sure whether you tried to reindex or not. And what do you mean by
'there was enough space'? For example with ext2 (and ext3/ext4) it was
rather simple to exhaust inodes long before the device was actually
full. What filesystem are you using, anyway?

This seems like a I/O issue, you should check the hardware and the
settings (e.g. what caches are enabled etc.). Post more details, if
possible. Have you checked S.M.A.R.T. info from the drives?

> And i have a full data-directory backup, when i've stop server, before
> start.

Good. Have you moved it to a different machine? Otherwise you don't have
a backup, just a copy.

Tomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
С уважением

Олег

pgsql-general by date:

Previous
From: Torsten Zühlsdorff
Date:
Subject: Re: How to convert HEX to ASCII?
Next
From: Christoph Zwerschke
Date:
Subject: Re: Shared memory usage in PostgreSQL 9.1