Thread: Postgresql + corrupted disk = data loss. (Need help for database recover)
Hello, i have a problem.
I've got a production server, working fine. Then i've got strange error:
> ERROR: right sibling's left-link doesn't match: block 147 links to 407 instead of expected 146 in index "order_status_key"'
And decidet to backup all server. So i shut-down VPS with server and backup all data.
Then, after i booted it - and then - i've got Data loss.
I've lost data, that have been written to DB around 10-100 hours (different tables, have different last updated value).
Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
db= WARNING: pgstat wait timeout
ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076
5 days ago:
a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/526512E0
83238 db= WARNING: could not write block 54 of base/16384/2619
83239 db= CONTEXT: writing block 54 of relation base/16384/2619
And today:
18 db= LOG: could not open file "pg_xlog/000000010000000F00000052" (log file 15, segment 82):
19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/52FDF0E0
There is any ability to recover fresh data from database?
Thanks!
I've got a production server, working fine. Then i've got strange error:
> ERROR: right sibling's left-link doesn't match: block 147 links to 407 instead of expected 146 in index "order_status_key"'
And decidet to backup all server. So i shut-down VPS with server and backup all data.
Then, after i booted it - and then - i've got Data loss.
I've lost data, that have been written to DB around 10-100 hours (different tables, have different last updated value).
Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
db= WARNING: pgstat wait timeout
ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076
5 days ago:
a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/526512E0
83238 db= WARNING: could not write block 54 of base/16384/2619
83239 db= CONTEXT: writing block 54 of relation base/16384/2619
And today:
18 db= LOG: could not open file "pg_xlog/000000010000000F00000052" (log file 15, segment 82):
19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/52FDF0E0
There is any ability to recover fresh data from database?
Thanks!
And, i'm an idiot.
My DB version:
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit
--
С уважением
Олег
My DB version:
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit
2011/12/2 Oleg Serov <serovov@gmail.com>
Hello, i have a problem.
I've got a production server, working fine. Then i've got strange error:
> ERROR: right sibling's left-link doesn't match: block 147 links to 407 instead of expected 146 in index "order_status_key"'
And decidet to backup all server. So i shut-down VPS with server and backup all data.
Then, after i booted it - and then - i've got Data loss.
I've lost data, that have been written to DB around 10-100 hours (different tables, have different last updated value).
Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
db= WARNING: pgstat wait timeout
ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076
5 days ago:
a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/526512E0
83238 db= WARNING: could not write block 54 of base/16384/2619
83239 db= CONTEXT: writing block 54 of relation base/16384/2619
And today:
18 db= LOG: could not open file "pg_xlog/000000010000000F00000052" (log file 15, segment 82):
19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/52FDF0E0
There is any ability to recover fresh data from database?
Thanks!
--
С уважением
Олег
Re: Postgresql + corrupted disk = data loss. (Need help for database recover)
From
Venkat Balaji
Date:
2011/12/2 Oleg Serov <serovov@gmail.com>
And, i'm an idiot.
My DB version:
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit2011/12/2 Oleg Serov <serovov@gmail.com>Hello, i have a problem.
I've got a production server, working fine. Then i've got strange error:
> ERROR: right sibling's left-link doesn't match: block 147 links to 407 instead of expected 146 in index "order_status_key"'
And decidet to backup all server. So i shut-down VPS with server and backup all data.
Then, after i booted it - and then - i've got Data loss.
This seems to be an Index corruption. Did you try re-indexing ? Index creation might have failed, re-indexing would re-organize the Index tuples. If you are sure about disk corruption, try and "re-create" or "create concurrent Index" on a different disk.
I've lost data, that have been written to DB around 10-100 hours (different tables, have different last updated value).
Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
db= WARNING: pgstat wait timeout
ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076
This should be a free space issue, do you have enough space in "pg_stat_tmp" disk ?
5 days ago:
a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/526512E0
83238 db= WARNING: could not write block 54 of base/16384/2619
83239 db= CONTEXT: writing block 54 of relation base/16384/2619
And today:
18 db= LOG: could not open file "pg_xlog/000000010000000F00000052" (log file 15, segment 82):
19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/52FDF0E0
There is any ability to recover fresh data from database?
What kind of backups you have available ?
Thanks
VB
Hello!
i've don't try to do reindex. There was enough space.
And i have a full data-directory backup, when i've stop server, before start.
--
С уважением
Олег
i've don't try to do reindex. There was enough space.
And i have a full data-directory backup, when i've stop server, before start.
2011/12/2 Venkat Balaji <venkat.balaji@verse.in>
2011/12/2 Oleg Serov <serovov@gmail.com>And, i'm an idiot.
My DB version:
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51), 64-bit2011/12/2 Oleg Serov <serovov@gmail.com>Hello, i have a problem.
I've got a production server, working fine. Then i've got strange error:
> ERROR: right sibling's left-link doesn't match: block 147 links to 407 instead of expected 146 in index "order_status_key"'
And decidet to backup all server. So i shut-down VPS with server and backup all data.
Then, after i booted it - and then - i've got Data loss.This seems to be an Index corruption. Did you try re-indexing ? Index creation might have failed, re-indexing would re-organize the Index tuples. If you are sure about disk corruption, try and "re-create" or "create concurrent Index" on a different disk.
I've lost data, that have been written to DB around 10-100 hours (different tables, have different last updated value).
Then i've analyzed log, and found this:
7 days ago appears this errors:
db= LOG: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat":
db= WARNING: pgstat wait timeout
ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076This should be a free space issue, do you have enough space in "pg_stat_tmp" disk ?
5 days ago:
a lot of: ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/526512E0
83238 db= WARNING: could not write block 54 of base/16384/2619
83239 db= CONTEXT: writing block 54 of relation base/16384/2619And today:
18 db= LOG: could not open file "pg_xlog/000000010000000F00000052" (log file 15, segment 82):
19 db= ERROR: xlog flush request F/DC1A22D8 is not satisfied --- flushed only to F/52FDF0E0
There is any ability to recover fresh data from database?What kind of backups you have available ?
ThanksVB
--
С уважением
Олег
Re: Postgresql + corrupted disk = data loss. (Need help for database recover)
From
Craig Ringer
Date:
On 12/02/2011 09:08 AM, Oleg Serov wrote: > Then i've analyzed log, and found this: > 7 days ago appears this errors: > db= LOG: could not rename temporary statistics file > "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": > db= WARNING: pgstat wait timeout > ERROR: missing chunk number 0 for toast value 2550017 in pg_toast_17076 Now that you've taken a file-level backup (hopefully copied to a different computer), do you think it might be worth doing an fsck of the file system? I'm wondering if your underlying storage has been doing something dodgy. -- Craig Ringer
Re: Postgresql + corrupted disk = data loss. (Need help for database recover)
From
Tomas Vondra
Date:
On 2.12.2011 09:16, Oleg Serov wrote: > Hello! > > i've don't try to do reindex. There was enough space. Not sure whether you tried to reindex or not. And what do you mean by 'there was enough space'? For example with ext2 (and ext3/ext4) it was rather simple to exhaust inodes long before the device was actually full. What filesystem are you using, anyway? This seems like a I/O issue, you should check the hardware and the settings (e.g. what caches are enabled etc.). Post more details, if possible. Have you checked S.M.A.R.T. info from the drives? > And i have a full data-directory backup, when i've stop server, before > start. Good. Have you moved it to a different machine? Otherwise you don't have a backup, just a copy. Tomas
I think, the main problem is that, postgres reads wrong xlog-s file.
I lunched strace to postgres process, and then i grep the log:
# cat /tmp/strace-log | fgrep xlog
5546 stat("pg_xlog", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546 stat("pg_xlog/archive_status", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546 open("pg_xlog/00000001.history", O_RDONLY) = -1 ENOENT (No such file or directory)
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
There is only one file.
But i have a lot of files (i parsed filenames for more readable format):
0000000100000010000000D2 tli: 1 log: 16 seg: 210
0000000100000010000000EE tli: 1 log: 16 seg: 238
0000000100000010000000D3 tli: 1 log: 16 seg: 211
0000000100000010000000E2 tli: 1 log: 16 seg: 226
0000000100000010000000D5 tli: 1 log: 16 seg: 213
0000000100000010000000E8 tli: 1 log: 16 seg: 232
0000000100000010000000F7 tli: 1 log: 16 seg: 247
0000000100000010000000DF tli: 1 log: 16 seg: 223
0000000100000010000000DC tli: 1 log: 16 seg: 220
0000000100000010000000E7 tli: 1 log: 16 seg: 231
0000000100000010000000EA tli: 1 log: 16 seg: 234
0000000100000010000000D1 tli: 1 log: 16 seg: 209
0000000100000010000000DD tli: 1 log: 16 seg: 221
0000000100000010000000F5 tli: 1 log: 16 seg: 245
0000000100000010000000E0 tli: 1 log: 16 seg: 224
0000000100000010000000EB tli: 1 log: 16 seg: 235
0000000100000010000000D0 tli: 1 log: 16 seg: 208
0000000100000010000000F4 tli: 1 log: 16 seg: 244
0000000100000010000000F6 tli: 1 log: 16 seg: 246
0000000100000010000000D7 tli: 1 log: 16 seg: 215
0000000100000010000000DB tli: 1 log: 16 seg: 219
0000000100000010000000E4 tli: 1 log: 16 seg: 228
0000000100000010000000DE tli: 1 log: 16 seg: 222
0000000100000010000000E9 tli: 1 log: 16 seg: 233
0000000100000010000000D4 tli: 1 log: 16 seg: 212
0000000100000010000000D9 tli: 1 log: 16 seg: 217
0000000100000010000000F3 tli: 1 log: 16 seg: 243
0000000100000010000000E5 tli: 1 log: 16 seg: 229
0000000100000010000000DA tli: 1 log: 16 seg: 218
0000000100000010000000EC tli: 1 log: 16 seg: 236
0000000100000010000000D6 tli: 1 log: 16 seg: 214
0000000100000010000000EF tli: 1 log: 16 seg: 239
0000000100000010000000E6 tli: 1 log: 16 seg: 230
0000000100000010000000E1 tli: 1 log: 16 seg: 225
0000000100000010000000F0 tli: 1 log: 16 seg: 240
0000000100000010000000D8 tli: 1 log: 16 seg: 216
0000000100000010000000CF tli: 1 log: 16 seg: 207
0000000100000010000000ED tli: 1 log: 16 seg: 237
0000000100000010000000E3 tli: 1 log: 16 seg: 227
0000000100000010000000F1 tli: 1 log: 16 seg: 241
0000000100000010000000F2 tli: 1 log: 16 seg: 242
0000000100000010000000F8 tli: 1 log: 16 seg: 248
So, main problem i think, is that pg_control file is corrupted (i guess).
So xlogreset-n sais:
bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data
could not change directory to "/root"
pg_control values:
First log file ID after reset: 16
First log file segment after reset: 249
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5592178670599662815
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/7760685
Latest checkpoint's NextOID: 2556003
Latest checkpoint's NextMultiXactId: 3925
Latest checkpoint's NextMultiOffset: 7901
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
And main question, how to force postgres to read this files?
--
С уважением
Олег
I lunched strace to postgres process, and then i grep the log:
# cat /tmp/strace-log | fgrep xlog
5546 stat("pg_xlog", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546 stat("pg_xlog/archive_status", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
5546 open("pg_xlog/00000001.history", O_RDONLY) = -1 ENOENT (No such file or directory)
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
5546 open("pg_xlog/000000010000000F00000052", O_RDONLY) = 4
There is only one file.
But i have a lot of files (i parsed filenames for more readable format):
0000000100000010000000D2 tli: 1 log: 16 seg: 210
0000000100000010000000EE tli: 1 log: 16 seg: 238
0000000100000010000000D3 tli: 1 log: 16 seg: 211
0000000100000010000000E2 tli: 1 log: 16 seg: 226
0000000100000010000000D5 tli: 1 log: 16 seg: 213
0000000100000010000000E8 tli: 1 log: 16 seg: 232
0000000100000010000000F7 tli: 1 log: 16 seg: 247
0000000100000010000000DF tli: 1 log: 16 seg: 223
0000000100000010000000DC tli: 1 log: 16 seg: 220
0000000100000010000000E7 tli: 1 log: 16 seg: 231
0000000100000010000000EA tli: 1 log: 16 seg: 234
0000000100000010000000D1 tli: 1 log: 16 seg: 209
0000000100000010000000DD tli: 1 log: 16 seg: 221
0000000100000010000000F5 tli: 1 log: 16 seg: 245
0000000100000010000000E0 tli: 1 log: 16 seg: 224
0000000100000010000000EB tli: 1 log: 16 seg: 235
0000000100000010000000D0 tli: 1 log: 16 seg: 208
0000000100000010000000F4 tli: 1 log: 16 seg: 244
0000000100000010000000F6 tli: 1 log: 16 seg: 246
0000000100000010000000D7 tli: 1 log: 16 seg: 215
0000000100000010000000DB tli: 1 log: 16 seg: 219
0000000100000010000000E4 tli: 1 log: 16 seg: 228
0000000100000010000000DE tli: 1 log: 16 seg: 222
0000000100000010000000E9 tli: 1 log: 16 seg: 233
0000000100000010000000D4 tli: 1 log: 16 seg: 212
0000000100000010000000D9 tli: 1 log: 16 seg: 217
0000000100000010000000F3 tli: 1 log: 16 seg: 243
0000000100000010000000E5 tli: 1 log: 16 seg: 229
0000000100000010000000DA tli: 1 log: 16 seg: 218
0000000100000010000000EC tli: 1 log: 16 seg: 236
0000000100000010000000D6 tli: 1 log: 16 seg: 214
0000000100000010000000EF tli: 1 log: 16 seg: 239
0000000100000010000000E6 tli: 1 log: 16 seg: 230
0000000100000010000000E1 tli: 1 log: 16 seg: 225
0000000100000010000000F0 tli: 1 log: 16 seg: 240
0000000100000010000000D8 tli: 1 log: 16 seg: 216
0000000100000010000000CF tli: 1 log: 16 seg: 207
0000000100000010000000ED tli: 1 log: 16 seg: 237
0000000100000010000000E3 tli: 1 log: 16 seg: 227
0000000100000010000000F1 tli: 1 log: 16 seg: 241
0000000100000010000000F2 tli: 1 log: 16 seg: 242
0000000100000010000000F8 tli: 1 log: 16 seg: 248
So, main problem i think, is that pg_control file is corrupted (i guess).
So xlogreset-n sais:
bash-3.2$ pg_resetxlog -n /var/lib/pgsql/data
could not change directory to "/root"
pg_control values:
First log file ID after reset: 16
First log file segment after reset: 249
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5592178670599662815
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/7760685
Latest checkpoint's NextOID: 2556003
Latest checkpoint's NextMultiXactId: 3925
Latest checkpoint's NextMultiOffset: 7901
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
And main question, how to force postgres to read this files?
3 декабря 2011 г. 18:10 пользователь Tomas Vondra <tv@fuzzy.cz> написал:
On 2.12.2011 09:16, Oleg Serov wrote:Not sure whether you tried to reindex or not. And what do you mean by
> Hello!
>
> i've don't try to do reindex. There was enough space.
'there was enough space'? For example with ext2 (and ext3/ext4) it was
rather simple to exhaust inodes long before the device was actually
full. What filesystem are you using, anyway?
This seems like a I/O issue, you should check the hardware and the
settings (e.g. what caches are enabled etc.). Post more details, if
possible. Have you checked S.M.A.R.T. info from the drives?Good. Have you moved it to a different machine? Otherwise you don't have
> And i have a full data-directory backup, when i've stop server, before
> start.
a backup, just a copy.
Tomas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
С уважением
Олег