Re: database errors - Mailing list pgsql-hackers

From Michael Brusser
Subject Re: database errors
Date
Msg-id DEEIJKLFNJGBEMBLBAHCKEHBEKAA.michael@synchronicity.com
Whole thread Raw
In response to Re: database errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: database errors
List pgsql-hackers
It looks that "No such file or directory" followed by the abort signal
resulted from manually removing logs. pg_resetxlog took care of this,
but other problems persisted.

I got a copy of the database and installed it on the local partition.
It does seem badly corrupted, these are some hard errors.

pg_dump fails and dumps the core:

pg_dump: ERROR:  XLogFlush: request 0/A971020 is not satisfied --- flushed only to 0/5000050 ... lost synchronization
withserver, resetting
 
connection

looking at the core file:
(dbx) where 15
=>[1] _libc_kill(0x0, 0x6, 0x0, 0xffffffff, 0x2eaf00, 0xff135888), at
0xff19f938 [2] abort(0xff1bc004, 0xff1c3a4c, 0x0, 0x7efefeff, 0x21c08, 0x2404c4), at
0xff13596c [3] elog(0x14, 0x267818, 0x0, 0xa971020, 0x0, 0x5006260), at 0x2407dc [4] XLogFlush(0xffbee908, 0xffbee908,
0x827e0,0x0, 0x0, 0x0), at 0x78530 [5] BufferSync(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x18df2c [6] FlushBufferPool(0x2,
0x1e554,0x0, 0x30000, 0x0, 0xffbeea79), at
 
0x18e5c4 [7] CreateCheckPoint(0x0, 0x0, 0x82c00, 0xff1bc004, 0x2212c, 0x83534), at
0x7d93c [8] BootstrapMain(0x5, 0xffbeec50, 0x10, 0xffbeec50, 0xffbeebc8,
0xffbeebc8), at 0x836bc [9] SSDataBase(0x3, 0x40a24a8a, 0x2e3800, 0x4, 0x2212c, 0x16f504), at
0x172590 [10] ServerLoop(0x5091, 0x2e398c, 0x2e3800, 0xff1c2940, 0xff1bc004,
0xff1c2940), at 0x16f3a0 [11] PostmasterMain(0x1, 0x323ad0, 0x2af000, 0x0, 0x65720000, 0x65720000),
at 0x16ef88 [12] main(0x1, 0xffbef68c, 0xffbef694, 0x2eaf08, 0x0, 0x0), at 0x12864c
======================
(I don't have the debug build at the moment to get more details)


this query fails:
LOG:  query: select count (1) from note_links_aux;
ERROR:  XLogFlush: request 0/A971020 is not satisfied --- flushed only to
0/5006260

drop table fails:
drop table note_links_aux;
ERROR:  getObjectDescription: Rule 17019 does not exist

Are there any pointers as to why this could happen, aside
of potential memory and disk problems?

As for NFS... I know how strong the Postgresql community is advising
against it, but we have to face it: our customers ARE running on NFS
and they WILL be running on NFS.
Is there such a thing as "better" and "worse" NFS versions?
(I made a note of what was said about hard mount vs. soft mount, etc)

Tom, you recommended upgrade from 7.3.2 to 7.3.6
Out next release is using v 7.3.4. (maybe it's not too late to upgrade)
Would v. 7.3.6 provide more protection against problems like this?

Thank you,
Mike


> -----Original Message-----
... ...
> The messages you quote certainly read like a badly corrupted database to
> me.  In the case of a local filesystem I'd be counseling you to start
> running memory and disk diagnostics.  That may still be appropriate
> here, but you had better also reconsider the decision to use NFS.
>
> If you're absolutely set on using NFS, one possibly useful tip is to
> make sure it's a hard mount not a soft mount.  If your systems support
> NFS-over-TCP instead of UDP, that might be worth trying too.
>
> Also I would strongly advise an update to PG 7.3.6.  7.3.2 has serious
> known bugs.
>
>             regards, tom lane
>




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bogus permissions display in 7.4
Next
From: Tom Lane
Date:
Subject: Re: pg_begintypend