Re: developer.pgadmin.org/nagios.pgadmin.org - Diskfailure - Mailing list pgsql-www

From Dave Page
Subject Re: developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
Date
Msg-id E7F85A1B5FF8D44C8A1AF6885BC9A0E401388168@ratbert.vale-housing.co.uk
Whole thread Raw
Responses Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org  (Raphaël Enrici <blacknoz@club-internet.fr>)
List pgsql-www

> -----Original Message-----
> From: Jeff MacDonald [mailto:jam@zoidtechnologies.com]
> Sent: 12 May 2006 23:19
> To: Dave Page
> Cc: Jeff MacDonald
> Subject: Re: [pgsql-www]
> developer.pgadmin.org/nagios.pgadmin.org - Diskfailure
>
> On Fri, 2006-05-12 at 22:47 +0100, Dave Page wrote:
> > The machine hosting the developer.pgadmin.org and
> nagios.pgadmin.org
> > vservers is currently having serious filesystem problems, which are
> > causing disk intensive operations (like rsync, tar) to segfault for
> > currently unknown reasons.
>
> do a memory test, swap as needed, see if that solves the
> problem..

I'll try just replacing it - I have some unopened sticks for that mobo.
FWIW, a reboot with a forced fsck found no errors at all and the box is
currently working OK, but I have now found errors similar to the
following:

May 12 21:11:29 barbas rsyncd[32134]: rsync: writefd_unbuffered failed
to write 4 bytes: phase "send_file_entry" [sender]: Broken pipe (32)
May 12 21:11:29 barbas rsyncd[32134]: rsync error: error in rsync
protocol data stream (code 12) at io.c(1126) [sender]
May 12 22:13:52 barbas kernel:  kernel BUG at page_alloc.c:142!
May 12 22:13:52 barbas kernel: invalid operand: 0000
May 12 22:13:52 barbas kernel: CPU:    1
May 12 22:13:52 barbas kernel: EIP:    0010:[<c013cec0>]    Not tainted
May 12 22:13:52 barbas kernel: EFLAGS: 00010286
May 12 22:13:52 barbas kernel: eax: d9e18100   ebx: c262c140   ecx:
c262c140   edx: 00000000
May 12 22:13:52 barbas kernel: esi: c262c140   edi: 00000000   ebp:
00000000   esp: d50d5edc
May 12 22:13:52 barbas kernel: ds: 0018   es: 0018   ss: 0018
May 12 22:13:52 barbas kernel: Process rsync (pid: 32141,
stackpage=d50d5000)
May 12 22:13:52 barbas kernel: Stack: d50d5ee8 c0133ab0 00001000
c262c140 e3a59d44 00006000 c01348e9 00000000
May 12 22:13:52 barbas kernel:        00000000 00001000 c262c140
e3a59d44 00000000 c013423d d50d5f7c c262c140
May 12 22:13:52 barbas kernel:        00000000 00001000 00001000
00000001 00000000 0000013b e3a59c80 c01347f0
May 12 22:13:52 barbas kernel: Call Trace:    [<c0133ab0>] [<c01348e9>]
[<c013423d>] [<c01347f0>] [<c01347f0>]
May 12 22:13:52 barbas kernel:   [<c0134a2f>] [<c01347f0>] [<c0145a50>]
[<c0108fdf>]
May 12 22:13:52 barbas kernel:
May 12 22:13:52 barbas kernel: Code: 0f 0b 8e 00 6b ba 37 c0 e9 ba fd ff
ff 8b 69 60 85 ed 0f 85

Could well be a duff stick I guess, given where it died.

> the quicker solution may be to just put the backup
> machine into production rather than running exhaustive memory tests.

Yes, well it was going into it anyway to get it out of the current 3U
chassis and into a 1U one with full OOB management. The only problem is
that I'm still awaiting delivery of a cable for the external tape drive
in the rack so I can only do rsync/scp backups until that arrives.

Regards, Dave.

pgsql-www by date:

Previous
From: "Dave Page"
Date:
Subject: developer.pgadmin.org/nagios.pgadmin.org - Disk failure
Next
From: Travis Hein
Date:
Subject: Re: developer.pgadmin.org/nagios.pgadmin.org - Disk failure