Re: Storage Manager crash at mdwrite() - Mailing list pgsql-hackers

From Tareq Aljabban
Subject Re: Storage Manager crash at mdwrite()
Date
Msg-id CAGOe0aLib5pRdHu3Nz06e=C9-k0T317RSHSxT06xMyY1-5tdhQ@mail.gmail.com
Whole thread Raw
In response to Re: Storage Manager crash at mdwrite()  (Greg Stark <stark@mit.edu>)
List pgsql-hackers


On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark <stark@mit.edu> wrote:
On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> There is a lot of difference between those two. In particular, it looks
> like the problem you are seeing is coming from the background writer,
> which is not running during initdb.

The difference that comes to mind is that the postmaster forks. If the
library opens any connections prior to forking and then uses them
after forking that would work at first but it would get confused
quickly once more than one backend tries to use the same connection.
The data being sent would all be mixed together and they would see
responses to requests other processes sent.
You need to ensure that any network connections are opened up *after*
the new processes are forked.

It's true.. it turned out that the reason of the problem is that HDFS has problems when dealing with forked processes.. However, there's no clear suggestion on how to fix this.
I attached gdb to the writer process and got the following backtrace:

#0  0xb76f0430 in __kernel_vsyscall ()
#1  0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82
#2  0x0840ab46 in pg_usleep (microsec=200000) at pgsleep.c:43
#3  0x0829ca9a in BgWriterNap () at bgwriter.c:642
#4  0x0829c882 in BackgroundWriterMain () at bgwriter.c:540
#5  0x0811b0ec in AuxiliaryProcessMain (argc=2, argv=0xbf982308) at bootstrap.c:417
#6  0x082a9af1 in StartChildProcess (type=BgWriterProcess) at postmaster.c:4427
#7  0x082a75de in reaper (postgres_signal_arg=17) at postmaster.c:2390
#8  <signal handler called>
#9  0xb76f0430 in __kernel_vsyscall ()
#10 0xb6b2893d in ___newselect_nocancel () at ../sysdeps/unix/syscall-template.S:82
#11 0x082a5b62 in ServerLoop () at postmaster.c:1391
#12 0x082a53e2 in PostmasterMain (argc=3, argv=0xa525c28) at postmaster.c:1092
#13 0x0822dfa8 in main (argc=3, argv=0xa525c28) at main.c:188

Any ideas?

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Command Triggers patch v18
Next
From: Dimitri Fontaine
Date:
Subject: Re: Command Triggers patch v18