Re: strange buildfarm failures - Mailing list pgsql-hackers
From | Stefan Kaltenbrunner |
---|---|
Subject | Re: strange buildfarm failures |
Date | |
Msg-id | 46302E43.4020509@kaltenbrunner.cc Whole thread Raw |
In response to | Re: strange buildfarm failures (Alvaro Herrera <alvherre@commandprompt.com>) |
List | pgsql-hackers |
Alvaro Herrera wrote: > Tom Lane wrote: >> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes: >>> Stefan Kaltenbrunner wrote: >>>> two of my buildfarm members had different but pretty weird looking >>>> failures lately: >>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-04-25%2002:03:03 >>>> and >>>> >>>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=emu&dt=2007-04-24%2014:35:02 >>>> >>>> any ideas on what might causing those ? > > Just for the record, quagga and emu failures don't seem related to the > report below. They don't crash; the regression.diffs contains data that > suggests that there may be data corruption of some sort. > > INSERT INTO INET_TBL (c, i) VALUES ('192.168.1.2/30', '192.168.1.226'); > ERROR: invalid cidr value: "%{" > > This doesn't seem to make much sense. yeah on further reflection it looks like the failures from emu and quagga seem unrelated to the issue lionfish is experiencing > > >>> lionfish just failed too: >>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2007-04-25%2005:30:09 >> And had a similar failure a few days ago. The curious thing is that >> what we get in the postmaster log is >> >> LOG: server process (PID 23405) was terminated by signal 6: Aborted >> LOG: terminating any other active server processes >> >> You would think SIGABRT would come from an assertion failure, but >> there's no preceding assertion message in the log. The other >> characteristic of these crashes is that *all* of the failing regression >> instances report "terminating connection because of crash of another >> server process", which suggests strongly that the crash was in an >> autovacuum process (if it were bgwriter or stats collector the >> postmaster would've said so). So I think the recent autovac patches >> are at fault. I spent a bit of time trolling for a spot where the code >> might abort() without having printed anything, but didn't find one. > > Hmm. I kept an eye on the buildfarm for a few days, but saw nothing > that could be connected to autovacuum so I neglected it. > > This is the other failure: > > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2007-04-20%2005:30:14 > > It shows the same pattern. I am baffled -- I don't understand how it > can die without reporting the error. I should have mentioned that initially - but I think the failure from 2007-04-20 is not related at all. The failure from 2007-04-20 was very likely caused due to the kernel running totally out of memory (lionfish is a very resource starved box at only 48MB of RAM and 128MB of swap at that time - do we have a recent patch that is increasing memory usage quite a lot?). I immediatly added another 128MB of swap after that and I don't think the failure from yesterday is the same (at least there are no kernel logs that indicate a similiar issue) > > Apparently it crashes rather frequently, so it shouldn't be too > difficult to reproduce on manual runs. If we could get it to run with a > higher debug level, it might prove helpful to further pinpoint the > problem. a manual run of the buildfarm script takes ~4,5 hours on lionfish ;-) > > The core file would be much better obviously (first and foremost to > confirm that it's autovacuum that's crashing ... ) I will see what I can come up with ... Stefan
pgsql-hackers by date: