Re: signal 11 on AIX: 7.4.2 - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: signal 11 on AIX: 7.4.2 |
Date | |
Msg-id | 200406171712.i5HHCAU10882@candle.pha.pa.us Whole thread Raw |
In response to | Re: signal 11 on AIX: 7.4.2 (Andrew Sullivan <ajs@crankycanuck.ca>) |
Responses |
Re: signal 11 on AIX: 7.4.2
|
List | pgsql-hackers |
Andrew Sullivan wrote: > On Mon, May 10, 2004 at 11:59:40AM -0400, Andrew Sullivan wrote: > > > > On the weekend, we ran a set of tests on the offending system to see > > if we could re-create it. We set up the triggering conditions just > > as they'd been when it happened, and alas, no segfault. So although > > this was pretty much regularly reproducible when it actually > > happened, it's now a note to the Journal of Irreproducible Results. > > I hate when that happens. > > I hate it even more when the symptom comes back inexplicably. We had > it again. For the record, here's what gdb says (there are some > high-bit characters in here; dunno how they'll come though in mail): > > (gdb) bt > #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) > #1 0xd0326e1c in getaddrinfo2 () from /usr/lib/libc.a(shr.o) > #2 0xd0327b6c in getaddrinfo () from /usr/lib/libc.a(shr.o) > #3 0x10058668 in WriteControlFile () at xlog.c:2121 > #4 0x101f8f78 in init_execution_state (src=0x202acd8c "", > argOidVect=0x7308710b, nargs=4, rettype=539520040, haspolyarg=-104 '\230') > at functions.c:121 > #5 0x101f9304 in init_sql_fcache (finfo=0xdeadbeef) at functions.c:250 > #6 0x101fa57c in set_tz (tz=0x7308710b <Address 0x7308710b out of bounds>) > at variable.c:261 > #7 0x101fa9a4 in assign_timezone (value=0x202ad398 "", doit=-1 '�', > interactive=-8 '�') at variable.c:584 > #8 0x1000466c in PostgresMain (argc=1, argv=0x2002cf38, username=0x1 "") > at postgres.c:2560 > #9 0x100040b0 in PostgresMain (argc=537240896, argv=0xdeadbeef, > username=0xdeadbeef <Address 0xdeadbeef out of bounds>) at postgres.c:2307 > #10 0x10002530 in exec_parse_message (query_string=0x20000a24 "", > stmt_name=0x5 "", paramTypes=0x0, numParams=0) at postgres.c:1216 > #11 0x10001f84 in exec_simple_query ( > query_string=0x2005a540 '�' <repeats 40 times>) at postgres.c:980 > #12 0x100005f0 in main (argc=1, argv=0xdeadbeef) at main.c:228 Well, the bad news is that this backtrace isn't very useful. It states the query you sent was 40 0xff's, and it says you called assign_timezone, which called set_tz, which then shows it calling init_sql_fcache() (impossible), which later calls WriteControlFile() impossible, which calls getaddrinfo() (impossible). My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. As to the cause, I assume this is not reproducable, right? Is there something unusual about your DNS setup or something that might have changed recently that caused getaddrinfo() to do something new? Of course, the memmove() might be causing the problem and the getaddrinfo is a corrupt part of the backtrace too. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: