Re: pg11.1: dsa_area could not attach to segment - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: pg11.1: dsa_area could not attach to segment |
Date | |
Msg-id | 20190212021428.GA31721@telsasoft.com Whole thread Raw |
In response to | Re: pg11.1: dsa_area could not attach to segment (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: pg11.1: dsa_area could not attach to segment
Re: pg11.1: dsa_area could not attach to segment |
List | pgsql-hackers |
On Tue, Feb 12, 2019 at 10:57:51AM +1100, Thomas Munro wrote: > > On current REL_11_STABLE branch with PANIC level i see this backtrace for failed parallel process: > > > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > > #1 0x00007f3b36983535 in __GI_abort () at abort.c:79 > > #2 0x000055f03ab87a4e in errfinish (dummy=dummy@entry=0) at elog.c:555 > > #3 0x000055f03ab899e0 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x55f03ad86900 "dsa_area could not attachto segment") at elog.c:1376 > > #4 0x000055f03abaa1e2 in get_segment_by_index (area=area@entry=0x55f03cdd6bf0, index=index@entry=7) at dsa.c:1743 > > #5 0x000055f03abaa8ab in get_best_segment (area=area@entry=0x55f03cdd6bf0, npages=npages@entry=8) at dsa.c:1993 > > #6 0x000055f03ababdb8 in dsa_allocate_extended (area=0x55f03cdd6bf0, size=size@entry=32768, flags=flags@entry=0) atdsa.c:701 > > Ok, this contains some clues I didn't have before. Here we see that a > request for a 32KB chunk of memory led to a traversal the linked list > of segments in a given bin, and at some point we followed a link to > segment index number 7, which turned out to be bogus. We tried to > attach to the segment whose handle is stored in > area->control->segment_handles[7] and it was not known to dsm.c. It > wasn't DSM_HANDLE_INVALID, or you'd have got a different error > message. That means that it wasn't a segment that had been freed by > destroy_superblock(), or it'd hold DSM_HANDLE_INVALID. > > Hmm. So perhaps the bin list was corrupted (the segment index was bad I think there is corruption *somewhere* due to never being able to do this (and looks very broken?) (gdb) p segment_map $1 = (dsa_segment_map *) 0x1 (gdb) print segment_map->header Cannot access memory at address 0x11 > Can we please see the stderr output of dsa_dump(area), added just > before the PANIC? Can we see the value of "handle" when the error is > raised, and the directory listing for /dev/shm (assuming Linux) after > the crash (maybe you need restart_after_crash = off to prevent > automatic cleanup)? PANIC: dsa_area could not attach to segment index:8 handle:1076305344 I think it needs to be: | if (segment == NULL) { | LWLockRelease(DSA_AREA_LOCK(area)); | dsa_dump(area); | elog(PANIC, "dsa_area could not attach to segment index:%zd handle:%d", index, handle); | } ..but that triggers recursion: #0 0x00000037b9c32495 in raise () from /lib64/libc.so.6 #1 0x00000037b9c33c75 in abort () from /lib64/libc.so.6 #2 0x0000000000a395c0 in errfinish (dummy=0) at elog.c:567 #3 0x0000000000a3bbf6 in elog_finish (elevel=22, fmt=0xc9faa0 "dsa_area could not attach to segment index:%zd handle:%d")at elog.c:1389 #4 0x0000000000a6b97a in get_segment_by_index (area=0x1659200, index=8) at dsa.c:1747 #5 0x0000000000a6a3dc in dsa_dump (area=0x1659200) at dsa.c:1093 #6 0x0000000000a6b946 in get_segment_by_index (area=0x1659200, index=8) at dsa.c:1744 [...] #717 0x0000000000a6a3dc in dsa_dump (area=0x1659200) at dsa.c:1093 #718 0x0000000000a6b946 in get_segment_by_index (area=0x1659200, index=8) at dsa.c:1744 #719 0x0000000000a6a3dc in dsa_dump (area=0x1659200) at dsa.c:1093 #720 0x0000000000a6b946 in get_segment_by_index (area=0x1659200, index=8) at dsa.c:1744 #721 0x0000000000a6c150 in get_best_segment (area=0x1659200, npages=8) at dsa.c:1997 #722 0x0000000000a69680 in dsa_allocate_extended (area=0x1659200, size=32768, flags=0) at dsa.c:701 #723 0x00000000007052eb in ExecParallelHashTupleAlloc (hashtable=0x7f56ff9b40e8, size=112, shared=0x7fffda8c36a0) at nodeHash.c:2837 #724 0x00000000007034f3 in ExecParallelHashTableInsert (hashtable=0x7f56ff9b40e8, slot=0x1608948, hashvalue=2677813320) atnodeHash.c:1693 #725 0x0000000000700ba3 in MultiExecParallelHash (node=0x1607f40) at nodeHash.c:288 #726 0x00000000007007ce in MultiExecHash (node=0x1607f40) at nodeHash.c:112 #727 0x00000000006e94d7 in MultiExecProcNode (node=0x1607f40) at execProcnode.c:501 [...] [pryzbyj@telsasoft-db postgresql]$ ls -lt /dev/shm |head total 353056 -rw-------. 1 pryzbyj pryzbyj 1048576 Feb 11 13:51 PostgreSQL.821164732 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 11 13:51 PostgreSQL.1990121974 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 11 12:54 PostgreSQL.847060172 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 11 12:48 PostgreSQL.1369859581 -rw-------. 1 postgres postgres 21328 Feb 10 21:00 PostgreSQL.1155375187 -rw-------. 1 pryzbyj pryzbyj 196864 Feb 10 18:52 PostgreSQL.2136009186 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 10 18:49 PostgreSQL.1648026537 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 10 18:49 PostgreSQL.827867206 -rw-------. 1 pryzbyj pryzbyj 2097152 Feb 10 18:49 PostgreSQL.1684837530 Justin
pgsql-hackers by date: