Thread: Segmentation fault with core dump

Segmentation fault with core dump

From
Glauco Torres
Date:
Hi group,

I'm using PG 9.6.6 and I have a problem with seg fault from every few days to up to two week,
this server is a replica, the other servers (master, and other slaves) do not have this problem.

I could not identify the problem, so I do not know what triggers the problem, however I have the PostgreSQL log and the core-dump generated by the problem.

The server has 60 GB RAM, PG is configured:

shared_buffers = 14GB
work_mem = 192MB


Below are the relevant details.

$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)


postgres=# select version();
                                                 version                                                 
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit
(1 row)


# cat postgresql-Mon.log | grep 'was terminated by signal 11: Segmentation fault'
2018-01-08 01:51:27.909 -03 [85039]: [102-1] user=,db=,app=,client= LOG:  server process (PID 40286) was terminated by signal 11: Segmentation fault
2018-01-08 05:09:51.929 -03 [85039]: [107-1] user=,db=,app=,client= LOG:  server process (PID 62427) was terminated by signal 11: Segmentation fault
2018-01-08 06:33:46.840 -03 [85039]: [112-1] user=,db=,app=,client= LOG:  server process (PID 72156) was terminated by signal 11: Segmentation fault
2018-01-08 13:59:37.422 -03 [119484]: [4-1] user=,db=,app=,client= LOG:  server process (PID 124190) was terminated by signal 11: Segmentation fault
2018-01-08 14:09:41.590 -03 [119484]: [9-1] user=,db=,app=,client= LOG:  checkpointer process (PID 124528) was terminated by signal 11: Segmentation fault
2018-01-08 15:18:06.379 -03 [119484]: [13-1] user=,db=,app=,client= LOG:  server process (PID 129026) was terminated by signal 11: Segmentation fault
2018-01-08 15:23:15.586 -03 [119484]: [18-1] user=,db=,app=,client= LOG:  server process (PID 6528) was terminated by signal 11: Segmentation fault
2018-01-08 15:55:32.029 -03 [119484]: [23-1] user=,db=,app=,client= LOG:  server process (PID 8762) was terminated by signal 11: Segmentation fault
2018-01-08 20:52:16.344 -03 [14804]: [5-1] user=,db=,app=,client= LOG:  checkpointer process (PID 14828) was terminated by signal 11: Segmentation fault

(gdb) bt
#0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c, pb=pb@entry=0x1be06d2d06644) at bufmgr.c:4137
#1  0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006", b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>, c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20 <ckpt_buforder_comparator>)
    at qsort.c:107
#2  0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380, n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20 <ckpt_buforder_comparator>) at qsort.c:157
#3  0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>, n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20 <ckpt_buforder_comparator>) at qsort.c:203
#4  0x00000000006a81cf in BufferSync (flags=flags@entry=128) at bufmgr.c:1863
#5  0x00000000006a8477 in CheckPointBuffers (flags=flags@entry=128) at bufmgr.c:2578
#6  0x00000000004dd781 in CheckPointGuts (checkPointRedo=<optimized out>, flags=<optimized out>) at xlog.c:8698
#7  0x00000000004e9faf in CreateRestartPoint (flags=<optimized out>) at xlog.c:8856
#8  0x000000000066977c in CheckpointerMain () at checkpointer.c:490
#9  0x00000000004f2820 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffd8bac2b80) at bootstrap.c:429
#10 0x0000000000673330 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5252
#11 0x0000000000674b1f in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4949
#12 <signal handler called>
#13 0x00007f6fc75a0b83 in __select_nocancel () from /lib64/libc.so.6
#14 0x000000000046ef32 in ServerLoop () at postmaster.c:1683
#15 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x27ca210) at postmaster.c:1327
#16 0x000000000047053e in main (argc=3, argv=0x27ca210) at main.c:228



#0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c, pb=pb@entry=0x1be06d2d06644) at bufmgr.c:4137
        a = 0x7f6fa9ef4b2c
        b = 0x1be06d2d06644
#1  0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006", b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>, c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20 <ckpt_buforder_comparator>)
    at qsort.c:107
No locals.
#2  0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380, n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20 <ckpt_buforder_comparator>) at qsort.c:157
        d = 350293923535640
        pa = <optimized out>
        pb = <optimized out>
        pc = <optimized out>
        pd = <optimized out>
        pl = 0x7f6fa9ef4b2c "\177\006"
        pm = 0x7f6fa9f06174 "\177\006"
        pn = 0xa7428f0f82428 <Address 0xa7428f0f82428 out of bounds>
        d1 = <optimized out>
        d2 = <optimized out>
        r = <optimized out>
        swaptype = 2
        presorted = 0
#3  0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>, n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20 <ckpt_buforder_comparator>) at qsort.c:203
        pa = <optimized out>
        pb = 0x7f6fa9f3a49c "\177\006"
        pc = <optimized out>
        pd = 0x7f6faa0c8840 "\177\006"
        pl = <optimized out>
        pm = <optimized out>
        pn = 0x7f6faa0c8854 "\177\006"
        d1 = <optimized out>
        d2 = 1631160
        r = <optimized out>
        swaptype = 2
        presorted = 0
#4  0x00000000006a81cf in BufferSync (flags=flags@entry=128) at bufmgr.c:1863
        buf_state = <optimized out>
        buf_id = 1835008
        num_to_scan = 111473
        num_spaces = <optimized out>
        num_processed = <optimized out>
        num_written = <optimized out>
        per_ts_stat = 0x0
        last_tsid = <optimized out>
        ts_heap = 0x7f6c266a8340
        i = <optimized out>
        mask = -2139095040
wb_context = {max_pending = 0xc287ec <checkpoint_flush_after>, nr_pending = 0, pending_writebacks = {{tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606246}, forkNum = MAIN_FORKNUM, blockNum = 428}}, {tag = {
                rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606246}, forkNum = MAIN_FORKNUM, blockNum = 429}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM,
                blockNum = 54}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 58}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252},
                forkNum = MAIN_FORKNUM, blockNum = 81}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 98}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060,
                  relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 158}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 160}}, {tag = {rnode = {spcNode = 1663,
                  dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 173}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 177}}, {tag = {rnode = {
                  spcNode = 1663, dbNode = 69060, relNode = 412606252}, forkNum = MAIN_FORKNUM, blockNum = 191}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606257}, forkNum = MAIN_FORKNUM, blockNum = 24}}, {
              tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606257}, forkNum = MAIN_FORKNUM, blockNum = 31}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 412606257}, forkNum = MAIN_FORKNUM,
                blockNum = 36}}, {tag = {rnode = {spcNode = 1663, dbNode = 8156905, relNode = 0}, forkNum = VISIBILITYMAP_FORKNUM, blockNum = 1}}, {tag = {rnode = {spcNode = 1663, dbNode = 69060, relNode = 20971520},
                forkNum = 14828, blockNum = 825242228}}, {tag = {rnode = {spcNode = 825240888, dbNode = 540553261, relNode = 893005874}, forkNum = 942815793, blockNum = 909194542}}, {tag = {rnode = {spcNode = 858795296,
                  dbNode = 875649824, relNode = 1563963960}, forkNum = 844832826, blockNum = 825046578}}, {tag = {rnode = {spcNode = 1937055837, dbNode = 742224485, relNode = 742220388}, forkNum = 1030778977,
                blockNum = 1768710956}}, {tag = {rnode = {spcNode = 1031040613, dbNode = 1196379168, relNode = 1914708026}, forkNum = 1635021669, blockNum = 8156905}}, {tag = {rnode = {spcNode = 0, dbNode = 2, relNode = 1},
                forkNum = 1920409658, blockNum = 543519855}}, {tag = {rnode = {spcNode = 6684672, dbNode = 14828, relNode = 825242228}, forkNum = 825240888, blockNum = 540553261}}, {tag = {rnode = {spcNode = 893005874,
                  dbNode = 943012401, relNode = 942945582}, forkNum = 858795296, blockNum = 875649824}}, {tag = {rnode = {spcNode = 1563963960, dbNode = 844832826, relNode = 825047346}, forkNum = 1937055837,
                blockNum = 742224485}}, {tag = {rnode = {spcNode = 742220388, dbNode = 1030778977, relNode = 1768710956}, forkNum = 1031040613, blockNum = 1196379168}}, {tag = {rnode = {spcNode = 1914708026, dbNode = 1635021669,
                  relNode = 1869640818}, forkNum = 544501353, blockNum = 1918989427}}, {tag = {rnode = {spcNode = 1735289204, dbNode = 1769218106, relNode = 1862952301}, forkNum = 1030513012, blockNum = 775501362}}, {tag = {
                rnode = {spcNode = 540096568, dbNode = 1931492211, relNode = 543387257}, forkNum = 1701603686, blockNum = 859323763}}, {tag = {rnode = {spcNode = 1814047799, dbNode = 1701277295, relNode = 809333875},
                forkNum = 875573294, blockNum = 539783968}}, {tag = {rnode = {spcNode = 1919252065, dbNode = 1030055777, relNode = 808463920}, forkNum = 997400624, blockNum = 1936286752}}, {tag = {rnode = {spcNode = 1668178292,
                  dbNode = 809057637, relNode = 808794162}, forkNum = 742550304, blockNum = 1953719584}}, {tag = {rnode = {spcNode = 1952542057, dbNode = 808533349, relNode = 960049456}, forkNum = 1114316856, blockNum = 8476938}},
            {tag = {rnode = {spcNode = 0, dbNode = 8477036, relNode = 0}, forkNum = -951086535, blockNum = 32623}}, {tag = {rnode = {spcNode = 6, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 4294967295, relNode = 4294967295}, forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode = 2, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = -1951655728, blockNum = 32765}}, {tag = {rnode = {spcNode = 2343311504, dbNode = 32765, relNode = 32}, forkNum = MAIN_FORKNUM, blockNum = 8459196}}, {tag = {
                rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 8156905, dbNode = 0, relNode = 2}, forkNum = FSM_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 17235968, relNode = 14828}, forkNum = 825242228, blockNum = 825240888}}, {tag = {rnode = {spcNode = 540553261, dbNode = 893005874, relNode = 942815793}, forkNum = 909194542,
                blockNum = 858795296}}, {tag = {rnode = {spcNode = 875649824, dbNode = 1563963960, relNode = 844832826}, forkNum = 825046834, blockNum = 1937055837}}, {tag = {rnode = {spcNode = 742224485, dbNode = 742220388,
                  relNode = 1030778977}, forkNum = 1768710956, blockNum = 1031040613}}, {tag = {rnode = {spcNode = 1196379168, dbNode = 1914708026, relNode = 1987011429}, forkNum = 544830053, blockNum = 1953719666}}, {tag = {
                rnode = {spcNode = 544502369, dbNode = 1852403568, relNode = 1952522356}, forkNum = 1110782752, blockNum = 959983411}}, {tag = {rnode = {spcNode = 825312577, dbNode = 839528498, relNode = 758657328},
                forkNum = 808268080, blockNum = 808591416}}, {tag = {rnode = {spcNode = 976303418, dbNode = 892221490, relNode = 757085745}, forkNum = 1528836912, blockNum = 842544177}}, {tag = {rnode = {spcNode = 540695864,
                  dbNode = 942813787, relNode = 542978349}, forkNum = 1919251317, blockNum = 1650732093}}, {tag = {rnode = {spcNode = 1885416509, dbNode = 1663843696, relNode = 1852139884}, forkNum = 1142963572,
                blockNum = 1229018181}}, {tag = {rnode = {spcNode = 538982988, dbNode = 1953718636, relNode = 1836016416}, forkNum = 1952803952, blockNum = 1948279909}}, {tag = {rnode = {spcNode = 1936613746, dbNode = 1769235297,
                  relNode = 1998614127}, forkNum = 1629516641, blockNum = 1869357172}}, {tag = {rnode = {spcNode = 1769218151, dbNode = 840983917, relNode = 758657328}, forkNum = 808268080, blockNum = 808591416}}, {tag = {rnode = {
                  spcNode = 976303418, dbNode = 858667058, relNode = 909523250}, forkNum = 171126829, blockNum = 10}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM,
                blockNum = 0}} <repeats 13 times>, {tag = {rnode = {spcNode = 6987421, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 2343312048}}, {tag = {rnode = {spcNode = 32765, dbNode = 0, relNode = 0},
                forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 15}, forkNum = MAIN_FORKNUM, blockNum = 3957080064}}, {tag = {rnode = {spcNode = 13747, dbNode = 6991741, relNode = 0},
                forkNum = 15, blockNum = 0}}, {tag = {rnode = {spcNode = 6985347, dbNode = 0, relNode = 707}, forkNum = MAIN_FORKNUM, blockNum = 6992616}}, {tag = {rnode = {spcNode = 0, dbNode = 15, relNode = 0}, forkNum = 15,
                blockNum = 0}}, {tag = {rnode = {spcNode = 2343313312, dbNode = 32765, relNode = 2343314560}, forkNum = 32765, blockNum = 1}}, {tag = {rnode = {spcNode = 0, dbNode = 6992908, relNode = 0}, forkNum = 2019518320,
                blockNum = 6778732}}, {tag = {rnode = {spcNode = 808464432, dbNode = 858796080, relNode = 808464432}, forkNum = 876754227, blockNum = 808464432}}, {tag = {rnode = {spcNode = 909193264, dbNode = 0, relNode = 0},
                forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
                forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 1}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 2343313776, relNode = 32765},
                forkNum = -949801968, blockNum = 32623}}, {tag = {rnode = {spcNode = 8449408, dbNode = 0, relNode = 3344033888}, forkNum = 32623, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
                forkNum = -1951653520, blockNum = 32765}}, {tag = {rnode = {spcNode = 2343313760, dbNode = 32765, relNode = 8449415}, forkNum = MAIN_FORKNUM, blockNum = 4}}, {tag = {rnode = {spcNode = 0, dbNode = 8449408,
                  relNode = 0}, forkNum = -951086535, blockNum = 32623}}, {tag = {rnode = {spcNode = 8459912, dbNode = 0, relNode = 42329571}, forkNum = MAIN_FORKNUM, blockNum = 8459902}}, {tag = {rnode = {spcNode = 0,
                  dbNode = 3343880761, relNode = 32623}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 3343886648, dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = -1951654688, blockNum = 32765}}, {tag = {rnode = {spcNode = 3343886648, dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 1}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = -4, blockNum = 4294967295}}, {tag = {rnode = {spcNode = 0, dbNode = 16, relNode = 6796}, forkNum = MAIN_FORKNUM, blockNum = 3344033888}}, {tag = {rnode = {
                  spcNode = 32623, dbNode = 9616944, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 32623}}, {tag = {rnode = {spcNode = 0, dbNode = 32765, relNode = 0}, forkNum = 32765, blockNum = 2343313700}}, {tag = {rnode = {
                  spcNode = 32765, dbNode = 0, relNode = 0}, forkNum = 4, blockNum = 32623}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 4294967295}}, {tag = {rnode = {
                  spcNode = 4294967295, dbNode = 3343886648, relNode = 32623}, forkNum = FSM_FORKNUM, blockNum = 32623}}, {tag = {rnode = {spcNode = 9616896, dbNode = 0, relNode = 3344033888}, forkNum = 32623, blockNum = 0}}, {
              tag = {rnode = {spcNode = 0, dbNode = 8449408, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 25, dbNode = 32765, relNode = 8449415}, forkNum = MAIN_FORKNUM, blockNum = 2}}, {tag = {
                rnode = {spcNode = 0, dbNode = 9616896, relNode = 0}, forkNum = -951086535, blockNum = 32623}}, {tag = {rnode = {spcNode = 9434791, dbNode = 0, relNode = 3343880761}, forkNum = 32623, blockNum = 28}}, {tag = {
                rnode = {spcNode = 0, dbNode = 2343312880, relNode = 32765}, forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode = 3343886648, dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {
              tag = {rnode = {spcNode = 4294967295, dbNode = 40, relNode = 48}, forkNum = -1951652928, blockNum = 32765}}, {tag = {rnode = {spcNode = 2343314176, dbNode = 32765, relNode = 2343314208}, forkNum = 32765,
                blockNum = 9434794}}, {tag = {rnode = {spcNode = 0, dbNode = 3, relNode = 0}, forkNum = 9434791, blockNum = 0}}, {tag = {rnode = {spcNode = 3343880761, dbNode = 32623, relNode = 58}, forkNum = MAIN_FORKNUM,
                blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 48, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = InvalidForkNumber,
                blockNum = 2343314320}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343314304, relNode = 32765}, forkNum = 9564263, blockNum = 0}}, {tag = {rnode = {spcNode = 2343313056, dbNode = 32765, relNode = 3343886648},
                forkNum = 32623, blockNum = 3343880761}}, {tag = {rnode = {spcNode = 32623, dbNode = 3343886648, relNode = 32623}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 4294967293, dbNode = 4294967295,
                  relNode = 0}, forkNum = 10, blockNum = 229}}, {tag = {rnode = {spcNode = 0, dbNode = 3343886648, relNode = 32623}, forkNum = 32, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0},
                forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343314149, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343314464, dbNode = 32765,
                  relNode = 2343314448}, forkNum = 32765, blockNum = 9734857}}, {tag = {rnode = {spcNode = 0, dbNode = 8448198, relNode = 0}, forkNum = 9734855, blockNum = 0}}, {tag = {rnode = {spcNode = 3343880761,
                  dbNode = 32623, relNode = 9616944}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 32623, dbNode = 0, relNode = 0}, forkNum = 9434791, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0,
                  relNode = 3}, forkNum = MAIN_FORKNUM, blockNum = 9434794}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343313296, dbNode = 32765,
                  relNode = 2343313232}, forkNum = 32765, blockNum = 0}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343314608, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 32765}}, {tag = {rnode = {spcNode = 0,
                  dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 4, relNode = 32623}, forkNum = 32, blockNum = 48}}, {tag = {rnode = {spcNode = 0, dbNode = 32765,
                  relNode = 0}, forkNum = 32765, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = INIT_FORKNUM, blockNum = 4294967295}}, {tag = {rnode = {spcNode = 3343886648, dbNode = 32623,
                  relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 4294967295, relNode = 4294967295}, forkNum = -951080648, blockNum = 32623}}, {tag = {rnode = {spcNode = 0,
                  dbNode = 32623, relNode = 32}, forkNum = 48, blockNum = 2343314848}}, {tag = {rnode = {spcNode = 32765, dbNode = 0, relNode = 0}, forkNum = 9734855, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0,
                  relNode = 6}, forkNum = 32623, blockNum = 9734860}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 32765,
                  relNode = 2343314533}, forkNum = 32765, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0,
                  relNode = 4294967295}, forkNum = InvalidForkNumber, blockNum = 3343886648}}, {tag = {rnode = {spcNode = 32623, dbNode = 0, relNode = 32623}, forkNum = 16, blockNum = 48}}, {tag = {rnode = {spcNode = 2343315152,
                  dbNode = 32765, relNode = 2343314944}, forkNum = 32765, blockNum = 9434791}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 9434794,
                  dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 4294967295, relNode = 4294967295}, forkNum = -1951652320, blockNum = 32765}}, {tag = {rnode = {
                  spcNode = 2343314960, dbNode = 32765, relNode = 41731220}, forkNum = MAIN_FORKNUM, blockNum = 9795134}}, {tag = {rnode = {spcNode = 0, dbNode = 41731192, relNode = 0}, forkNum = -951086535, blockNum = 32623}}, {
              tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 4}, forkNum = 10, blockNum = 16}}, {tag = {rnode = {spcNode = 48, dbNode = 2343315296, relNode = 32765}, forkNum = -1951652208, blockNum = 32765}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 3}}, {tag = {rnode = {spcNode = 32623, dbNode = 3343886648, relNode = 32623}, forkNum = -1951653488, blockNum = 32765}}, {tag = {rnode = {
                  spcNode = 2343313744, dbNode = 32765, relNode = 4294967295}, forkNum = InvalidForkNumber, blockNum = 3343886648}}, {tag = {rnode = {spcNode = 32623, dbNode = 0, relNode = 32623}, forkNum = MAIN_FORKNUM,
                blockNum = 48}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = 32765, blockNum = 9734855}}, {tag = {rnode = {spcNode = 0, dbNode = 32, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {
              tag = {rnode = {spcNode = 3343886648, dbNode = 32623, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {
                rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 4294967295}}, {tag = {rnode = {spcNode = 4294967295, dbNode = 3343886648, relNode = 32623}, forkNum = 8, blockNum = 16}}, {tag = {
                rnode = {spcNode = 2309886240, dbNode = 1127760177, relNode = 2343313776}, forkNum = 32765, blockNum = 2343314384}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343313776, relNode = 32765}, forkNum = 8449408,
                blockNum = 0}}, {tag = {rnode = {spcNode = 2343314152, dbNode = 32765, relNode = 1}, forkNum = MAIN_FORKNUM, blockNum = 1023}}, {tag = {rnode = {spcNode = 0, dbNode = 2343314384, relNode = 32765},
                forkNum = -950277931, blockNum = 32623}}, {tag = {rnode = {spcNode = 4222451713, dbNode = 0, relNode = 2343314384}, forkNum = 32765, blockNum = 2343314384}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343314384,
                  relNode = 32765}, forkNum = -1951652912, blockNum = 32765}}, {tag = {rnode = {spcNode = 2343314409, dbNode = 32765, relNode = 2343315407}, forkNum = 32765, blockNum = 2343314384}}, {tag = {rnode = {
                  spcNode = 32765, dbNode = 2343315407, relNode = 32765}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = 12, blockNum = 4}}, {tag = {rnode = {spcNode = 9616896, dbNode = 0, relNode = 2343305216}, forkNum = 942833661, blockNum = 0}}, {tag = {rnode = {spcNode = 0,
                  dbNode = 2343314512, relNode = 32765}, forkNum = -1951652784, blockNum = 32765}}, {tag = {rnode = {spcNode = 12, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 8}}, {tag = {rnode = {spcNode = 48,
                  dbNode = 2343315808, relNode = 32765}, forkNum = InvalidForkNumber, blockNum = 808615933}}, {tag = {rnode = {spcNode = 2343314048, dbNode = 32765, relNode = 2343314576}, forkNum = 32765, blockNum = 3347493952}}, {
              tag = {rnode = {spcNode = 32623, dbNode = 0, relNode = 0}, forkNum = -947468368, blockNum = 32623}}, {tag = {rnode = {spcNode = 1951653217, dbNode = 4294934530, relNode = 2343314080}, forkNum = 32765,
  blockNum = 2343314079}}, {tag = {rnode = {spcNode = 32765, dbNode = 2343314624, relNode = 32765}, forkNum = 12, blockNum = 0}}, {tag = {rnode = {spcNode = 9616896, dbNode = 0, relNode = 2343314408},
                forkNum = 32765, blockNum = 1}}, {tag = {rnode = {spcNode = 0, dbNode = 2343314096, relNode = 32765}, forkNum = 6796, blockNum = 0}}, {tag = {rnode = {spcNode = 155648, dbNode = 0, relNode = 526916352},
                forkNum = 32620, blockNum = 6}}, {tag = {rnode = {spcNode = 0, dbNode = 3347498848, relNode = 32623}, forkNum = -947468448, blockNum = 32623}}, {tag = {rnode = {spcNode = 1, dbNode = 0, relNode = 1951653025},
                forkNum = -32766, blockNum = 2052}}, {tag = {rnode = {spcNode = 0, dbNode = 513, relNode = 0}, forkNum = 64, blockNum = 0}}, {tag = {rnode = {spcNode = 561, dbNode = 155, relNode = 2343314272}, forkNum = 32765,
                blockNum = 8}}, {tag = {rnode = {spcNode = 0, dbNode = 1, relNode = 0}, forkNum = 118, blockNum = 120}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 2343314271}, forkNum = 32765, blockNum = 1}}, {tag = {
                rnode = {spcNode = 0, dbNode = 118, relNode = 120}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 124, dbNode = 32765, relNode = 6987421}, forkNum = MAIN_FORKNUM, blockNum = 4036731840}}, {
              tag = {rnode = {spcNode = 13747, dbNode = 2343314384, relNode = 32765}, forkNum = 124, blockNum = 0}}, {tag = {rnode = {spcNode = 384, dbNode = 0, relNode = 6}, forkNum = MAIN_FORKNUM, blockNum = 3347498848}}, {
              tag = {rnode = {spcNode = 32623, dbNode = 42324488, relNode = 0}, forkNum = 42324506, blockNum = 0}}, {tag = {rnode = {spcNode = 4, dbNode = 0, relNode = 3344354262}, forkNum = 32623, blockNum = 42324416}}, {tag = {
                rnode = {spcNode = 0, dbNode = 42324464, relNode = 0}, forkNum = -192, blockNum = 4294967295}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 3}, forkNum = MAIN_FORKNUM, blockNum = 3344353169}}, {tag = {
                rnode = {spcNode = 32623, dbNode = 128, relNode = 0}, forkNum = INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 128, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 4029030200}}, {tag = {
                rnode = {spcNode = 13747, dbNode = 3, relNode = 0}, forkNum = INIT_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 3344353005, dbNode = 32623, relNode = 41743440}, forkNum = MAIN_FORKNUM, blockNum = 6985401}}, {
              tag = {rnode = {spcNode = 0, dbNode = 42324416, relNode = 0}, forkNum = 4895616, blockNum = 0}}, {tag = {rnode = {spcNode = 41743440, dbNode = 0, relNode = 6985401}, forkNum = MAIN_FORKNUM, blockNum = 42324416}}, {
              tag = {rnode = {spcNode = 0, dbNode = 6876528, relNode = 0}, forkNum = 41733255, blockNum = 0}}, {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = MAIN_FORKNUM, blockNum = 0}}, {tag = {rnode = {spcNode = 2343314816, dbNode = 4, relNode = 3343787445}, forkNum = 32623, blockNum = 41680896}}, {tag = {rnode = {
                  spcNode = 0, dbNode = 0, relNode = 0}, forkNum = 41867936, blockNum = 0}}...}}
#5  0x00000000006a8477 in CheckPointBuffers (flags=flags@entry=128) at bufmgr.c:2578
No locals.
#6  0x00000000004dd781 in CheckPointGuts (checkPointRedo=<optimized out>, flags=<optimized out>) at xlog.c:8698
No locals.
#7  0x00000000004e9faf in CreateRestartPoint (flags=<optimized out>) at xlog.c:8856
        lastCheckPointRecPtr = <optimized out>
        lastCheckPointEndPtr = <optimized out>
        lastCheckPoint = <optimized out>
        PriorRedoPtr = <optimized out>
        xtime = <optimized out>
        __func__ = "CreateRestartPoint"
#8  0x000000000066977c in CheckpointerMain () at checkpointer.c:490
        ckpt_performed = 0 '\000'
        do_restartpoint = 1 '\001'
        flags = 128
        do_checkpoint = <optimized out>
        now = 1515455518
        elapsed_secs = 300
        cur_timeout = <optimized out>
        rc = <optimized out>
        local_sigjmp_buf = {{__jmpbuf = {140726946769872, -6702958939962280073, 2, 8447397, 0, 41868768, -6702958940010514569, 6701736465640912759}, __mask_was_saved = 1, __saved_mask = {__val = {18446744066192964103,
                140726946769792, 2, 8447397, 0, 41868768, 6968122, 0, 0, 0, 0, 0, 4, 8, 6700557, 1}}}}
        checkpointer_context = 0x27cbb88
        __func__ = "CheckpointerMain"
#9  0x00000000004f2820 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffd8bac2b80) at bootstrap.c:429
        progname = 0x80e5a5 "postgres"
        flag = <optimized out>
        userDoption = 0x0
        __func__ = "AuxiliaryProcessMain"
#10 0x0000000000673330 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5252
        pid = <optimized out>
        av = {0x80e5a5 "postgres", 0x7ffd8bac2bd0 "-x4", 0x0, 0xffec105a240ead00 <Address 0xffec105a240ead00 out of bounds>, 0x1 <Address 0x1 out of bounds>, 0x39e6 <Address 0x39e6 out of bounds>, 0x7ffd8bac2bf4 "",
          0x27ede00 "\260\215\301", 0x7f6fc9248800 "", 0x0}
        ac = 2
        typebuf = "-x4\000\000\000\000\000vCg\000\000\000\000\000\360>N\307o\177\000\000\065\233N\307o\177\000"
#11 0x0000000000674b1f in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4949
        save_errno = 0
        __func__ = "sigusr1_handler"
#12 <signal handler called>
No symbol table info available.
#13 0x00007f6fc75a0b83 in __select_nocancel () from /lib64/libc.so.6
No symbol table info available.
#14 0x000000000046ef32 in ServerLoop () at postmaster.c:1683
        timeout = {tv_sec = 59, tv_usec = 999708}
        rmask = {fds_bits = {120, 0 <repeats 15 times>}}
        selres = <optimized out>
        now = <optimized out>
        readmask = {fds_bits = {120, 0 <repeats 15 times>}}
        last_lockfile_recheck_time = 1515438348
        last_touch_time = 1515438348
        __func__ = "ServerLoop"
#15 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x27ca210) at postmaster.c:1327
        opt = <optimized out>
        status = <optimized out>
        userDoption = <optimized out>
        listen_addr_saved = <optimized out>
        i = <optimized out>
        output_config_variable = <optimized out>
        __func__ = "PostmasterMain"
#16 0x000000000047053e in main (argc=3, argv=0x27ca210) at main.c:228


Kind Regards,
    Glauco Torres





Re: Segmentation fault with core dump

From
Tom Lane
Date:
Glauco Torres <torres.glauco@gmail.com> writes:
> (gdb) bt
> #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
> pb=pb@entry=0x1be06d2d06644)
> at bufmgr.c:4137
> #1  0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
> b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
> c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
> <ckpt_buforder_comparator>)
>     at qsort.c:107
> #2  0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
> n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> <ckpt_buforder_comparator>) at qsort.c:157
> #3  0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
> n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> <ckpt_buforder_comparator>) at qsort.c:203
> #4  0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
> bufmgr.c:1863

Hm.  I'm not normally one to jump to the conclusion that something is a
compiler bug, but it's hard to explain this stack trace any other way.
The value of "n" passed to the inner invocation of pg_qsort should not
have been more than 29914, but working from either the value of d or the
value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
looks a lot more like an address in the array than a proper value of n.

I suppose this might be due to a corrupted copy of the postgres executable
rather than an actual compiler bug.  Did you build it yourself?

BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
see the same block ID twice in the array, which I think is an
unsupportable assumption.  But I cannot see a way that that could lead
to a crash in pg_qsort --- at worst it might cause a little inefficiency.

            regards, tom lane


Re: Segmentation fault with core dump

From
Merlin Moncure
Date:
On Wed, Jan 10, 2018 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Glauco Torres <torres.glauco@gmail.com> writes:
>> (gdb) bt
>> #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
>> pb=pb@entry=0x1be06d2d06644)
>> at bufmgr.c:4137
>> #1  0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
>> b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
>> c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
>> <ckpt_buforder_comparator>)
>>     at qsort.c:107
>> #2  0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
>> n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> <ckpt_buforder_comparator>) at qsort.c:157
>> #3  0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
>> n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
>> <ckpt_buforder_comparator>) at qsort.c:203
>> #4  0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
>> bufmgr.c:1863
>
> Hm.  I'm not normally one to jump to the conclusion that something is a
> compiler bug, but it's hard to explain this stack trace any other way.
> The value of "n" passed to the inner invocation of pg_qsort should not
> have been more than 29914, but working from either the value of d or the
> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
> looks a lot more like an address in the array than a proper value of n.
>
> I suppose this might be due to a corrupted copy of the postgres executable
> rather than an actual compiler bug.  Did you build it yourself?
>
> BTW, I notice that ckpt_buforder_comparator assumes it can't possibly
> see the same block ID twice in the array, which I think is an
> unsupportable assumption.  But I cannot see a way that that could lead
> to a crash in pg_qsort --- at worst it might cause a little inefficiency.

simple
SELECT version();
...can give a lot of hints on who/what compiled the database if you don't know.

merlin


Re: Segmentation fault with core dump

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Glauco Torres <torres.glauco@gmail.com> writes:
> > (gdb) bt
> > #0  ckpt_buforder_comparator (pa=pa@entry=0x7f6fa9ef4b2c,
> > pb=pb@entry=0x1be06d2d06644)
> > at bufmgr.c:4137
> > #1  0x0000000000801268 in med3 (a=0x7f6fa9ef4b2c "\177\006",
> > b=0x1be06d2d06644 <Address 0x1be06d2d06644 out of bounds>,
> > c=0x2fc9dfbb1815c <Address 0x2fc9dfbb1815c out of bounds>, cmp=0x6a4d20
> > <ckpt_buforder_comparator>)
> >     at qsort.c:107
> > #2  0x0000000000801621 in pg_qsort (a=0x7f6fa9ef4b2c, a@entry=0x7f6fa9ea8380,
> > n=<optimized out>, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> > <ckpt_buforder_comparator>) at qsort.c:157
> > #3  0x00000000008015e2 in pg_qsort (a=0x7f6fa9ea8380, n=<optimized out>,
> > n@entry=111473, es=es@entry=20, cmp=cmp@entry=0x6a4d20
> > <ckpt_buforder_comparator>) at qsort.c:203
> > #4  0x00000000006a81cf in BufferSync (flags=flags@entry=128) at
> > bufmgr.c:1863
> 
> Hm.  I'm not normally one to jump to the conclusion that something is a
> compiler bug, but it's hard to explain this stack trace any other way.
> The value of "n" passed to the inner invocation of pg_qsort should not
> have been more than 29914, but working from either the value of d or the
> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
> looks a lot more like an address in the array than a proper value of n.
> 
> I suppose this might be due to a corrupted copy of the postgres executable
> rather than an actual compiler bug.  Did you build it yourself?

Hmm, is this something that can be explained by using a different
postgres executable in GDB than the one that produced the core file?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Segmentation fault with core dump

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> Hm.  I'm not normally one to jump to the conclusion that something is a
>> compiler bug, but it's hard to explain this stack trace any other way.
>> The value of "n" passed to the inner invocation of pg_qsort should not
>> have been more than 29914, but working from either the value of d or the
>> value of pn leads to the conclusion that it was 0x7f6fa9f3a470, which
>> looks a lot more like an address in the array than a proper value of n.

> Hmm, is this something that can be explained by using a different
> postgres executable in GDB than the one that produced the core file?

That would result in nonsensical gdb output, most likely; but Glauco's
trace is internally consistent enough that I doubt gdb is lying to us.
In any case, the crash is an observable fact :-(

            regards, tom lane


Re: Segmentation fault with core dump

From
Alvaro Herrera
Date:
Merlin Moncure wrote:

> simple
> SELECT version();
> ...can give a lot of hints on who/what compiled the database if you don't know.

Probably, this is why Glauco included the output in his opening letter.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Segmentation fault with core dump

From
Glauco Torres
Date:


That would result in nonsensical gdb output, most likely; but Glauco's
trace is internally consistent enough that I doubt gdb is lying to us.
In any case, the crash is an observable fact :-(


The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

We are pretty sure that the same binary that crashed was using on `gdb` command. More specifically, the path used was `/usr/pgsql-9.6/bin/postmaster`, and we were running 9.6.6 (most recent 9.6 minor release today) for a few weeks, so there shouldn't have any upgrade on the binaries since the server was up, specially because we restarted the service in order to allow core dump creation, this is not the first crash (although the only one with core dump generated so far), we can send new gdb stack if it happens again.

More information:
$ uname -a
Linux pg-iii.br 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ /usr/pgsql-9.6/bin/pg_config
BINDIR = /usr/pgsql-9.6/bin
DOCDIR = /usr/pgsql-9.6/doc
HTMLDIR = /usr/pgsql-9.6/doc/html
INCLUDEDIR = /usr/pgsql-9.6/include
PKGINCLUDEDIR = /usr/pgsql-9.6/include
INCLUDEDIR-SERVER = /usr/pgsql-9.6/include/server
LIBDIR = /usr/pgsql-9.6/lib
PKGLIBDIR = /usr/pgsql-9.6/lib
LOCALEDIR = /usr/pgsql-9.6/share/locale
MANDIR = /usr/pgsql-9.6/share/man
SHAREDIR = /usr/pgsql-9.6/share
SYSCONFDIR = /etc/sysconfig/pgsql
PGXS = /usr/pgsql-9.6/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/pgsql-9.6' '--includedir=/usr/pgsql-9.6/include' '--mandir=/usr/pgsql-9.6/share/man' '--datadir=/usr/pgsql-9.6/share' '--libdir=/usr/pgsql-9.6/lib' '--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' '--with-openssl' '--with-pam' '--with-gssapi' '--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls' '--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-selinux' '--with-systemd' '--with-system-tzdata=/usr/share/zoneinfo' '--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-9.6/doc' '--htmldir=/usr/pgsql-9.6/doc/html' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,--as-needed'
CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CFLAGS_SL = -fPIC
LDFLAGS = -L../../src/common -Wl,--as-needed -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-9.6/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lselinux -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm 
VERSION = PostgreSQL 9.6.6

Regards,
    Glauco

Re: Segmentation fault with core dump

From
Tom Lane
Date:
Glauco Torres <torres.glauco@gmail.com> writes:
>> The system is a CentOS 7, and PG was installed using PGDG's YUM repository.

Might be worth comparing sha1sum's of the postgres executable between
this server and one that's not having the problem, just to eliminate
the corrupted-binary theory.

            regards, tom lane


Re: Segmentation fault with core dump

From
Glauco Torres
Date:


Might be worth comparing sha1sum's of the postgres executable between
this server and one that's not having the problem, just to eliminate
the corrupted-binary theory.

                  

The return is the same for the two servers,

$ sha1sum /usr/pgsql-9.6/bin/postmaster
56bcb4d644a8b00f07e9bd42f9a3f02be7ff2523  /usr/pgsql-9.6/bin/postmaster


Today I left to generate more core-dump, follow the return,

(gdb) bt
#0  tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70) at tidbitmap.c:1031
#1  0x0000000000801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70 <Address 0x3acdb70 out of bounds>, c=0x583ecd8 <Address 0x583ecd8 out of bounds>, cmp=0x603ca0 <tbm_comparator>) at qsort.c:107
#2  0x0000000000801621 in pg_qsort (a=0x1d5ca08, n=<optimized out>, n@entry=10477, es=es@entry=8, cmp=cmp@entry=0x603ca0 <tbm_comparator>) at qsort.c:157
#3  0x0000000000604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at tidbitmap.c:635
#4  0x00000000005d3a89 in BitmapHeapNext (node=node@entry=0x1dc2ef0) at nodeBitmapHeapscan.c:110
#5  0x00000000005caf1a in ExecScanFetch (recheckMtd=0x5d35b0 <BitmapHeapRecheck>, accessMtd=0x5d35f0 <BitmapHeapNext>, node=0x1dc2ef0) at execScan.c:95
#6  ExecScan (node=node@entry=0x1dc2ef0, accessMtd=accessMtd@entry=0x5d35f0 <BitmapHeapNext>, recheckMtd=recheckMtd@entry=0x5d35b0 <BitmapHeapRecheck>) at execScan.c:180
#7  0x00000000005d3cff in ExecBitmapHeapScan (node=node@entry=0x1dc2ef0) at nodeBitmapHeapscan.c:440
#8  0x00000000005c3fb8 in ExecProcNode (node=node@entry=0x1dc2ef0) at execProcnode.c:437
#9  0x00000000005de877 in ExecNestLoop (node=node@entry=0x1dc0148) at nodeNestloop.c:174
#10 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dc0148) at execProcnode.c:476
#11 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbfdd8) at nodeNestloop.c:123
#12 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dbfdd8) at execProcnode.c:476
#13 0x00000000005d624d in MultiExecHash (node=node@entry=0x1dbf9b8) at nodeHash.c:104
#14 0x00000000005c40c0 in MultiExecProcNode (node=node@entry=0x1dbf9b8) at execProcnode.c:577
#15 0x00000000005d6cb9 in ExecHashJoin (node=node@entry=0x1dbe688) at nodeHashjoin.c:178
#16 0x00000000005c3f08 in ExecProcNode (node=node@entry=0x1dbe688) at execProcnode.c:484
#17 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbc6e0) at nodeNestloop.c:123
#18 0x00000000005c3f28 in ExecProcNode (node=node@entry=0x1dbc6e0) at execProcnode.c:476
#19 0x00000000005de7d7 in ExecNestLoop (node=node@entry=0x1dbc520) at nodeNestloop.c:123
#20 0x00000000005c3f28 in ExecProcNode (node=0x1dbc520) at execProcnode.c:476
#21 0x00000000005cf619 in fetch_input_tuple (aggstate=aggstate@entry=0x1dbbc48) at nodeAgg.c:598
#22 0x00000000005d10ff in agg_retrieve_direct (aggstate=0x1dbbc48) at nodeAgg.c:2067
#23 ExecAgg (node=node@entry=0x1dbbc48) at nodeAgg.c:1892
#24 0x00000000005c3ec8 in ExecProcNode (node=node@entry=0x1dbbc48) at execProcnode.c:503
#25 0x00000000005c03a7 in ExecutePlan (dest=0x1b607c8, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1dbbc48, estate=0x1dbba58)
    at execMain.c:1566
#26 standard_ExecutorRun (queryDesc=0x1c98de0, direction=<optimized out>, count=0) at execMain.c:338
#27 0x00007f016577e0a5 in pgss_ExecutorRun (queryDesc=0x1c98de0, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:877
#28 0x00000000006d3a97 in PortalRunSelect (portal=portal@entry=0x1ad9278, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x1b607c8) at pquery.c:948
#29 0x00000000006d4eab in PortalRun (portal=0x1ad9278, count=9223372036854775807, isTopLevel=<optimized out>, dest=0x1b607c8, altdest=0x1b607c8, completionTag=0x7ffdfe32a700 "") at pquery.c:789
#30 0x00000000006d2371 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:1969
#31 0x000000000046f8d4 in BackendRun (port=0x1add6d0) at postmaster.c:4294
#32 BackendStartup (port=0x1add6d0) at postmaster.c:3968
#33 ServerLoop () at postmaster.c:1719
#34 0x0000000000675b69 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1ab1210) at postmaster.c:1327
#35 0x000000000047053e in main (argc=3, argv=0x1ab1210) at main.c:228

Regards,
    Glauco

Re: Segmentation fault with core dump

From
Tom Lane
Date:
Glauco Torres <torres.glauco@gmail.com> writes:
> Today I left to generate more core-dump, follow the return,

> (gdb) bt
> #0  tbm_comparator (left=left@entry=0x1d5ca08, right=right@entry=0x3acdb70)
> at tidbitmap.c:1031
> #1  0x0000000000801268 in med3 (a=0x1d5ca08 "\350>\337\001", b=0x3acdb70
> <Address 0x3acdb70 out of bounds>, c=0x583ecd8 <Address 0x583ecd8 out of
> bounds>, cmp=0x603ca0 <tbm_comparator>) at qsort.c:107
> #2  0x0000000000801621 in pg_qsort (a=0x1d5ca08, n=<optimized out>,
> n@entry=10477,
> es=es@entry=8, cmp=cmp@entry=0x603ca0 <tbm_comparator>) at qsort.c:157
> #3  0x0000000000604a7b in tbm_begin_iterate (tbm=tbm@entry=0x1dd8a00) at
> tidbitmap.c:635

Oh ho!  I was wondering to myself "if pg_qsort is broken, why isn't
his system falling over everywhere?".  The answer evidently is
"yes, it is falling over everywhere".  This symptom looks pretty much
like what you had before, ie far-out-of-range addresses getting passed
to med3(), but the qsort call site is completely different.

Since you've eliminated the idea that the executable file per se
is different from your working servers, I think we're now down to
the conclusion that there's something flaky about the hardware
on this server.  Maybe it's misexecuting integer divide every so
often --- though it's hard to guess why only pg_qsort would be
affected.

            regards, tom lane