Thread: pg11.1 jit segv
Crash is reproducible but only when JIT=on. postgresql11-llvmjit-11.1-1PGDG.rhel7.x86_64 [2769871.453033] postmaster[8582]: segfault at 7f083bddb780 ip 00007f08127e814e sp 00007ffe463506e0 error 4 [2770774.470600] postmaster[29410]: segfault at 7f0812eeb6c8 ip 00007f08127eb4f0 sp 00007ffe463506e0 error 4 Core was generated by `postgres: telsasoft ts 192.168.122.11(41908) SELECT '. Program terminated with signal 11, Segmentation fault. (gdb) bt #0 0x00007f08127e814e in ?? () #1 0x0000000000000000 in ?? () [pryzbyj@database ~]$ sudo -u postgres valgrind /usr/pgsql-11/bin/postgres --single -D /var/lib/pgsql/11/data ts <tmp/sql-jit-crash-2018-11-15 ==26448== Conditional jump or move depends on uninitialised value(s) ==26448== at 0x1B510F09: getAdjustedPtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter>&,llvm::DataLayout const&, llvm::Value*, llvm::APInt, llvm::Type*, llvm::Twine) (SROA.cpp:1531) ==26448== by 0x1B511C52: llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymousnamespace)::IRBuilderPrefixedInserter>&, llvm::Type*) (SROA.cpp:2313) ==26448== by 0x1B516BA0: llvm::sroa::AllocaSliceRewriter::visitIntrinsicInst(llvm::IntrinsicInst&) (SROA.cpp:2921) ==26448== by 0x1B5190CC: visitCall (Instruction.def:190) ==26448== by 0x1B5190CC: llvm::InstVisitor<llvm::sroa::AllocaSliceRewriter, bool>::visit(llvm::Instruction&) (Instruction.def:190) ==26448== by 0x1B51947A: visit (InstVisitor.h:114) ==26448== by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*) (SROA.cpp:2262) ==26448== by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, llvm::sroa::AllocaSlices&, llvm::sroa::Partition&)(SROA.cpp:3921) ==26448== by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, llvm::sroa::AllocaSlices&) (SROA.cpp:4029) ==26448== by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) (SROA.cpp:4156) ==26448== by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, llvm::DominatorTree&, llvm::AssumptionCache&) (SROA.cpp:4243) ==26448== by 0x1B520E40: llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296) ==26448== by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1514) ==26448== by 0x1B6A805E: RunPassOnSCC (Timer.h:149) ==26448== by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419) ==26448== by 0x1B6A805E: (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (CallGraphSCCPass.cpp:474) ==26448== ==26448== Conditional jump or move depends on uninitialised value(s) ==26448== at 0x1B510F09: getAdjustedPtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymous namespace)::IRBuilderPrefixedInserter>&,llvm::DataLayout const&, llvm::Value*, llvm::APInt, llvm::Type*, llvm::Twine) (SROA.cpp:1531) ==26448== by 0x1B511C52: llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder<llvm::ConstantFolder, (anonymousnamespace)::IRBuilderPrefixedInserter>&, llvm::Type*) (SROA.cpp:2313) ==26448== by 0x1B515EF0: llvm::sroa::AllocaSliceRewriter::visitMemSetInst(llvm::MemSetInst&) (SROA.cpp:2656) ==26448== by 0x1B5190CC: visitCall (Instruction.def:190) ==26448== by 0x1B5190CC: llvm::InstVisitor<llvm::sroa::AllocaSliceRewriter, bool>::visit(llvm::Instruction&) (Instruction.def:190) ==26448== by 0x1B51947A: visit (InstVisitor.h:114) ==26448== by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous namespace)::Slice const*) (SROA.cpp:2262) ==26448== by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, llvm::sroa::AllocaSlices&, llvm::sroa::Partition&)(SROA.cpp:3921) ==26448== by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, llvm::sroa::AllocaSlices&) (SROA.cpp:4029) ==26448== by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) (SROA.cpp:4156) ==26448== by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, llvm::DominatorTree&, llvm::AssumptionCache&) (SROA.cpp:4243) ==26448== by 0x1B520E40: llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296) ==26448== by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1514) ==26448== by 0x1B6A805E: RunPassOnSCC (Timer.h:149) ==26448== by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419) ==26448== by 0x1B6A805E: (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (CallGraphSCCPass.cpp:474) ==26448== ==26448== Invalid write of size 8 ==26448== at 0x4C2E0C3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022) ==26448== by 0x824075: UnknownInlinedFun (string3.h:51) ==26448== by 0x824075: varstrfastcmp_locale (varlena.c:2135) ==26448== by 0x87C6F3: ApplySortComparator (sortsupport.h:224) ==26448== by 0x87C6F3: comparetup_heap (tuplesort.c:3560) ==26448== by 0x8793F4: qsort_tuple (qsort_tuple.c:112) ==26448== by 0x87EE53: tuplesort_performsort (tuplesort.c:1811) ==26448== by 0x628CE2: ExecSort (nodeSort.c:118) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== Address 0x1f77c1c0 is 0 bytes after a block of size 8,192 alloc'd ==26448== at 0x4C29C23: malloc (vg_replace_malloc.c:299) ==26448== by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477) ==26448== by 0x87AA3D: tuplesort_begin_common (tuplesort.c:697) ==26448== by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812) ==26448== by 0x628C93: ExecSort (nodeSort.c:89) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x61B364: UnknownInlinedFun (executor.h:237) ==26448== by 0x61B364: MultiExecPrivateHash (nodeHash.c:164) ==26448== by 0x61B364: MultiExecHash (nodeHash.c:114) ==26448== ==26448== Invalid read of size 8 ==26448== at 0x4C2E0CE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022) ==26448== by 0x824075: UnknownInlinedFun (string3.h:51) ==26448== by 0x824075: varstrfastcmp_locale (varlena.c:2135) ==26448== by 0x87C6F3: ApplySortComparator (sortsupport.h:224) ==26448== by 0x87C6F3: comparetup_heap (tuplesort.c:3560) ==26448== by 0x8793F4: qsort_tuple (qsort_tuple.c:112) ==26448== by 0x87EE53: tuplesort_performsort (tuplesort.c:1811) ==26448== by 0x628CE2: ExecSort (nodeSort.c:118) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== Address 0x1f77e200 is 0 bytes after a block of size 8,192 alloc'd ==26448== at 0x4C29C23: malloc (vg_replace_malloc.c:299) ==26448== by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477) ==26448== by 0x87AA5A: tuplesort_begin_common (tuplesort.c:710) ==26448== by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812) ==26448== by 0x628C93: ExecSort (nodeSort.c:89) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x61B364: UnknownInlinedFun (executor.h:237) ==26448== by 0x61B364: MultiExecPrivateHash (nodeHash.c:164) ==26448== by 0x61B364: MultiExecHash (nodeHash.c:114) ==26448== ==26448== Invalid read of size 8 ==26448== at 0x4C2E0C0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022) ==26448== by 0x824075: UnknownInlinedFun (string3.h:51) ==26448== by 0x824075: varstrfastcmp_locale (varlena.c:2135) ==26448== by 0x87C6F3: ApplySortComparator (sortsupport.h:224) ==26448== by 0x87C6F3: comparetup_heap (tuplesort.c:3560) ==26448== by 0x8793F4: qsort_tuple (qsort_tuple.c:112) ==26448== by 0x87EE53: tuplesort_performsort (tuplesort.c:1811) ==26448== by 0x628CE2: ExecSort (nodeSort.c:118) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== Address 0x1f77e210 is 16 bytes after a block of size 8,192 alloc'd ==26448== at 0x4C29C23: malloc (vg_replace_malloc.c:299) ==26448== by 0x86D5AD: AllocSetContextCreateExtended (aset.c:477) ==26448== by 0x87AA5A: tuplesort_begin_common (tuplesort.c:710) ==26448== by 0x87DAA9: tuplesort_begin_heap (tuplesort.c:812) ==26448== by 0x628C93: ExecSort (nodeSort.c:89) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x61B364: UnknownInlinedFun (executor.h:237) ==26448== by 0x61B364: MultiExecPrivateHash (nodeHash.c:164) ==26448== by 0x61B364: MultiExecHash (nodeHash.c:114) ==26448== ==26448== ==26448== More than 10000000 total errors detected. I'm not reporting any more. ==26448== Final error counts will be inaccurate. Go fix your program! ==26448== Rerun with --error-limit=no to disable this cutoff. Note ==26448== that errors may occur in your program without prior warning from ==26448== Valgrind, because errors are no longer being displayed. ==26448== ==26448== ==26448== Process terminating with default action of signal 11 (SIGSEGV) ==26448== Access not within mapped region at address 0x23337000 ==26448== at 0x4C2E0CE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022) ==26448== by 0x824075: UnknownInlinedFun (string3.h:51) ==26448== by 0x824075: varstrfastcmp_locale (varlena.c:2135) ==26448== by 0x87C6F3: ApplySortComparator (sortsupport.h:224) ==26448== by 0x87C6F3: comparetup_heap (tuplesort.c:3560) ==26448== by 0x8793F4: qsort_tuple (qsort_tuple.c:112) ==26448== by 0x87EE53: tuplesort_performsort (tuplesort.c:1811) ==26448== by 0x628CE2: ExecSort (nodeSort.c:118) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x610D48: UnknownInlinedFun (executor.h:237) ==26448== by 0x610D48: fetch_input_tuple (nodeAgg.c:406) ==26448== by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736) ==26448== by 0x61277F: ExecAgg (nodeAgg.c:1551) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== by 0x60A764: ExecScanFetch (execScan.c:95) ==26448== by 0x60A764: ExecScan (execScan.c:162) ==26448== by 0x609067: ExecProcNodeInstr (execProcnode.c:461) ==26448== If you believe this happened as a result of a stack ==26448== overflow in your program's main thread (unlikely but ==26448== possible), you can try to increase the size of the ==26448== main thread stack using the --main-stacksize= flag. ==26448== The main thread stack size used in this run was 8388608. = Let me know if there's anything else I can provide. Justin
Hi, On 2018-11-15 16:39:59 -0600, Justin Pryzby wrote: > Crash is reproducible but only when JIT=on. > > postgresql11-llvmjit-11.1-1PGDG.rhel7.x86_64 > > [2769871.453033] postmaster[8582]: segfault at 7f083bddb780 ip 00007f08127e814e sp 00007ffe463506e0 error 4 > [2770774.470600] postmaster[29410]: segfault at 7f0812eeb6c8 ip 00007f08127eb4f0 sp 00007ffe463506e0 error 4 > > Core was generated by `postgres: telsasoft ts 192.168.122.11(41908) SELECT '. > Program terminated with signal 11, Segmentation fault. Thanks for reporting! > (gdb) bt > #0 0x00007f08127e814e in ?? () > #1 0x0000000000000000 in ?? () Could you enable jit_debugging_support and reproduce? That should give a more useful backtrace. It'd also be interesting if toggling jit_inline_above_cost=-1, jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set individually, make a difference. > Let me know if there's anything else I can provide. Any way for me to reproduce this locally? I.e. can you narrow this down to a reproducible test case? Greetings, Andres Freund
On Thu, Nov 15, 2018 at 02:47:55PM -0800, Andres Freund wrote: > > (gdb) bt > > #0 0x00007f08127e814e in ?? () > > #1 0x0000000000000000 in ?? () > > Could you enable jit_debugging_support and reproduce? That should give > a more useful backtrace. Core was generated by `postgres: pryzbyj ts [local] EXPLAIN '. Program terminated with signal 11, Segmentation fault. #0 0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.1-3.el7_5.1.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 glib4 (gdb) bt #0 0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x0000000000824076 in memcpy (__len=18446744073709551612, __src=0x6c35818, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 varstrfastcmp_locale (x=113465364, y=113465636, ssup=<optimized out>) at varlena.c:2135 #3 0x000000000087c6f4 in ApplySortComparator (ssup=0x6c33c40, isNull2=<optimized out>, datum2=<optimized out>, isNull1=false,datum1=<optimized out>) at ../../../../src/include/utils/sortsupport.h:224 #4 comparetup_heap (a=<optimized out>, b=<optimized out>, state=0x6c337f0) at tuplesort.c:3560 #5 0x00000000008793f5 in qsort_tuple (a=0x8311958, n=5272, cmp_tuple=0x87c2d0 <comparetup_heap>, state=state@entry=0x6c337f0)at qsort_tuple.c:112 #6 0x000000000087d2db in tuplesort_sort_memtuples (state=state@entry=0x6c337f0) at tuplesort.c:3320 #7 0x000000000087ee54 in tuplesort_performsort (state=state@entry=0x6c337f0) at tuplesort.c:1811 #8 0x0000000000628ce3 in ExecSort (pstate=0x5e50ed8) at nodeSort.c:118 #9 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50ed8) at execProcnode.c:461 #10 0x0000000000610d49 in ExecProcNode (node=0x5e50ed8) at ../../../src/include/executor/executor.h:237 #11 fetch_input_tuple (aggstate=aggstate@entry=0x5e50aa0) at nodeAgg.c:406 #12 0x0000000000612780 in agg_retrieve_direct (aggstate=0x5e50aa0) at nodeAgg.c:1736 #13 ExecAgg (pstate=0x5e50aa0) at nodeAgg.c:1551 #14 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50aa0) at execProcnode.c:461 #15 0x000000000060a765 in ExecScanFetch (recheckMtd=0x62be50 <SubqueryRecheck>, accessMtd=0x62be70 <SubqueryNext>, node=0x5e508e0)at execScan.c:95 #16 ExecScan (node=0x5e508e0, accessMtd=0x62be70 <SubqueryNext>, recheckMtd=0x62be50 <SubqueryRecheck>) at execScan.c:162 #17 0x0000000000609068 in ExecProcNodeInstr (node=0x5e508e0) at execProcnode.c:461 #18 0x000000000061b365 in ExecProcNode (node=0x5e508e0) at ../../../src/include/executor/executor.h:237 #19 MultiExecPrivateHash (node=0x5e50720) at nodeHash.c:164 #20 MultiExecHash (node=node@entry=0x5e50720) at nodeHash.c:114 #21 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e50720) at execProcnode.c:501 #22 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x5e4f620) at nodeHashjoin.c:290 #23 ExecHashJoin (pstate=0x5e4f620) at nodeHashjoin.c:565 #24 0x0000000000609068 in ExecProcNodeInstr (node=0x5e4f620) at execProcnode.c:461 #25 0x000000000061b365 in ExecProcNode (node=0x5e4f620) at ../../../src/include/executor/executor.h:237 #26 MultiExecPrivateHash (node=0x5e4f460) at nodeHash.c:164 #27 MultiExecHash (node=node@entry=0x5e4f460) at nodeHash.c:114 #28 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e4f460) at execProcnode.c:501 #29 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x38344f8) at nodeHashjoin.c:290 #30 ExecHashJoin (pstate=0x38344f8) at nodeHashjoin.c:565 #31 0x0000000000609068 in ExecProcNodeInstr (node=0x38344f8) at execProcnode.c:461 #32 0x000000000061b365 in ExecProcNode (node=0x38344f8) at ../../../src/include/executor/executor.h:237 #33 MultiExecPrivateHash (node=0x38343c8) at nodeHash.c:164 #34 MultiExecHash (node=node@entry=0x38343c8) at nodeHash.c:114 #35 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x38343c8) at execProcnode.c:501 #36 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x3833148) at nodeHashjoin.c:290 #37 ExecHashJoin (pstate=0x3833148) at nodeHashjoin.c:565 #38 0x0000000000609068 in ExecProcNodeInstr (node=0x3833148) at execProcnode.c:461 #39 0x0000000000628cd6 in ExecProcNode (node=0x3833148) at ../../../src/include/executor/executor.h:237 #40 ExecSort (pstate=0x3832bd0) at nodeSort.c:107 #41 0x0000000000609068 in ExecProcNodeInstr (node=0x3832bd0) at execProcnode.c:461 #42 0x0000000000610d49 in ExecProcNode (node=0x3832bd0) at ../../../src/include/executor/executor.h:237 #43 fetch_input_tuple (aggstate=aggstate@entry=0x3832e30) at nodeAgg.c:406 #44 0x0000000000612780 in agg_retrieve_direct (aggstate=0x3832e30) at nodeAgg.c:1736 #45 ExecAgg (pstate=0x3832e30) at nodeAgg.c:1551 #46 0x0000000000609068 in ExecProcNodeInstr (node=0x3832e30) at execProcnode.c:461 #47 0x00000000006025ba in ExecProcNode (node=0x3832e30) at ../../../src/include/executor/executor.h:237 #48 ExecutePlan (execute_once=<optimized out>, dest=0xced3a0 <donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x3832e30, estate=0x3832970) at execMain.c:1723 #49 standard_ExecutorRun (queryDesc=0x5c3ec68, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 #50 0x00007f8191ade08d in pgss_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at pg_stat_statements.c:892 #51 0x00007f80eb8bf5e1 in explain_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at auto_explain.c:268 #52 0x00000000005a497c in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x5c32cd8, into=into@entry=0x0, es=es@entry=0x207aaf8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5 #53 0x00000000005a4c87 in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0, es=0x207aaf8, queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' (' ||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as1 #54 0x00000000005a5174 in ExplainQuery (pstate=pstate@entry=0x2078dc8, stmt=stmt@entry=0x1fe1df8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u4 #55 0x00000000007469dd in standard_ProcessUtility (pstmt=pstmt@entry=0x1fe1ea8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5 #56 0x00007f8191ae0025 in pgss_ProcessUtility (pstmt=0x1fe1ea8, queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' (' ||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as5 #57 0x0000000000743e86 in PortalRunUtility (portal=0x1f12be0, pstmt=0x1fe1ea8, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x2078d38, completionTag=0x7ffc304d0e20 "") at pquery.c:1178 #58 0x0000000000744bc2 in FillPortalStore (portal=portal@entry=0x1f12be0, isTopLevel=isTopLevel@entry=true) at pquery.c:1038 #59 0x000000000074564f in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x7f81a09e2dc8, altdest=0x7f81a09e2dc8, completionTag=0x7ffc304d1040 "") at pquery.c:768 #60 0x0000000000741577 in exec_simple_query (query_string=<optimized out>) at postgres.c:1145 #61 0x0000000000742842 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4182 #62 0x00000000004801dd in BackendRun (port=0x1ec3f70) at postmaster.c:4361 #63 BackendStartup (port=0x1ec3f70) at postmaster.c:4033 #64 ServerLoop () at postmaster.c:1706 #65 0x00000000006d5d49 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1e98280) at postmaster.c:1379 #66 0x000000000048103f in main (argc=3, argv=0x1e98280) at main.c:228 ( (gdb) bt f #0 0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6 No symbol table info available. #1 0x0000000000824076 in memcpy (__len=18446744073709551612, __src=0x6c35818, __dest=<optimized out>) at /usr/include/bits/string3.h:51 No locals. #2 varstrfastcmp_locale (x=113465364, y=113465636, ssup=<optimized out>) at varlena.c:2135 arg1 = 0x6c35814 arg2 = 0x6c35924 arg1_match = false sss = 0x6c33d60 a1p = 0x6c35818 "" a2p = 0x6c35928 "" len1 = -4 len2 = -4 result = <optimized out> #3 0x000000000087c6f4 in ApplySortComparator (ssup=0x6c33c40, isNull2=<optimized out>, datum2=<optimized out>, isNull1=false,datum1=<optimized out>) at ../../../../src/include/utils/sortsupport.h:224 compare = <optimized out> isNull2 = false isNull1 = false ssup = 0x6c33c40 datum2 = <optimized out> compare = <optimized out> #4 comparetup_heap (a=<optimized out>, b=<optimized out>, state=0x6c337f0) at tuplesort.c:3560 sortKey = 0x6c33c40 ltup = {t_len = 248, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 42884}, t_tableOid = 0, t_data = 0x6c357f8} rtup = {t_len = 248, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_tableOid = 0, t_data = 0x6c35908} tupDesc = 0x5ede208 nkey = 1 attno = 2 datum1 = <optimized out> datum2 = <optimized out> isnull1 = false isnull2 = false #5 0x00000000008793f5 in qsort_tuple (a=0x8311958, n=5272, cmp_tuple=0x87c2d0 <comparetup_heap>, state=state@entry=0x6c337f0)at qsort_tuple.c:112 pa = <optimized out> pb = <optimized out> pc = <optimized out> pd = <optimized out> pl = <optimized out> pm = 0x8311970 pn = <optimized out> d1 = <optimized out> d2 = <optimized out> r = <optimized out> presorted = 1 #6 0x000000000087d2db in tuplesort_sort_memtuples (state=state@entry=0x6c337f0) at tuplesort.c:3320 No locals. #7 0x000000000087ee54 in tuplesort_performsort (state=state@entry=0x6c337f0) at tuplesort.c:1811 oldcontext = 0x3832860 __func__ = "tuplesort_performsort" #8 0x0000000000628ce3 in ExecSort (pstate=0x5e50ed8) at nodeSort.c:118 plannode = <optimized out> outerNode = 0x5e50fe8 tupDesc = <optimized out> node = 0x5e50ed8 estate = 0x3832970 dir = ForwardScanDirection tuplesortstate = 0x6c337f0 slot = <optimized out> #9 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50ed8) at execProcnode.c:461 result = <optimized out> #10 0x0000000000610d49 in ExecProcNode (node=0x5e50ed8) at ../../../src/include/executor/executor.h:237 No locals. #11 fetch_input_tuple (aggstate=aggstate@entry=0x5e50aa0) at nodeAgg.c:406 slot = <optimized out> #12 0x0000000000612780 in agg_retrieve_direct (aggstate=0x5e50aa0) at nodeAgg.c:1736 econtext = 0x5e50e28 firstSlot = 0x674fa50 numGroupingSets = 1 node = 0x5c03788 tmpcontext = 0x5e50cc8 peragg = 0x6754918 outerslot = <optimized out> nextSetSize = <optimized out> pergroups = 0x6755928 result = <optimized out> hasGroupingSets = false currentSet = <optimized out> numReset = 1 i = <optimized out> #13 ExecAgg (pstate=0x5e50aa0) at nodeAgg.c:1551 node = 0x5e50aa0 result = 0x0 #14 0x0000000000609068 in ExecProcNodeInstr (node=0x5e50aa0) at execProcnode.c:461 result = <optimized out> #15 0x000000000060a765 in ExecScanFetch (recheckMtd=0x62be50 <SubqueryRecheck>, accessMtd=0x62be70 <SubqueryNext>, node=0x5e508e0)at execScan.c:95 estate = 0x3832970 #16 ExecScan (node=0x5e508e0, accessMtd=0x62be70 <SubqueryNext>, recheckMtd=0x62be50 <SubqueryRecheck>) at execScan.c:162 econtext = 0x5e509f0 qual = 0x0 projInfo = 0x675dbb8 #17 0x0000000000609068 in ExecProcNodeInstr (node=0x5e508e0) at execProcnode.c:461 result = <optimized out> #18 0x000000000061b365 in ExecProcNode (node=0x5e508e0) at ../../../src/include/executor/executor.h:237 No locals. #19 MultiExecPrivateHash (node=0x5e50720) at nodeHash.c:164 outerNode = <optimized out> hashkeys = 0x67662f0 econtext = 0x5e50830 hashvalue = 0 hashtable = 0x6a7d468 #20 MultiExecHash (node=node@entry=0x5e50720) at nodeHash.c:114 No locals. #21 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e50720) at execProcnode.c:501 __func__ = "MultiExecProcNode" #22 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x5e4f620) at nodeHashjoin.c:290 outerNode = 0x5e4f8e0 hashNode = <optimized out> econtext = <optimized out> node = 0x5e4f620 joinqual = <optimized out> otherqual = <optimized out> hashtable = 0x6a7d468 hashvalue = 0 batchno = 48 parallel_state = <optimized out> #23 ExecHashJoin (pstate=0x5e4f620) at nodeHashjoin.c:565 No locals. #24 0x0000000000609068 in ExecProcNodeInstr (node=0x5e4f620) at execProcnode.c:461 result = <optimized out> #25 0x000000000061b365 in ExecProcNode (node=0x5e4f620) at ../../../src/include/executor/executor.h:237 No locals. #26 MultiExecPrivateHash (node=0x5e4f460) at nodeHash.c:164 outerNode = <optimized out> hashkeys = 0x676ef80 econtext = 0x5e4f570 hashvalue = 0 hashtable = 0x6a7cf48 #27 MultiExecHash (node=node@entry=0x5e4f460) at nodeHash.c:114 No locals. #28 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x5e4f460) at execProcnode.c:501 __func__ = "MultiExecProcNode" #29 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x38344f8) at nodeHashjoin.c:290 outerNode = 0x5c4b4b8 hashNode = <optimized out> econtext = <optimized out> node = 0x38344f8 joinqual = <optimized out> otherqual = <optimized out> hashtable = 0x6a7cf48 hashvalue = 0 batchno = 0 parallel_state = <optimized out> #30 ExecHashJoin (pstate=0x38344f8) at nodeHashjoin.c:565 No locals. #31 0x0000000000609068 in ExecProcNodeInstr (node=0x38344f8) at execProcnode.c:461 result = <optimized out> #32 0x000000000061b365 in ExecProcNode (node=0x38344f8) at ../../../src/include/executor/executor.h:237 No locals. #33 MultiExecPrivateHash (node=0x38343c8) at nodeHash.c:164 outerNode = <optimized out> hashkeys = 0x6778c40 econtext = 0x3834048 hashvalue = 0 hashtable = 0x6a7ce38 #34 MultiExecHash (node=node@entry=0x38343c8) at nodeHash.c:114 No locals. #35 0x0000000000609610 in MultiExecProcNode (node=node@entry=0x38343c8) at execProcnode.c:501 __func__ = "MultiExecProcNode" #36 0x000000000061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x3833148) at nodeHashjoin.c:290 outerNode = 0x3833408 hashNode = <optimized out> econtext = <optimized out> node = 0x3833148 joinqual = <optimized out> otherqual = <optimized out> hashtable = 0x6a7ce38 hashvalue = 0 batchno = 96457208 parallel_state = <optimized out> #37 ExecHashJoin (pstate=0x3833148) at nodeHashjoin.c:565 No locals. #38 0x0000000000609068 in ExecProcNodeInstr (node=0x3833148) at execProcnode.c:461 result = <optimized out> #39 0x0000000000628cd6 in ExecProcNode (node=0x3833148) at ../../../src/include/executor/executor.h:237 No locals. #40 ExecSort (pstate=0x3832bd0) at nodeSort.c:107 plannode = <optimized out> outerNode = 0x3833148 tupDesc = <optimized out> node = 0x3832bd0 estate = 0x3832970 dir = ForwardScanDirection tuplesortstate = 0x6c0f6f0 slot = <optimized out> #41 0x0000000000609068 in ExecProcNodeInstr (node=0x3832bd0) at execProcnode.c:461 result = <optimized out> #42 0x0000000000610d49 in ExecProcNode (node=0x3832bd0) at ../../../src/include/executor/executor.h:237 No locals. #43 fetch_input_tuple (aggstate=aggstate@entry=0x3832e30) at nodeAgg.c:406 slot = <optimized out> #44 0x0000000000612780 in agg_retrieve_direct (aggstate=0x3832e30) at nodeAgg.c:1736 econtext = 0x3833098 firstSlot = 0x677a550 numGroupingSets = 1 node = 0x5c06950 tmpcontext = 0x3832ce0 peragg = 0x67873d0 outerslot = <optimized out> nextSetSize = <optimized out> pergroups = 0x6a752e8 result = <optimized out> hasGroupingSets = false currentSet = <optimized out> numReset = 1 i = <optimized out> #45 ExecAgg (pstate=0x3832e30) at nodeAgg.c:1551 node = 0x3832e30 result = 0x0 #46 0x0000000000609068 in ExecProcNodeInstr (node=0x3832e30) at execProcnode.c:461 result = <optimized out> #47 0x00000000006025ba in ExecProcNode (node=0x3832e30) at ../../../src/include/executor/executor.h:237 No locals. #48 ExecutePlan (execute_once=<optimized out>, dest=0xced3a0 <donothingDR>, direction=<optimized out>, numberTuples=0, sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x3832e30, estate=0x3832970) at execMain.c:1723 slot = <optimized out> current_tuple_count = 0 #49 standard_ExecutorRun (queryDesc=0x5c3ec68, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 estate = 0x3832970 operation = CMD_SELECT dest = 0xced3a0 <donothingDR> sendTuples = <optimized out> oldcontext = 0x2078b00 __func__ = "standard_ExecutorRun" #50 0x00007f8191ade08d in pgss_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at pg_stat_statements.c:892 save_exception_stack = 0x7ffc304d0890 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {34056952, -4317634825662262871, 140721118841376, 96726120, 0, 0, 4318053775260225961,4258026755233780137}, __mask_was_saved = 0, __saved_mask = {__val = {33258448, 2, 8603255, 33258448, 5392503, 16, 0, 72057594037927936, 336, 1, 58927200, 58927472, 8861374, 7, 1, 58928688}}}} #51 0x00007f80eb8bf5e1 in explain_ExecutorRun (queryDesc=0x5c3ec68, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at auto_explain.c:268 save_exception_stack = 0x7ffc304d0cf0 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {34056952, -4317634825662262871, 140721118841376, 96726120, 0, 0, 4318053775153271209,4257441557765001641}, __mask_was_saved = 0, __saved_mask = {__val = {336, 1, 58927200, 96726120, 8861374, 2147483647, 1, 34048768, 6356030, 96726120, 96726120, 13907488, 140191684360137, 13554592, 810355232,34056952}}}} #52 0x00000000005a497c in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x5c32cd8, into=into@entry=0x0, es=es@entry=0x207aaf8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5 dir = <optimized out> dest = <optimized out> queryDesc = 0x5c3ec68 starttime = {tv_sec = 2774013, tv_nsec = 519809227} totaltime = 0 eflags = <optimized out> instrument_option = <optimized out> #53 0x00000000005a4c87 in ExplainOneQuery (query=<optimized out>, cursorOptions=<optimized out>, into=0x0, es=0x207aaf8, queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' (' ||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as1 plan = 0x5c32cd8 planstart = {tv_sec = 2774013, tv_nsec = 111256886} planduration = {tv_sec = 0, tv_nsec = 408551920} params = 0x0 queryString = 0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || '(' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.per. es = 0x207aaf8 into = 0x0 queryEnv = 0x0 cursorOptions = <optimized out> query = <optimized out> #54 0x00000000005a5174 in ExplainQuery (pstate=pstate@entry=0x2078dc8, stmt=stmt@entry=0x1fe1df8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u4 l = 0x20e15a8 es = 0x207aaf8 tstate = <optimized out> rewritten = <optimized out> lc = <optimized out> timing_set = <optimized out> summary_set = <optimized out> __func__ = "ExplainQuery" #55 0x00000000007469dd in standard_ProcessUtility (pstmt=pstmt@entry=0x1fe1ea8, queryString=queryString@entry=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid|| ' (' || gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_u5 parsetree = 0x1fe1df8 isTopLevel = true isAtomicContext = false pstate = 0x2078dc8 __func__ = "standard_ProcessUtility" #56 0x00007f8191ae0025 in pgss_ProcessUtility (pstmt=0x1fe1ea8, queryString=0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' (' ||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period as5 save_exception_stack = 0x7ffc304d0f10 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {32582624, -4317636606361366103, 34049336, 32848032, 32582624, 2, 4318053775111328169,4258026755403256233}, __mask_was_saved = 0, __saved_mask = {__val = {0, 0, 8860127, 140721118842400, 16, 32088528, 32847712, 24, 34049336, 140721118842400, 8860127, 32848032, 32582624, 32848032,8926208, 2}}}} duration = {tv_sec = 33430760, tv_nsec = 33985024} bufusage = {shared_blks_hit = -140721118842111, shared_blks_read = 1028, shared_blks_dirtied = 257, shared_blks_written= 32, local_blks_hit = 528280977713, local_blks_read = 140721118842112, local_blks_dirtied = 4, local_blks_written = 0, temp_blks_read = 511101108338, temp_blks_written = 0, blk_read_time = {tv_sec = 140721118842111,tv_nsec = 140721118842160}, blk_write_time = {tv_sec = 2, tv_nsec = 0}} start = {tv_sec = 2774013, tv_nsec = 108551544} rows = <optimized out> bufusage_start = {shared_blks_hit = 900, shared_blks_read = 0, shared_blks_dirtied = 0, shared_blks_written = 0,local_blks_hit = <optimized out>, local_blks_read = <optimized out>, local_blks_dirtied = <optimized out>, local_blks_written = <optimized out>, temp_blks_read = <optimized out>, temp_blks_written= <optimized out>, blk_read_time = {tv_sec = <optimized out>, tv_nsec = <optimized out>}, blk_write_time = {tv_sec = <optimized out>, tv_nsec = <optimized out>}} parsetree = <optimized out> #57 0x0000000000743e86 in PortalRunUtility (portal=0x1f12be0, pstmt=0x1fe1ea8, isTopLevel=<optimized out>, setHoldSnapshot=<optimizedout>, dest=0x2078d38, completionTag=0x7ffc304d0e20 "") at pquery.c:1178 utilityStmt = <optimized out> snapshot = 0x1f538a0 #58 0x0000000000744bc2 in FillPortalStore (portal=portal@entry=0x1f12be0, isTopLevel=isTopLevel@entry=true) at pquery.c:1038 treceiver = 0x2078d38 completionTag = "\000\377\377\377\000\000\000\000\260z\372\001\000\000\000\000\360\240\351\001\000\000\000\000\245c\207\000\000\000\000\000\020\214\a\002\000\000\000\000\260z\372\001\000\000\000\000\a\000\000\000\000\" __func__ = "FillPortalStore" #59 0x000000000074564f in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x7f81a09e2dc8, altdest=0x7f81a09e2dc8, completionTag=0x7ffc304d1040 "") at pquery.c:768 save_exception_stack = 0x7ffc304d1220 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {0, -4317636606344588887, 33431128, 140721118842944, 32582624, 2, 4318053774943556009,-4317633223726624343}, __mask_was_saved = 0, __saved_mask = {__val = {480, 10132060, 8746404, 1, 32590832, 10476062, 34048768, 2, 1, 2, 112, 33431080, 32101840, 10233096, 8861374, 2}}}} result = <optimized out> nprocessed = <optimized out> saveTopTransactionResourceOwner = 0x1ed5a08 saveTopTransactionContext = 0x1f53760 saveActivePortal = 0x0 saveResourceOwner = 0x1ed5a08 savePortalContext = 0x0 saveMemoryContext = 0x1f53760 __func__ = "PortalRun" #60 0x0000000000741577 in exec_simple_query (query_string=<optimized out>) at postgres.c:1145 dest = <optimized out> oldcontext = <optimized out> parsetree_list = <optimized out> parsetree_item = <optimized out> save_log_statement_stats = <optimized out> was_logged = <optimized out> use_implicit_block = <optimized out> msec_str = "\000 \000\000\000\000\000\000?\252c\000\000\000\000\000\247\235\354\001\000\000\000\000p?\354\001\002\000\000" __func__ = "exec_simple_query" #61 0x0000000000742842 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4182 firstchar = 32267608 input_message = { data = 0x1ec5d58 "explain(analyze)\nSELECT site_office as site_gran, site_location AS bs, sect_mscid || ' (' ||gsm_site_name_to_sect_name (site_name, sect_num, sect_name) || ')' AS sitename, data_utrancell.period a} local_sigjmp_buf = {{__jmpbuf = {32292536, -4317633050175762007, 32259952, 0, 0, 32260608, 4318053775948091817, -4317633221742456407},__mask_was_saved = 1, __saved_mask = {__val = {0, 32260608, 140194680307352, 0, 206158430256, 140721118843760, 140721118843552, 32102304, 32080672, 16, 66, 32102304, 32292536, 0, 32292616,32292784}}}} send_ready_for_query = false disable_idle_in_transaction_timeout = <optimized out> __func__ = "PostgresMain" #62 0x00000000004801dd in BackendRun (port=0x1ec3f70) at postmaster.c:4361 ac = 1 secs = 595637552 usecs = 441639 i = 1 av = 0x1ecbfe8 maxac = <optimized out> #63 BackendStartup (port=0x1ec3f70) at postmaster.c:4033 bn = 0x1ec4200 pid = 0 #64 ServerLoop () at postmaster.c:1706 rmask = {fds_bits = {32, 0 <repeats 15 times>}} selres = <optimized out> now = <optimized out> readmask = {fds_bits = {120, 0 <repeats 15 times>}} last_lockfile_recheck_time = <optimized out> last_touch_time = 1542322298 __func__ = "ServerLoop" #65 0x00000000006d5d49 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1e98280) at postmaster.c:1379 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = true i = <optimized out> output_config_variable = <optimized out> __func__ = "PostmasterMain" #66 0x000000000048103f in main (argc=3, argv=0x1e98280) at main.c:228 No locals. (gdb) > It'd also be interesting if toggling jit_inline_above_cost=-1, > jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set > individually, make a difference. Crashes with jit_tuple_deforming=true (others off). Doesn't crash with other combinations: Options: Inlining false, Optimization true, Expressions true, Deforming false Options: Inlining true, Optimization false, Expressions true, Deforming false > Any way for me to reproduce this locally? I.e. can you narrow this down > to a reproducible test case? Will work on it. Justin
Hi, On 2018-11-15 17:03:35 -0600, Justin Pryzby wrote: > On Thu, Nov 15, 2018 at 02:47:55PM -0800, Andres Freund wrote: > > > (gdb) bt > > > #0 0x00007f08127e814e in ?? () > > > #1 0x0000000000000000 in ?? () > > > > Could you enable jit_debugging_support and reproduce? That should give > > a more useful backtrace. > Core was generated by `postgres: pryzbyj ts [local] EXPLAIN '. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.1-3.el7_5.1.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 glib4 > (gdb) bt > #0 0x00007f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6 > ... Huh, that's the same crash? Because I don't see any evalexpr functions in the stack, and without those the above bt should have worked... > > It'd also be interesting if toggling jit_inline_above_cost=-1, > > jit_optimize_above_cost=-1 and jit_tuple_deforming=false, each set > > individually, make a difference. > > Crashes with jit_tuple_deforming=true (others off). > Doesn't crash with other combinations: > Options: Inlining false, Optimization true, Expressions true, Deforming false > Options: Inlining true, Optimization false, Expressions true, Deforming false Oh, interesting. That helps. Could you perhaps show an EXPLAIN VERBOSE of the query plan? > > Any way for me to reproduce this locally? I.e. can you narrow this down > > to a reproducible test case? > > Will work on it. Thanks! Greetings, Andres Freund
On Thu, Nov 15, 2018 at 03:14:01PM -0800, Andres Freund wrote: > Huh, that's the same crash? Because I don't see any evalexpr functions > in the stack, and without those the above bt should have worked... TTBOMK it's the same .. Is it odd if i'm seeing this: (odd because of "ANOTHER") ? I guess that's maybe because it's running parallel query ? psql:tmp/sql-jit-crash-2018-11-15.jtp:16: WARNING: terminating connection because of crash of another server process I'm started trying to minimize the query. Here's what's left: SELECT * FROM daily_eric_umts_rnc_utrancell_view t2 JOIN (SELECT * FROM eric_umts_rbs_sector_carrier_view t1 WHERE t1.start_time >= '2018-07-01 00:00:00' AND t1.start_time <'2018-07-09 00:00:00') AS t1 USING(start_time) WHERE (t2.start_time >= '2018-07-01 00:00:00' AND t2.start_time < '2018-07-09 00:00:00') Verbose plan, munged for brevity/sanity due to joining wide tables, and redacted since the view probably has to be considered proprietary. Hopefully the remaining bits are still useful. I replaced column names with x. Gather (cost=481961.60..482102.41 rows=1180 width=8068) Output: [ there are 1700+ columns here... ] Workers Planned: 3 -> Merge Join (cost=480961.60..480984.41 rows=381 width=8068) Output: ARRAY[x, x, ..., COALESCE(x, 0), COALESCE(x, 0)], ARRAY[..., COALESCE(x, 0), COALESCE(x, 0), COALESCE(x,0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COA LESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x,0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALE SCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0),COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESC E(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0),COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0)], ((x)::numeric / 10.0), CASE WHEN (x ISNULL) THEN ARRAY[..., eric_umts_rbs_sector_carrier_201807.pmaveragerssi_064] ELSE ARRAY[...] END, CASE WHEN (x IS NULL)THEN '{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-60.0}'::numeric[] ELSE '{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-39.5}'::numeric[] END,ARRAY[x], x, x Merge Cond: (eric_umts_rbs_sector_carrier_201807.start_time = t2.start_time) -> Sort (cost=480421.29..480428.89 rows=3039 width=1652) Output: [...] Sort Key: eric_umts_rbs_sector_carrier_201807.start_time -> Parallel Append (cost=0.00..480245.49 rows=3039 width=1652) -> Parallel Seq Scan on child.eric_umts_rbs_sector_carrier_201807 (cost=0.00..480230.30 rows=3039width=1652) Output: [...] Filter: ((eric_umts_rbs_sector_carrier_201807.start_time >= '2018-07-01 00:00:00-07'::timestampwith time zone) AND (eric_umts_rbs_sector_carrier_201807.start_time < '2018-07-09 00:00:00-07'::timesta mp with time zone)) -> Sort (cost=540.31..540.65 rows=139 width=7760) Output: [...] Sort Key: t2.start_time -> Append (cost=0.00..535.36 rows=139 width=7760) -> Seq Scan on public.daily_eric_umts_rnc_utrancell_view t2 (cost=0.00..0.00 rows=1 width=7760) Output: [...] Filter: ((t2.start_time >= '2018-07-01 00:00:00'::timestamp without time zone) AND (t2.start_time< '2018-07-09 00:00:00'::timestamp without time zone)) -> Bitmap Heap Scan on child.daily_eric_umts_rnc_utrancell_view_201807 t2_1 (cost=9.83..534.66 rows=138width=7760) Output: [...] Recheck Cond: ((t2_1.start_time >= '2018-07-01 00:00:00'::timestamp without time zone) AND (t2_1.start_time< '2018-07-09 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201807_unique_idx (cost=0.00..9.79rows=138 width=0) Index Cond: ((t2_1.start_time >= '2018-07-01 00:00:00'::timestamp without time zone) AND(t2_1.start_time < '2018-07-09 00:00:00'::timestamp without time zone)) JIT: Functions: 19 Options: Inlining false, Optimization false, Expressions true, Deforming true (28 rows) Justin
On Thu, Nov 15, 2018 at 06:03:34PM -0600, Justin Pryzby wrote: > Verbose plan, munged for brevity/sanity due to joining wide tables, and > redacted since the view probably has to be considered proprietary. Hopefully > the remaining bits are still useful. I replaced column names with x. Actually the view isn't as intricate as I thought, but I'd like to avoid publishing it for sake of simplicity. I replaced the view with its underlying table and now I get: [pryzbyj@database ~]$ time psql ts -f tmp/sql-jit-crash-2018-11-15.jtp psql:tmp/sql-jit-crash-2018-11-15.jtp:12: ERROR: invalid memory alloc request size 2447889552 It's unclear if that's a useful hint, a separate problem, or a nonissue.. Justin
Hi, On 2018-11-15 18:11:05 -0600, Justin Pryzby wrote: > On Thu, Nov 15, 2018 at 06:03:34PM -0600, Justin Pryzby wrote: > > Verbose plan, munged for brevity/sanity due to joining wide tables, and > > redacted since the view probably has to be considered proprietary. Hopefully > > the remaining bits are still useful. I replaced column names with x. > > Actually the view isn't as intricate as I thought, but I'd like to avoid > publishing it for sake of simplicity. I replaced the view with its underlying > table and now I get: > > [pryzbyj@database ~]$ time psql ts -f tmp/sql-jit-crash-2018-11-15.jtp > psql:tmp/sql-jit-crash-2018-11-15.jtp:12: ERROR: invalid memory alloc request size 2447889552 > > It's unclear if that's a useful hint, a separate problem, or a nonissue.. It's probably the same / closely related issue. ISTM that JITed deforming either doesn't deform correctly, or skips deforming erroneously. I'm about to commit some changes to 12/master that'd possibly make it easier to find issues like this. Is there any chance that it's easier to repro this on master than making a reproducible test case? Greetings, Andres Freund
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote: > I'm about to commit some changes to 12/master that'd possibly make it > easier to find issues like this. Is there any chance that it's easier to > repro this on master than making a reproducible test case? Yes, very possibly - this is on a customer's server, so their data, plus views, etc are potentially involved until excluded. Justin
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote: > I'm about to commit some changes to 12/master that'd possibly make it > easier to find issues like this. Are you referring to this a future commit ? commit 763f2edd92095b1ca2f4476da073a28505c13820 Rejigger materializing and fetching a HeapTuple from a slot. I was able to reproduce under HEAD with pg_restored data. I guess you're right that the "memory alloc failure" is related/same thing, I've seen it intermittently with queries which also sometimes crash (and also sometimes don't). Note that when it crashes, it seems to take a longer time to do so than the query would normally take. Like we're walking off the end of an array, say. I've been able to reproduce the crash with a self join of a table (no view, no expressions, no parallel, directly querying a relkind='r' child). In that case, enable_bitmapscan=on and jit_tuple_deforming=on are both needed to crash, and jit_debugging_support=on does not yield a useful bt. The table is not too special, but was probably ALTERed to add columns a good number of times by one of our processes. It has ~1100 columns, including arrays, and some with null_frac=1. I'm trying to come up with a test case involving column types and order. (gdb) bt #0 0x00007f81a08b8b98 in ?? () #1 0x0000000000000000 in ?? () ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose off) SELECT a.* FROM child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30' AND'2018-05-04'; SET SET QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=527.36..1038.17 rows=1 width=7760) Hash Cond: ((a.start_time = b.start_time) AND (a.sect_id = b.sect_id)) -> Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 a (cost=9.78..515.59 rows=133 width=7760) Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <= '2018-05-0400:00:00'::timestamp without time zone)) -> Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201804_unique_idx (cost=0.00..9.74 rows=133 width=0) Index Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <= '2018-05-0400:00:00'::timestamp without time zone)) -> Hash (cost=515.59..515.59 rows=133 width=12) -> Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 b (cost=9.78..515.59 rows=133 width=12) Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <= '2018-05-0400:00:00'::timestamp without time zone)) -> Bitmap Index Scan on daily_eric_umts_rnc_utrancell_view_201804_unique_idx (cost=0.00..9.74 rows=133 width=0) Index Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (start_time <= '2018-05-0400:00:00'::timestamp without time zone)) JIT: Functions: 19 Options: Inlining false, Optimization false, Expressions true, Deforming true BTW find attached patch which I believe corrects some comments. Justin
Attachment
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote: > Are you referring to this a future commit ? this OR a future commit > The table is not too special, but was probably ALTERed to add columns a good > number of times by one of our processes. It has ~1100 columns, including > arrays, and some with null_frac=1. I'm trying to come up with a test case > involving column types and order. I don't have a failing test case yet but here's what the columns look like: ts=# SELECT attnum, null_frac, atttypid::regtype, attnotnull, attname, attalign , attstorage, attbyval, attlen, attislocalFROM pg_stats s JOIN pg_attribute a USING(attname) JOIN pg_class c ON s.tablename=c.relname WHERE c.oid=a.attrelidAND tablename='daily_eric_umts_rnc_utrancell_view_201804' AND (attnotnull OR null_frac>0.9 OR atttypid::regtype::textLIKE '%[]' OR NOT atttypid::regtype::text~'int$|integer|double|numeric' OR attlen=-1 OR NOT attbyvalOR atthasmissing OR attisdropped OR attnum BETWEEN 80 AND 99) ORDER BY 1; attnum | null_frac | atttypid | attnotnull | attname | attalign| attstorage | attbyval | attlen | attislocal --------+-----------+-----------------------------+------------+-------------------------------------------------+----------+------------+----------+--------+------------ 1 | 0 | timestamp without time zone | t | start_time | d | p | t | 8 | t 2 | 0 | integer | t | site_id | i | p | t | 4 | t 3 | 0 | integer | t | sect_id | i | p | t | 4 | t 4 | 0 | integer | t | rnc_id | i | p | t | 4 | t 5 | 0 | text | t | utrancell | i | x | f | -1 | t 30 | 1 | bigint | f | dl_alt_chcode_alloc | d | p | t | 8 | t 31 | 1 | integer | f | dl_alt_chcode_alloc_min | i | p | t | 4 | t 32 | 1 | integer | f | dl_alt_chcode_alloc_max | i | p | t | 4 | t 45 | 0 | integer[] | f | dch_ul_rlc_user_tput_samples | i | x | f | -1 | t 46 | 0 | integer[] | f | dch_ul_rlc_user_tput_samples_min | i | x | f | -1 | t 47 | 0 | integer[] | f | dch_ul_rlc_user_tput_samples_max | i | x | f | -1 | t 51 | 0 | numeric | f | ps_int_sum_latency_2 | i | m | f | -1 | t 69 | 0 | numeric | f | mbytes_ul_srb_only_eul | i | m | f | -1 | t [...] 87 | 0 | numeric | f | mbytes_dl_active_cs57 | i | m | f | -1 | t 88 | 0 | numeric | f | mbytes_dl_active_cs57_min | i | m | f | -1 | t 89 | 0 | numeric | f | mbytes_dl_active_cs57_max | i | m | f | -1 | t If I query for cs57, it doesen't crash (in 500ms), but if I query for the next column, cs57_min, it does (in 18000ms). Here's a new error message instead of a crash this time: ts=# SET jit=on;SET jit_above_cost=0;explain(analyze on,verbose off) SELECT b.mbytes_dl_active_cs57_min FROM child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30' AND'2018-05-04'; SET SET ERROR: out of memory DETAIL: Failed on request of size 425170160 in memory context "HashBatchContext". Here's verbose output you requested, sans expressions: ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose) SELECT b.mbytes_dl_active_cs57_min FROM child.daily_eric_umts_rnc_utrancell_view_201804a JOIN child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id)WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time BETWEEN '2018-04-30' AND'2018-05-04'; SET SET QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=2224.45..4491.19 rows=22 width=3) Output: b.mbytes_dl_active_cs57_min Hash Cond: ((a.start_time = b.start_time) AND (a.sect_id = b.sect_id)) -> Index Only Scan using daily_eric_umts_rnc_utrancell_view_201804_unique_idx on child.daily_eric_umts_rnc_utrancell_view_201804a (cost=0.29..2214.33 rows=656 width=12) Output: a.start_time, a.site_id, a.rnc_id, a.sect_id, a.utrancell Index Cond: ((a.start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (a.start_time <= '2018-05-0400:00:00'::timestamp without time zone)) -> Hash (cost=2214.33..2214.33 rows=656 width=15) Output: b.mbytes_dl_active_cs57_min, b.start_time, b.sect_id -> Index Scan using daily_eric_umts_rnc_utrancell_view_201804_unique_idx on child.daily_eric_umts_rnc_utrancell_view_201804b (cost=0.29..2214.33 rows=656 width=15) Output: b.mbytes_dl_active_cs57_min, b.start_time, b.sect_id Index Cond: ((b.start_time >= '2018-04-30 00:00:00'::timestamp without time zone) AND (b.start_time <= '2018-05-0400:00:00'::timestamp without time zone)) JIT: Functions: 19 Options: Inlining false, Optimization false, Expressions true, Deforming true If I query instead for a.mbytes_dl_active_cs57_min, I get no crash (yet). Justin
Hi, On 2018-11-16 08:38:26 -0600, Justin Pryzby wrote: > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote: > > I'm about to commit some changes to 12/master that'd possibly make it > > easier to find issues like this. > > Are you referring to this a future commit ? > commit 763f2edd92095b1ca2f4476da073a28505c13820 > Rejigger materializing and fetching a HeapTuple from a slot. I was thinking of commit 15d8f83128e15de97de61430d0b9569f5ebecc26 Author: Andres Freund <andres@anarazel.de> Date: 2018-11-15 22:00:30 -0800 Verify that expected slot types match returned slot types. (and then the followup fix in a387a3df) (will try to catch up later with the rest of what you wrote, after I had some coffee). - Andres
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote: > BTW find attached patch which I believe corrects some comments. Updated. Some of the changes may be gratuitous, but changed while I was already looking. Also note that I had to remove -flto=thin to compile under RH7. Justin
Attachment
On Fri, Nov 16, 2018 at 08:29:27AM -0800, Andres Freund wrote: > > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote: > > > I'm about to commit some changes to 12/master that'd possibly make it > commit 15d8f83128e15de97de61430d0b9569f5ebecc26 I don't think it had to do with your commit, but I recompiled HEAD with debug + casserts and have this to show. TRAP: unrecognized TOAST vartag("1", File: "heaptuple.c", Line: 1705) #0 0x00007faf5fac9277 in raise () from /lib64/libc.so.6 #1 0x00007faf5faca968 in abort () from /lib64/libc.so.6 #2 0x000000000088e347 in ExceptionalCondition (conditionName=conditionName@entry=0xa8a69e "1", errorType=errorType@entry=0x8deda8"unrecognized TOAST vartag", fileName=fileName@entry=0x8dff82 "heaptuple.c", lineNumber=lineNumber@entry=1705) at assert.c:54 #3 0x0000000000489830 in varsize_any (p=<optimized out>) at heaptuple.c:1705 #4 0x00007faf60c98560 in ?? () #5 0x00000000ffffffff in ?? () #6 0x000000000008c21c in ?? () #7 0x00000000004897d0 in ?? () at heaptuple.c:1690 #8 0x000000000008c21c in ?? () #9 0x000000000008c21c in ?? () #10 0x000000000000021c in ?? () #11 0x00000000004897d0 in ?? () at heaptuple.c:1690 [...]
Hi, On 2018-11-16 19:23:44 -0600, Justin Pryzby wrote: > On Fri, Nov 16, 2018 at 08:29:27AM -0800, Andres Freund wrote: > > > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote: > > > > I'm about to commit some changes to 12/master that'd possibly make it > > commit 15d8f83128e15de97de61430d0b9569f5ebecc26 > > I don't think it had to do with your commit, but I recompiled HEAD with debug + > casserts and have this to show. That's probably just the same issue as before, namely random data somehow being produced as the result of tuple deforming. Greetings, Andres Freund
On Fri, Nov 16, 2018 at 05:47:24PM -0800, Andres Freund wrote: > That's probably just the same issue as before, namely random data > somehow being produced as the result of tuple deforming. Does this help at all? ts=# SELECT utrancell FROM child.daily_eric_umts_rnc_utrancell_view_201807 LIMIT 9; ERROR: invalid memory alloc request size 18446744073709551613 Note that's a table named after a view, and not itself a view.. (gdb) b errfinish Breakpoint 3 at 0x891a00: file elog.c, line 411. (gdb) c Continuing. SELECT start_time, site_id, sect_id, rnc_id, utrancell, eul_harq_ps_trans_tti2_1 FROM child.daily_eric_umts_rnc_utrancell_view_201801LIMIT 1; 1: start_time (typeid = 1114, len = 8, typmod = -1, byval = t) 2: site_id (typeid = 23, len = 4, typmod = -1, byval = t) 3: sect_id (typeid = 23, len = 4, typmod = -1, byval = t) 4: rnc_id (typeid = 23, len = 4, typmod = -1, byval = t) 5: utrancell (typeid = 25, len = -1, typmod = -1, byval = f) 6: eul_harq_ps_trans_tti2_1 (typeid = 20, len = 8, typmod = -1, byval = t) ---- 1: start_time = "2000-01-01 00:00:00" (typeid = 1114, len = 8, typmod = -1, byval = t) 2: site_id = "0" (typeid = 23, len = 4, typmod = -1, byval = t) 3: sect_id = "0" (typeid = 23, len = 4, typmod = -1, byval = t) 4: rnc_id = "0" (typeid = 23, len = 4, typmod = -1, byval = t) The error occurs when selecting the 5th column, which is "text not null"; but, the other fields are also garbage (namely 0). #0 slot_compile_deform (context=context@entry=0xe184b8, desc=desc@entry=0xe9baf0, ops=ops@entry=0xa21020 <TTSOpsBufferHeapTuple>,natts=5) at llvmjit_deform.c:36 #1 0x00007fffe72535f5 in llvm_compile_expr (state=0xed8458) at llvmjit_expr.c:325 #2 0x000000000061e429 in ExecReadyExpr (state=state@entry=0xed8458) at execExpr.c:628 #3 0x00000000006219b4 in ExecBuildProjectionInfo (targetList=<optimized out>, econtext=<optimized out>, slot=<optimizedout>, parent=parent@entry=0xed7d40, inputDesc=inputDesc@entry=0xe9baf0) at execExpr.c:472 #4 0x00000000006370b2 in ExecAssignProjectionInfo (planstate=planstate@entry=0xed7d40, inputDesc=inputDesc@entry=0xe9baf0)at execUtils.c:501 #5 0x00000000006371c6 in ExecConditionalAssignProjectionInfo (planstate=planstate@entry=0xed7d40, inputDesc=0xe9baf0, varno=<optimizedout>) at execUtils.c:539 #6 0x0000000000632627 in ExecAssignScanProjectionInfo (node=node@entry=0xed7d40) at execScan.c:240 #7 0x0000000000655867 in ExecInitSeqScan (node=node@entry=0xeecbb8, estate=estate@entry=0xed78d8, eflags=eflags@entry=16)at nodeSeqscan.c:182 #8 0x0000000000630e53 in ExecInitNode (node=0xeecbb8, estate=estate@entry=0xed78d8, eflags=eflags@entry=16) at execProcnode.c:207 #9 0x000000000064c78a in ExecInitLimit (node=node@entry=0xeecc50, estate=estate@entry=0xed78d8, eflags=eflags@entry=16)at nodeLimit.c:368 #10 0x0000000000630b52 in ExecInitNode (node=node@entry=0xeecc50, estate=estate@entry=0xed78d8, eflags=eflags@entry=16) atexecProcnode.c:363 #11 0x000000000062ac45 in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:1025 #12 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:265 #13 0x0000000000784fdb in PortalStart (portal=portal@entry=0xe48528, params=params@entry=0x0, eflags=eflags@entry=0, snapshot=snapshot@entry=0x0)at pquery.c:520 #14 0x0000000000781218 in exec_simple_query (query_string=0xe56568 "SELECT start_time,site_id,sect_id,rnc_id,utrancell FROMchild.daily_eric_umts_rnc_utrancell_view_201801 LIMIT 1\n") at postgres.c:1176 #15 0x00000000007825c2 in PostgresMain (argc=argc@entry=17, argv=argv@entry=0xddbb40, dbname=0xdfcd70 "postgres", dbname@entry=0x0,username=<optimized out>) at postgres.c:4243 #16 0x000000000047d8be in main (argc=17, argv=0xddbb40) at main.c:224 #0 errfinish (dummy=dummy@entry=0) at elog.c:411 #1 0x0000000000895379 in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0xaabf68 "invalid memory alloc request size%zu") at elog.c:1365 #2 0x00000000008bb46e in palloc (size=18446744073709551613) at mcxt.c:934 #3 0x000000000086ad3a in text_to_cstring (t=0x7fffe7b83344) at varlena.c:189 #4 0x0000000000897ed5 in FunctionCall1Coll (flinfo=flinfo@entry=0x7fffffffdce0, collation=collation@entry=0, arg1=arg1@entry=140737080996676)at fmgr.c:1123 #5 0x0000000000899206 in OutputFunctionCall (val=140737080996676, flinfo=0x7fffffffdce0) at fmgr.c:1755 #6 OidOutputFunctionCall (functionId=<optimized out>, val=val@entry=140737080996676) at fmgr.c:1838 #7 0x000000000048b57b in debugtup (slot=0xed8330, self=<optimized out>) at printtup.c:625 #8 0x0000000000629a50 in ExecutePlan (execute_once=<optimized out>, dest=0xa66160 <debugtupDR>, direction=<optimized out>,numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0xed7ad0, estate=0xed7878) at execMain.c:1748 #9 standard_ExecutorRun (queryDesc=0xe6ac78, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 #10 0x000000000078409b in PortalRunSelect (portal=portal@entry=0xe484c8, forward=forward@entry=true, count=0, count@entry=9223372036854775807,dest=dest@entry=0xa66160 <debugtupDR>) at pquery.c:932 #11 0x00000000007855b8 in PortalRun (portal=portal@entry=0xe484c8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,run_once=run_once@entry=true, dest=dest@entry=0xa66160 <debugtupDR>, altdest=altdest@entry=0xa66160 <debugtupDR>, completionTag=completionTag@entry=0x7fffffffdfc0 "") at pquery.c:773 #12 0x000000000078128d in exec_simple_query (query_string=0xe56508 "SELECT start_time,site_id,sect_id,rnc_id,utrancell,eul_harq_ps_trans_tti2_1FROM child.daily_eric_umts_rnc_utrancell_view_201801LIMIT 1;\n") at postgres.c:1215 #13 0x00000000007825c2 in PostgresMain (argc=argc@entry=15, argv=argv@entry=0xddbb40, dbname=0xdfcd30 "postgres", dbname@entry=0x0,username=<optimized out>) at postgres.c:4243 #14 0x000000000047d8be in main (argc=15, argv=0xddbb40) at main.c:224 Justin
On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote: > On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote: > > The table is not too special, but was probably ALTERed to add columns a good > > number of times by one of our processes. It has ~1100 columns, including > > arrays, and some with null_frac=1. I'm trying to come up with a test case > > involving column types and order. Try this ? SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default 0'),',')||');INSERT INTO t VALUES(0)' FROM generate_series(1,999) i; \gexec SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9; That's given all sorts of nice errors: ERROR: invalid memory alloc request size 18446744073709551613 ERROR: compressed data is corrupted And occasionally crashes and/or returns unrelated data: = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03 n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location Justin
On 2018-11-17 17:37:15 -0600, Justin Pryzby wrote: > On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote: > > On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote: > > > The table is not too special, but was probably ALTERed to add columns a good > > > number of times by one of our processes. It has ~1100 columns, including > > > arrays, and some with null_frac=1. I'm trying to come up with a test case > > > involving column types and order. > > Try this ? > > SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default 0'),',')||');INSERT INTO t VALUES(0)' FROM generate_series(1,999) i; > \gexec > SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9; > > That's given all sorts of nice errors: > > ERROR: invalid memory alloc request size 18446744073709551613 > ERROR: compressed data is corrupted > > And occasionally crashes and/or returns unrelated data: > > = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03 > n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location Ah, hah. The issue is that t_hoff is larger than 128 here (due to the size of the NULL bitmap), and apparently getelementptr interprets an i8 > 128 as a signed integer. Which thus yields a negative offset from the start of the tuple, which predictably doesn't work great. v_hoff = l_load_struct_gep(b, v_tuplep, FIELDNO_HEAPTUPLEHEADERDATA_HOFF, "t_hoff"); v_tupdata_base = LLVMBuildGEP(b, LLVMBuildBitCast(b, v_tuplep, l_ptr(LLVMInt8Type()), ""), &v_hoff, 1, "v_tupdata_base"); I'd missed the "These integers are treated as signed values where relevant." bit in the getelementptr docs http://llvm.org/docs/LangRef.html#getelementptr-instruction The fix is easy enough, just adding a v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), ""); fixes the issue for me. Could you check that the attached patch this also fixes your original issue? Going through the code to see if there's other occurances of this. Greetings, Andres Freund
Attachment
On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote: > Could you check that the attached patch this also fixes your original > issue? Going through the code to see if there's other occurances of > this. Confirmed that fixes my crash. Thanks, Justin
Hi, On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote: > On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote: > > Could you check that the attached patch this also fixes your original > > issue? Going through the code to see if there's other occurances of > > this. > > Confirmed that fixes my crash. Thanks a lot for narrowing down your crash to something I can reproduce! Here's a more complete patch, with a testcase. Tom, the test creates a 1100k column table (using \set ECHO none + gexec), but with a small row. Currently it's not dropped after the table, as I thought it might be worthwhile to be tested by pg_dump/upgrade etc too. You're probably the person most concerned with test runtimes, ... Any concerns about that? The table creation is quick*, on the order of 30ms. Greetings, Andres Freund *at least as long as there's no default columns and the table's not dropped, seems we have somewhat of an O(N^2) situation going on when dropping a table with many columns that have default columns - we re-build the cache entry after each dropped default value. But as the max is 1600 columns, that's not too bad.
Attachment
On 2018-11-27 00:26:55 -0800, Andres Freund wrote: > Hi, > > On 2018-11-26 22:56:09 -0600, Justin Pryzby wrote: > > On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote: > > > Could you check that the attached patch this also fixes your original > > > issue? Going through the code to see if there's other occurances of > > > this. > > > > Confirmed that fixes my crash. > > Thanks a lot for narrowing down your crash to something I can reproduce! > > > Here's a more complete patch, with a testcase. > > Tom, the test creates a 1100k column table (using \set ECHO none + > gexec), but with a small row. Currently it's not dropped after the > table, as I thought it might be worthwhile to be tested by > pg_dump/upgrade etc too. You're probably the person most concerned with > test runtimes, ... Any concerns about that? The table creation is > quick*, on the order of 30ms. And pushed. Justin, thanks again for reporting the bug and then narrowing it down to a reproducible test case! Would've been much harder to diagnose without that. I'll look into your comments patch in a bit. Greetings, Andres Freund
On Tue, Nov 27, 2018 at 10:24:52AM -0800, Andres Freund wrote: > And pushed. Justin, thanks again for reporting the bug and then > narrowing it down to a reproducible test case! Would've been much harder > to diagnose without that. > > I'll look into your comments patch in a bit. Thanks for implementing and patching it :) And thanks for remembering the patch, and reminding me. Here's an updated copy with additional hunks. Justin