Thread: SegFault on 9.6.14
Greetings Hackers. We have a reproduceable case of $subject that issues a backtrace such as seen below. The query that I'd prefer to sanitize before sending is <30 lines of at a glance, not terribly complex logic. It nonetheless dies hard after a few seconds of running and as expected, results in an automatic all-backend restart. Please advise on how to proceed. Thanks! bt #0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 #1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 #2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 #3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 #4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 #5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 #6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 #7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 <about 30 lines omitted> -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >Greetings Hackers. > >We have a reproduceable case of $subject that issues a backtrace such as >seen below. > >The query that I'd prefer to sanitize before sending is <30 lines of at >a glance, not terribly complex logic. > >It nonetheless dies hard after a few seconds of running and as expected, >results in an automatic all-backend restart. > >Please advise on how to proceed. Thanks! > >bt >#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 ><about 30 lines omitted> > Hmmm, that means it's crashing here: if (scan->rs_parallel != NULL) scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here else scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); But clearly, scan is valid (otherwise it'd crash on the if condition), and scan->rs_parallel must me non-NULL. Which probably means the pointer is (no longer) valid. Could it be that the rs_parallel DSM disappears on rescan, or something like that? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: > >>Greetings Hackers. >> >>We have a reproduceable case of $subject that issues a backtrace such as >>seen below. >> >>The query that I'd prefer to sanitize before sending is <30 lines of at >>a glance, not terribly complex logic. >> >>It nonetheless dies hard after a few seconds of running and as expected, >>results in an automatic all-backend restart. >> >>Please advise on how to proceed. Thanks! >> >>bt >>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') >> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >>#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >>#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 >><about 30 lines omitted> >> > > Hmmm, that means it's crashing here: > > if (scan->rs_parallel != NULL) > scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here > else > scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); > > But clearly, scan is valid (otherwise it'd crash on the if condition), > and scan->rs_parallel must me non-NULL. Which probably means the pointer > is (no longer) valid. > > Could it be that the rs_parallel DSM disappears on rescan, or something > like that? No clue but something I just tried was to disable parallelism by setting max_parallel_workers_per_gather to 0 and however the query has not finished after a few minutes, there is no crash. Please advise. Thx > > > regards -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > >> On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >> >>>Greetings Hackers. >>> >>>We have a reproduceable case of $subject that issues a backtrace such as >>>seen below. >>> >>>The query that I'd prefer to sanitize before sending is <30 lines of at >>>a glance, not terribly complex logic. >>> >>>It nonetheless dies hard after a few seconds of running and as expected, >>>results in an automatic all-backend restart. >>> >>>Please advise on how to proceed. Thanks! >>> >>>bt >>>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') >>> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >>>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >>>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >>>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >>>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >>>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >>>#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >>>#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 >>><about 30 lines omitted> >>> >> >> Hmmm, that means it's crashing here: >> >> if (scan->rs_parallel != NULL) >> scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here >> else >> scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); >> >> But clearly, scan is valid (otherwise it'd crash on the if condition), >> and scan->rs_parallel must me non-NULL. Which probably means the pointer >> is (no longer) valid. >> >> Could it be that the rs_parallel DSM disappears on rescan, or something >> like that? > >No clue but something I just tried was to disable parallelism by setting >max_parallel_workers_per_gather to 0 and however the query has not >finished after a few minutes, there is no crash. > That might be a hint my rough analysis was somewhat correct. The question is whether the non-parallel plan does the same thing. Maybe it picks a plan that does not require rescans, or something like that. >Please advise. > It would be useful to see (a) exacution plan of the query, (b) full backtrace and (c) a bit of context for the place where it crashed. Something like (in gdb): bt full list p *scan regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: > >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> >>> On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >>> >>>>Greetings Hackers. >>>> >>>>We have a reproduceable case of $subject that issues a backtrace such as >>>>seen below. >>>> >>>>The query that I'd prefer to sanitize before sending is <30 lines of at >>>>a glance, not terribly complex logic. >>>> >>>>It nonetheless dies hard after a few seconds of running and as expected, >>>>results in an automatic all-backend restart. >>>> >>>>Please advise on how to proceed. Thanks! >>>> >>>>bt >>>>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') >>>> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >>>>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >>>>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >>>>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >>>>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >>>>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >>>>#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >>>>#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 >>>><about 30 lines omitted> >>>> >>> >>> Hmmm, that means it's crashing here: >>> >>> if (scan->rs_parallel != NULL) >>> scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here >>> else >>> scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); >>> >>> But clearly, scan is valid (otherwise it'd crash on the if condition), >>> and scan->rs_parallel must me non-NULL. Which probably means the pointer >>> is (no longer) valid. >>> >>> Could it be that the rs_parallel DSM disappears on rescan, or something >>> like that? >> >>No clue but something I just tried was to disable parallelism by setting >>max_parallel_workers_per_gather to 0 and however the query has not >>finished after a few minutes, there is no crash. >> > > That might be a hint my rough analysis was somewhat correct. The > question is whether the non-parallel plan does the same thing. Maybe it > picks a plan that does not require rescans, or something like that. > >>Please advise. >> > > It would be useful to see (a) exacution plan of the query, (b) full > backtrace and (c) a bit of context for the place where it crashed. > > Something like (in gdb): > > bt full > list > p *scan The p *scan did nothing unless I ran it first however my gdb $foo isn't strong presently. I'll need to sanitize the explain output but can do so ASAP and send it along. Thx! $ gdb /usr/lib/postgresql/9.6/bin/postgres core GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/postgresql/9.6/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/04/6f55a5ce6ce05064edfc8feee61c6cb039d296.debug...done. done. [New LWP 31654] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: foo_eis_segfault: jsievers staging 10.220.22.26(57948) SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 233 /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c: No such file or directory. (gdb) p *scan $1 = {rs_rd = 0x7fa6c6935a08, rs_snapshot = 0x55d7a7c2e630, rs_nkeys = 0, rs_key = 0x0, rs_bitmapscan = 0 '\000', rs_samplescan= 0 '\000', rs_pageatatime = 1 '\001', rs_allow_strat = 1 '\001', rs_allow_sync = 1 '\001', rs_temp_snap = 1 '\001', rs_nblocks = 198714, rs_startblock = 1920300133,rs_numblocks = 4294967295, rs_strategy = 0x55d7a7daa6a0, rs_syncscan = 1 '\001', rs_inited = 0 '\000', rs_ctup = {t_len = 114, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 62879},ip_posid = 77}, t_tableOid = 994804890, t_data = 0x0}, rs_cblock = 4294967295, rs_cbuf = 0, rs_parallel = 0x7fa673a54108, rs_cindex = 76, rs_ntuples = 77, rs_vistuples = {1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 28255, 25711, 24421, 25705, 12576, 8247, 29754, 29281, 25959,27764, 29545, 8308, 31528, 16724, 18258, 21573, 20037, 21076, 8281, 25914, 28792, 8306, 22139, 21057, 14880, 24950, 28274, 8303, 12337, 14880, 24950, 24946, 29812, 28526, 13088, 14880,24950, 29810, 28793, 8293, 13106, 14880, 24950, 29810, 28793, 28525, 8292, 12589, 14880, 24950, 25458, 27759, 26988, 8292, 8240, 30266, 29281, 25964, 25974, 29548, 28789, 12320, 14880,24950, 28274, 28527, 25708, 12576, 8240, 30266, 29281, 24943, 29812, 28526, 13088, 14880, 28524, 24931, 26996, 28271, 13344, 13110, 8317, 29242, 29541, 28526, 12576, 14880, 25970,28275, 28001, 8293, 15932, 14880, 25970, 29555, 29295, 26484, 28530, 28789, 25970, 8294, 8240, 29242, 29541, 29295, 26473, 25204, 8300, 8240, 29242, 29541, 29295, 26473, 28515, 8300,8240, 29242, 29541...}} (gdb) bt full #0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 allow_strat = <optimized out> allow_sync = <optimized out> #1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 __func__ = "heap_rescan" #2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 scan = <optimized out> #3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 __func__ = "ExecReScan" #4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 No locals. #5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 __func__ = "ExecReScan" #6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 No locals. #7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 __func__ = "ExecReScan" #8 0x000055d7a7433ce7 in ExecProcNode (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:381 result = <optimized out> __func__ = "ExecProcNode" #9 0x000055d7a7452989 in ExecSort (node=node@entry=0x55d7a7d83ea0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSort.c:103 plannode = <optimized out> outerNode = 0x55d7a7d84110 tupDesc = <optimized out> estate = 0x55d7a7d5fee8 dir = ForwardScanDirection tuplesortstate = 0x55d7a7dd2448 slot = <optimized out> #10 0x000055d7a7433de8 in ExecProcNode (node=0x55d7a7d83ea0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:495 result = <optimized out> __func__ = "ExecProcNode" #11 0x000055d7a743ffe9 in fetch_input_tuple (aggstate=aggstate@entry=0x55d7a7d83528) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:598 slot = <optimized out> #12 0x000055d7a7441bb3 in agg_retrieve_direct (aggstate=0x55d7a7d83528) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:2078 econtext = 0x55d7a7d838b0 pergroup = 0x55d7a7d8e758 firstSlot = 0x55d7a7d83960 numGroupingSets = 1 node = 0x7fa6c68a5da8 tmpcontext = 0x55d7a7d83750 peragg = 0x55d7a7d8d6b8 outerslot = <optimized out> nextSetSize = <optimized out> result = <optimized out> hasGroupingSets = 0 '\000' currentSet = <optimized out> numReset = 1 i = <optimized out> ---Type <return> to continue, or q <return> to quit--- #13 ExecAgg (node=node@entry=0x55d7a7d83528) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:1903 result = <optimized out> #14 0x000055d7a7433dc8 in ExecProcNode (node=node@entry=0x55d7a7d83528) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:503 result = <optimized out> __func__ = "ExecProcNode" #15 0x000055d7a744af74 in ExecLimit (node=node@entry=0x55d7a7d83288) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:91 direction = ForwardScanDirection slot = <optimized out> outerPlan = 0x55d7a7d83528 __func__ = "ExecLimit" #16 0x000055d7a7433d28 in ExecProcNode (node=node@entry=0x55d7a7d83288) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:531 result = <optimized out> __func__ = "ExecProcNode" #17 0x000055d7a744ff69 in ExecNestLoop (node=node@entry=0x55d7a7d60cd0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeNestloop.c:174 nl = 0x7fa6c68a6048 innerPlan = 0x55d7a7d83288 outerPlan = 0x55d7a7d610c0 outerTupleSlot = <optimized out> innerTupleSlot = <optimized out> joinqual = 0x0 otherqual = 0x0 econtext = 0x55d7a7d60de0 lc = <optimized out> #18 0x000055d7a7433e28 in ExecProcNode (node=node@entry=0x55d7a7d60cd0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:476 result = <optimized out> __func__ = "ExecProcNode" #19 0x000055d7a7452989 in ExecSort (node=node@entry=0x55d7a7d60a60) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSort.c:103 plannode = <optimized out> outerNode = 0x55d7a7d60cd0 tupDesc = <optimized out> estate = 0x55d7a7d5fee8 dir = ForwardScanDirection tuplesortstate = 0x55d7a7d98398 slot = <optimized out> #20 0x000055d7a7433de8 in ExecProcNode (node=0x55d7a7d60a60) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:495 result = <optimized out> __func__ = "ExecProcNode" #21 0x000055d7a743ffe9 in fetch_input_tuple (aggstate=aggstate@entry=0x55d7a7d60088) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:598 slot = <optimized out> #22 0x000055d7a7441bb3 in agg_retrieve_direct (aggstate=0x55d7a7d60088) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:2078 econtext = 0x55d7a7d60440 pergroup = 0x55d7a7d91230 firstSlot = 0x55d7a7d604f0 numGroupingSets = 1 node = 0x7fa6c68a6328 tmpcontext = 0x55d7a7d602b0 peragg = 0x55d7a7d90190 outerslot = <optimized out> nextSetSize = <optimized out> ---Type <return> to continue, or q <return> to quit--- result = <optimized out> hasGroupingSets = 0 '\000' currentSet = <optimized out> numReset = 1 i = <optimized out> #23 ExecAgg (node=node@entry=0x55d7a7d60088) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:1903 result = <optimized out> #24 0x000055d7a7433dc8 in ExecProcNode (node=node@entry=0x55d7a7d60088) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:503 result = <optimized out> __func__ = "ExecProcNode" #25 0x000055d7a742ff2e in ExecutePlan (dest=0x7fa673a96308, direction=<optimized out>, numberTuples=0, sendTuples=<optimizedout>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55d7a7d60088, estate=0x55d7a7d5fee8) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execMain.c:1567 slot = <optimized out> current_tuple_count = 0 #26 standard_ExecutorRun (queryDesc=0x55d7a7d54718, direction=<optimized out>, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execMain.c:339 estate = 0x55d7a7d5fee8 operation = CMD_SELECT dest = 0x7fa673a96308 sendTuples = <optimized out> #27 0x00007fa6c7027515 in explain_ExecutorRun (queryDesc=0x55d7a7d54718, direction=ForwardScanDirection, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../contrib/auto_explain/auto_explain.c:281 save_exception_stack = 0x7fff4aeeaa80 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {94384722106264, 8229023444991490729, 0, 94384722102040, 0, 1, 8229023444890827433,8250672449167702697}, __mask_was_saved = 0, __saved_mask = { __val = {94384721739856, 140734450543072, 94384714940022, 140354273004312, 140354273004312, 140734450543104,94384714691234, 2, 2, 140734450543200, 94384711690034, 2, 3462443396, 8388608, 3547611511646930944, 140734450543200}}}} #28 0x00007fa6c6e1fdb0 in pgss_ExecutorRun (queryDesc=0x55d7a7d54718, direction=ForwardScanDirection, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../contrib/pg_stat_statements/pg_stat_statements.c:875 save_exception_stack = 0x7fff4aeeac20 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {94384722106264, 8229023444960033449, 0, 94384722102040, 0, 1, 8229023444993587881,8250670555334589097}, __mask_was_saved = 0, __saved_mask = { __val = {4294967296, 140354272256808, 94384714928429, 16, 94384719269552, 24, 94384720895528, 94384722102040,0, 140734450543408, 94384714928429, 94384722106264, 94384720895528, 140734450543440, 94384714994982, 94384722106264}}}} #29 0x000055d7a7553167 in PortalRunSelect (portal=portal@entry=0x55d7a7d55798, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807,dest=dest@entry=0x7fa673a96308) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/pquery.c:948 queryDesc = 0x55d7a7d54718 direction = <optimized out> nprocessed = <optimized out> __func__ = "PortalRunSelect" #30 0x000055d7a75547a0 in PortalRun (portal=portal@entry=0x55d7a7d55798, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1'\001', dest=dest@entry=0x7fa673a96308, altdest=altdest@entry=0x7fa673a96308, completionTag=completionTag@entry=0x7fff4aeeb050 "") at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/pquery.c:789 save_exception_stack = 0x7fff4aeeaf00 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {94384721085312, 8229023445033433769, 94384722106264, 140352881779464, 94384721085584,2, 8229023444955839145, 2765920793019169449}, __mask_was_saved = 0, __saved_mask = {__val = {0, 12099560782865280144, 0, 8, 8, 140734450544226, 1, 88, 94384722106264,94384715935530, 94384721085584, 140734450543840, 94384714930017, 2, 94384722106264, 140734450543872}}}} result = <optimized out> nprocessed = <optimized out> saveTopTransactionResourceOwner = 0x55d7a7c118e8 ---Type <return> to continue, or q <return> to quit--- saveTopTransactionContext = 0x55d7a7c10eb8 saveActivePortal = 0x0 saveResourceOwner = 0x55d7a7c118e8 savePortalContext = 0x0 saveMemoryContext = 0x55d7a7c10eb8 __func__ = "PortalRun" #31 0x000055d7a75512d6 in exec_simple_query ( query_string=0x55d7a7ce6b38 "select v.account_id, COUNT(cnt.clicks), te.description,\nl.product_id\nfrom nbox_nc_ah.tracking_eventste\njoin nbox_nc_ah.page_views pv on pv.page_view_id = te.page_view_id\njoin nbox_nc_ah.visitsv on v"...) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/postgres.c:1109 parsetree = 0x55d7a7c5c380 portal = 0x55d7a7d55798 snapshot_set = <optimized out> commandTag = <optimized out> completionTag = "\000\370\253\247\327U\000\000\240F\252\247\327U\000\000\200\260\356J\377\177\000\000\215\326g\247\327U\000\000\300\260\356J\377\177\000\000d\261\356J\377\177\000\000\240\260\356J\377\177\000\000v\031F\247\327U\000" querytree_list = <optimized out> plantree_list = 0x7fa673a962d8 receiver = 0x7fa673a96308 format = 0 dest = DestRemote parsetree_list = 0x55d7a7c5c4b0 save_log_statement_stats = 0 '\000' was_logged = 0 '\000' msec_str = "\020\261\356J\377\177\000\000(\002", '\000' <repeats 14 times>, "\340?\256\247\327U\000" parsetree_item = 0x55d7a7c5c490 isTopLevel = 1 '\001' #32 PostgresMain (argc=<optimized out>, argv=argv@entry=0x55d7a7c56830, dbname=0x55d7a7c11b88 "staging", username=<optimizedout>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/postgres.c:4101 query_string = 0x55d7a7ce6b38 "select v.account_id, COUNT(cnt.clicks), te.description,\nl.product_id\nfrom nbox_nc_ah.tracking_eventste\njoin nbox_nc_ah.page_views pv on pv.page_view_id = te.page_view_id\njoin nbox_nc_ah.visitsv on v"... firstchar = -1479190632 input_message = { data = 0x55d7a7ce6b38 "select v.account_id, COUNT(cnt.clicks), te.description,\nl.product_id\nfrom nbox_nc_ah.tracking_eventste\njoin nbox_nc_ah.page_views pv on pv.page_view_id = te.page_view_id\njoin nbox_nc_ah.visitsv on v"..., len = 1042, maxlen = 2048, cursor = 1042} local_sigjmp_buf = {{__jmpbuf = {140734450544288, 8229023445169748649, 94384721061936, 1, 94384721061720, 94384721052928,8229023445035530921, 2765920790734322345}, __mask_was_saved = 1, __saved_mask = {__val = {0, 94386201296895, 94384713689589, 18446603339259007057, 140354407146656,0, 1305670059009, 32, 4, 489626271867, 0, 0, 532575944823, 140734450544608, 0, 140734450544704}}}} send_ready_for_query = 0 '\000' disable_idle_in_transaction_timeout = <optimized out> __func__ = "PostgresMain" #33 0x000055d7a72c6a1b in BackendRun (port=0x55d7a7c54500) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:4339 ac = 1 secs = 616545808 usecs = 503344 i = 1 av = 0x55d7a7c56830 maxac = <optimized out> #34 BackendStartup (port=0x55d7a7c54500) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:4013 bn = <optimized out> ---Type <return> to continue, or q <return> to quit--- pid = <optimized out> #35 ServerLoop () at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:1722 rmask = {fds_bits = {16, 0 <repeats 15 times>}} selres = <optimized out> now = <optimized out> readmask = {fds_bits = {48, 0 <repeats 15 times>}} last_lockfile_recheck_time = 1563230588 last_touch_time = 1563230588 __func__ = "ServerLoop" #36 0x000055d7a74ed281 in PostmasterMain (argc=13, argv=<optimized out>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:1330 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = 1 '\001' i = <optimized out> output_config_variable = <optimized out> __func__ = "PostmasterMain" #37 0x000055d7a72c7bf1 in main (argc=13, argv=0x55d7a7c0f840) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/main/main.c:228 No locals. (gdb) list 228 in /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c (gdb) > > > > regards -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > >> On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: >> >>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> >>>> On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >>>> >>>>>Greetings Hackers. >>>>> >>>>>We have a reproduceable case of $subject that issues a backtrace such as >>>>>seen below. >>>>> >>>>>The query that I'd prefer to sanitize before sending is <30 lines of at >>>>>a glance, not terribly complex logic. >>>>> >>>>>It nonetheless dies hard after a few seconds of running and as expected, >>>>>results in an automatic all-backend restart. >>>>> >>>>>Please advise on how to proceed. Thanks! >>>>> >>>>>bt >>>>>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') >>>>> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >>>>>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >>>>>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >>>>>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >>>>>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >>>>>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >>>>>#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >>>>>#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 >>>>><about 30 lines omitted> >>>>> >>>> >>>> Hmmm, that means it's crashing here: >>>> >>>> if (scan->rs_parallel != NULL) >>>> scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here >>>> else >>>> scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); >>>> >>>> But clearly, scan is valid (otherwise it'd crash on the if condition), >>>> and scan->rs_parallel must me non-NULL. Which probably means the pointer >>>> is (no longer) valid. >>>> >>>> Could it be that the rs_parallel DSM disappears on rescan, or something >>>> like that? >>> >>>No clue but something I just tried was to disable parallelism by setting >>>max_parallel_workers_per_gather to 0 and however the query has not >>>finished after a few minutes, there is no crash. >>> >> >> That might be a hint my rough analysis was somewhat correct. The >> question is whether the non-parallel plan does the same thing. Maybe it >> picks a plan that does not require rescans, or something like that. >> >>>Please advise. >>> >> >> It would be useful to see (a) exacution plan of the query, (b) full >> backtrace and (c) a bit of context for the place where it crashed. >> >> Something like (in gdb): >> >> bt full >> list >> p *scan > >The p *scan did nothing unless I ran it first however my gdb $foo isn't >strong presently. Hmm, the rs_parallel pointer looks sane (it's not obvious garbage). Can you try this? p *scan->rs_parallel Another question - are you sure this is not an OOM issue? That might sometimes look like SIGSEGV due to overcommit. What's the memory consumption / is there anything in dmesg? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 16, 2019 at 8:22 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: > >>>>>We have a reproduceable case of $subject that issues a backtrace such as > >>>>>seen below. > >>>>>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') > >>>>> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 > >>>>>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 > >>>>>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 > >>>>>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 > >>>>>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 > >>>>>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 Here's a query that rescans a gather node repeatedly on 9.6 in case it helps someone build a repro, but it works fine here. -- Thomas Munro https://enterprisedb.com
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, Jul 15, 2019 at 08:20:00PM -0500, Jerry Sievers wrote: > >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> >>> On Mon, Jul 15, 2019 at 07:22:55PM -0500, Jerry Sievers wrote: >>> >>>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> >>>>> On Mon, Jul 15, 2019 at 06:48:05PM -0500, Jerry Sievers wrote: >>>>> >>>>>>Greetings Hackers. >>>>>> >>>>>>We have a reproduceable case of $subject that issues a backtrace such as >>>>>>seen below. >>>>>> >>>>>>The query that I'd prefer to sanitize before sending is <30 lines of at >>>>>>a glance, not terribly complex logic. >>>>>> >>>>>>It nonetheless dies hard after a few seconds of running and as expected, >>>>>>results in an automatic all-backend restart. >>>>>> >>>>>>Please advise on how to proceed. Thanks! >>>>>> >>>>>>bt >>>>>>#0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') >>>>>> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 >>>>>>#1 0x000055d7a72fa8d0 in heap_rescan (scan=0x55d7a7daa0b0, key=key@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1529 >>>>>>#2 0x000055d7a7451fef in ExecReScanSeqScan (node=node@entry=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:280 >>>>>>#3 0x000055d7a742d36e in ExecReScan (node=0x55d7a7d85100) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:158 >>>>>>#4 0x000055d7a7445d38 in ExecReScanGather (node=node@entry=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:475 >>>>>>#5 0x000055d7a742d255 in ExecReScan (node=0x55d7a7d84d30) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:166 >>>>>>#6 0x000055d7a7448673 in ExecReScanHashJoin (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeHashjoin.c:1019 >>>>>>#7 0x000055d7a742d29e in ExecReScan (node=node@entry=0x55d7a7d84110) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execAmi.c:226 >>>>>><about 30 lines omitted> >>>>>> >>>>> >>>>> Hmmm, that means it's crashing here: >>>>> >>>>> if (scan->rs_parallel != NULL) >>>>> scan->rs_nblocks = scan->rs_parallel->phs_nblocks; <--- here >>>>> else >>>>> scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); >>>>> >>>>> But clearly, scan is valid (otherwise it'd crash on the if condition), >>>>> and scan->rs_parallel must me non-NULL. Which probably means the pointer >>>>> is (no longer) valid. >>>>> >>>>> Could it be that the rs_parallel DSM disappears on rescan, or something >>>>> like that? >>>> >>>>No clue but something I just tried was to disable parallelism by setting >>>>max_parallel_workers_per_gather to 0 and however the query has not >>>>finished after a few minutes, there is no crash. >>>> >>> >>> That might be a hint my rough analysis was somewhat correct. The >>> question is whether the non-parallel plan does the same thing. Maybe it >>> picks a plan that does not require rescans, or something like that. >>> >>>>Please advise. >>>> >>> >>> It would be useful to see (a) exacution plan of the query, (b) full >>> backtrace and (c) a bit of context for the place where it crashed. >>> >>> Something like (in gdb): >>> >>> bt full >>> list >>> p *scan >> >>The p *scan did nothing unless I ran it first however my gdb $foo isn't >>strong presently. > > Hmm, the rs_parallel pointer looks sane (it's not obvious garbage). Can > you try this? > > p *scan->rs_parallel $ gdb /usr/lib/postgresql/9.6/bin/postgres core GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/postgresql/9.6/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/04/6f55a5ce6ce05064edfc8feee61c6cb039d296.debug...done. done. [New LWP 31654] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: foo_eis_segfault: jsievers staging 10.220.22.26(57948) SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 initscan (scan=scan@entry=0x55d7a7daa0b0, key=0x0, keep_startblock=keep_startblock@entry=1 '\001') at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:233 233 /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c: No such file or directory. (gdb) p *scan->rs_parallel Cannot access memory at address 0x7fa673a54108 (gdb) > > Another question - are you sure this is not an OOM issue? That might > sometimes look like SIGSEGV due to overcommit. What's the memory > consumption / is there anything in dmesg? Below is all I got after a prior dmesg -c... dmesg -c [5441294.442062] postgres[12033]: segfault at 7f3d011d2110 ip 000055666def9a31 sp 00007ffc37be9a70 error 4 in postgres[55666de23000+653000] Thanks! > > regards -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers <gsievers19@comcast.net> wrote: > (gdb) p *scan->rs_parallel > Cannot access memory at address 0x7fa673a54108 So I guess one question is: was it a valid address that's been unexpectedly unmapped, or is the pointer corrupted? Any chance you can strace the backend and pull out the map, unmap calls? -- Thomas Munro https://enterprisedb.com
On Wed, Jul 17, 2019 at 11:11 AM Thomas Munro <thomas.munro@gmail.com> wrote: > map, unmap mmap, munmap -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers <gsievers19@comcast.net> wrote: > >> (gdb) p *scan->rs_parallel >> Cannot access memory at address 0x7fa673a54108 > > So I guess one question is: was it a valid address that's been > unexpectedly unmapped, or is the pointer corrupted? Any chance you > can strace the backend and pull out the map, unmap calls? I'll dig further. Here is a sanitized look at the query and explain plan... The segfault happens $immediately upon issuance of the query. begin; -- This setting makes the segfault go away --set local max_parallel_workers_per_gather to 0; explain select v.account_id, COUNT(cnt.clicks), te.description, l.product_id from thing3.thing10 te join thing3.thing9 pv on pv.page_view_id = te.page_view_id join thing3.thing11 v on v.visit_id = pv.visit_id left join thing6.thing12 l on v.account_id=l.account_id left join lateral ( select MAX(v.visit_id) ,COUNT(*) as clicks from thing3.thing10 te join thing3.thing9 pv on pv.page_view_id = te.page_view_id join thing3.thing11 v on v.visit_id = pv.visit_id where te.description in ('thing7', 'thing8') and v.account_id=l.account_id GROUP BY v.account_id, v.visit_id order by v.account_id, v.visit_id desc limit 1 )cnt on true where (te.description in ('thing4', 'thing5') or te.description like'%auto%') and te.created_at > '2019-06-24 00:00:00' --and l.loan_status_id in (5,6) group by v.account_id, te.description, l.product_id; abort; BEGIN QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=12300178.71..12300179.79 rows=48 width=44) Group Key: v.account_id, te.description, l.product_id -> Sort (cost=12300178.71..12300178.83 rows=48 width=44) Sort Key: v.account_id, te.description, l.product_id -> Nested Loop Left Join (cost=251621.81..12300177.37 rows=48 width=44) -> Gather (cost=1001.55..270403.27 rows=48 width=40) Workers Planned: 3 -> Nested Loop Left Join (cost=1.56..269398.47 rows=15 width=40) -> Nested Loop (cost=1.13..269391.71 rows=14 width=32) -> Nested Loop (cost=0.57..269368.66 rows=39 width=32) -> Parallel Seq Scan on thing10 te (cost=0.00..269228.36 rows=39 width=32) Filter: ((created_at > '2019-06-24 00:00:00'::timestamp without time zone) AND(((description)::text = ANY ('{thing4,thing5}'::text[])) OR ((description)::text ~~ '%auto%'::text))) -> Index Scan using page_views_pkey on thing9 pv (cost=0.57..3.59 rows=1 width=8) Index Cond: (page_view_id = te.page_view_id) -> Index Scan using visits_pkey on thing11 v (cost=0.56..0.58 rows=1 width=8) Index Cond: (visit_id = pv.visit_id) -> Index Scan using index_loans_on_account_id on thing12 l (cost=0.42..0.46 rows=2 width=8) Index Cond: (v.account_id = account_id) -> Limit (cost=250620.25..250620.27 rows=1 width=20) -> GroupAggregate (cost=250620.25..250620.27 rows=1 width=20) Group Key: v_1.visit_id -> Sort (cost=250620.25..250620.26 rows=1 width=8) Sort Key: v_1.visit_id DESC -> Hash Join (cost=1154.34..250620.24 rows=1 width=8) Hash Cond: (te_1.page_view_id = pv_1.page_view_id) -> Gather (cost=1000.00..250452.00 rows=3706 width=4) Workers Planned: 3 -> Parallel Seq Scan on thing10 te_1 (cost=0.00..249081.40 rows=1195 width=4) Filter: ((description)::text = ANY ('{thing7,thing8}'::text[])) -> Hash (cost=152.85..152.85 rows=119 width=12) -> Nested Loop (cost=1.01..152.85 rows=119 width=12) -> Index Scan using index_visits_on_account_id on thing11 v_1 (cost=0.43..15.63rows=18 width=8) Index Cond: (account_id = l.account_id) -> Index Scan using index_pv_on_visit on thing9 pv_1 (cost=0.57..7.55rows=7 width=8) Index Cond: (visit_id = v_1.visit_id) (35 rows) ROLLBACK -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 17, 2019 at 11:06 AM Jerry Sievers <gsievers19@comcast.net> wrote: > >> (gdb) p *scan->rs_parallel >> Cannot access memory at address 0x7fa673a54108 > > So I guess one question is: was it a valid address that's been > unexpectedly unmapped, or is the pointer corrupted? Any chance you > can strace the backend and pull out the map, unmap calls? There were about 60k lines from strace including these few... mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3d0127a000 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3d01239000 mmap(NULL, 287624, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011f2000 mmap(NULL, 262504, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011b1000 munmap(0x7f3d011b1000, 262504) = 0 Thx -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Wed, Jul 17, 2019 at 11:33 AM Jerry Sievers <gsievers19@comcast.net> wrote: > -> Nested Loop Left Join (cost=251621.81..12300177.37 rows=48 width=44) > -> Gather (cost=1001.55..270403.27 rows=48 width=40) > -> Limit (cost=250620.25..250620.27 rows=1 width=20) > -> Gather (cost=1000.00..250452.00 rows=3706 width=4) One observation is that it's a rescan a bit like the one in the unsuccessful repro attempt I posted, but it has *two* Gather nodes in it (and thus two parallel query DSM segments), and only one of them should be rescanned, and from the backtrace we see that it is indeed the expected one, the one under the Limit operator. Neither of them should be getting unmapped in the leader though and AFAIK nothing happening in the workers could cause this effect, the leader would have to explicitly unmap the thing AFAIK. On Wed, Jul 17, 2019 at 11:42 AM Jerry Sievers <gsievers19@comcast.net> wrote: > mmap(NULL, 287624, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011f2000 > mmap(NULL, 262504, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011b1000 > munmap(0x7f3d011b1000, 262504) = 0 Ok, there go our two parallel query DSM segments, and there it is being unmapped. Hmm. Any chance you could attach a debugger, and "break munmap", "cont", and then show us the backtrace "bt" when that is reached? -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 17, 2019 at 11:33 AM Jerry Sievers <gsievers19@comcast.net> wrote: > >> -> Nested Loop Left Join (cost=251621.81..12300177.37 rows=48 width=44) >> -> Gather (cost=1001.55..270403.27 rows=48 width=40) > >> -> Limit (cost=250620.25..250620.27 rows=1 width=20) > >> -> Gather (cost=1000.00..250452.00 rows=3706 width=4) > > One observation is that it's a rescan a bit like the one in the > unsuccessful repro attempt I posted, but it has *two* Gather nodes in > it (and thus two parallel query DSM segments), and only one of them > should be rescanned, and from the backtrace we see that it is indeed > the expected one, the one under the Limit operator. Neither of them > should be getting unmapped in the leader though and AFAIK nothing > happening in the workers could cause this effect, the leader would > have to explicitly unmap the thing AFAIK. > > On Wed, Jul 17, 2019 at 11:42 AM Jerry Sievers <gsievers19@comcast.net> wrote: >> mmap(NULL, 287624, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011f2000 >> mmap(NULL, 262504, PROT_READ|PROT_WRITE, MAP_SHARED, 124, 0) = 0x7f3d011b1000 >> munmap(0x7f3d011b1000, 262504) = 0 > > Ok, there go our two parallel query DSM segments, and there it is > being unmapped. Hmm. Any chance you could attach a debugger, and > "break munmap", "cont", and then show us the backtrace "bt" when that > is reached? gdb /usr/lib/postgresql/9.6/bin/postgres 21640 GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/postgresql/9.6/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/04/6f55a5ce6ce05064edfc8feee61c6cb039d296.debug...done. done. Attaching to program: /usr/lib/postgresql/9.6/bin/postgres, process 21640 Reading symbols from /usr/lib/x86_64-linux-gnu/libxml2.so.2...Reading symbols from /usr/lib/debug/.build-id/d3/57ce1dba1fab803eddf48922123ffd0a303676.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libpam.so.0...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libssl.so.1.0.0...Reading symbols from /usr/lib/debug/.build-id/ff/69ea60ebe05f2dd689d2b26fc85a73e5fbc3a0.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0...Reading symbols from /usr/lib/debug/.build-id/15/ffeb43278726b025f020862bf51302822a40ec.debug...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.23.so...done. done. Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.23.so...done. done. Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.23.so...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2...Reading symbols from /usr/lib/debug/.build-id/38/90d33727391e4a85dc0f819ab0aa29bb5dfc86.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libsystemd.so.0...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libicuuc.so.55...Reading symbols from /usr/lib/debug/.build-id/46/3d8b610702d64ae0803c7dfcaa02cfb4c6477b.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...Reading symbols from /usr/lib/debug/.build-id/8d/9bd4ce26e45ef16075c67d5f5eeafd8b562832.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/liblzma.so.5...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libaudit.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libk5crypto.so.3...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libcom_err.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libcom_err.so.2.1...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5support.so.0...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/b1/7c21299099640a6d863e423d99265824e7bb16.debug...done. done. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2...Reading symbols from /usr/lib/debug/.build-id/8e/613d0b8d8e3537785637424782be8502ababd2.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libresolv.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libresolv-2.23.so...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libsasl2.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi.so.3...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libgnutls.so.30...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libselinux.so.1...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libgcrypt.so.20...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libicudata.so.55...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libgcc_s.so.1...done. done. Reading symbols from /lib/x86_64-linux-gnu/libkeyutils.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libheimntlm.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.26...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libasn1.so.8...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libhcrypto.so.4...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libroken.so.18...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libidn.so.11...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libtasn1.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libnettle.so.6...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libhogweed.so.4...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libgmp.so.10...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libpcre.so.3...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpcre.so.3.13.2...done. done. Reading symbols from /lib/x86_64-linux-gnu/libgpg-error.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libwind.so.0...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libheimbase.so.1...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libhx509.so.5...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0...Reading symbols from /usr/lib/debug/.build-id/3b/0454e57467057071f7ad49651e0fa7b01cf5c7.debug...done. done. Reading symbols from /lib/x86_64-linux-gnu/libcrypt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libcrypt-2.23.so...done. done. Reading symbols from /usr/lib/x86_64-linux-gnu/libffi.so.6...Reading symbols from /usr/lib/debug/.build-id/9d/9c958f1f4894afef6aecd90d1c430ea29ac34f.debug...done. done. Reading symbols from /usr/lib/postgresql/9.6/lib/auto_explain.so...Reading symbols from /usr/lib/debug/.build-id/94/ab76178c50b0e098f2bd0f3501d9cb6562c743.debug...done. done. Reading symbols from /usr/lib/postgresql/9.6/lib/pg_stat_statements.so...Reading symbols from /usr/lib/debug/.build-id/cf/f288800c22fd97059aaf8e425ae17e29fb88fb.debug...done. done. Reading symbols from /usr/lib/postgresql/9.6/lib/pglogical.so...(no debugging symbols found)...done. Reading symbols from /usr/lib/x86_64-linux-gnu/libpq.so.5...(no debugging symbols found)...done. Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.23.so...done. done. 0x00007f3d093379f3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) break munmap Breakpoint 1 at 0x7f3d09331740: file ../sysdeps/unix/syscall-template.S, line 84. (gdb) cont Continuing. Program received signal SIGUSR1, User defined signal 1. hash_search_with_hash_value (hashp=0x5566701baa68, keyPtr=keyPtr@entry=0x7ffc37be9790, hashvalue=hashvalue@entry=1634369601, action=action@entry=HASH_FIND, foundPtr=foundPtr@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/utils/hash/dynahash.c:959 959 /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/utils/hash/dynahash.c: No such file or directory. (gdb) bt #0 hash_search_with_hash_value (hashp=0x5566701baa68, keyPtr=keyPtr@entry=0x7ffc37be9790, hashvalue=hashvalue@entry=1634369601, action=action@entry=HASH_FIND, foundPtr=foundPtr@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/utils/hash/dynahash.c:959 #1 0x000055666e1224ca in BufTableLookup (tagPtr=tagPtr@entry=0x7ffc37be9790, hashcode=hashcode@entry=1634369601) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/buffer/buf_table.c:96 #2 0x000055666e12527d in BufferAlloc (foundPtr=0x7ffc37be978b "", strategy=0x556670360418, blockNum=53, forkNum=MAIN_FORKNUM, relpersistence=112 'p', smgr=0x5566702a5990) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/buffer/bufmgr.c:1013 #3 ReadBuffer_common (smgr=0x5566702a5990, relpersistence=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=53, mode=RBM_NORMAL, strategy=0x556670360418, hit=0x7ffc37be9837 "") at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/buffer/bufmgr.c:745 #4 0x000055666e125b15 in ReadBufferExtended (reln=0x7f3d015e2670, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=53, mode=mode@entry=RBM_NORMAL, strategy=<optimized out>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/buffer/bufmgr.c:664 #5 0x000055666defc036 in heapgetpage (scan=scan@entry=0x5566703484f8, page=page@entry=53) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:375 #6 0x000055666defd5c2 in heapgettup_pagemode (key=0x0, nkeys=0, dir=ForwardScanDirection, scan=0x5566703484f8) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1036 #7 heap_getnext (scan=scan@entry=0x5566703484f8, direction=direction@entry=ForwardScanDirection) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/heap/heapam.c:1787 #8 0x000055666e053e21 in SeqNext (node=node@entry=0x556670328c48) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:80 #9 0x000055666e03d711 in ExecScanFetch (recheckMtd=0x55666e053de0 <SeqRecheck>, accessMtd=0x55666e053df0 <SeqNext>, node=0x556670328c48) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execScan.c:95 #10 ExecScan (node=node@entry=0x556670328c48, accessMtd=accessMtd@entry=0x55666e053df0 <SeqNext>, recheckMtd=recheckMtd@entry=0x55666e053de0 <SeqRecheck>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execScan.c:180 #11 0x000055666e053ea8 in ExecSeqScan (node=node@entry=0x556670328c48) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSeqscan.c:127 ---Type <return> to continue, or q <return> to quit--- > > > > > -- > Thomas Munro > https://enterprisedb.com > > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Wed, Jul 17, 2019 at 12:05 PM Jerry Sievers <gsievers19@comcast.net> wrote: > Program received signal SIGUSR1, User defined signal 1. Oh, we need to ignore those pesky signals with "handle SIGUSR1 noprint nostop". -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 17, 2019 at 12:05 PM Jerry Sievers <gsievers19@comcast.net> wrote: > >> Program received signal SIGUSR1, User defined signal 1. > > Oh, we need to ignore those pesky signals with "handle SIGUSR1 noprint nostop". Is this the right sequencing? 1. Start client and get backend pid 2. GDB; handle SIGUSR1, break, cont 3. Run query 4. bt Thanks Don't think I am doing this correctly. Please advise. handle SIGUSR1 noprint nostop Signal Stop Print Pass to program Description SIGUSR1 No No Yes User defined signal 1 (gdb) break munmap Breakpoint 1 at 0x7f3d09331740: file ../sysdeps/unix/syscall-template.S, line 84. (gdb) cont Continuing. Breakpoint 1, munmap () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 munmap () at ../sysdeps/unix/syscall-template.S:84 #1 0x000055666e12d7f4 in dsm_impl_posix (impl_private=0x22, elevel=19, mapped_size=0x556670205890, mapped_address=0x556670205888, request_size=0, handle=<optimized out>, op=DSM_OP_DETACH) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/ipc/dsm_impl.c:259 #2 dsm_impl_op (op=op@entry=DSM_OP_DETACH, handle=<optimized out>, request_size=request_size@entry=0, impl_private=impl_private@entry=0x556670205880, mapped_address=mapped_address@entry=0x556670205888, mapped_size=mapped_size@entry=0x556670205890, elevel=19) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/ipc/dsm_impl.c:176 #3 0x000055666e12efb1 in dsm_detach (seg=0x556670205860) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/storage/ipc/dsm.c:738 #4 0x000055666df31369 in DestroyParallelContext (pcxt=0x556670219b68) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/access/transam/parallel.c:750 #5 0x000055666e0357bb in ExecParallelCleanup (pei=0x7f3d012218b0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execParallel.c:575 #6 0x000055666e047ca2 in ExecShutdownGather (node=node@entry=0x55667033bed0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeGather.c:443 #7 0x000055666e0359f5 in ExecShutdownNode (node=0x55667033bed0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:820 #8 0x000055666e0777e1 in planstate_tree_walker (planstate=0x55667033b2b0, walker=0x55666e0359a0 <ExecShutdownNode>, context=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/nodes/nodeFuncs.c:3636 #9 0x000055666e0777e1 in planstate_tree_walker (planstate=0x55667033b040, walker=0x55666e0359a0 <ExecShutdownNode>, context=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/nodes/nodeFuncs.c:3636 #10 0x000055666e0777e1 in planstate_tree_walker (planstate=planstate@entry=0x55667033a6c8, walker=walker@entry=0x55666e0359a0 <ExecShutdownNode>, context=context@entry=0x0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/nodes/nodeFuncs.c:3636 #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 #13 0x000055666e035d28 in ExecProcNode (node=node@entry=0x55667033a428) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:531 #14 0x000055666e051f69 in ExecNestLoop (node=node@entry=0x55667031c660) ---Type <return> to continue, or q <return> to quit--- at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeNestloop.c:174 #15 0x000055666e035e28 in ExecProcNode (node=node@entry=0x55667031c660) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:476 #16 0x000055666e054989 in ExecSort (node=node@entry=0x55667031c3f0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeSort.c:103 #17 0x000055666e035de8 in ExecProcNode (node=0x55667031c3f0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:495 #18 0x000055666e041fe9 in fetch_input_tuple (aggstate=aggstate@entry=0x55667031ba18) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:598 #19 0x000055666e043bb3 in agg_retrieve_direct (aggstate=0x55667031ba18) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:2078 #20 ExecAgg (node=node@entry=0x55667031ba18) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeAgg.c:1903 #21 0x000055666e035dc8 in ExecProcNode (node=node@entry=0x55667031ba18) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:503 #22 0x000055666e031f2e in ExecutePlan (dest=0x7f3d01277aa8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55667031ba18, estate=0x55667031b878) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execMain.c:1567 #23 standard_ExecutorRun (queryDesc=0x556670320a78, direction=<optimized out>, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execMain.c:339 #24 0x00007f3d01cd0515 in explain_ExecutorRun (queryDesc=0x556670320a78, direction=ForwardScanDirection, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../contrib/auto_explain/auto_explain.c:281 #25 0x00007f3d01ac8db0 in pgss_ExecutorRun (queryDesc=0x556670320a78, direction=ForwardScanDirection, count=0) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../contrib/pg_stat_statements/pg_stat_statements.c:875 #26 0x000055666e155167 in PortalRunSelect (portal=portal@entry=0x5566701d6df8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x7f3d01277aa8) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/pquery.c:948 #27 0x000055666e1567a0 in PortalRun (portal=portal@entry=0x5566701d6df8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x7f3d01277aa8, altdest=altdest@entry=0x7f3d01277aa8, completionTag=completionTag@entry=0x7ffc37bea670 "") at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/pquery.c:---Type <return> to continue,or q <return> to quit--- 789 #28 0x000055666e1532d6 in exec_simple_query ( query_string=0x5566702a4c68 "select v.account_id, COUNT(cnt.clicks), te.description,\nl.product_id\nfrom nbox_nc_ah.tracking_eventste\njoin nbox_nc_ah.page_views pv on pv.page_view_id = te.page_view_id\njoin nbox_nc_ah.visitsv on v"...) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/postgres.c:1109 #29 PostgresMain (argc=<optimized out>, argv=argv@entry=0x556670204630, dbname=0x5566701bab88 "staging", username=<optimized out>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/tcop/postgres.c:4101 #30 0x000055666dec8a1b in BackendRun (port=0x5566701fd500) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:4339 #31 BackendStartup (port=0x5566701fd500) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:4013 #32 ServerLoop () at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:1722 #33 0x000055666e0ef281 in PostmasterMain (argc=13, argv=<optimized out>) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/postmaster/postmaster.c:1330 #34 0x000055666dec9bf1 in main (argc=13, argv=0x5566701b8840) at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/main/main.c:228 (gdb) (gdb) quit A debugging session is active. Inferior 1 [process 32291] will be detached. Quit anyway? (y or n) y Detaching from program: /usr/lib/postgresql/9.6/bin/postgres, process 32291 root@pgdev01:/home/jsievers# -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Wed, Jul 17, 2019 at 12:26 PM Jerry Sievers <gsievers19@comcast.net> wrote: > Is this the right sequencing? > > 1. Start client and get backend pid > 2. GDB; handle SIGUSR1, break, cont > 3. Run query > 4. bt Perfect, thanks. I think I just spotted something: > #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 Limit thinks it's OK to "shut down" the subtree, but if you shut down a Gather node you can't rescan it later because it destroys its shared memory. Oops. Not sure what to do about that yet. -- Thomas Munro https://enterprisedb.com
On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > > #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 > > https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 > > Limit thinks it's OK to "shut down" the subtree, but if you shut down a > Gather node you can't rescan it later because it destroys its shared > memory. Oops. Not sure what to do about that yet. CCing Amit and Robert, authors of commits 19df1702 and 69de1718. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 17, 2019 at 12:26 PM Jerry Sievers <gsievers19@comcast.net> wrote: > >> Is this the right sequencing? >> >> 1. Start client and get backend pid >> 2. GDB; handle SIGUSR1, break, cont >> 3. Run query >> 4. bt > > Perfect, thanks. I think I just spotted something: Dig that! Great big thanks to you and Tomas, et al for jumping on this. Please let know if there's anything else I can submit that would be helpful. > >> #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) >> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 >> #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) >> at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 > > https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 > > Limit thinks it's OK to "shut down" the subtree, but if you shut down a > Gather node you can't rescan it later because it destroys its shared > memory. Oops. Not sure what to do about that yet. > > > -- > Thomas Munro > https://enterprisedb.com > > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net
On Wed, Jul 17, 2019 at 12:57 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > > > #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 > > > > https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 > > > > Limit thinks it's OK to "shut down" the subtree, but if you shut down a > > Gather node you can't rescan it later because it destroys its shared > > memory. Oops. Not sure what to do about that yet. > > CCing Amit and Robert, authors of commits 19df1702 and 69de1718. Here's a repro (I'm sure you can find a shorter one, this one's hacked up from join_hash.sql, basically just adding LIMIT): create table join_foo as select generate_series(1, 3000) as id, 'xxxxx'::text as t; alter table join_foo set (parallel_workers = 0); create table join_bar as select generate_series(0, 10000) as id, 'xxxxx'::text as t; alter table join_bar set (parallel_workers = 2); set parallel_setup_cost = 0; set parallel_tuple_cost = 0; set max_parallel_workers_per_gather = 2; set enable_material = off; set enable_mergejoin = off; set work_mem = '1GB'; select count(*) from join_foo left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id) limit 1000) ss on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1; -- Thomas Munro https://enterprisedb.com
On Wed, Jul 17, 2019 at 6:28 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > > > #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 > > > > https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 > > > > Limit thinks it's OK to "shut down" the subtree, but if you shut down a > > Gather node you can't rescan it later because it destroys its shared > > memory. Oops. Not sure what to do about that yet. > Yeah, that is a problem. Actually, what we need here is to wait-for-workers-to-finish and collect all the instrumentation information. We don't need to destroy the shared memory at this stage, but we don't have a special purpose function which can just allow us to collect stats. One idea could be that we create a special purpose function which sounds like a recipe of code duplication, another could be that somehow pass the information through ExecShutdownNode to Gather/GatherMerge that they don't destroy shared memory. Immediately, I can't think of better ideas, but it is possible that there is some better way to deal with this. > CCing Amit and Robert, authors of commits 19df1702 and 69de1718. > Thanks for diagnosing the issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 17, 2019 at 4:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 17, 2019 at 6:28 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Wed, Jul 17, 2019 at 12:44 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > #11 0x000055666e0359df in ExecShutdownNode (node=node@entry=0x55667033a6c8) > > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/execProcnode.c:830 > > > > #12 0x000055666e04d0ff in ExecLimit (node=node@entry=0x55667033a428) > > > > at /build/postgresql-9.6-5O8OLM/postgresql-9.6-9.6.14/build/../src/backend/executor/nodeLimit.c:139 > > > > > > https://github.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/executor/nodeLimit.c#L139 > > > > > > Limit thinks it's OK to "shut down" the subtree, but if you shut down a > > > Gather node you can't rescan it later because it destroys its shared > > > memory. Oops. Not sure what to do about that yet. > > > > Yeah, that is a problem. Actually, what we need here is to > wait-for-workers-to-finish and collect all the instrumentation > information. We don't need to destroy the shared memory at this > stage, but we don't have a special purpose function which can just > allow us to collect stats. One idea could be that we create a special > purpose function which sounds like a recipe of code duplication, > another could be that somehow pass the information through > ExecShutdownNode to Gather/GatherMerge that they don't destroy shared > memory. Immediately, I can't think of better ideas, but it is > possible that there is some better way to deal with this. > I am not able to come up with anything better. Robert, Thomas, do you see any problem with this idea or do you have any better ideas to fix this issue? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 18, 2019 at 6:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jul 17, 2019 at 4:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Yeah, that is a problem. Actually, what we need here is to > > wait-for-workers-to-finish and collect all the instrumentation > > information. We don't need to destroy the shared memory at this > > stage, but we don't have a special purpose function which can just > > allow us to collect stats. One idea could be that we create a special > > purpose function which sounds like a recipe of code duplication, > > another could be that somehow pass the information through > > ExecShutdownNode to Gather/GatherMerge that they don't destroy shared > > memory. Immediately, I can't think of better ideas, but it is > > possible that there is some better way to deal with this. > > I am not able to come up with anything better. Robert, Thomas, do you > see any problem with this idea or do you have any better ideas to fix > this issue? Hmm, so something like a new argument "bool final" added to the ExecXXXShutdown() functions, which receives false in this case to tell it that there could be a rescan so keep the parallel context around. Or alternatively a separate function with another end-of-scan type of name that I'm having trouble inventing, which is basically the same but a bigger patch. If you add a new argument you might in theory want to pass that on to the ShutdownForeignScan and ShutdownCustomScan callbacks, but we obviously can't change those APIs in the back branches. If you add a new function instead you might theoretically want to add it to those APIs too, which you also can't really do in the back branches either (well even if you could, existing extensions won't register anything). I think the new argument version is probably better because I suspect only Gather would really ever have any reason to treat the two cases differently, and all existing cases in or out of core would just keep doing what they're doing. So I think adding "bool final" (or better name) would probably work out OK, and I don't have a better idea. -- Thomas Munro https://enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes: > Hmm, so something like a new argument "bool final" added to the > ExecXXXShutdown() functions, which receives false in this case to tell > it that there could be a rescan so keep the parallel context around. I think this is going in the wrong direction. Nodes should *always* assume that a rescan is possible until ExecEndNode is called. See the commentary about EXEC_FLAG_REWIND in executor.h: * REWIND indicates that the plan node should try to efficiently support * rescans without parameter changes. (Nodes must support ExecReScan calls * in any case, but if this flag was not given, they are at liberty to do it * through complete recalculation. Note that a parameter change forces a * full recalculation in any case.) If nodeLimit is doing something that's incompatible with that, it's nodeLimit's fault; and similarly for the parallel machinery. If you want to do otherwise, you are going to be inventing a whole bunch of complicated and doubtless-initially-buggy control logic to pass down information about whether a rescan might be possible. That doesn't sound like a recipe for a back-patchable fix. Perhaps we could consider redesigning the rules around REWIND in a future version, but that's not where to focus the bug fix effort. regards, tom lane
On Thu, Jul 18, 2019 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > Hmm, so something like a new argument "bool final" added to the > > ExecXXXShutdown() functions, which receives false in this case to tell > > it that there could be a rescan so keep the parallel context around. > > I think this is going in the wrong direction. Nodes should *always* > assume that a rescan is possible until ExecEndNode is called. > I am thinking that why not we remove the part of destroying the parallel context (and shared memory) from ExecShutdownGather (and ExecShutdownGatherMerge) and then do it at the time of ExecEndGather (and ExecEndGatherMerge)? This should fix the bug in hand and seems to be more consistent with our overall design principles. I have not tried to code it to see if there are any other repercussions of the same but seems worth investigating. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Jul 18, 2019 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > Hmm, so something like a new argument "bool final" added to the > > > ExecXXXShutdown() functions, which receives false in this case to tell > > > it that there could be a rescan so keep the parallel context around. > > > > I think this is going in the wrong direction. Nodes should *always* > > assume that a rescan is possible until ExecEndNode is called. > > I am thinking that why not we remove the part of destroying the > parallel context (and shared memory) from ExecShutdownGather (and > ExecShutdownGatherMerge) and then do it at the time of ExecEndGather > (and ExecEndGatherMerge)? This should fix the bug in hand and seems > to be more consistent with our overall design principles. I have not > tried to code it to see if there are any other repercussions of the > same but seems worth investigating. What do you think? I tried moving ExecParallelCleanup() into ExecEndGather(). The first problem is that ExecutePlan() wraps execution in EnterParallelMode()/ExitParallelMode(), but ExitParallelMode() fails an assertion that no parallel context is active because ExecEndGather() hasn't run yet. The enclosing ExecutorStart()/ExecutorEnd() calls are further down the call stack, in ProcessQuery(). So some more restructuring might be needed to exit parallel mode later, but then I feel like you might be getting way out of back-patchable territory, especially if it involves moving code to the other side of the executor hook boundary. Is there an easier way? Another idea from the band-aid-solutions-that-are-easy-to-back-patch department: in ExecutePlan() where we call ExecShutdownNode(), we could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have ExecGatherShutdown() only run ExecParallelCleanup() if it sees that flag. That's not beautiful, but it's less churn that the 'bool final' argument we discussed before, and could be removed in master when we have a better way. Stepping back a bit, it seems like we need two separate tree-walking calls: one to free resources not needed anymore by the current rescan (workers), and another to free resources not needed ever again (parallel context). That could be spelled ExecShutdownNode(false) and ExecShutdownNode(true), or controlled with the EXEC_FLAG_DONE kluge, or a new additional ExecSomethingSomethingNode() function, or as you say, perhaps the second thing could be incorporated into ExecEndNode(). I suspect that the Shutdown callbacks for Hash, Hash Join, Custom Scan and Foreign Scan might not be needed anymore if we could keep the parallel context around until after the run ExecEndNode(). -- Thomas Munro https://enterprisedb.com
On Tue, Jul 23, 2019 at 9:11 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jul 19, 2019 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I am thinking that why not we remove the part of destroying the > > parallel context (and shared memory) from ExecShutdownGather (and > > ExecShutdownGatherMerge) and then do it at the time of ExecEndGather > > (and ExecEndGatherMerge)? This should fix the bug in hand and seems > > to be more consistent with our overall design principles. I have not > > tried to code it to see if there are any other repercussions of the > > same but seems worth investigating. What do you think? > > I tried moving ExecParallelCleanup() into ExecEndGather(). The first > problem is that ExecutePlan() wraps execution in > EnterParallelMode()/ExitParallelMode(), but ExitParallelMode() fails > an assertion that no parallel context is active because > ExecEndGather() hasn't run yet. The enclosing > ExecutorStart()/ExecutorEnd() calls are further down the call stack, > in ProcessQuery(). So some more restructuring might be needed to exit > parallel mode later, but then I feel like you might be getting way out > of back-patchable territory, especially if it involves moving code to > the other side of the executor hook boundary. Is there an easier way? > If we have to follow the solution on these lines, then I don't see an easier way. One idea could be that we relax the assert in ExitParallelMode so that it doesn't expect parallel context to be gone by that time, but not sure if that is a good idea because it is used in some other places as well. I feel in general it is a good assertion that before we leave parallel mode, the parallel context should be gone as that ensures we won't do any parallel activity after that. > Another idea from the band-aid-solutions-that-are-easy-to-back-patch > department: in ExecutePlan() where we call ExecShutdownNode(), we > could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have > ExecGatherShutdown() only run ExecParallelCleanup() if it sees that > flag. That's not beautiful, but it's less churn that the 'bool final' > argument we discussed before, and could be removed in master when we > have a better way. > Right, that will be lesser code churn and it can also work. However, one thing that needs some thought is till now es_top_eflags is only set in ExecutorStart and same is mentioned in comments where it is declared and it seems we are going to change that with this idea. How about having a separate function ExecBlahShutdown which will clean up resources as parallel context and can be called only from ExecutePlan where we are calling ExecShutdownNode? I think both these and the other solution we have discussed are on similar lines and another idea could be to relax the assert which again is not a superb idea. > Stepping back a bit, it seems like we need two separate tree-walking > calls: one to free resources not needed anymore by the current rescan > (workers), and another to free resources not needed ever again > (parallel context). That could be spelled ExecShutdownNode(false) and > ExecShutdownNode(true), or controlled with the EXEC_FLAG_DONE kluge, > or a new additional ExecSomethingSomethingNode() function, or as you > say, perhaps the second thing could be incorporated into > ExecEndNode(). I suspect that the Shutdown callbacks for Hash, Hash > Join, Custom Scan and Foreign Scan might not be needed anymore if we > could keep the parallel context around until after the run > ExecEndNode(). > I think we need those to collect instrumentation information. I guess that has to be done before we call InstrStopNode, otherwise, we might miss some instrumentation information. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 23, 2019 at 9:11 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > > Another idea from the band-aid-solutions-that-are-easy-to-back-patch > > department: in ExecutePlan() where we call ExecShutdownNode(), we > > could write EXEC_FLAG_DONE into estate->es_top_eflags, and then have > > ExecGatherShutdown() only run ExecParallelCleanup() if it sees that > > flag. That's not beautiful, but it's less churn that the 'bool final' > > argument we discussed before, and could be removed in master when we > > have a better way. > > > > Right, that will be lesser code churn and it can also work. However, > one thing that needs some thought is till now es_top_eflags is only > set in ExecutorStart and same is mentioned in comments where it is > declared and it seems we are going to change that with this idea. How > about having a separate function ExecBlahShutdown which will clean up > resources as parallel context and can be called only from ExecutePlan > where we are calling ExecShutdownNode? I think both these and the > other solution we have discussed are on similar lines and another idea > could be to relax the assert which again is not a superb idea. > It seems we don't have a clear preference for any particular solution among these and neither there appears to be any better idea. I guess we can wait for a few days to see if Robert has any views on this, otherwise, pick one of the above and move ahead. Robert, let us know if you have any preference or better idea to fix this problem? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Right, that will be lesser code churn and it can also work. However, > > one thing that needs some thought is till now es_top_eflags is only > > set in ExecutorStart and same is mentioned in comments where it is > > declared and it seems we are going to change that with this idea. How > > about having a separate function ExecBlahShutdown which will clean up > > resources as parallel context and can be called only from ExecutePlan > > where we are calling ExecShutdownNode? I think both these and the > > other solution we have discussed are on similar lines and another idea > > could be to relax the assert which again is not a superb idea. > > It seems we don't have a clear preference for any particular solution > among these and neither there appears to be any better idea. I guess > we can wait for a few days to see if Robert has any views on this, > otherwise, pick one of the above and move ahead. I take the EXEC_FLAG_DONE idea back. It's ugly and too hard to verify that every appropriate path sets it, and a flag that means the opposite would be even more of a kluge, and generally I think I was looking at this too myopically: I was looking for a way to shut down processes ASAP without giving up the shared memory we'll need for rescanning, but what I should have been looking at is the reason you did that in the first place: to get the instrumentation data. Can you explain why it's necessary to do that explicitly for Limit? Wouldn't the right place to collect instrumentation be at the end of execution when Shutdown will run in all cases anyway (and possibly also during ExecParallelReinitialize() or something like that if it's being clobbered by rescans, I didn't check)? What's special about Limit? Today while poking at this and trying to answer those questions for myself, I realised that the repro I posted earlier[1] crashes exactly as Jerry reported on REL9_6_STABLE, but in later release branches it runs to completion. That's because the crashing code was removed in commit 41b0dd98 "Separate reinitialization of shared parallel-scan state from ExecReScan.". So newer branches get past that problem, but they all spit out tons of each of these three warnings: WARNING: buffer refcount leak: [172] (rel=base/12558/16390, blockNum=5, flags=0x93800000, refcount=1 2998) ... WARNING: relcache reference leak: relation "join_bar" not closed ... WARNING: Snapshot reference leak: Snapshot 0x7ff20383bfb0 still referenced ... Oops. I don't know exactly why yet, but the problem goes away if you just comment out the offending ExecShutdownNode() call in nodeLimit.c. I tried to understand whether the buffer stats were wrong with that code commented out (Adrien Nayrat's original complaint[2]), but I ran out of time for debugging adventures today. [1] https://www.postgresql.org/message-id/CA%2BhUKGJyqDp9FZSHLTjiNMcz-c6%3DRdStB%2BUjVZsR8wfHnJXy8Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/86137f17-1dfb-42f9-7421-82fd786b04a1%40anayrat.info -- Thomas Munro https://enterprisedb.com
On Sat, Jul 27, 2019 at 8:29 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Right, that will be lesser code churn and it can also work. However, > > > one thing that needs some thought is till now es_top_eflags is only > > > set in ExecutorStart and same is mentioned in comments where it is > > > declared and it seems we are going to change that with this idea. How > > > about having a separate function ExecBlahShutdown which will clean up > > > resources as parallel context and can be called only from ExecutePlan > > > where we are calling ExecShutdownNode? I think both these and the > > > other solution we have discussed are on similar lines and another idea > > > could be to relax the assert which again is not a superb idea. > > > > It seems we don't have a clear preference for any particular solution > > among these and neither there appears to be any better idea. I guess > > we can wait for a few days to see if Robert has any views on this, > > otherwise, pick one of the above and move ahead. > > I take the EXEC_FLAG_DONE idea back. It's ugly and too hard to verify > that every appropriate path sets it, and a flag that means the > opposite would be even more of a kluge, and generally I think I was > looking at this too myopically: I was looking for a way to shut down > processes ASAP without giving up the shared memory we'll need for > rescanning, but what I should have been looking at is the reason you > did that in the first place: to get the instrumentation data. Can you > explain why it's necessary to do that explicitly for Limit? Wouldn't > the right place to collect instrumentation be at the end of execution > when Shutdown will run in all cases anyway (and possibly also during > ExecParallelReinitialize() or something like that if it's being > clobbered by rescans, I didn't check)? What's special about Limit? > I think here you are missing the point that to collect the instrumentation information one also need to use InstrStartNode and InstrStopNode. So, for the Limit node, the InstrStopNode would be already done by the time we call shutdown of workers at the end of execution. To know a bit more details, see [1][2][3]. > Today while poking at this and trying to answer those questions for > myself, I realised that the repro I posted earlier[1] crashes exactly > as Jerry reported on REL9_6_STABLE, but in later release branches it > runs to completion. That's because the crashing code was removed in > commit 41b0dd98 "Separate reinitialization of shared parallel-scan > state from ExecReScan.". > > So newer branches get past that problem, but they all spit out tons of > each of these three warnings: > > WARNING: buffer refcount leak: [172] (rel=base/12558/16390, > blockNum=5, flags=0x93800000, refcount=1 2998) > ... > WARNING: relcache reference leak: relation "join_bar" not closed > ... > WARNING: Snapshot reference leak: Snapshot 0x7ff20383bfb0 still referenced > ... > > Oops. > This is exactly due to the same problem that before rescans, we have destroyed the shared memory. If you do the earlier trick of not cleaning up shared memory till ExecEndNode, then you won't see this problem. [1] - https://www.postgresql.org/message-id/CAA4eK1KZEbYKj9HHP-6WqqjAXuoB%2BWJu-w1s9uovj%3DeeBxC48Q%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CA%2BTgmoY3kcTcc5bFCZeY5NMFna-xaMPuTHA-z-z2Bmfg%2Bdb-XQ%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAA4eK1L0KAZWgnRJz%3DVNVpyS3FFbVh8E5egyziaR0E10bC204Q%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 18, 2019 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think this is going in the wrong direction. Nodes should *always* > assume that a rescan is possible until ExecEndNode is called. > If you want to do otherwise, you are going to be inventing a whole > bunch of complicated and doubtless-initially-buggy control logic > to pass down information about whether a rescan might be possible. > That doesn't sound like a recipe for a back-patchable fix. Perhaps > we could consider redesigning the rules around REWIND in a future > version, but that's not where to focus the bug fix effort. So, if I can summarize how we got here, as best I understand it: 0. The historic behavior of the executor is to assume it's OK to leak resources for the lifetime of the query. Nodes that are executed to completion generally do some cleanup, but we feel free (as under Limit) to just stop executing a node without giving it any hint that it should release resources. So a Sort may hold onto a terabyte of memory and an index scan may keep holding a pin even after there's no theoretical way of ever needing those resources again, and we just don't care. 1. Parallel query made that perhaps-already-shaky assumption a lot more problematic. Partly that's because workers are a a more scarce and considerably heavier resource than anything else, and moreover act as a container for anything else, so whatever you were leaking before, you can now leak N times more of it, plus N processes, until the end of the query. However, there's a correctness reason too, which is that when a node has a copy in the leader and a copy in every worker, each copy has its own instrumentation data (startup time, run time, nloops, etc) and we can only fold all that together once the node is done executing, because it's really hard to add up a bunch of numbers before the numbers are done changing. We could've made the instrumentation shared throughout, but if we had, we could have contention for updating the instrumentation data, which seems like it'd be bad. 2. To fix that correctness problem, we decided to try to shut down the node under a limit node when we're done with it (commit 85c9d3475e4f680dbca7c04fe096af018f3b8760). At a certain level, this looks fundamentally necessary to me. If you're going to have N separate copies of the instrumentation, and you want to add them up when you're done, then you have to decide to be done at some point; otherwise you don't know when to add them up, and maybe won't add them up at all, and then you'll be sad. This does not mean that the exact timing couldn't be changed somehow, but if you want a correct implementation, you have to shut down Limit's sub-node after you're done executing it (so that you can get the instrumentation data from the workers after it's final) and before you start destroying DSM segments and stuff (so that you get the instrumentation data from the workers before it vanishes). 3. The aforementioned commit turned out to be buggy in at least to two ways, precisely because it didn't do a good enough job predicting when the Limit needed to be shut down. First, there was commit 2cd0acfdade82f3cab362fd9129d453f81cc2745, where we missed the fact that you could hit the Limit and then back up. Second, there's the present issue, where the Limit gets rescanned. So, given all that, if we want to adopt Tom's position that we should always cater to a possible rescan, then we're going to have to rethink the way that instrumentation data gets consolidated from workers into the leader in such a way that we can consolidate multiple times without ending up with the wrong answer. The other option is to do what I understand Amit and Thomas to be proposing, which is to do a better job identifying the case where we're "done for good" and can trigger the shutdown fearlessly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Jul 18, 2019 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think this is going in the wrong direction. Nodes should *always* > > assume that a rescan is possible until ExecEndNode is called. > > If you want to do otherwise, you are going to be inventing a whole > > bunch of complicated and doubtless-initially-buggy control logic > > to pass down information about whether a rescan might be possible. > > That doesn't sound like a recipe for a back-patchable fix. Perhaps > > we could consider redesigning the rules around REWIND in a future > > version, but that's not where to focus the bug fix effort. > > So, if I can summarize how we got here, as best I understand it: > Thanks for the summarization. This looks mostly correct to me. > 0. The historic behavior of the executor is to assume it's OK to leak > resources for the lifetime of the query. Nodes that are executed to > completion generally do some cleanup, but we feel free (as under > Limit) to just stop executing a node without giving it any hint that > it should release resources. So a Sort may hold onto a terabyte of > memory and an index scan may keep holding a pin even after there's no > theoretical way of ever needing those resources again, and we just > don't care. > > 1. Parallel query made that perhaps-already-shaky assumption a lot > more problematic. Partly that's because workers are a a more scarce > and considerably heavier resource than anything else, and moreover act > as a container for anything else, so whatever you were leaking before, > you can now leak N times more of it, plus N processes, until the end > of the query. However, there's a correctness reason too, which is that > when a node has a copy in the leader and a copy in every worker, each > copy has its own instrumentation data (startup time, run time, nloops, > etc) and we can only fold all that together once the node is done > executing, because it's really hard to add up a bunch of numbers > before the numbers are done changing. We could've made the > instrumentation shared throughout, but if we had, we could have > contention for updating the instrumentation data, which seems like > it'd be bad. > > 2. To fix that correctness problem, we decided to try to shut down the > node under a limit node when we're done with it (commit > 85c9d3475e4f680dbca7c04fe096af018f3b8760). At a certain level, this > looks fundamentally necessary to me. If you're going to have N > separate copies of the instrumentation, and you want to add them up > when you're done, then you have to decide to be done at some point; > otherwise you don't know when to add them up, and maybe won't add them > up at all, and then you'll be sad. This does not mean that the exact > timing couldn't be changed somehow, but if you want a correct > implementation, you have to shut down Limit's sub-node after you're > done executing it (so that you can get the instrumentation data from > the workers after it's final) and before you start destroying DSM > segments and stuff (so that you get the instrumentation data from the > workers before it vanishes). > > 3. The aforementioned commit turned out to be buggy in at least to two > ways, precisely because it didn't do a good enough job predicting when > the Limit needed to be shut down. First, there was commit > 2cd0acfdade82f3cab362fd9129d453f81cc2745, where we missed the fact > that you could hit the Limit and then back up. > We have not missed it, rather we decided to it separately because it appears to impact some different cases as well [1][2]. > Second, there's the > present issue, where the Limit gets rescanned. > > So, given all that, if we want to adopt Tom's position that we should > always cater to a possible rescan, then we're going to have to rethink > the way that instrumentation data gets consolidated from workers into > the leader in such a way that we can consolidate multiple times > without ending up with the wrong answer. > The other idea we had discussed which comes closer to adopting Tom's position was that during ExecShutdownNode, we just destroy parallel workers, collect instrumentation data and don't destroy the parallel context. The parallel context could be destroyed in ExecEndNode (ExecEndGather(Merge)) code path. The problem with this idea is that ExitParallelMode doesn't expect parallel context to be active. Now, we can either change the location of Exit/EnterParallelMode or relax that restriction. As mentioned above that restriction appears good to me, so I am not in favor of changing it unless we have some other solid way to install it. I am not sure if this idea is better than other approaches we are discussing. > The other option is to do > what I understand Amit and Thomas to be proposing, which is to do a > better job identifying the case where we're "done for good" and can > trigger the shutdown fearlessly. > Yes, this sounds safe fix for back-branches. We might want to go with this for back-branches and then see if we can come up with a better way to fix for HEAD. [1] - https://www.postgresql.org/message-id/CA%2BTgmoYAxqmE13UOOSU%3DmE-hBGnTfYakb3dOoOJ_043Oc%3D6Xug%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1KwCx9qQk%3DKo4LFTwoYg9B8TSccPAc%3DEoJR88rQpCYVdA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote: >> The other option is to do >> what I understand Amit and Thomas to be proposing, which is to do a >> better job identifying the case where we're "done for good" and can >> trigger the shutdown fearlessly. > Yes, this sounds safe fix for back-branches. Actually, my point was exactly that I *didn't* think that would be a safe fix for the back branches --- at least, not unless you're okay with a very conservative and hence resource-leaky method for deciding when it's safe to shut down sub-nodes. We could do something involving (probably) adding new eflags bits to pass this sort of info down to child plan nodes. But that's going to require design and coding, and it will not be backwards compatible. At least not from the point of view of any extension that's doing anything in that area. regards, tom lane
On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Jul 18, 2019 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > I think this is going in the wrong direction. Nodes should *always* > > > assume that a rescan is possible until ExecEndNode is called. > > > If you want to do otherwise, you are going to be inventing a whole > > > bunch of complicated and doubtless-initially-buggy control logic > > > to pass down information about whether a rescan might be possible. > > > That doesn't sound like a recipe for a back-patchable fix. Perhaps > > > we could consider redesigning the rules around REWIND in a future > > > version, but that's not where to focus the bug fix effort. > > > > So, if I can summarize how we got here, as best I understand it: > > > > Thanks for the summarization. This looks mostly correct to me. > > > 0. The historic behavior of the executor is to assume it's OK to leak > > resources for the lifetime of the query. Nodes that are executed to > > completion generally do some cleanup, but we feel free (as under > > Limit) to just stop executing a node without giving it any hint that > > it should release resources. So a Sort may hold onto a terabyte of > > memory and an index scan may keep holding a pin even after there's no > > theoretical way of ever needing those resources again, and we just > > don't care. > > > > 1. Parallel query made that perhaps-already-shaky assumption a lot > > more problematic. Partly that's because workers are a a more scarce > > and considerably heavier resource than anything else, and moreover act > > as a container for anything else, so whatever you were leaking before, > > you can now leak N times more of it, plus N processes, until the end > > of the query. However, there's a correctness reason too, which is that > > when a node has a copy in the leader and a copy in every worker, each > > copy has its own instrumentation data (startup time, run time, nloops, > > etc) and we can only fold all that together once the node is done > > executing, because it's really hard to add up a bunch of numbers > > before the numbers are done changing. We could've made the > > instrumentation shared throughout, but if we had, we could have > > contention for updating the instrumentation data, which seems like > > it'd be bad. > > > > 2. To fix that correctness problem, we decided to try to shut down the > > node under a limit node when we're done with it (commit > > 85c9d3475e4f680dbca7c04fe096af018f3b8760). At a certain level, this > > looks fundamentally necessary to me. If you're going to have N > > separate copies of the instrumentation, and you want to add them up > > when you're done, then you have to decide to be done at some point; > > otherwise you don't know when to add them up, and maybe won't add them > > up at all, and then you'll be sad. This does not mean that the exact > > timing couldn't be changed somehow, but if you want a correct > > implementation, you have to shut down Limit's sub-node after you're > > done executing it (so that you can get the instrumentation data from > > the workers after it's final) and before you start destroying DSM > > segments and stuff (so that you get the instrumentation data from the > > workers before it vanishes). > > > > 3. The aforementioned commit turned out to be buggy in at least to two > > ways, precisely because it didn't do a good enough job predicting when > > the Limit needed to be shut down. First, there was commit > > 2cd0acfdade82f3cab362fd9129d453f81cc2745, where we missed the fact > > that you could hit the Limit and then back up. > > > > We have not missed it, rather we decided to it separately because it > appears to impact some different cases as well [1][2]. > > > Second, there's the > > present issue, where the Limit gets rescanned. > > > > So, given all that, if we want to adopt Tom's position that we should > > always cater to a possible rescan, then we're going to have to rethink > > the way that instrumentation data gets consolidated from workers into > > the leader in such a way that we can consolidate multiple times > > without ending up with the wrong answer. > > > > The other idea we had discussed which comes closer to adopting Tom's > position was that during ExecShutdownNode, we just destroy parallel > workers, collect instrumentation data and don't destroy the parallel > context. The parallel context could be destroyed in ExecEndNode > (ExecEndGather(Merge)) code path. The problem with this idea is that > ExitParallelMode doesn't expect parallel context to be active. Now, > we can either change the location of Exit/EnterParallelMode or relax > that restriction. As mentioned above that restriction appears good to > me, so I am not in favor of changing it unless we have some other > solid way to install it. I am not sure if this idea is better than > other approaches we are discussing. > > I have made a patch based on the above lines. I have tested the scenarios which Thomas had shared in the earlier mail and few more tests based on Thomas's tests. I'm not sure if we will be going ahead with this solution or not. Let me know your opinion on the same. If you feel this approach is ok, we can add few of this tests into pg tests. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Aug 7, 2019 at 3:15 PM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > The other idea we had discussed which comes closer to adopting Tom's > > position was that during ExecShutdownNode, we just destroy parallel > > workers, collect instrumentation data and don't destroy the parallel > > context. The parallel context could be destroyed in ExecEndNode > > (ExecEndGather(Merge)) code path. The problem with this idea is that > > ExitParallelMode doesn't expect parallel context to be active. Now, > > we can either change the location of Exit/EnterParallelMode or relax > > that restriction. As mentioned above that restriction appears good to > > me, so I am not in favor of changing it unless we have some other > > solid way to install it. I am not sure if this idea is better than > > other approaches we are discussing. > > > > > I have made a patch based on the above lines. > I have tested the scenarios which Thomas had shared in the earlier > mail and few more tests based on Thomas's tests. > I'm not sure if we will be going ahead with this solution or not. > Let me know your opinion on the same. > If you feel this approach is ok, we can add few of this tests into pg tests. > This patch is on the lines of what I had in mind, but I see some problems in this which are explained below. The other approach to fix this was to move Enter/ExitParallelMode to the outer layer. For ex., can we enter in parallel mode during InitPlan and exit from it during ExecEndPlan? That might not be good to backpatch, but it might turn out to be more robust than the current approach. Few comments on your patch: 1. @@ -569,13 +569,6 @@ ExecParallelCleanup(ParallelExecutorInfo *pei) if (pei->instrumentation) ExecParallelRetrieveInstrumentation(pei->planstate, pei->instrumentation); - - if (pei->pcxt != NULL) - { - DestroyParallelContext(pei->pcxt); - pei->pcxt = NULL; - } - pfree(pei); } Here, you have just removed parallel context-free, but I think we can't detach from parallel context area here as well, otherwise, it will create similar problems in other cases. Note, that we create the area only in ExecInitParallelPlan and just reuse it in ExecParallelReinitialize. So, if we allow getting it destroyed in ExecParallelCleanup (which is called via ExecShutdownNode), we won't have access to it in rescan code path. IT is better if we have a test for the same as well. I think we should only retrieve the instrumentation information here. Also, if we do that, then we might also want to change function name and comments atop of this function. 2. ExecEndGather(GatherState *node) { + ParallelExecutorInfo *pei = node->pei; ExecShutdownGather(node); + + if (pei != NULL) + { + if (pei->pcxt != NULL) + { + DestroyParallelContext(pei->pcxt); + pei->pcxt = NULL; + } + + pfree(pei); + node->pei = NULL; + } I feel that it is better you move a collection of instrumentation information from ExecParallelCleanup to a separate function and then use ExecParallelCleanup here. 3. extern bool IsInParallelMode(void); +extern bool getParallelModeLevel(void); To be consistent, it better to name the function as GetParallelModeLevel. 4. @@ -1461,6 +1461,8 @@ ExecEndPlan(PlanState *planstate, EState *estate) ExecEndNode(subplanstate); } + if (estate->es_use_parallel_mode) + Assert (getParallelModeLevel() > 0 || !ParallelContextActive()); Add some comments here to explain about this Assert. I am not sure if this is correct because it won't fail even if the parallel mode is non-zero and there is no parallel context. At this stage, we must have exited the parallel mode. 5. explain analyze select count(*) from join_foo left join (select b1.id from join_bar b1 limit 1000) ss All the tests in your test file use left join to reproduce the issue, but I think it should be reproducible with inner join as well. This comment is not that your test case is wrong, but I want to see if we can further simplify it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Aug 7, 2019 at 5:45 AM vignesh C <vignesh21@gmail.com> wrote: > I have made a patch based on the above lines. > I have tested the scenarios which Thomas had shared in the earlier > mail and few more tests based on Thomas's tests. > I'm not sure if we will be going ahead with this solution or not. > Let me know your opinion on the same. > If you feel this approach is ok, we can add few of this tests into pg tests. I think this patch is bizarre: - It introduces a new function called getParallelModeLevel(), which is randomly capitalized different from the other functions that do similar things, and then uses it to do the same thing that could have been done with the existing function IsInParallelMode(). - It contains an "if" statement whose only content is an Assert(). Don't write if (a) Assert(b); write Assert(!a || b). - It contains zero lines of comment changes, which is obviously not enough for a patch that proposes to fix a very thorny issue. This failure has two parts. First, it adds no new comments to explain the bug being fixed or the theory of operation of the new code. Second, it does not even bother updating existing comments that are falsified by the patch, such as the function header comments for ExecParallelCleanup and ExecShutdownGather. - It changes what ExecParallelCleanup does while adjusting only one of the two callers to match the behavior change. nodeGatherMerge.c manages to be completed untouched by this patch. If you change what a function does, you really need to grep for all the calls to that function and adjust all callers to match the new set of expectations. It's a little hard to get past all of those issues and look at what the patch actually does, but I'm going to try: the theory of operation of the patch seems to be that we can skip destroying the parallel context when performing ExecParallelCleanup and in fact when exiting parallel mode, and then when we get to executor end time the context will still be there and we can fish the instrumentation out of it. But this seems problematic for several reasons. For one thing, as Amit already observed, the code currently contains an assertion which ensure that a ParallelContext can't outlive the time spent in parallel mode, and it doesn't seem desirable to relax that assertion (this patch removes it). But beyond that, the issue here is that the Limit node is shutting down the Gather node too early, and the right fix must be to stop doing that, not to change the definition of what it means to shut down a node, as this patch does. So maybe a possible approach here - which I think is more or less what Tom is proposing - is: 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode() gets called at the very end of execution, at least when execute_once is set, before exiting parallel mode. 3. Figure out, possibly at a later time or only in HEAD, how to make the early call to ExecLimit() in ExecShutdownNode(), and then put it back. I think we could do this by passing down some information indicating which nodes are potentially rescanned by other nodes higher up in the tree; there's the separate question of whether rescans can happen due to cursor operations, but the execute_once stuff can handle that aspect of it, I think. I'm not quite sure that approach is altogether correct so I'd appreciate some analysis on that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Aug 10, 2019 at 12:59 AM Robert Haas <robertmhaas@gmail.com> wrote: > But beyond that, the issue here is that the Limit node is shutting > down the Gather node too early, and the right fix must be to stop > doing that, not to change the definition of what it means to shut down > a node, as this patch does. So maybe a possible approach here - which > I think is more or less what Tom is proposing - is: > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode() > gets called at the very end of execution, at least when execute_once > is set, before exiting parallel mode. > 3. Figure out, possibly at a later time or only in HEAD, how to make > the early call to ExecLimit() in ExecShutdownNode(), and then put it > back. I think we could do this by passing down some information > indicating which nodes are potentially rescanned by other nodes higher > up in the tree; there's the separate question of whether rescans can > happen due to cursor operations, but the execute_once stuff can handle > that aspect of it, I think. > > I'm not quite sure that approach is altogether correct so I'd > appreciate some analysis on that point. I'm not sure exactly what we should do yet, but one thought I wanted to resurrect from older discussions is that we now think it was a mistake to give every Gather node its own DSM segment, having seen queries in the wild where that decision interacted badly with large number of partitions. In 13 we should try to figure out how to have a single DSM segment allocated for all Gather[Merge] nodes in the tree (and remove the embarrassing band-aid hack in commit fd7c0fa7). That's possibly relevant because it means we'd have a ParallelContext or some new overarching object that has a lifetime that is longer than the individual Gather nodes' processes and instrumentation data. I'm not saying we need to discuss any details of this other concern now, I'm just wondering out loud if the whole problem in this thread goes away automatically when we fix it. -- Thomas Munro https://enterprisedb.com
On 2019-Aug-12, Thomas Munro wrote: > That's possibly relevant because it means we'd have a ParallelContext > or some new overarching object that has a lifetime that is longer than > the individual Gather nodes' processes and instrumentation data. I'm > not saying we need to discuss any details of this other concern now, > I'm just wondering out loud if the whole problem in this thread goes > away automatically when we fix it. How likely is it that we would ever be able to release memory from a Sort (or, say, a hashjoin hash table) when it's done being read, but before completing the whole plan? As I understand, right now we hold onto a lot of memory after such plans have been fully read, for no good reason other than executor being unaware of this. This might not be directly related to the problem at hand, since it's not just parallel plans that are affected. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 12, 2019 at 3:07 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > How likely is it that we would ever be able to release memory from a > Sort (or, say, a hashjoin hash table) when it's done being read, but > before completing the whole plan? As I understand, right now we hold > onto a lot of memory after such plans have been fully read, for no good > reason other than executor being unaware of this. This might not be > directly related to the problem at hand, since it's not just parallel > plans that are affected. Being able to do that sort of thing was one of my goals in designing the ExecShutdownNode stuff. Unfortunately, it's clear from this bug report that it's still a few bricks short of a load, and Tom doesn't seem real optimistic about how easy it will be to buy those bricks at discount prices. But I hope we persist in trying to get there, because I don't like the idea of saying that we'll never be smart enough to know we're done with any part of the plan until we're definitely done with the whole thing. I think that's leaving too much money on the table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 13, 2019 at 7:07 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Aug-12, Thomas Munro wrote: > > That's possibly relevant because it means we'd have a ParallelContext > > or some new overarching object that has a lifetime that is longer than > > the individual Gather nodes' processes and instrumentation data. I'm > > not saying we need to discuss any details of this other concern now, > > I'm just wondering out loud if the whole problem in this thread goes > > away automatically when we fix it. > > How likely is it that we would ever be able to release memory from a > Sort (or, say, a hashjoin hash table) when it's done being read, but > before completing the whole plan? As I understand, right now we hold > onto a lot of memory after such plans have been fully read, for no good > reason other than executor being unaware of this. This might not be > directly related to the problem at hand, since it's not just parallel > plans that are affected. Right, AIUI we hold onto that memory because it's a nice optimisation to be able to rescan the sorted data or reuse the hash table (single batch, non-parallel hash joins only for now). We have no disincentive, because our memory model doesn't care about the total peak memory usage (ie all nodes). Some other RDBMSs do care about that, and somehow consider the peak memory usage (that is, considering early memory release) when comparing join orders. However, I think it's independent of the DSM lifetime question, because the main Parallel Context DSM segment is really small, it has a small fixed header and then a small object per parallel-aware node, and isn't used for holding the hash table for Parallel Hash and probably wouldn't be used for a future hypothetical Parallel Sort (if it's implemented the way I imagine at least). It contains a DSA area, which creates more DSM segments as it needs them, and nodes can opt to free DSA memory sooner, which will likely result in those extra DSM segments being freed; you can see that happening in Parallel Hash which in fact does give back memory quite eagerly. (I'm the first to admit that it's weird that DSM segments can hold DSA areas and DSA areas are made up of DSM segments; that falls out of the choice to use DSM segments both for storage and as a lifetime management system for shared resources, and I wouldn't be surprised if we reconsider that as we get more experience and ideas.) -- Thomas Munro https://enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > Being able to do that sort of thing was one of my goals in designing > the ExecShutdownNode stuff. Unfortunately, it's clear from this bug > report that it's still a few bricks short of a load, and Tom doesn't > seem real optimistic about how easy it will be to buy those bricks at > discount prices. But I hope we persist in trying to get there, because > I don't like the idea of saying that we'll never be smart enough to > know we're done with any part of the plan until we're definitely done > with the whole thing. I think that's leaving too much money on the > table. To clarify my position --- I think it's definitely possible to improve the situation a great deal. We "just" have to pass down more information about whether rescans are possible. What I don't believe is that that leads to a bug fix that would be sane to back-patch as far as 9.6. regards, tom lane
On Mon, Aug 12, 2019 at 5:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > To clarify my position --- I think it's definitely possible to improve > the situation a great deal. We "just" have to pass down more information > about whether rescans are possible. What I don't believe is that that > leads to a bug fix that would be sane to back-patch as far as 9.6. Sounds like a fair opinion. I'm not sure how complicated the fix would be so I don't know whether I agree with your opinion, but you usually have a fairly good intuition for such things, so... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 13, 2019 at 3:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > Being able to do that sort of thing was one of my goals in designing > > the ExecShutdownNode stuff. Unfortunately, it's clear from this bug > > report that it's still a few bricks short of a load, and Tom doesn't > > seem real optimistic about how easy it will be to buy those bricks at > > discount prices. But I hope we persist in trying to get there, because > > I don't like the idea of saying that we'll never be smart enough to > > know we're done with any part of the plan until we're definitely done > > with the whole thing. I think that's leaving too much money on the > > table. > > To clarify my position --- I think it's definitely possible to improve > the situation a great deal. We "just" have to pass down more information > about whether rescans are possible. > Right, you have speculated above that it is possible via adding some eflag bits. Can you please describe a bit more about that idea, so that somebody else can try to write a patch? I think if someone other than you try to write a patch without having some sort of upfront design, it might lead to a lot of re-work. It would be great if you have an interest in doing the leg work which can then be extended to fix the issue in the parallel query, but if not at least let us know the idea you have in mind in a bit more detail. > What I don't believe is that that > leads to a bug fix that would be sane to back-patch as far as 9.6. > Fair enough. In such a situation, we can plan to revert the earlier fix for Limit node and tell people that the same will be fixed in PG-13. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Tue, Aug 13, 2019 at 3:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> To clarify my position --- I think it's definitely possible to improve >> the situation a great deal. We "just" have to pass down more information >> about whether rescans are possible. > Right, you have speculated above that it is possible via adding some > eflag bits. Can you please describe a bit more about that idea, so > that somebody else can try to write a patch? Well, there are two components to solving this problem: 1. What are we going to do about the executor's external API? Right now, callers of ExecutorStart don't have to say whether they might call ExecutorRewind. We need some way for callers to make a binding promise that they won't do any such thing. Perhaps we just want to invent another flag that's like EXEC_FLAG_BACKWARD, but it's not clear how it should interact with the existing "soft" REWIND flag. Nor do I know how far up the call stack will we have to make changes to make it possible to promise as much as we can -- for instance, will we have to adapt the SPI interfaces? 2. What happens inside ExecutorStart in response to such promises? I imagine that we translate them into additional eflags bits that get passed down to node init functions, possibly with modification (e.g., nodeNestloop.c would have to revoke the no-rescans promise to its inner input). You'd need to work out what is the most convenient set of conventions (positive or negative sense of the flag bits, etc), and go through all the non-leaf node types to determine what they can pass down. (BTW, unless I'm missing something, there's not currently any enforcement of EXEC_FLAG_BACKWARD, ie a caller can fail to pass that and then try to back up anyway. We probably want to improve that situation, and also enforce this new flag about ExecutorRewind.) The reason I'm dubious about back-patching this is that each of these things seems likely to affect external code. Point 1 could affect external callers of the executor, and point 2 is likely to have consequences for FDWs and custom-scan providers. Maybe we can set things up so that everything defaults in a safe direction for unchanged code, but I don't want to contort the design just to do that. regards, tom lane
On Tue, Aug 13, 2019 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Tue, Aug 13, 2019 at 3:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> To clarify my position --- I think it's definitely possible to improve > >> the situation a great deal. We "just" have to pass down more information > >> about whether rescans are possible. > > > Right, you have speculated above that it is possible via adding some > > eflag bits. Can you please describe a bit more about that idea, so > > that somebody else can try to write a patch? > > Well, there are two components to solving this problem: > > 1. What are we going to do about the executor's external API? > > Right now, callers of ExecutorStart don't have to say whether they > might call ExecutorRewind. We need some way for callers to make a > binding promise that they won't do any such thing. Perhaps we just > want to invent another flag that's like EXEC_FLAG_BACKWARD, but it's > not clear how it should interact with the existing "soft" REWIND > flag. Yeah making it interact with REWIND will be a bit of challenge as I think to some extent the REWIND flag also indicates the same. Do I understand correctly that, we have some form of rule such that if EXEC_FLAG_REWIND is set or node's chgParam is NULL, then we can expect that node can support rescan? If it is true, then maybe we need to design this new flag in such a way that it covers existing cases of REWIND as well. Another point which I am wondering is why can't we use the existing REWIND flag to solve the current issue, basically if we have access to that information in nodeLimit.c (ExecLimit), then can't we just pass down that to ExecShutdownNode? I guess the problem could be that if LimitNode doesn't support REWIND, but one of the nodes beneath it supports that same, then we won't be able to rely on the information passed to ExecShutdownNode. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > Another point which I am wondering is why can't we use the existing > REWIND flag to solve the current issue, basically if we have access to > that information in nodeLimit.c (ExecLimit), then can't we just pass > down that to ExecShutdownNode? The existing REWIND flag tells subnodes whether they should *optimize* for getting rewound or not. I don't recall right now (well past midnight) why that seemed like a useful definition, but if you grep for places that are paying attention to that flag, I'm sure you'll find out. We probably don't want to give up that distinction --- if it had been equally good to define the flag as a hard yes-or-no, I'm sure we would have taken that definition, because it's simpler. regards, tom lane
On Fri, Aug 9, 2019 at 6:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 7, 2019 at 5:45 AM vignesh C <vignesh21@gmail.com> wrote: > > I have made a patch based on the above lines. > > I have tested the scenarios which Thomas had shared in the earlier > > mail and few more tests based on Thomas's tests. > > I'm not sure if we will be going ahead with this solution or not. > > Let me know your opinion on the same. > > If you feel this approach is ok, we can add few of this tests into pg tests. > > I think this patch is bizarre: > > - It introduces a new function called getParallelModeLevel(), which is > randomly capitalized different from the other functions that do > similar things, and then uses it to do the same thing that could have > been done with the existing function IsInParallelMode(). > - It contains an "if" statement whose only content is an Assert(). > Don't write if (a) Assert(b); write Assert(!a || b). > - It contains zero lines of comment changes, which is obviously not > enough for a patch that proposes to fix a very thorny issue. This > failure has two parts. First, it adds no new comments to explain the > bug being fixed or the theory of operation of the new code. Second, it > does not even bother updating existing comments that are falsified by > the patch, such as the function header comments for > ExecParallelCleanup and ExecShutdownGather. > - It changes what ExecParallelCleanup does while adjusting only one of > the two callers to match the behavior change. nodeGatherMerge.c > manages to be completed untouched by this patch. If you change what a > function does, you really need to grep for all the calls to that > function and adjust all callers to match the new set of expectations. > > It's a little hard to get past all of those issues and look at what > the patch actually does, but I'm going to try: the theory of operation > of the patch seems to be that we can skip destroying the parallel > context when performing ExecParallelCleanup and in fact when exiting > parallel mode, and then when we get to executor end time the context > will still be there and we can fish the instrumentation out of it. But > this seems problematic for several reasons. For one thing, as Amit > already observed, the code currently contains an assertion which > ensure that a ParallelContext can't outlive the time spent in parallel > mode, and it doesn't seem desirable to relax that assertion (this > patch removes it). > > But beyond that, the issue here is that the Limit node is shutting > down the Gather node too early, and the right fix must be to stop > doing that, not to change the definition of what it means to shut down > a node, as this patch does. So maybe a possible approach here - which > I think is more or less what Tom is proposing - is: > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). > Attached patch does that. I have also added one test as a separate patch so that later if we introduce shutting down resources in Limit node, we don't break anything. As of now, I have kept it separate for easy verification, but if we decide to go with this approach and test appears fine, we can merge it along with the fix. > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode() > gets called at the very end of execution, at least when execute_once > is set, before exiting parallel mode. > I am not sure if I completely understand this point. AFAICS, the ExecuteShutdownNode is called whenever we are done getting the tuples. One place where it is not there in that function is when we assume destination is closed, basically below code: ExecutePlan() { .. if (!dest->receiveSlot(slot, dest)) break; .. } Do you expect this case to be also dealt or you have something else in mind? The other possibility could be that we move the shutdown of the node at the end of the function when we exit parallel mode but doing that lead to some regression failure on my machine. I will investigate the same. > 3. Figure out, possibly at a later time or only in HEAD, how to make > the early call to ExecLimit() in ExecShutdownNode(), and then put it > back. Okay, Tom has suggested a design to address this, but that will be for HEAD only. To be clear, I am not planning to spend time on that at this moment, but OTOH, I want the bug reported in this thread to be closed, so for now, we need to proceed with some minimum fix as mentioned by you in above two points. If someone else can write a patch, I can help in the review of same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 9, 2019 at 6:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > But beyond that, the issue here is that the Limit node is shutting > > down the Gather node too early, and the right fix must be to stop > > doing that, not to change the definition of what it means to shut down > > a node, as this patch does. So maybe a possible approach here - which > > I think is more or less what Tom is proposing - is: > > > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). > > > > Attached patch does that. I have also added one test as a separate > patch so that later if we introduce shutting down resources in Limit > node, we don't break anything. As of now, I have kept it separate for > easy verification, but if we decide to go with this approach and test > appears fine, we can merge it along with the fix. > I have merged the code change and test case patch as I felt that it is good to cover this case. I have slightly changed the test case to make its output predictable (made the inner scan ordered so that the query always produces the same result). One more thing I am not able to come up with some predictable test case for 9.6 branches as it doesn't support Gather Merge which is required for this particular test to always produce predictable output. There could be some better way to write this test, so any input in that regards or otherwise is welcome. So, if we commit this patch the containing test case will be for branches HEAD~10, but the code will be for HEAD~9.6. > > 2. Adjust ExecutePlan() so that it ensures that ExecuteShutdownNode() > > gets called at the very end of execution, at least when execute_once > > is set, before exiting parallel mode. > > > > I am not sure if I completely understand this point. AFAICS, the > ExecuteShutdownNode is called whenever we are done getting the tuples. > One place where it is not there in that function is when we assume > destination is closed, basically below code: > ExecutePlan() > { > .. > if (!dest->receiveSlot(slot, dest)) > break; > .. > } > > Do you expect this case to be also dealt or you have something else in > mind? > It still appears problematic, but I couldn't come up with a test case to reproduce the problem. I'll try some more on this, but I think this anyway can be done separately once we have a test to show the problem. > The other possibility could be that we move the shutdown of the > node at the end of the function when we exit parallel mode but doing > that lead to some regression failure on my machine. I will > investigate the same. > This was failing because use_parallel_mode flag in function ExecutePlan() won't be set for workers and hence they won't get a chance to accumulate its stats. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, Sep 5, 2019 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 2, 2019 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Aug 9, 2019 at 6:29 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > > > > But beyond that, the issue here is that the Limit node is shutting > > > down the Gather node too early, and the right fix must be to stop > > > doing that, not to change the definition of what it means to shut down > > > a node, as this patch does. So maybe a possible approach here - which > > > I think is more or less what Tom is proposing - is: > > > > > > 1. Remove the code from ExecLimit() that calls ExecShutdownNode(). > > > > > > > Attached patch does that. I have also added one test as a separate > > patch so that later if we introduce shutting down resources in Limit > > node, we don't break anything. As of now, I have kept it separate for > > easy verification, but if we decide to go with this approach and test > > appears fine, we can merge it along with the fix. > > > > I have merged the code change and test case patch as I felt that it is > good to cover this case. I have slightly changed the test case to > make its output predictable (made the inner scan ordered so that the > query always produces the same result). One more thing I am not able > to come up with some predictable test case for 9.6 branches as it > doesn't support Gather Merge which is required for this particular > test to always produce predictable output. There could be some better > way to write this test, so any input in that regards or otherwise is > welcome. So, if we commit this patch the containing test case will be > for branches HEAD~10, but the code will be for HEAD~9.6. > Robert, Thomas, do you have any more suggestions related to this. I am planning to commit the above-discussed patch (Forbid Limit node to shutdown resources.) coming Monday, so that at least the reported problem got fixed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Robert, Thomas, do you have any more suggestions related to this. I > am planning to commit the above-discussed patch (Forbid Limit node to > shutdown resources.) coming Monday, so that at least the reported > problem got fixed. I think that your commit message isn't very clear about what the actual issue is. And the patch itself doesn't add any comments or anything to try to clear it up. So I wouldn't favor committing it in this form. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 13, 2019 at 1:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Robert, Thomas, do you have any more suggestions related to this. I > > am planning to commit the above-discussed patch (Forbid Limit node to > > shutdown resources.) coming Monday, so that at least the reported > > problem got fixed. > > I think that your commit message isn't very clear about what the > actual issue is. And the patch itself doesn't add any comments or > anything to try to clear it up. So I wouldn't favor committing it in > this form. Is the proposed commit message at the bottom of this email an improvement? Do I understand correctly that, with this patch, we can only actually lose statistics in the case where we rescan? That is, precisely the case that crashes (9.6) or spews warnings (10+)? In a quick non-rescan test with the ExecShutdownNode() removed, I don't see any problem with the buffer numbers on my screen: postgres=# explain (analyze, buffers, timing off, costs off) select count(*) from t limit 50000; QUERY PLAN ------------------------------------------------------------------------------ Limit (actual rows=1 loops=1) Buffers: shared hit=16210 read=28038 -> Finalize Aggregate (actual rows=1 loops=1) Buffers: shared hit=16210 read=28038 -> Gather (actual rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=16210 read=28038 -> Partial Aggregate (actual rows=1 loops=3) Buffers: shared hit=16210 read=28038 -> Parallel Seq Scan on t (actual rows=3333333 loops=3) Buffers: shared hit=16210 read=28038 Planning Time: 0.086 ms Execution Time: 436.669 ms (14 rows) === Don't shut down Gather[Merge] early under Limit. Revert part of commit 19df1702f5. Early shutdown was added by that commit so that we could collect statistics from workers, but unfortunately it interacted badly with rescans. Rescanning a Limit over a Gather node could produce a SEGV on 9.6 and resource leak warnings on later releases. By reverting the early shutdown code, we might lose statistics in some cases of Limit over Gather, but that will require further study to fix. Author: Amit Kapila, testcase by Vignesh C Reported-by: Jerry Sievers Diagnosed-by: Thomas Munro Backpatch-through: 9.6 Discussion: https://postgr.es/m/87ims2amh6.fsf@jsievers.enova.com ===
On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Sep 13, 2019 at 1:35 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 12, 2019 at 8:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Robert, Thomas, do you have any more suggestions related to this. I > > > am planning to commit the above-discussed patch (Forbid Limit node to > > > shutdown resources.) coming Monday, so that at least the reported > > > problem got fixed. > > > > I think that your commit message isn't very clear about what the > > actual issue is. And the patch itself doesn't add any comments or > > anything to try to clear it up. So I wouldn't favor committing it in > > this form. > > Is the proposed commit message at the bottom of this email an improvement? > > Do I understand correctly that, with this patch, we can only actually > lose statistics in the case where we rescan? > No, it will lose without rescan as well. To understand in detail, you might want to read the emails pointed by me in one of the above email [1] in this thread. > That is, precisely the > case that crashes (9.6) or spews warnings (10+)? In a quick > non-rescan test with the ExecShutdownNode() removed, I don't see any > problem with the buffer numbers on my screen: > Try by removing aggregate function. Basically, the Limit node has to finish before consuming all the rows sent by a parallel node beneath it. > > === > Don't shut down Gather[Merge] early under Limit. > > Revert part of commit 19df1702f5. > > Early shutdown was added by that commit so that we could collect > statistics from workers, but unfortunately it interacted badly with > rescans. Rescanning a Limit over a Gather node could produce a SEGV > on 9.6 and resource leak warnings on later releases. By reverting the > early shutdown code, we might lose statistics in some cases of Limit > over Gather, but that will require further study to fix. > How about some text like below? I have added slightly different text to explain the reason for the problem. "Early shutdown was added by that commit so that we could collect statistics from workers, but unfortunately, it interacted badly with rescans. The problem is that we ended up destroying the parallel context which is required for rescans. This leads to rescans of a Limit node over a Gather node to produce unpredictable results as it tries to access destroyed parallel context. By reverting the early shutdown code, we might lose statistics in some cases of Limit over Gather, but that will require further study to fix." I am not sure but we can even add a comment in the place where we are removing some code (in nodeLimit.c) to indicate that 'Ideally we should shutdown parallel resources here to get the correct stats, but that would lead to rescans misbehaving when there is a Gather [Merge] node beneath it. (Explain the reason for misbehavior and the ideas we discussed in this thread to fix the same) ........." I can try to come up with comments in nodeLimit.c on the above lines if we think that is a good idea? [1] - https://www.postgresql.org/message-id/CAA4eK1Ja-eoavXcr0eq7w7hP%3D64VP49k%3DNMFxwhtK28NHfBOdA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2019 at 10:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > === > > Don't shut down Gather[Merge] early under Limit. > > > > Revert part of commit 19df1702f5. > > > > Early shutdown was added by that commit so that we could collect > > statistics from workers, but unfortunately it interacted badly with > > rescans. Rescanning a Limit over a Gather node could produce a SEGV > > on 9.6 and resource leak warnings on later releases. By reverting the > > early shutdown code, we might lose statistics in some cases of Limit > > over Gather, but that will require further study to fix. > > > > How about some text like below? I have added slightly different text > to explain the reason for the problem. > > "Early shutdown was added by that commit so that we could collect > statistics from workers, but unfortunately, it interacted badly with > rescans. The problem is that we ended up destroying the parallel > context which is required for rescans. This leads to rescans of a > Limit node over a Gather node to produce unpredictable results as it > tries to access destroyed parallel context. By reverting the early > shutdown code, we might lose statistics in some cases of Limit over > Gather, but that will require further study to fix." > > I am not sure but we can even add a comment in the place where we are > removing some code (in nodeLimit.c) to indicate that 'Ideally we > should shutdown parallel resources here to get the correct stats, but > that would lead to rescans misbehaving when there is a Gather [Merge] > node beneath it. (Explain the reason for misbehavior and the ideas we > discussed in this thread to fix the same) ........." > > I can try to come up with comments in nodeLimit.c on the above lines > if we think that is a good idea? > I have modified the commit message as proposed above and additionally added comments in nodeLimit.c. I think we should move ahead with this bug-fix patch. If we don't like the comment, it can anyway be improved later. Any suggestions? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Nov 18, 2019 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > I have modified the commit message as proposed above and additionally > added comments in nodeLimit.c. I think we should move ahead with this > bug-fix patch. If we don't like the comment, it can anyway be > improved later. > > Any suggestions? > If there are no further suggestions or objections, I will commit this early next week, probably on Monday. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 20, 2019 at 5:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 18, 2019 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I have modified the commit message as proposed above and additionally > > added comments in nodeLimit.c. I think we should move ahead with this > > bug-fix patch. If we don't like the comment, it can anyway be > > improved later. > > > > Any suggestions? > > > > If there are no further suggestions or objections, I will commit this > early next week, probably on Monday. > Yesterday, I pushed this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com