Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free - Mailing list pgsql-bugs

From Chris Hofstaedtler
Subject Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free
Date
Msg-id adTAq--RZYaHjxP9@zeha.at
Whole thread Raw
In response to Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free
List pgsql-bugs
* Thomas Munro <thomas.munro@gmail.com> [260328 08:03]:
>On Thu, Feb 26, 2026 at 11:28 AM PG Bug reporting form
><noreply@postgresql.org> wrote:
>> #0  0x0000564830c3c9d1 in dsa_free (area=0x5648621a9280, dp=<optimized out>)
>> at ./build/../src/backend/utils/mmgr/dsa.c:845
>> 845     ./build/../src/backend/utils/mmgr/dsa.c: No such file or directory.
>> (gdb) bt full
>> #0  0x0000564830c3c9d1 in dsa_free (area=0x5648621a9280, dp=<optimized out>)
>> at ./build/../src/backend/utils/mmgr/dsa.c:845
>>         segment_map = 0x5648621a92b8
>>         pageno = 0
>>         span_pointer = 0
>>         span = <optimized out>
>>         superblock = <optimized out>
>>         object = <optimized out>
>>         size = <optimized out>
>>         size_class = <optimized out>
>> #1  0x0000564830917ae1 in ExecHashTableDetach (hashtable=0x564862147fa0) at
>> ./build/../src/backend/executor/nodeHash.c:3439
>>         i = <optimized out>
>>         pstate = 0x7ef9970b5040
>
>Hmm, it's not supposed to be possible for two backends to reach the
>dsa_free() at nodeHash.c:3439.  But it does smell a bit like that
>that's what happened... hmmm. We were in _RUN phase, and then this
>barrier should only let one process through to _FREE.  I'd be
>interested in the contents of pstate->build_barrier in frame 1 if you
>get another core file with the same stack in it.

Sorry for the lack of followup. Currently we are thinking our 
problem matches the bug described in https://postgr.es/m/044401dcabac$fe432490$fac96db0$@icloud.com
("[BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment 
odd-sized path").

>Can you share the
>outline of the query plan?   Does it happen to have a full/right PHJ
>in it?

It appears we forgot to capture more of the logs (before they expired).
Here is a plan from a smaller machine carrying a similar dataset -
unfortunately I cannot guarantee it's the same plan, although it seems close.
Indeed it has a "Parallel Hash Right Join" node.


  Nested Loop  (cost=13030498.63..46923224.94 rows=21144 width=36)
    ->  Nested Loop  (cost=13030498.19..46911677.41 rows=21144 width=44)
          ->  GroupAggregate  (cost=13030497.75..46808339.62 rows=54669 width=2120)
                Group Key: subscriber.id
                Filter: (last_agg(nsr.voicemail_present ORDER BY nsr.sort) = 'true'::text)
                ->  Gather Merge  (cost=13030497.75..19559670.85 rows=53993659 width=23)
                      Workers Planned: 5
                      ->  Sort  (cost=13029497.68..13056494.51 rows=10798732 width=23)
                            Sort Key: subscriber.id, nsr.sort
                            ->  Parallel Hash Left Join  (cost=10633766.78..11767970.44 rows=10798732 width=23)
                                  Hash Cond: (subscriber.id = subscription.subscriber_id)
                                  Filter: ((nsr.postpaid IS NULL) OR (nsr.postpaid = (subscription.id IS NOT NULL)))
                                  ->  Hash Left Join  (cost=10585690.41..11619799.44 rows=11184278 width=24)
                                        Hash Cond: (subscriber.id = subscriber_options.subscriber_id)
                                        Filter: ((nsr.option_ids IS NULL) OR (nsr.option_ids <@
subscriber_options.option_ids))
                                        ->  Parallel Hash Right Join  (cost=9142123.87..10141640.84 rows=13177920
width=49)
                                              Hash Cond: ((optionvalue_lso.subscriber_id = subscriber.id) AND
(optionvalue_lso.option_id= nsr.optionvalue_option_id) AND (optionvalue_lso.value = nsr.optionvalue_value))
 
                                              Filter: ((nsr.optionvalue_option_id IS NULL) OR (optionvalue_lso.id IS
NOTNULL))
 
                                              ->  Parallel Seq Scan on lnk_subscriber_option optionvalue_lso
(cost=0.00..728660.20rows=1901347 width=21)
 
                                                    Filter: ((activationdate <= now()) AND (deletiondate > now()))
                                              ->  Parallel Hash  (cost=8769950.27..8769950.27 rows=13177920 width=62)
                                                    ->  Parallel Hash Join  (cost=2888653.49..8769950.27 rows=13177920
width=62)
                                                          Hash Cond: (lnk_subscriber_simcard.subscriber_id =
subscriber.id)
                                                          Join Filter: (((nsr.brand_id IS NULL) OR (nsr.brand_id =
subscriber.brand_id))AND ((nsr.registered IS NULL) OR (nsr.registered = subscriber.registered)) AND
((nsr.customer_category_idsIS NULL) OR (basedata.customercategory_id = ANY (nsr.customer_category_ids))) AND
((nsr.subscriberstatus_idsIS NULL) OR (lnk_subscriber_status.subscriberstatus_id = ANY (nsr.subscriberstatus_ids))))
 
                                                          ->  Nested Loop  (cost=1148075.17..6707980.64 rows=34348223
width=126)
                                                                Join Filter: ((simcard.imsi >= nsr.imsi_start) AND
(simcard.imsi<= nsr.imsi_end))
 
                                                                ->  Parallel Hash Join  (cost=1148075.17..1298129.01
rows=1807801width=12)
 
                                                                      Hash Cond: (lnk_subscriber_simcard.simcard_id =
simcard.id)
                                                                      ->  Parallel Seq Scan on lnk_subscriber_simcard
(cost=0.00..145308.36rows=1807801 width=8)
 
                                                                            Filter: ((activationdate <= now()) AND
(deletiondate> now()))
 
                                                                      ->  Parallel Hash  (cost=1118684.52..1118684.52
rows=2351252width=12)
 
                                                                            ->  Parallel Seq Scan on simcard
(cost=0.00..1118684.52rows=2351252 width=12)
 
                                                                ->  Materialize  (cost=0.00..7.56 rows=171 width=138)
                                                                      ->  Seq Scan on network_settings_rule nsr
(cost=0.00..6.71rows=171 width=138)
 
                                                          ->  Parallel Hash  (cost=1717950.74..1717950.74 rows=1810207
width=21)
                                                                ->  Parallel Hash Left Join
(cost=778201.62..1717950.74rows=1810207 width=21)
 
                                                                      Hash Cond: (subscriber.id =
lnk_subscriber_basedata.subscriber_id)
                                                                      ->  Parallel Hash Join
(cost=342951.95..1270264.89rows=1810207 width=17)
 
                                                                            Hash Cond:
(lnk_subscriber_status.subscriber_id= subscriber.id)
 
                                                                            ->  Parallel Bitmap Heap Scan on
lnk_subscriber_status (cost=102927.10..1025488.25 rows=1810207 width=8)
 
                                                                                  Recheck Cond: (deletiondate > now())
                                                                                  Filter: (activationdate <= now())
                                                                                  ->  Bitmap Index Scan on
idx_subscriber_status_deletiondate (cost=0.00..100664.35 rows=9051037 width=0)
 
                                                                                        Index Cond: (deletiondate >
now())
                                                                            ->  Parallel Hash
(cost=216653.82..216653.82rows=1869682 width=9)
 
                                                                                  ->  Parallel Seq Scan on subscriber
(cost=0.00..216653.82rows=1869682 width=9)
 
                                                                      ->  Parallel Hash  (cost=413397.43..413397.43
rows=1748179width=12) 
                                                                            ->  Parallel Hash Left Join
(cost=228746.63..413397.43rows=1748179 width=12)
 
                                                                                  Hash Cond:
(lnk_subscriber_basedata.basedata_id= basedata.id)
 
                                                                                  ->  Parallel Seq Scan on
lnk_subscriber_basedata (cost=0.00..180061.84 rows=1748179 width=12)
 
                                                                                        Filter: ((now() >=
activationdate)AND (now() < deletiondate))
 
                                                                                  ->  Parallel Hash
(cost=201487.39..201487.39rows=2180739 width=8)
 
                                                                                        ->  Parallel Seq Scan on
basedata (cost=0.00..201487.39 rows=2180739 width=8)
 
                                        ->  Hash  (cost=1434914.41..1434914.41 rows=692170 width=36)
                                              ->  Subquery Scan on subscriber_options  (cost=1345069.22..1434914.41
rows=692170width=36)
 
                                                    ->  HashAggregate  (cost=1345069.22..1427992.71 rows=692170
width=36)
                                                          Group Key: lnk_subscriber_option.subscriber_id
                                                          Planned Partitions: 8
                                                          ->  Index Scan using idx_option_subscriber_combined on
lnk_subscriber_option (cost=0.57..1033129.48 rows=9506735 width=8)
 
                                                                Index Cond: ((activationdate <= now()) AND
(deletiondate> now()))
 
                                  ->  Parallel Hash  (cost=46189.54..46189.54 rows=150946 width=8)
                                        ->  Parallel Hash Join  (cost=14649.64..46189.54 rows=150946 width=8)
                                              Hash Cond: (lnk_subscription_status.subscription_id = subscription.id)
                                              ->  Hash Join  (cost=1.27..31144.93 rows=150946 width=4)
                                                    Hash Cond: (lnk_subscription_status.subscriptionstatus_id =
subscriptionstatus.id)
                                                    ->  Parallel Seq Scan on lnk_subscription_status
(cost=0.00..30421.01rows=211324 width=8)
 
                                                          Filter: ((activationdate <= now()) AND (deletiondate >
now()))
                                                    ->  Hash  (cost=1.14..1.14 rows=10 width=4)
                                                          ->  Seq Scan on subscriptionstatus  (cost=0.00..1.14 rows=10
width=4)
                                                                Filter: activestate
                                              ->  Parallel Hash  (cost=11245.39..11245.39 rows=272239 width=8)
                                                    ->  Parallel Seq Scan on subscription  (cost=0.00..11245.39
rows=272239width=8)
 
          ->  Index Scan using idx_lnk_subscriber_msisdn_subscriber_deletiondate on lnk_subscriber_msisdn
(cost=0.44..1.87rows=1 width=8)
 
                Index Cond: ((subscriber_id = subscriber.id) AND (deletiondate > now()))
                Filter: (activationdate <= now())
    ->  Index Scan using pkey_msisdn on msisdn  (cost=0.43..0.53 rows=1 width=20)
          Index Cond: (id = lnk_subscriber_msisdn.msisdn_id)


For now our plan is to wait for the next 18.x stable release, and then reenable
enable_parallel_hash. Then see if it occours again.

As a side-note, my colleagues noted that the problematic query has 
drastic better timings with enable_parallel_hash = off. Something on our list
for future exploration.

Best,
Chris




pgsql-bugs by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18
Next
From: John Naylor
Date:
Subject: Re: BUG #19450: Where is checksum_block.inc.c after master install?