Re: [BUGS] bug or simply not enough stack space? - Mailing list pgsql-bugs

From Merlin Moncure
Subject Re: [BUGS] bug or simply not enough stack space?
Date
Msg-id CAHyXU0yofd1JJs-B3n7qW7AjfjDuHm4VPuHXXqUMQ02E0UOP4A@mail.gmail.com
Whole thread Raw
In response to Re: bug or simply not enough stack space?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUGS] bug or simply not enough stack space?
List pgsql-bugs
On Sat, Jul 18, 2009 at 2:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> >> I think I might have been overenthusiastic in trying to free resources
> >> during a subtransaction abort.  Try this patch and see if you notice
> >> any bad side-effects.
>
> > All examples I had that crashed and burned, now work correctly and/or bail out
> > correctly where needed.
>
> > No side-effects noticed.
>
> Well, it had some :-(.  I've committed a better version of that patch,
> which will be in 8.4.1.  However, if you're not seeing any serious
> memory leakage then you could keep using the patch you have for the
> time being.

Reviving this ancient thread.  I saw "did not find subXID" errors, in
9.6.12.  Here is what happened.


2019-10-03 19:54:01 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (14 seconds apart)
2019-10-03 19:54:01 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:54:15 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (14 seconds apart)
2019-10-03 19:54:15 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: LOG:  could not send data to client: Broken
pipe
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: STATEMENT:  select
LoadHistoryDataFromYSM_testing2();
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: STATEMENT:  select
LoadHistoryDataFromYSM_testing2();
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  AbortSubTransaction while in ABORT
state
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:53:31 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (16 seconds apart)
2019-10-03 19:53:31 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:53:31 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (16 seconds apart)
2019-10-03 19:53:31 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:53:47 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (16 seconds apart)
2019-10-03 19:53:47 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:54:01 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (14 seconds apart)
2019-10-03 19:54:01 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:54:15 CDT [: @] []: LOG:  checkpoints are occurring too
frequently (14 seconds apart)
2019-10-03 19:54:15 CDT [: @] []: HINT:  Consider increasing the
configuration parameter "max_wal_size".
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: LOG:  could not send data to client: Broken
pipe
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: STATEMENT:  select
LoadHistoryDataFromYSM_testing2();
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
loadhistorydatafromysm_testing2() line 99 during exception cleanup
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: STATEMENT:  select
LoadHistoryDataFromYSM_testing2();
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  AbortSubTransaction while in ABORT
state
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  AbortSubTransaction while in ABORT
state
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  AbortSubTransaction while in ABORT
state
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  AbortSubTransaction while in ABORT
state
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  you don't own a lock of type
ExclusiveLock
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: WARNING:  ReleaseLockIfHeld: failed??
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
[10.22.236.83(54943)]: PANIC:  ERRORDATA_STACK_SIZE exceeded
2019-10-03 19:58:37 CDT [: @] []: LOG:  server process (PID 10715) was
terminated by signal 6: Aborted
2019-10-03 19:58:37 CDT [: @] []: LOG:  terminating any other active
server processes

Couple interesting things here:
*) Logs are strongly out of order
*) "loadhistorydatafromysm_testing2()" is using pl/sh, which is a
known source of weird (but rare) instability issues (I'm assuming this
is underlying cause of issue)
*) everything restarted and is working fine
*) chance of providing reproducible case is zero

Bringing this up due to this thread:
https://github.com/laurenz/oracle_fdw/issues/328

I can't help but wonder if we have some kind of obscure issue that is
related to C extension problems; just throwing a data point on the
table.

merlin



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16038: Alter table - SegFault
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #16038: Alter table - SegFault