Merlin Moncure <mmoncure@gmail.com> writes:
> Reviving this ancient thread. I saw "did not find subXID" errors, in
> 9.6.12. Here is what happened.
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: WARNING: did not find subXID 384134 in MyProc
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: LOG: could not send data to client: Broken
> pipe
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: STATEMENT: select
> LoadHistoryDataFromYSM_testing2();
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: ERROR: failed to re-find shared lock object
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: STATEMENT: select
> LoadHistoryDataFromYSM_testing2();
[ and then we get into recursive error-during-error-cleanup failures ]
Yeah, something has left stuff in a bad state here.
> *) "loadhistorydatafromysm_testing2()" is using pl/sh, which is a
> known source of weird (but rare) instability issues (I'm assuming this
> is underlying cause of issue)
Hm. Yeah, I'd be way more interested if this could be reproduced
without pl/sh.
> I can't help but wonder if we have some kind of obscure issue that is
> related to C extension problems; just throwing a data point on the
> table.
Well, there's nothing too obscure about the rule that error cleanup
needs to avoid doing anything that might cause another error, for fear
of causing infinite recursion. I suspect that the underlying issue is
that pl/sh is violating that rule somewhere. The other thread you point
to suggests that maybe oracle_fdw also used to do that, and fixed it.
regards, tom lane