Re: [BUGS] bug or simply not enough stack space? - Mailing list pgsql-bugs

From Tom Lane
Subject Re: [BUGS] bug or simply not enough stack space?
Date
Msg-id 14790.1570205194@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUGS] bug or simply not enough stack space?  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-bugs
Merlin Moncure <mmoncure@gmail.com> writes:
> Reviving this ancient thread.  I saw "did not find subXID" errors, in
> 9.6.12.  Here is what happened.

> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: WARNING:  did not find subXID 384134 in MyProc
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: LOG:  could not send data to client: Broken
> pipe
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: STATEMENT:  select
> LoadHistoryDataFromYSM_testing2();
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: ERROR:  failed to re-find shared lock object
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: CONTEXT:  PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms@cds2]
> [10.22.236.83(54943)]: STATEMENT:  select
> LoadHistoryDataFromYSM_testing2();

[ and then we get into recursive error-during-error-cleanup failures ]

Yeah, something has left stuff in a bad state here.

> *) "loadhistorydatafromysm_testing2()" is using pl/sh, which is a
> known source of weird (but rare) instability issues (I'm assuming this
> is underlying cause of issue)

Hm.  Yeah, I'd be way more interested if this could be reproduced
without pl/sh.

> I can't help but wonder if we have some kind of obscure issue that is
> related to C extension problems; just throwing a data point on the
> table.

Well, there's nothing too obscure about the rule that error cleanup
needs to avoid doing anything that might cause another error, for fear
of causing infinite recursion.  I suspect that the underlying issue is
that pl/sh is violating that rule somewhere.  The other thread you point
to suggests that maybe oracle_fdw also used to do that, and fixed it.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #16038: Alter table - SegFault
Next
From: PG Bug reporting form
Date:
Subject: BUG #16039: PANIC when activating replication slots in Postgres 12.0 64bit under Windows