Thread: PANIC :Call AbortTransaction when transaction id is no normal

PANIC :Call AbortTransaction when transaction id is no normal

From
Thunder
Date:

Hello,

The process crashed when running in bootstrap mode and received signal to  shutdown.

From the call stack we can see that the transaction id is 1, which is BootstrapTransactionId.

During TransactionLogFetch function, which fetch commit status of specified transaction id, it will return COMITTED when transcation id is BootstrapTransactionId.

Then it will crash because we cannot abort transaction while it was already committed.

(gdb) bt
#0  0x00007f4598f02617 in raise () from /lib64/libc.so.6
#1  0x00007f4598f03d08 in abort () from /lib64/libc.so.6
#2  0x000000000106001d in errfinish (dummy=0) at elog.c:564
#3  0x0000000001064788 in elog_finish (elevel=22, fmt=0x1190408 "cannot abort transaction %u, it was already committed") at elog.c:1385
#4  0x0000000000633e5a in RecordTransactionAbort (isSubXact=false) at xact.c:1584
#5  0x00000000006361fa in AbortTransaction () at xact.c:2614
#6  0x000000000063a5b9 in AbortOutOfAnyTransaction () at xact.c:4423
#7  0x000000000108f417 in ShutdownPostgres (code=1, arg=0) at postinit.c:1221
#8  0x0000000000d3554a in shmem_exit (code=1) at ipc.c:239
#9  0x0000000000d35395 in proc_exit_prepare (code=1) at ipc.c:194
#10 0x0000000000d352bb in proc_exit (code=1) at ipc.c:107
#11 0x000000000105ffbb in errfinish (dummy=0) at elog.c:550
#12 0x0000000000d9a020 in ProcessInterrupts () at postgres.c:3019
#13 0x00000000005489aa in heapgetpage (scan=0x34895f0, page=1) at heapam.c:384
#14 0x000000000054c48d in heapgettup_pagemode (scan=0x34895f0, dir=ForwardScanDirection, nkeys=1, key=0x33cf150) at heapam.c:1052
#15 0x000000000054e3e0 in heap_getnext (scan=0x34895f0, direction=ForwardScanDirection) at heapam.c:1850
#16 0x00000000006fe364 in index_update_stats (rel=0x7f459b770030, hasindex=true, reltuples=0) at index.c:2167
#17 0x00000000006ff40b in index_build (heapRelation=0x7f459b770030, indexRelation=0x7f459b704ca8, indexInfo=0x345c008, isprimary=false, isreindex=false, parallel=false) at index.c:2398
#18 0x00000000006e3ae3 in build_indices () at bootstrap.c:1112
#19 0x00000000006dbc8b in boot_yyparse () at bootparse.y:399
#20 0x00000000006e17f4 in BootstrapModeMain () at bootstrap.c:516
#21 0x00000000006e1578 in AuxiliaryProcessMain (argc=6, argv=0x33299b8) at bootstrap.c:436
#22 0x0000000000aab0be in main (argc=7, argv=0x33299b0) at main.c:220
(gdb) f 4
#4  0x0000000000633e5a in RecordTransactionAbort (isSubXact=false) at xact.c:1584
1584                    elog(PANIC, "cannot abort transaction %u, it was already committed",
(gdb) p xid
$4 = 1

I try to fix this issue and check whether it's normal transaction id before we do abort.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 20feeec327..dbf2bf567a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4504,8 +4504,13 @@ RollbackAndReleaseCurrentSubTransaction(void)
 void
 AbortOutOfAnyTransaction(void)
 {
+       TransactionId xid = GetCurrentTransactionIdIfAny();
        TransactionState s = CurrentTransactionState;
 
+       /* Check to see if the transaction ID is a permanent one because we cannot abort it */
+       if (!TransactionIdIsNormal(xid))
+               return;
+
        /* Ensure we're not running in a doomed memory context */
        AtAbort_Memory();

Can we fix in this way?

Thanks

Ray





 

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Kuntal Ghosh
Date:
Hello,

On Mon, May 13, 2019 at 10:15 AM Thunder <thunder1@126.com> wrote:

I try to fix this issue and check whether it's normal transaction id before we do abort.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 20feeec327..dbf2bf567a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4504,8 +4504,13 @@ RollbackAndReleaseCurrentSubTransaction(void)
 void
 AbortOutOfAnyTransaction(void)
 {
+       TransactionId xid = GetCurrentTransactionIdIfAny();
        TransactionState s = CurrentTransactionState;
 
+       /* Check to see if the transaction ID is a permanent one because we cannot abort it */
+       if (!TransactionIdIsNormal(xid))
+               return;
+
        /* Ensure we're not running in a doomed memory context */
        AtAbort_Memory();

Can we fix in this way?

If we fix the issue in this way, we're certainly not going to do all those portal,locks,memory,resource owner cleanups that are done inside AbortTransaction() for a normal transaction ID. But, I'm not sure how relevant those steps are since the database is anyway shutting down.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Michael Paquier
Date:
On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
> If we fix the issue in this way, we're certainly not going to do all
> those portal,locks,memory,resource owner cleanups that are done
> inside AbortTransaction() for a normal transaction ID. But, I'm not
> sure how relevant those steps are since the database is anyway
> shutting down.

And it is happening in bootstrap, meaning that the data folder is most
likely toast, and needs to be reinitialized.  TransactionLogFetch()
treats bootstrap and frozen XIDs as always committed, so from this
perspective it is not wrong either to complain that this transaction
has already been committed when attempting to abort it.  Not sure
what's a more user-friendly behavior in this case though.
--
Michael

Attachment

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
>> If we fix the issue in this way, we're certainly not going to do all
>> those portal,locks,memory,resource owner cleanups that are done
>> inside AbortTransaction() for a normal transaction ID. But, I'm not
>> sure how relevant those steps are since the database is anyway
>> shutting down.

> And it is happening in bootstrap, meaning that the data folder is most
> likely toast, and needs to be reinitialized.

Indeed, initdb is going to remove the data directory if the bootstrap run
crashes.

If we do anything at all about this, my thought would just be to change
bootstrap_signals() so that it points all the signal handlers at
quickdie(), or maybe something equivalent to quickdie() but printing
a more apropos message, or even just set them all to SIGDFL since that
means process termination for all of these.  die() isn't really the right
thing, precisely because it thinks it can trigger transaction abort,
which makes no sense in bootstrap mode.

But ... that code's been like that for decades and nobody's complained
before.  Why are we worried about bootstrap's response to signals at all?

            regards, tom lane



Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Kuntal Ghosh
Date:
On Mon, May 13, 2019 at 7:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, May 13, 2019 at 01:25:19PM +0530, Kuntal Ghosh wrote:
>> If we fix the issue in this way, we're certainly not going to do all
>> those portal,locks,memory,resource owner cleanups that are done
>> inside AbortTransaction() for a normal transaction ID. But, I'm not
>> sure how relevant those steps are since the database is anyway
>> shutting down.

> And it is happening in bootstrap, meaning that the data folder is most
> likely toast, and needs to be reinitialized.

Indeed, initdb is going to remove the data directory if the bootstrap run
crashes.

But ... that code's been like that for decades and nobody's complained
before.  Why are we worried about bootstrap's response to signals at all?

+1
 
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Tom Lane
Date:
I wrote:
> If we do anything at all about this, my thought would just be to change
> bootstrap_signals() so that it points all the signal handlers at
> quickdie(), or maybe something equivalent to quickdie() but printing
> a more apropos message, or even just set them all to SIGDFL since that
> means process termination for all of these.  die() isn't really the right
> thing, precisely because it thinks it can trigger transaction abort,
> which makes no sense in bootstrap mode.

After further thought I like the SIG_DFL answer, as per attached proposed
patch.  With this, you get SIGINT behavior like this if you manage to
catch it in bootstrap mode (which is not that easy these days):

selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ^Cchild process was terminated by signal 2: Interrupt
initdb: removing data directory "/home/postgres/testversion/data"

That seems perfectly fine from here.

> But ... that code's been like that for decades and nobody's complained
> before.  Why are we worried about bootstrap's response to signals at all?

I'm still wondering why the OP cares.  Still, the PANIC message that you
get right now is potentially confusing.

            regards, tom lane

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d8776e1..825433d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -558,11 +558,16 @@ bootstrap_signals(void)
 {
     Assert(!IsUnderPostmaster);

-    /* Set up appropriately for interactive use */
-    pqsignal(SIGHUP, die);
-    pqsignal(SIGINT, die);
-    pqsignal(SIGTERM, die);
-    pqsignal(SIGQUIT, die);
+    /*
+     * We don't actually need any non-default signal handling in bootstrap
+     * mode; "curl up and die" is a sufficient response for all these cases.
+     * But let's just make sure the signals are set that way, since our parent
+     * process initdb has them set differently.
+     */
+    pqsignal(SIGHUP, SIG_DFL);
+    pqsignal(SIGINT, SIG_DFL);
+    pqsignal(SIGTERM, SIG_DFL);
+    pqsignal(SIGQUIT, SIG_DFL);
 }

 /*

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Michael Paquier
Date:
On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote:
> But ... that code's been like that for decades and nobody's complained
> before.  Why are we worried about bootstrap's response to signals at all?

Yeah, I don't think that it is something worth bothering either.  As
you mentioned the data folder would be removed by default.  Or perhaps
the reporter has another case in mind which could justify a change in
the signal handlers?  I am ready to hear that case, but there is
nothing about the reason why it could be a benefit.

The patch proposed upthread is not something I find correct anyway,
I'd rather have the abort path complain loudly about a bootstrap
transaction that fails instead of just ignoring it, because it is the
kind of transaction which must never fail.  And it seems to me that it
can be handy for development purposes.
--
Michael

Attachment
On our server when process crash and core dump file generated we will receive complaining phone call.
That's why i try to fix it.






At 2019-05-14 07:53:36, "Michael Paquier" <michael@paquier.xyz> wrote: >On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote: >> But ... that code's been like that for decades and nobody's complained >> before. Why are we worried about bootstrap's response to signals at all? > >Yeah, I don't think that it is something worth bothering either. As >you mentioned the data folder would be removed by default. Or perhaps >the reporter has another case in mind which could justify a change in >the signal handlers? I am ready to hear that case, but there is >nothing about the reason why it could be a benefit. > >The patch proposed upthread is not something I find correct anyway, >I'd rather have the abort path complain loudly about a bootstrap >transaction that fails instead of just ignoring it, because it is the >kind of transaction which must never fail. And it seems to me that it >can be handy for development purposes. >-- >Michael


 

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Tom Lane
Date:
[ please don't top-post on the PG lists ]

Thunder  <thunder1@126.com> writes:
> At 2019-05-14 07:53:36, "Michael Paquier" <michael@paquier.xyz> wrote:
>> On Mon, May 13, 2019 at 09:37:32AM -0400, Tom Lane wrote:
>>> But ... that code's been like that for decades and nobody's complained
>>> before.  Why are we worried about bootstrap's response to signals at all?

> On our server when process crash and core dump file generated we will receive complaining phone call.
> That's why i try to fix it.

OK, that's fair.  The SIG_DFL change I suggested will fix that problem
for SIGINT etc (except SIGQUIT, for which you should be *expecting*
a core file).  I agree with Michael that we do not wish to change what
happens for an internal error; but external signals do not represent
a bug in PG, so forcing a PANIC for those seems unwarranted.

            regards, tom lane



Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Michael Paquier
Date:
On Mon, May 13, 2019 at 11:28:51PM -0400, Tom Lane wrote:
> OK, that's fair.  The SIG_DFL change I suggested will fix that problem
> for SIGINT etc (except SIGQUIT, for which you should be *expecting*
> a core file).  I agree with Michael that we do not wish to change what
> happens for an internal error; but external signals do not represent
> a bug in PG, so forcing a PANIC for those seems unwarranted.

No objections from here to change the signal handlers.  Still, I would
like to understand why the bootstrap process has been signaled to
begin with, particularly for an initdb, which is not really something
that should happen on a server where an instance runs.  If you have a
too aggressive monitoring job, you may want to revisit that as well,
because it is able to complain just with an initdb.
--
Michael

Attachment

Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Andres Freund
Date:
Hi,

On 2019-05-14 12:37:39 +0900, Michael Paquier wrote:
> Still, I would like to understand why the bootstrap process has been
> signaled to begin with, particularly for an initdb, which is not
> really something that should happen on a server where an instance
> runs.  If you have a too aggressive monitoring job, you may want to
> revisit that as well, because it is able to complain just with an
> initdb.

Shutdown, timeout, resource exhaustion all seem like possible
causes. Don't think any of them warrant a core file - as the OP
explains, that'll often trigger pages etc.

Greetings,

Andres Freund



Re: PANIC :Call AbortTransaction when transaction id is no normal

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-14 12:37:39 +0900, Michael Paquier wrote:
>> Still, I would like to understand why the bootstrap process has been
>> signaled to begin with, particularly for an initdb, which is not
>> really something that should happen on a server where an instance
>> runs.  If you have a too aggressive monitoring job, you may want to
>> revisit that as well, because it is able to complain just with an
>> initdb.

> Shutdown, timeout, resource exhaustion all seem like possible
> causes. Don't think any of them warrant a core file - as the OP
> explains, that'll often trigger pages etc.

Yeah.  The case I was thinking about was mostly "start initdb,
decide I didn't want to do that, hit control-C".  That cleans up
without much fuss *except* if you manage to hit the window
where it's running bootstrap, and then it spews this scary-looking
error.  It's less scary-looking with the SIG_DFL patch, which
I've now pushed.

            regards, tom lane