cannot abort transaction 2737414167, it was already committed - Mailing list pgsql-hackers

From Justin Pryzby
Subject cannot abort transaction 2737414167, it was already committed
Date
Msg-id ZYw8gVOMF9gfp6i5@pryzbyj2023
Whole thread Raw
Responses Re: cannot abort transaction 2737414167, it was already committed
Re: cannot abort transaction 2737414167, it was already committed
List pgsql-hackers
We had this:

< 2023-12-25 04:06:20.062 MST telsasoft >ERROR:  could not open file "pg_tblspc/16395/PG_16_202307071/16384/121010871":
Input/outputerror
 
< 2023-12-25 04:06:20.062 MST telsasoft >STATEMENT:  commit
< 2023-12-25 04:06:20.062 MST telsasoft >WARNING:  AbortTransaction while in COMMIT state
< 2023-12-25 04:06:20.062 MST telsasoft >PANIC:  cannot abort transaction 2737414167, it was already committed
< 2023-12-25 04:06:20.473 MST  >LOG:  server process (PID 14678) was terminated by signal 6: Aborted

The application is a daily cronjob which would've just done:

begin;
lo_unlink(); -- the client-side function called from pygresql;
DELETE FROM tbl WHERE col=%s;
commit;

The table being removed would've been a transient (but not "temporary")
table created ~1 day prior.

It's possible that the filesystem had an IO error, but I can't find any
evidence of that.  Postgres is running entirely on zfs, which says:

scan: scrub repaired 0B in 00:07:03 with 0 errors on Mon Dec 25 04:49:07 2023
errors: No known data errors

My main question is why an IO error would cause the DB to abort, rather
than raising an ERROR.

This is pg16 compiled at efa8f6064, runing under centos7.  ZFS is 2.2.2,
but the pool hasn't been upgraded to use the features new since 2.1.

(gdb) bt
#0  0x00007fc961089387 in raise () from /lib64/libc.so.6
#1  0x00007fc96108aa78 in abort () from /lib64/libc.so.6
#2  0x00000000009438b7 in errfinish (filename=filename@entry=0xac8e20 "xact.c", lineno=lineno@entry=1742,
funcname=funcname@entry=0x9a6600<__func__.32495> "RecordTransactionAbort") at elog.c:604
 
#3  0x000000000054d6ab in RecordTransactionAbort (isSubXact=isSubXact@entry=false) at xact.c:1741
#4  0x000000000054d7bd in AbortTransaction () at xact.c:2814
#5  0x000000000054e015 in AbortCurrentTransaction () at xact.c:3415
#6  0x0000000000804e4e in PostgresMain (dbname=0x12ea840 "ts", username=0x12ea828 "telsasoft") at postgres.c:4354
#7  0x000000000077bdd6 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4465
#8  BackendStartup (port=0x12e44c0) at postmaster.c:4193
#9  ServerLoop () at postmaster.c:1783
#10 0x000000000077ce9a in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x12ad280) at postmaster.c:1467
#11 0x00000000004ba8b8 in main (argc=3, argv=0x12ad280) at main.c:198

#3  0x000000000054d6ab in RecordTransactionAbort (isSubXact=isSubXact@entry=false) at xact.c:1741
        xid = 2737414167
        rels = 0x94f549 <hash_seq_init+73>
        ndroppedstats = 0
        droppedstats = 0x0

#4  0x000000000054d7bd in AbortTransaction () at xact.c:2814
        is_parallel_worker = false

-- 
Justin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: trying again to get incremental backup
Next
From: Tom Lane
Date:
Subject: Re: Should we remove -Wdeclaration-after-statement?