Thread: BUG #16184: Segmentation Fault during update
The following bug has been logged on the website: Bug reference: 16184 Logged by: Piotr Wlodarczyk Email address: piotrwlodarczyk89@gmail.com PostgreSQL version: 12.1 Operating system: Linux dev-g1-db-synchro.gnb.adm 3.10.0-1062.7.1.el Description: Hello geeks, We're preparing to upgrade postgresql from 11.5 to 12.1 but still we have some bugs and issues. Most of them are our fault, but now we have problem we don't know how to play with it. Log: 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG: server process (PID 28879) was terminated by signal 11: Segmentation fault 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL: Failed process was running: UPDATE job_abstract a SET job_status = job_status WHERE (a.xmax :: TEXT :: BIGINT < (SELECT txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT txid_snapshot_xmin( txid_current_snapshot()) <= (SELECT txid_snapshot_xmax( txid_current_snapshot())))) AND job_status = 0 AND NOT xmax = 0; Please don't ask me why this update is made in this way - I suppose to set xmax to 0 but why? I don't know. Its legacy code, being part of core of our application. Reproducting this bug is not easy and I don't know which conditions has to be met to replicate this bug. But during application is working and we made this update, segmentation fault occurs. But sometimes it works. Column job_status is telling application that this row should be preceded one more time. Up to 11.5 this bug doesn't exists.
PG Bug reporting form <noreply@postgresql.org> writes: > Log: > 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG: > server process (PID 28879) was terminated by signal 11: Segmentation fault > 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL: > Failed process was running: UPDATE job_abstract a > SET job_status = job_status > WHERE (a.xmax :: TEXT :: BIGINT < (SELECT > txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT Clearly a bug, but there's not much we can do with this amount of information. If you can provide a stack trace, that'd be helpful. https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend > Reproducting this bug is not easy and I don't know which conditions has to > be met to replicate this bug. But during application is working and we made > this update, segmentation fault occurs. But sometimes it works. My gut instinct is that this might be a concurrent-update problem, which would explain the lack of reproducibility. Try to get multiple sessions to update the same table row(s) at the same time. Also, we had some bugs of that ilk on 12.0, please make sure you really are updated to 12.1. regards, tom lane
Hi,
sorry for the late reply, but it was not so easy to get more logs. So what we have now is listed below:
(gdb) bt
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520) at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520, relation=relation@entry=0x7f694c76f838, rti=14, inputslot=inputslot@entry=0x31bc408) at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428, tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>, planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520, estate=estate@entry=0x2aff450, canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at nodeModifyTable.c:2223
#4 0x0000000000614302 in ExecProcNode (node=0x2ae9428) at ../../../src/include/executor/executor.h:239
#5 ExecutePlan (execute_once=<optimized out>, dest=0x2f82d88, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x2ae9428, estate=0x2aff450) at execMain.c:1646
#6 standard_ExecutorRun (queryDesc=0x2c7c460, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#7 0x000000000076365a in ProcessQuery (plan=<optimized out>, sourceText=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "..., params=0x0, queryEnv=0x0,
dest=0x2f82d88, completionTag=0x7ffc94e3d250 "") at pquery.c:161
#8 0x0000000000763882 in PortalRunMulti (portal=portal@entry=0x2be30a0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:1283
#9 0x00000000007643b5 in PortalRun (portal=portal@entry=0x2be30a0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:796
#10 0x00000000007604b5 in exec_simple_query (query_string=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "...) at postgres.c:1215
#11 0x0000000000761782 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2ac50a8, dbname=0x2ac4f88 "synchro_test", username=<optimized out>) at postgres.c:4236
#12 0x0000000000483d47 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4437
#13 BackendStartup (port=0x2abd4c0) at postmaster.c:4128
#14 ServerLoop () at postmaster.c:1704
#15 0x00000000006f106f in PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x2a98330) at postmaster.c:1377
#16 0x0000000000484cd3 in main (argc=7, argv=0x2a98330) at main.c:228
(gdb)
#0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520) at execMain.c:2735
#1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520, relation=relation@entry=0x7f694c76f838, rti=14, inputslot=inputslot@entry=0x31bc408) at execMain.c:2454
#2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428, tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>, planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520, estate=estate@entry=0x2aff450, canSetTag=true) at nodeModifyTable.c:1387
#3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at nodeModifyTable.c:2223
#4 0x0000000000614302 in ExecProcNode (node=0x2ae9428) at ../../../src/include/executor/executor.h:239
#5 ExecutePlan (execute_once=<optimized out>, dest=0x2f82d88, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x2ae9428, estate=0x2aff450) at execMain.c:1646
#6 standard_ExecutorRun (queryDesc=0x2c7c460, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#7 0x000000000076365a in ProcessQuery (plan=<optimized out>, sourceText=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "..., params=0x0, queryEnv=0x0,
dest=0x2f82d88, completionTag=0x7ffc94e3d250 "") at pquery.c:161
#8 0x0000000000763882 in PortalRunMulti (portal=portal@entry=0x2be30a0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:1283
#9 0x00000000007643b5 in PortalRun (portal=portal@entry=0x2be30a0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:796
#10 0x00000000007604b5 in exec_simple_query (query_string=0x2a9d7a0 " UPDATE job_abstract a\n SET job_status = job_status\n WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n "...) at postgres.c:1215
#11 0x0000000000761782 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2ac50a8, dbname=0x2ac4f88 "synchro_test", username=<optimized out>) at postgres.c:4236
#12 0x0000000000483d47 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4437
#13 BackendStartup (port=0x2abd4c0) at postmaster.c:4128
#14 ServerLoop () at postmaster.c:1704
#15 0x00000000006f106f in PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x2a98330) at postmaster.c:1377
#16 0x0000000000484cd3 in main (argc=7, argv=0x2a98330) at main.c:228
(gdb)
Is it enought?
On Thu, Jan 2, 2020 at 4:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Log:
> 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
> server process (PID 28879) was terminated by signal 11: Segmentation fault
> 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
> Failed process was running: UPDATE job_abstract a
> SET job_status = job_status
> WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
> txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT
Clearly a bug, but there's not much we can do with this amount of
information. If you can provide a stack trace, that'd be helpful.
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
> Reproducting this bug is not easy and I don't know which conditions has to
> be met to replicate this bug. But during application is working and we made
> this update, segmentation fault occurs. But sometimes it works.
My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility. Try to get
multiple sessions to update the same table row(s) at the same time.
Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.
regards, tom lane
Pozdrawiam
Piotr Włodarczyk
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes: > sorry for the late reply, but it was not so easy to get more logs. So what > we have now is listed below: > (gdb) bt > #0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520) > at execMain.c:2735 > #1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520, > relation=relation@entry=0x7f694c76f838, rti=14, > inputslot=inputslot@entry=0x31bc408) > at execMain.c:2454 > #2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428, > tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>, > planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520, > estate=estate@entry=0x2aff450, > canSetTag=true) at nodeModifyTable.c:1387 > #3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at > nodeModifyTable.c:2223 OK, so that confirms my suspicion that it was an EvalPlanQual issue (hence, you need concurrent updates on the same row to trigger it). I believe this might be the same bug we isolated and fixed just a couple of days ago: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5 The stack trace looks the same as what Oleksii showed in that thread for a non-assert build. As far as I know, that bug only triggers if a single ModifyTable plan node has to run EPQ checks for more than one target table --- so, is this query updating a partitioned table or inheritance tree, and is it plausible that it had to update rows in more than one sub-table, and that some other transaction(s) were concurrently updating those same rows? regards, tom lane
Hi,
I don't know what is EvalPlanQual but conditions you wrote about are met.
Table which is updated is main table and it name is job_abstract. The're a lot of another tables for example job_bca, job_mod, job_smth which extends job_abstract.
In my case situation is as follows: in the same time it's possible that some different transactions are in progress (started but not committed yet). The're some legacy (looks like some kind of guard) that update under some conditions table job_abstract. In another transaction(s) table job_bca (which extends job_abstract - inheritance) is updated too. This two (or more) transactions are updating the same rows, but different set of fields. I think that author of this update tried to force change xmax of job_abstract making dumb update job_status = 0 where ... job_status = 0. But everything works fine to postgres 11.5. When we upgreade to 12.1 segmentation fault occurs. But i saw in commit you linked to description: "Back-patch to v12 where the bug was introduced".
On Fri, Jan 31, 2020 at 6:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Włodarczyk <piotrwlodarczyk89@gmail.com> writes:
> sorry for the late reply, but it was not so easy to get more logs. So what
> we have now is listed below:
> (gdb) bt
> #0 0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520)
> at execMain.c:2735
> #1 0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520,
> relation=relation@entry=0x7f694c76f838, rti=14,
> inputslot=inputslot@entry=0x31bc408)
> at execMain.c:2454
> #2 0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
> tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
> planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
> estate=estate@entry=0x2aff450,
> canSetTag=true) at nodeModifyTable.c:1387
> #3 0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
> nodeModifyTable.c:2223
OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).
I believe this might be the same bug we isolated and fixed just a
couple of days ago:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5
The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build. As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?
regards, tom lane
Pozdrawiam
Piotr Włodarczyk
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes: > I don't know what is EvalPlanQual but conditions you wrote about are met. OK, then I'm pretty confident that patch will fix your problem. It'll be in 12.2 which is due out in two weeks. regards, tom lane
Ok,
thanks.
On Fri, Jan 31, 2020 at 11:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Włodarczyk <piotrwlodarczyk89@gmail.com> writes:
> I don't know what is EvalPlanQual but conditions you wrote about are met.
OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.
regards, tom lane
Pozdrawiam
Piotr Włodarczyk