Thread: BUG #16184: Segmentation Fault during update

BUG #16184: Segmentation Fault during update

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16184
Logged by:          Piotr Wlodarczyk
Email address:      piotrwlodarczyk89@gmail.com
PostgreSQL version: 12.1
Operating system:   Linux dev-g1-db-synchro.gnb.adm 3.10.0-1062.7.1.el
Description:

Hello geeks,

We're preparing to upgrade postgresql from 11.5 to 12.1 but still we have
some bugs and issues. Most of them are our fault, but now we have problem we
don't know how to play with it.

Log:
2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
server process (PID 28879) was terminated by signal 11: Segmentation fault
2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
Failed process was running: UPDATE job_abstract a
          SET job_status = job_status
          WHERE (a.xmax :: TEXT :: BIGINT < (SELECT

txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

                        txid_snapshot_xmin(

                            txid_current_snapshot())

                        <= (SELECT

                              txid_snapshot_xmax(

                                  txid_current_snapshot()))))
                AND job_status = 0 AND NOT xmax = 0;

Please don't ask me why this update is made in this way - I suppose to set
xmax to 0 but why? I don't know. Its legacy code, being part of core of our
application.

Reproducting this bug is not easy and I don't know which conditions has to
be met to replicate this bug. But during application is working and we made
this update, segmentation fault occurs. But sometimes it works. 

Column job_status is telling application that this row should be preceded
one more time. Up to 11.5 this bug doesn't exists.


Re: BUG #16184: Segmentation Fault during update

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> Log:
> 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
> server process (PID 28879) was terminated by signal 11: Segmentation fault
> 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
> Failed process was running: UPDATE job_abstract a
>           SET job_status = job_status
>           WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
> txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

Clearly a bug, but there's not much we can do with this amount of
information.  If you can provide a stack trace, that'd be helpful.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

> Reproducting this bug is not easy and I don't know which conditions has to
> be met to replicate this bug. But during application is working and we made
> this update, segmentation fault occurs. But sometimes it works.

My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility.  Try to get
multiple sessions to update the same table row(s) at the same time.

Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.

            regards, tom lane



Re: BUG #16184: Segmentation Fault during update

From
Piotr Włodarczyk
Date:
Hi,

sorry for the late reply, but it was not so easy to get more logs. So what we have now is listed below:

(gdb) bt
#0  0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520) at execMain.c:2735
#1  0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520, relation=relation@entry=0x7f694c76f838, rti=14, inputslot=inputslot@entry=0x31bc408) at execMain.c:2454
#2  0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428, tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>, planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520, estate=estate@entry=0x2aff450, canSetTag=true) at nodeModifyTable.c:1387
#3  0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at nodeModifyTable.c:2223
#4  0x0000000000614302 in ExecProcNode (node=0x2ae9428) at ../../../src/include/executor/executor.h:239
#5  ExecutePlan (execute_once=<optimized out>, dest=0x2f82d88, direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x2ae9428, estate=0x2aff450) at execMain.c:1646
#6  standard_ExecutorRun (queryDesc=0x2c7c460, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#7  0x000000000076365a in ProcessQuery (plan=<optimized out>, sourceText=0x2a9d7a0 " UPDATE job_abstract a\n  SET job_status = job_status\n  WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n     "..., params=0x0, queryEnv=0x0,
    dest=0x2f82d88, completionTag=0x7ffc94e3d250 "") at pquery.c:161
#8  0x0000000000763882 in PortalRunMulti (portal=portal@entry=0x2be30a0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:1283
#9  0x00000000007643b5 in PortalRun (portal=portal@entry=0x2be30a0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2f82d88, altdest=altdest@entry=0x2f82d88, completionTag=completionTag@entry=0x7ffc94e3d250 "") at pquery.c:796
#10 0x00000000007604b5 in exec_simple_query (query_string=0x2a9d7a0 " UPDATE job_abstract a\n  SET job_status = job_status\n  WHERE (a.xmax :: TEXT :: BIGINT < (SELECT\n", ' ' <repeats 39 times>, "txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT\n     "...) at postgres.c:1215
#11 0x0000000000761782 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x2ac50a8, dbname=0x2ac4f88 "synchro_test", username=<optimized out>) at postgres.c:4236
#12 0x0000000000483d47 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4437
#13 BackendStartup (port=0x2abd4c0) at postmaster.c:4128
#14 ServerLoop () at postmaster.c:1704
#15 0x00000000006f106f in PostmasterMain (argc=argc@entry=7, argv=argv@entry=0x2a98330) at postmaster.c:1377
#16 0x0000000000484cd3 in main (argc=7, argv=0x2a98330) at main.c:228
(gdb)

Is it enought?

On Thu, Jan 2, 2020 at 4:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Log:
> 2020-01-02 15:42:26 CET [12729]: [95-1] user=,db=,xid=0,state=00000 LOG:
> server process (PID 28879) was terminated by signal 11: Segmentation fault
> 2020-01-02 15:42:26 CET [12729]: [96-1] user=,db=,xid=0,state=00000 DETAIL:
> Failed process was running: UPDATE job_abstract a
>           SET job_status = job_status
>           WHERE (a.xmax :: TEXT :: BIGINT < (SELECT
> txid_snapshot_xmin(txid_current_snapshot()))) AND ((SELECT

Clearly a bug, but there's not much we can do with this amount of
information.  If you can provide a stack trace, that'd be helpful.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

> Reproducting this bug is not easy and I don't know which conditions has to
> be met to replicate this bug. But during application is working and we made
> this update, segmentation fault occurs. But sometimes it works.

My gut instinct is that this might be a concurrent-update problem,
which would explain the lack of reproducibility.  Try to get
multiple sessions to update the same table row(s) at the same time.

Also, we had some bugs of that ilk on 12.0, please make sure you
really are updated to 12.1.

                        regards, tom lane


--

Pozdrawiam
Piotr Włodarczyk

Re: BUG #16184: Segmentation Fault during update

From
Tom Lane
Date:
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:
> sorry for the late reply, but it was not so easy to get more logs. So what
> we have now is listed below:

> (gdb) bt
> #0  0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520)
> at execMain.c:2735
> #1  0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520,
> relation=relation@entry=0x7f694c76f838, rti=14,
> inputslot=inputslot@entry=0x31bc408)
> at execMain.c:2454
> #2  0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
> tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
> planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
> estate=estate@entry=0x2aff450,
> canSetTag=true) at nodeModifyTable.c:1387
> #3  0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
> nodeModifyTable.c:2223

OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).

I believe this might be the same bug we isolated and fixed just a
couple of days ago:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5

The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build.  As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?

            regards, tom lane



Re: BUG #16184: Segmentation Fault during update

From
Piotr Włodarczyk
Date:
Hi,

I don't know what is EvalPlanQual but conditions you wrote about are met.

Table which is updated is main table and it name is job_abstract. The're a lot of another tables for example job_bca, job_mod, job_smth which extends job_abstract.

In my case situation is as follows: in the same time it's possible that some different transactions are in progress (started but not committed yet). The're some legacy (looks like some kind of guard) that update under some conditions table job_abstract. In another transaction(s) table job_bca (which extends job_abstract - inheritance) is updated too. This two (or more) transactions are updating the same rows, but different set of fields. I think that author of this update tried to force change xmax of job_abstract making dumb update job_status = 0 where ... job_status = 0. But everything works fine to postgres 11.5. When we upgreade to 12.1 segmentation fault occurs. But i saw in commit you linked to description: "Back-patch to v12 where the bug was introduced".

On Fri, Jan 31, 2020 at 6:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Włodarczyk <piotrwlodarczyk89@gmail.com> writes:
> sorry for the late reply, but it was not so easy to get more logs. So what
> we have now is listed below:

> (gdb) bt
> #0  0x0000000000616912 in EvalPlanQualBegin (epqstate=epqstate@entry=0x2ae9520)
> at execMain.c:2735
> #1  0x0000000000616cfb in EvalPlanQual (epqstate=epqstate@entry=0x2ae9520,
> relation=relation@entry=0x7f694c76f838, rti=14,
> inputslot=inputslot@entry=0x31bc408)
> at execMain.c:2454
> #2  0x0000000000638bdf in ExecUpdate (mtstate=mtstate@entry=0x2ae9428,
> tupleid=0x7ffc94e3cf40, oldtuple=0x0, slot=<optimized out>,
> planSlot=0x312fc28, epqstate=epqstate@entry=0x2ae9520,
> estate=estate@entry=0x2aff450,
> canSetTag=true) at nodeModifyTable.c:1387
> #3  0x000000000063914d in ExecModifyTable (pstate=0x2ae9428) at
> nodeModifyTable.c:2223

OK, so that confirms my suspicion that it was an EvalPlanQual issue
(hence, you need concurrent updates on the same row to trigger it).

I believe this might be the same bug we isolated and fixed just a
couple of days ago:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=87fed2a197abc1397b63ee74b3fa7eb20471fff5

The stack trace looks the same as what Oleksii showed in that thread
for a non-assert build.  As far as I know, that bug only triggers if
a single ModifyTable plan node has to run EPQ checks for more than
one target table --- so, is this query updating a partitioned table
or inheritance tree, and is it plausible that it had to update rows
in more than one sub-table, and that some other transaction(s) were
concurrently updating those same rows?

                        regards, tom lane


--

Pozdrawiam
Piotr Włodarczyk

Re: BUG #16184: Segmentation Fault during update

From
Tom Lane
Date:
=?UTF-8?Q?Piotr_W=C5=82odarczyk?= <piotrwlodarczyk89@gmail.com> writes:
> I don't know what is EvalPlanQual but conditions you wrote about are met.

OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.

            regards, tom lane



Re: BUG #16184: Segmentation Fault during update

From
Piotr Włodarczyk
Date:
Ok,

thanks. 

On Fri, Jan 31, 2020 at 11:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Włodarczyk <piotrwlodarczyk89@gmail.com> writes:
> I don't know what is EvalPlanQual but conditions you wrote about are met.

OK, then I'm pretty confident that patch will fix your problem.
It'll be in 12.2 which is due out in two weeks.

                        regards, tom lane


--

Pozdrawiam
Piotr Włodarczyk