Re: BUG #15677: Crash while deleting from partitioned table - Mailing list pgsql-bugs

From Amit Langote
Subject Re: BUG #15677: Crash while deleting from partitioned table
Date
Msg-id 3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp
Whole thread Raw
In response to BUG #15677: Crash while deleting from partitioned table  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #15677: Crash while deleting from partitioned table  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Re: BUG #15677: Crash while deleting from partitioned table  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
Hi,

On 2019/03/08 16:29, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15677
> Logged by:          Norbert Benkocs
> Email address:      infernorb@gmail.com
> PostgreSQL version: 11.2
> Operating system:   CentOS Linux release 7.4.1708 (Core)
> Description:        
> 
> Version: PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-36), 64-bit
> OS: CentOS Linux release 7.4.1708 (Core) 
> 
> Hello,
> 
> We have an insert/update/delete query on a partitioned table (multiple
> CTE-s) that causes our PostgreSQL server to crash once every few days. We
> haven't been able to reproduce this crash so far, and re-running the same
> query with the same parameters didn't result in a crash either. The table in
> question is updated thousands of times each hour, and most of these work
> fine.
> Previously this table was not partitioned, we started seeing the crash after
> partitioning the table.

Thanks for the report and for providing detailed information which was
useful for diagnosing the bug.

I looked at this:

> (gdb) bt
> #0  ExecInitModifyTable (node=node@entry=0x2568180,
> estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
> nodeModifyTable.c:2327
> #1  0x000000000060af88 in ExecInitNode (node=0x2568180,
> estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
> execProcnode.c:174
> #2  0x0000000000606fdd in EvalPlanQualStart (epqstate=0x3773848,
> epqstate=0x3773848, planTree=0x36c3f08, parentestate=0xa6) at
> execMain.c:3257

note: ExecInitModifyTable() being called from EvalPlanQualStart().

and:

> (gdb) p *mtstate
> $4 = {ps = {type = T_ModifyTableState, plan = 0x2568180, state = 0x35f1440,
> ExecProcNode = 0x626e30 <ExecModifyTable>, ExecProcNodeReal = 0x0,
> instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual
> = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0,
> chgParam = 0x0, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x0, 
>     ps_ProjInfo = 0x0, scandesc = 0x0}, operation = CMD_DELETE, canSetTag =
> false, mt_done = false, mt_plans = 0x39c8088, mt_nplans = 15, mt_whichplan =
> 0, resultRelInfo = 0x35f3f78, rootResultRelInfo = 0xc0, mt_arowmarks =

note: rootResultRelInfo = 0xc0

and:

> (gdb) p *estate
> $7 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot =
> 0x208ba70, es_crosscheck_snapshot = 0x0, es_range_table = 0x282af48,
> es_plannedstmt = 0x2829e98, es_sourceText = 0x0, es_junkFilter = 0x0,
> es_output_cid = 0, es_result_relations = 0x35f3378, es_num_result_relations
> = 34, es_result_relation_info = 0x0, 
>   es_root_result_relations = 0x0, es_num_root_result_relations = 0,

note: es_root_result_relations = 0x0

From the above, I could conclude that EvalPlanQualStart() is not copying
the value of es_root_result_relations from the parent EState.  That means
ExecInitModifyTable called in the context of EvalPlanQual() checking has
the wrong value of es_root_result_relations to begin with, so the value it
computes for rootResultRelInfo for the ModifyTableState it's initializing
is wrong (0xc0 as seen above).

To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
all):

Setup:

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
insert into p values (1);

Session 1:

begin;
update p set a = a;

Session 2:

with u as (update p set a = a returning p.*) update p set a = u.a from u;
<blocks>

Session 1:
commit;

Session 2:
<invokes-EvalPlanQual-and-crashes>
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


This can be fixed by the attached patch, which modifies EvalPlanQualStart
to copy the value of es_root_result_relations from its parent EState.

Thanks,
Amit

Attachment

pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: BUG #15684: Server crash on DROP partitioned table
Next
From: Amit Langote
Date:
Subject: Re: BUG #15677: Crash while deleting from partitioned table