Thread: Delete cascade with three levels bug ?

Delete cascade with three levels bug ?

From

Marcelo Costa

Date:

27 October 2009, 14:05:36

Hi, hackers<br /><br /> I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003 Server) SO independent.<br
/><br/>When run the scripts below I receive the error:<br /><br
/>---------------------------------------------------------------------------<br/>testes=# DELETE FROM pai WHERE co_pai
=1;<br /> server closed the connection unexpectedly<br />         This probably means the server terminated
abnormally<br/>        before or while processing the request.<br />The connection to the server was lost. Attempting
reset:Failed.<br /> ---------------------------------------------------------------------------<br /><br />The script
createsthe father table, son and grandson. All conected with FK Cascade.<br /><br />I changed the comportment of the
pg_triggerto postpone the validation of constraint<br /> (make compatible with 7.4 version because we migrate it).<br
/><br/>When I delete the father register the next error ocourr.<br /><br />PS1.: On Windows need reinicialize the
service.<br/> PS2.: On Linux all connection down but not is needed reinicialize the service.<br /> PS3.: I make a test
on8.4 version and the same error ocourr.<br /><br /><br /> -------Scripts-------<br /> CREATE TABLE father<br /> (<br
/> co_father double precision NOT NULL,<br />  no_description character varying(50) NOT NULL,<br />  CONSTRAINT
pk_fatherPRIMARY KEY (co_father)<br /> )<br /> WITH (OIDS=TRUE);<br /> ALTER TABLE father OWNER TO postgres;<br /><br
/><br/> CREATE TABLE son<br /> (<br />  co_son double precision NOT NULL,<br />  co_father double precision NOT
NULL,<br/>  no_description character varying(50) NOT NULL,<br />  CONSTRAINT pk_son PRIMARY KEY (co_son)<br /> )<br />
WITH(OIDS=TRUE);<br /> ALTER TABLE son OWNER TO postgres;<br /><br /> CREATE TABLE grandson<br /> (<br />  co_grandson
doubleprecision NOT NULL,<br />  co_son double precision NOT NULL,<br />  no_description character varying(50) NOT
NULL,<br/>  CONSTRAINT pk_grandson PRIMARY KEY (co_grandson)<br /> )<br /> WITH (OIDS=TRUE);<br /> ALTER TABLE son
OWNERTO postgres;<br /><br /> ALTER TABLE son<br />  ADD CONSTRAINT fk_son_father FOREIGN KEY (co_father)<br />    
 REFERENCESfather (co_father) MATCH SIMPLE<br />      ON UPDATE CASCADE ON DELETE CASCADE;<br /><br /> ALTER TABLE
grandson<br/>  ADD CONSTRAINT fk_grandson_son FOREIGN KEY (co_son)<br />      REFERENCES son (co_son) MATCH SIMPLE<br
/>     ON UPDATE CASCADE ON DELETE CASCADE;<br /><br /> UPDATE pg_constraint SET condeferred  = TRUE, condeferrable =
TRUE;<br/> UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;<br /><br /> INSERT INTO father VALUES(1,
'FATHER-1');<br/><br /> INSERT INTO son VALUES(1,1, 'Son FATHER-1');<br /> INSERT INTO son VALUES(2,1, 'Son
FATHER-1');<br/> INSERT INTO son VALUES(3,1, 'Son FATHER-1');<br /><br /> INSERT INTO grandson VALUES(1,1, 'Grandson
FATHER-1');<br/> INSERT INTO grandson VALUES(2,2, 'Grandson FATHER-1');<br /> INSERT INTO grandson VALUES(3,3,
'GrandsonFATHER-1');<br /><br /> DELETE FROM father WHERE co_father = 1<br /><br /><br />My LOGS
-----------------------------------------------------------------------------------------------------------<br/><br
/>0LOG: 00000: server process (PID 23470) was terminated by signal 11: Segmentation fault<br /> 0LOCATION: 
LogChildExit,postmaster.c:2529<br />0LOG:  00000: terminating any other active server processes<br />0LOCATION: 
HandleChildCrash,postmaster.c:2374<br />0FATAL:  57P03: the database system is in recovery mode<br /> 0LOCATION: 
ProcessStartupPacket,postmaster.c:1648<br /> 0LOG:  00000: all server processes terminated; reinitializing<br
/>0LOCATION: PostmasterStateMachine, postmaster.c:2690<br />0LOG:  00000: database system was interrupted; last known
upat 2009-10-27 11:43:37 BRST<br />0LOCATION:  StartupXLOG, xlog.c:4836<br /> 0DEBUG:  00000: checkpoint record is at
1/1C865BD4<br/>0LOCATION:  StartupXLOG, xlog.c:4906<br />0DEBUG:  00000: redo record is at 1/1C865BD4; shutdown TRUE<br
/>0LOCATION: StartupXLOG, xlog.c:4932<br />0DEBUG:  00000: next transaction ID: 0/27113; next OID: 67190<br />
0LOCATION: StartupXLOG, xlog.c:4936<br />0DEBUG:  00000: next MultiXactId: 1; next MultiXactOffset: 0<br />0LOCATION: 
StartupXLOG,xlog.c:4939<br />0LOG:  00000: database system was not properly shut down; automatic recovery in
progress<br/> 0LOCATION:  StartupXLOG, xlog.c:5003<br />0LOG:  00000: record with zero length at 1/1C865C14<br
/>0LOCATION: ReadRecord, xlog.c:3126<br />0LOG:  00000: redo is not required<br />0LOCATION:  StartupXLOG,
xlog.c:5146<br/>0DEBUG:  00000: transaction ID wrap limit is 2147484026, limited by database "template1"<br />
0LOCATION: SetTransactionIdLimit, varsup.c:283<br />0LOG:  00000: database system is ready to accept connections<br
/>0LOCATION: reaper, postmaster.c:2156<br /><br
/>-----------------------------------------------------------------------------------------------------------------------------<br
clear="all"/><br />Has I know, the deletion cascade with three levels don't give this problem. The same think occourr
ifI change the pg_trigger comportment to postpone validation. This is right?<br /><br />Sorry, but what wrap limit do
?<br/><br />Thanks in advanced.<br /><br />Sincerely,<br />-- <br />Marcelo Costa<br /><a
href="http://www.marcelocosta.net"target="_blank"></a>-------------------------------------------------<br />“You can't
alwaysget what want”, <br /><br />Doctor House in apology to Mike Jagger<br />

Re: Delete cascade with three levels bug ?

From

Euler Taveira de Oliveira

Date:

27 October 2009, 14:48:22

Marcelo Costa escreveu:
> Hi, hackers
> 
> I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003
> Server) SO independent.
> 
> When run the scripts below I receive the error:
> 
This is not a bug. There are many ways to shoot yourself in the foot; and it
is one of them...

> UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;
> 
This query will defer all of the triggers, including the cascade ones. If you
compiles with --enable-cassert you will see:

TRAP: FailedAssertion("!(afterTriggers->query_depth >= 0)", Arquivo:
"/a/pgsql/src/backend/commands/trigger.c", Linha: 3893)

Why don't you use the syntax (DEFERRABLE and INITIALLY DEFERRED)? That will do
the Right Thing (TM).

BTW, is it worth preventing such a crash putting an elog message in trigger.c?


--  Euler Taveira de Oliveira http://www.timbira.com/

Re: Delete cascade with three levels bug ?

From

Robert Haas

Date:

27 October 2009, 17:01:02

On Tue, Oct 27, 2009 at 10:41 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:
> Marcelo Costa escreveu:
>> Hi, hackers
>>
>> I have a problem at PostgreSQL 8.3.5 (Slackware Server and Win 2003
>> Server) SO independent.
>>
>> When run the scripts below I receive the error:
>>
> This is not a bug. There are many ways to shoot yourself in the foot; and it
> is one of them...
>
>> UPDATE pg_trigger SET tgdeferrable = TRUE, tginitdeferred = TRUE;
>>
> This query will defer all of the triggers, including the cascade ones. If you
> compiles with --enable-cassert you will see:
>
> TRAP: FailedAssertion("!(afterTriggers->query_depth >= 0)", Arquivo:
> "/a/pgsql/src/backend/commands/trigger.c", Linha: 3893)
>
> Why don't you use the syntax (DEFERRABLE and INITIALLY DEFERRED)? That will do
> the Right Thing (TM).
>
> BTW, is it worth preventing such a crash putting an elog message in trigger.c?

It doesn't seem right to allow a catalog change that results in an
assertion failure.  Seems like we should either prevent the catalog
change, or have an elog() there rather than Assert().

...Robert

Re: Delete cascade with three levels bug ?

From

Tom Lane

Date:

27 October 2009, 17:30:53

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 27, 2009 at 10:41 AM, Euler Taveira de Oliveira
>> BTW, is it worth preventing such a crash putting an elog message in trigger.c?

> It doesn't seem right to allow a catalog change that results in an
> assertion failure.  Seems like we should either prevent the catalog
> change, or have an elog() there rather than Assert().

We can't prevent a superuser from making stupid catalog changes.  I
agree though that it's not very nice to Assert or dump core afterwards.
Changing the assert to an elog seems like an appropriate answer,
assuming that this isn't masking some more significant issue.
        regards, tom lane

Re: Delete cascade with three levels bug ?

From

Tom Lane

Date:

27 October 2009, 18:31:48

Marcelo Costa <marcelojscosta@gmail.com> writes:
> [ trying to defer RI_FKey_cascade_del trigger crashes the backend ]

I looked at this a bit more and think that there's more to it than pilot
error.  The crash occurs because we queue a deferred trigger here:

#0  AfterTriggerSaveEvent (relinfo=0x40181778, event=1, row_trigger=1 '\001', oldtup=0x4018c1b8, newtup=0x0,
recheckIndexes=0x0,modifiedCols=0x0) at trigger.c:3890
 
#1  0x2db9c0 in ExecARDeleteTriggers (estate=0x40181778, relinfo=0x40181778, tupleid=0x4018c1b8) at trigger.c:1896
#2  0x317124 in ExecDelete (tupleid=0x7b03c388, planSlot=0x4018b2c0, epqstate=0x4018b664, estate=0x401816a8) at
nodeModifyTable.c:368
#3  0x317894 in ExecModifyTable (node=0x4018b618) at nodeModifyTable.c:738
#4  0x2fe01c in ExecProcNode (node=0x4018b618) at execProcnode.c:359
#5  0x2fc034 in ExecutePlan (estate=0x401816a8, planstate=0x4018b618, operation=CMD_DELETE, sendTuples=0 '\000',
numberTuples=0,direction=ForwardScanDirection, dest=0x40003a80) at execMain.c:1188
 
#6  0x2fadf0 in standard_ExecutorRun (queryDesc=0x4017e2a0, direction=ForwardScanDirection, count=0) at execMain.c:278
#7  0x2fac94 in ExecutorRun (queryDesc=0x40181778, direction=ForwardScanDirection, count=1) at execMain.c:227
#8  0x3268c8 in _SPI_pquery (queryDesc=0x4017e2a0, fire_triggers=0 '\000', tcount=0) at spi.c:2009
#9  0x326520 in _SPI_execute_plan (plan=0x40180288, paramLI=0x4017e280, snapshot=0x0, crosscheck_snapshot=0x0,
read_only=0,fire_triggers=0 '\000', tcount=0) at spi.c:1831
 
#10 0x32383c in SPI_execute_snapshot (plan=0x40180288, Values=0x7b03be88, Nulls=0x7b03bf88 " ", snapshot=0x0,
crosscheck_snapshot=0x0,read_only=0, fire_triggers=0, tcount=0) at spi.c:441
 
#11 0x493d08 in ri_PerformCheck (qkey=0x7b03bb20, qplan=0x40180288, fk_rel=0x40133600, pk_rel=0x401262f0,
old_tuple=0x7b03b5a8,new_tuple=0x0, detectNewRows=1, expect_OK=8, constrname=0x7b03b8cc "fk_son_father") at
ri_triggers.c:3350
#12 0x4901d4 in RI_FKey_cascade_del (fcinfo=0x40181778) at ri_triggers.c:1166
#13 0x2db140 in ExecCallTriggerFunc (trigdata=0x7b03b588, tgindx=0, finfo=0x7b03b850, instr=0x0,
per_tuple_context=0x40153b60)at trigger.c:1631
 
#14 0x2dccc8 in AfterTriggerExecute (event=0x40170678, rel=0x401262f0, trigdesc=0x1, finfo=0x4016a0d0, instr=0x0,
per_tuple_context=0x40153b60)at trigger.c:2830
 
#15 0x2dd038 in afterTriggerInvokeEvents (events=0x40170a58, firing_id=1, estate=0x40169e40, delete_ok=1 '\001') at
trigger.c:3009
#16 0x2dd5ac in AfterTriggerFireDeferred () at trigger.c:3247
#17 0x1fd230 in CommitTransaction () at xact.c:1581
#18 0x1fdfe8 in CommitTransactionCommand () at xact.c:2323
#19 0x3ed0e8 in finish_xact_command () at postgres.c:2372
#20 0x3eae20 in exec_simple_query (query_string=0x400625b8 "DELETE FROM father WHERE co_father = 1;") at
postgres.c:1039
#21 0x3ef344 in PostgresMain (argc=2, argv=0x51, username=0x4005f9d8 "postgres") at postgres.c:3573

when we are not inside any AfterTriggerBeginQuery/AfterTriggerEndQuery
pair.  Normally _SPI_pquery() would take care of that detail, but it's
been specifically told not to by the RI trigger code (notice the
fire_triggers=0 arguments).  It is not immediately obvious that an
RI trigger query could never cause AFTER triggers to be queued, so
I think this is at least a latent bug, even if this particular symptom
involves intentional misconfiguration of standard triggers.

The two reasonable fixes seem to be to change our minds about not
letting SPI fire deferred triggers in this usage, or to add code to
AfterTriggerFireDeferred to invoke AfterTriggerBegin/EndQuery.
I do not recall what the reasoning was for deferring triggers during
ri_triggers calls, and am afraid to monkey with that behavior unless
someone can lay out an argument why it's okay to change.  So modifying
AfterTriggerFireDeferred seems like the right solution.
        regards, tom lane

Re: Delete cascade with three levels bug ?

From

Tom Lane

Date:

27 October 2009, 19:39:24

I wrote:
> The crash occurs because we queue a deferred trigger here ...
> when we are not inside any AfterTriggerBeginQuery/AfterTriggerEndQuery
> pair.  Normally _SPI_pquery() would take care of that detail, but it's
> been specifically told not to by the RI trigger code (notice the
> fire_triggers=0 arguments).  It is not immediately obvious that an
> RI trigger query could never cause AFTER triggers to be queued, so
> I think this is at least a latent bug, even if this particular symptom
> involves intentional misconfiguration of standard triggers.

After further study I've gone back to the let's-just-make-the-assert-an-elog
camp.  The reason we have an issue is exactly that the cascade-delete
trigger assumes that it will never be deferred, which is how come it
can get away with telling SPI to not deal with any invoked AFTER triggers
immediately.  There isn't any good reason to support deferring that
trigger, nor is there any way to get into the state short of clueless
superuser manipulation of the catalogs.

I did look at modifying AfterTriggerFireDeferred, but concluded that's
not a great solution on two grounds:

* there doesn't seem to be any way to do it without taking a noticeable
performance hit (in particular, an extra useless scan over the deferred
trigger queue to verify everything's been fired);

* the crash occurs all the way back to 8.0, and such a fix would be a
great deal more trouble to back-port.

So elog it is ...
        regards, tom lane