Thread: Bug #514: Backend crashes periodically
Warren Volz (wrv@po.cwru.edu) reports a bug with a severity of 2 The lower the number the more severe it is. Short Description Backend crashes periodically Long Description I am using Postgres version 7.1.3 for a webmail application. At what seems like random intervals the backend dies with aerror status of 11. Whenever the backend dies the mod_perl webserver must be restarted to re-establish the DB connection.The only way I have found to reproduce this problem is to run several consecutive VACUUM commands. Here are the relevant details: DB Version: PostgreSQL v7.1.3 OS: Linux 2.4.5 #10 Fri Jun 22 02:20:21 PDT 2001 i686 Startup command: pg_ctl start -D /home/pgsql/data -l /var/local/pgsql/errlog -o "-F" Changes to config file: max_connections = 64, shared_buffers = 128, wal_files = 16 I managed to capture the core file and recompiled with -g to get the following trace: Core was generated by `postgres: sis sis [local] UPDATE '. Program terminated with signal 11, Segmentation fault. (Omitted symbol loading) #0 ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "") at execQual.c:323 323 heapTuple = slot->val; The output in our pglog file is: Nov 10 20:32:31 wedge postgres[1839]: [1-1] NOTICE: Message from PostgreSQL backend: Nov 10 20:32:31 wedge postgres[1839]: [1-2] ^IThe Postmaster has informed me that some other backend^Idied abnormally andpossibly corrupted shared memory. Nov 10 20:32:31 wedge postgres[1839]: [1-3] ^II have rolled back the current transaction and am^Igoing to terminate yourdatabase system connection and exit. Server process (pid 13230) exited with status 11 at Sat Nov 10 13:21:31 2001 Terminating any active server processes... Server processes were terminated at Sat Nov 10 13:21:31 2001 Sample Code No file was uploaded with this report
pgsql-bugs@postgresql.org writes: > Program terminated with signal 11, Segmentation fault. > (Omitted symbol loading) > #0 ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "") at execQual.c:323 > 323 heapTuple = slot->val; This isn't enough to say anything useful, but please send the results of bt and p debug_query_string in this same corefile. regards, tom lane
Tom, Here ya go... bt output: #0 ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "") at execQual.c:323 #1 0x80c1a2b in ExecEvalExpr (expression=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "", isDone=0xbfffea0c) at execQual.c:1187 #2 0x80c1431 in ExecEvalFuncArgs (fcache=0x823b760, argList=0x82398b0, econtext=0x8239ed8) at execQual.c:614 #3 0x80c14b0 in ExecMakeFunctionResult (fcache=0x823b760, arguments=0x82398b0, econtext=0x8239ed8, isNull=0xbfffeadf "", isDone=0x0) at execQual.c:667 #4 0x80c165e in ExecEvalOper (opClause=0x8239838, econtext=0x8239ed8, isNull=0xbfffeadf "", isDone=0x0) at execQual.c:859 #5 0x80c1aa0 in ExecEvalExpr (expression=0x8239838, econtext=0x8239ed8, isNull=0xbfffeadf "", isDone=0x0) at execQual.c:1222 #6 0x80c1bc6 in ExecQual (qual=0x8239908, econtext=0x8239ed8, resultForNull=0 '\000') at execQual.c:1374 #7 0x80c56db in IndexNext (node=0x8238c98) at nodeIndexscan.c:133 #8 0x80c2063 in ExecScan (node=0x8238c98, accessMtd=0x80c55e0 <IndexNext>) at execScan.c:95 #9 0x80c58af in ExecIndexScan (node=0x8238c98) at nodeIndexscan.c:283 #10 0x80c0729 in ExecProcNode (node=0x8238c98, parent=0x8237d60) at execProcnode.c:282 #11 0x80c724c in ExecNestLoop (node=0x8237d60) at nodeNestloop.c:165 #12 0x80c0759 in ExecProcNode (node=0x8237d60, parent=0x8237d60) at execProcnode.c:297 #13 0x80c0392 in EvalPlanQualNext (estate=0x822ea18) at execMain.c:1884 #14 0x80c0369 in EvalPlanQual (estate=0x822ea18, rti=4, tid=0xbfffed28) at execMain.c:1870 #15 0x80bfcbb in ExecReplace (slot=0x822f970, tupleid=0xbfffed98, estate=0x822ea18) at execMain.c:1465 #16 0x80bf951 in ExecutePlan (estate=0x822ea18, plan=0x822df90, operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection, destfunc=0x82374e8) at execMain.c:1125 #17 0x80bedb7 in ExecutorRun (queryDesc=0x822f640, estate=0x822ea18, feature=3, count=0) at execMain.c:233 #18 0x8105ab3 in ProcessQuery (parsetree=0x820cf98, plan=0x822df90, dest=Remote) at pquery.c:295 #19 0x81044d1 in pg_exec_query_string (query_string=0x820caa0 "DELETE FROM session WHERE session_key = 'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'", dest=Remote, parse_context=0x81e6d58) at postgres.c:810 #20 0x8105564 in PostgresMain (argc=4, argv=0xbffff050, real_argc=4, real_argv=0xbffff9c4, username=0x81dcc09 "sis") at postgres.c:1908 #21 0x80f0194 in DoBackend (port=0x81dc9a0) at postmaster.c:2114 #22 0x80efd4c in BackendStartup (port=0x81dc9a0) at postmaster.c:1897 #23 0x80eef79 in ServerLoop () at postmaster.c:995 #24 0x80ee963 in PostmasterMain (argc=4, argv=0xbffff9c4) at postmaster.c:685 #25 0x80cefad in main (argc=4, argv=0xbffff9c4) at main.c:171 #26 0x400e92eb in __libc_start_main (main=0x80cee90 <main>, argc=4, ubp_av=0xbffff9c4, init=0x80653ec <_init>, fini=0x8148c8c <_fini>, rtld_fini=0x4000c130 <_dl_fini>, stack_end=0xbffff9bc) at ../sysdeps/generic/libc-start.c:129 p debug_query_string output: $1 = 0x820caa0 "DELETE FROM session WHERE session_key = 'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'" -Warren >pgsql-bugs@postgresql.org writes: >> Program terminated with signal 11, Segmentation fault. >> (Omitted symbol loading) >> #0 ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, >>isNull=0x823b7c4 "") at execQual.c:323 >> 323 heapTuple = slot->val; > >This isn't enough to say anything useful, but please send the results of > bt >and > p debug_query_string >in this same corefile.
Warren Volz <wrv@po.cwru.edu> writes: > #14 0x80c0369 in EvalPlanQual (estate=0x822ea18, rti=4, > tid=0xbfffed28) at execMain.c:1870 The fact that the crash is down inside EvalPlanQual makes me think that you are seeing the same problem recently reported by Barry Lind (see pgsql-hackers archives for 5-Nov). The only way to get into EvalPlanQual is to be trying to update/delete a tuple that was already updated or deleted by a concurrent transaction that hasn't yet committed. So the problem is not reproducible on a single-backend basis --- but I think it might be fairly easy to reproduce if you manually put two backends through the paces of your application. Something like Session 1 Session 2 begin; update foo where ... begin; update foo where ... [ blocks waiting for session 1 ] end; [ crash? ] EvalPlanQual does work in simple test cases, so there's some extra ingredient needed to cause the problem --- but what? If you can get a reproducible test sequence then I'm sure we can find and fix the error. Barry hasn't come back with a test case yet, so please work on it... regards, tom lane
Warren Volz <wrv@po.cwru.edu> writes: > p debug_query_string output: > $1 = 0x820caa0 "DELETE FROM session WHERE session_key = > 'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'" BTW, do you have rules or triggers on that table? The backtrace shows the system executing a nestloop join, which implies at least a two-table query, which says that there's more going on than meets the eye here. It would be useful to see the full schema for this table and any related tables, as well as an EXPLAIN VERBOSE for the above query. regards, tom lane
Tom, Yeah, there is a trigger. Here are the relevant bits of the schema, followed by the output of EXPLAIN VERBOSE on a similar query. Thanks, Warren -- -- Table structure for table 'session' -- CREATE TABLE session ( session_key CHAR(40) PRIMARY KEY, sis_user_id INTEGER NOT NULL REFERENCES sis_user (sis_user_id), last_access_time DATETIME NOT NULL DEFAULT TEXT 'now' ); -- A rule for keeping the last_visit updated. CREATE RULE session_del AS ON DELETE TO session DO UPDATE sis_user SET last_visit = OLD.last_access_time WHERE OLD.sis_user_id = sis_user.sis_user_id; CREATE TABLE sis_user ( sis_user_id INTEGER PRIMARY KEY REFERENCES member (member_id), last_name CHAR(30), cwrunet_id CHAR(10) UNIQUE, email_address CHAR(80) NOT NULL UNIQUE, description VARCHAR(128), field_of_study VARCHAR(80), favorite_quote VARCHAR(128), last_visit DATETIME NOT NULL DEFAULT TEXT 'now', second_to_last_visit DATETIME NOT NULL DEFAULT TEXT 'now', email_bounced SMALLINT NOT NULL DEFAULT 0, on_vacation_until DATETIME, password CHAR(13), privacy INTEGER NOT NULL, last_ip_address CHAR(15) DEFAULT '0.0.0.0', checked_mail BOOLEAN NOT NULL DEFAULT FALSE, num_messages INTEGER DEFAULT 0 ); sis=> explain verbose delete from session where session_key='ys!aoUC3kAF2UJ46vz6pSIz1UpsfuLps71YwaT+P'; NOTICE: QUERY DUMP: { NESTLOOP :startup_cost 0.00 :total_cost 35.23 :rows 35 :width 145 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname sis_user_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042 :restypmod 34 :resname last_name :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 2 :vartype 1042 :vartypmod 34 :varlevelsup 0 :varnoold 4 :varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1042 :restypmod 14 :resname cwrunet_id :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 3 :vartype 1042 :vartypmod 14 :varlevelsup 0 :varnoold 4 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1042 :restypmod 84 :resname email_address :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 4 :vartype 1042 :vartypmod 84 :varlevelsup 0 :varnoold 4 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 5 :restype 1043 :restypmod 132 :resname description :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 5 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 4 :varoattno 5}} { TARGETENTRY :resdom { RESDOM :resno 6 :restype 1043 :restypmod 84 :resname field_of_study :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 6 :vartype 1043 :vartypmod 84 :varlevelsup 0 :varnoold 4 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1043 :restypmod 132 :resname favorite_quote :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 7 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 4 :varoattno 7}} { TARGETENTRY :resdom { RESDOM :resno 8 :restype 1184 :restypmod -1 :resname last_visit :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65001 :varattno 1 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 9 :restype 1184 :restypmod -1 :resname second_to_last_visit :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 8 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 9}} { TARGETENTRY :resdom { RESDOM :resno 10 :restype 21 :restypmod -1 :resname email_bounced :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 9 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 10}} { TARGETENTRY :resdom { RESDOM :resno 11 :restype 1184 :restypmod -1 :resname on_vacation_until :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 10 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 11}} { TARGETENTRY :resdom { RESDOM :resno 12 :restype 1042 :restypmod 17 :resname password :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 11 :vartype 1042 :vartypmod 17 :varlevelsup 0 :varnoold 4 :varoattno 12}} { TARGETENTRY :resdom { RESDOM :resno 13 :restype 23 :restypmod -1 :resname privacy :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 12 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 13}} { TARGETENTRY :resdom { RESDOM :resno 14 :restype 1042 :restypmod 19 :resname last_ip_address :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 13 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 4 :varoattno 14}} { TARGETENTRY :resdom { RESDOM :resno 15 :restype 16 :restypmod -1 :resname checked_mail :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 14 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 15}} { TARGETENTRY :resdom { RESDOM :resno 16 :restype 23 :restypmod -1 :resname num_messages :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 15 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 16}} { TARGETENTRY :resdom { RESDOM :resno 17 :restype 27 :restypmod -1 :resname ctid :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk true } :expr { VAR :varno 65000 :varattno 16 :vartype 27 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno -1}}) :qpqual <> :lefttree { SEQSCAN :startup_cost 0.00 :total_cost 1.43 :rows 1 :width 12 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1184 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2}}) :qpqual ({ EXPR :typeOid 16 :opType op :oper { OPER :opno 1054 :opid 1048 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 44 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 1042 :constlen -1 :constbyval false :constisnull false :constvalue 44 [ 44 0 0 0 121 115 33 97 111 85 67 51 107 65 70 50 85 74 52 54 118 122 54 112 83 73 122 49 85 112 115 102 117 76 112 115 55 49 89 119 97 84 43 80 ] })}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } :righttree { INDEXSCAN :startup_cost 0.00 :total_cost 33.36 :rows 35 :width 133 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042 :restypmod 34 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 2 :vartype 1042 :vartypmod 34 :varlevelsup 0 :varnoold 4 :varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1042 :restypmod 14 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 3 :vartype 1042 :vartypmod 14 :varlevelsup 0 :varnoold 4 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1042 :restypmod 84 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 4 :vartype 1042 :vartypmod 84 :varlevelsup 0 :varnoold 4 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 5 :restype 1043 :restypmod 132 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 5 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 4 :varoattno 5}} { TARGETENTRY :resdom { RESDOM :resno 6 :restype 1043 :restypmod 84 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 6 :vartype 1043 :vartypmod 84 :varlevelsup 0 :varnoold 4 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1043 :restypmod 132 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 7 :vartype 1043 :vartypmod 132 :varlevelsup 0 :varnoold 4 :varoattno 7}} { TARGETENTRY :resdom { RESDOM :resno 8 :restype 1184 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 9 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 9}} { TARGETENTRY :resdom { RESDOM :resno 9 :restype 21 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 10 :vartype 21 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 10}} { TARGETENTRY :resdom { RESDOM :resno 10 :restype 1184 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 11 :vartype 1184 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 11}} { TARGETENTRY :resdom { RESDOM :resno 11 :restype 1042 :restypmod 17 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 12 :vartype 1042 :vartypmod 17 :varlevelsup 0 :varnoold 4 :varoattno 12}} { TARGETENTRY :resdom { RESDOM :resno 12 :restype 23 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 13 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 13}} { TARGETENTRY :resdom { RESDOM :resno 13 :restype 1042 :restypmod 19 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 14 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 4 :varoattno 14}} { TARGETENTRY :resdom { RESDOM :resno 14 :restype 16 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 15 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 15}} { TARGETENTRY :resdom { RESDOM :resno 15 :restype 23 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno 16 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 16}} { TARGETENTRY :resdom { RESDOM :resno 16 :restype 27 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4 :varattno -1 :vartype 27 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno -1}}) :qpqual <> :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 4 :indxid ( 183804) :indxqual (({ EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 4 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 1} { VAR :varno 65001 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2})})) :indxqualorig (({ EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 65001 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2} { VAR :varno 4 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 4 :varoattno 1})})) :indxorderdir 0 } :extprm () :locprm () :initplan <> :nprm 0 :jointype 0 :joinqual <>} NOTICE: QUERY PLAN: Nested Loop (cost=0.00..35.23 rows=35 width=145) -> Seq Scan on session (cost=0.00..1.43 rows=1 width=12) -> Index Scan using sis_user_pkey on sis_user (cost=0.00..33.36 rows=35 width=133) NOTICE: QUERY DUMP: { SEQSCAN :startup_cost 0.00 :total_cost 1.43 :rows 1 :width 6 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 27 :restypmod -1 :resname ctid :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk true } :expr { VAR :varno 1 :varattno -1 :vartype 27 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno -1}}) :qpqual ({ EXPR :typeOid 16 :opType op :oper { OPER :opno 1054 :opid 1048 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 44 :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 1042 :constlen -1 :constbyval false :constisnull false :constvalue 44 [ 44 0 0 0 121 115 33 97 111 85 67 51 107 65 70 50 85 74 52 54 118 122 54 112 83 73 122 49 85 112 115 102 117 76 112 115 55 49 89 119 97 84 43 80 ] })}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } NOTICE: QUERY PLAN: Seq Scan on session (cost=0.00..1.43 rows=1 width=6) EXPLAIN At 10:53 PM -0500 11/11/01, Tom Lane wrote: >Warren Volz <wrv@po.cwru.edu> writes: >> p debug_query_string output: > >> $1 = 0x820caa0 "DELETE FROM session WHERE session_key = >> 'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'" > >BTW, do you have rules or triggers on that table? The backtrace shows >the system executing a nestloop join, which implies at least a two-table >query, which says that there's more going on than meets the eye here. >It would be useful to see the full schema for this table and any related >tables, as well as an EXPLAIN VERBOSE for the above query. > > regards, tom lane
Okay, I've extracted a reproducible test case from Warren's info: Setup: CREATE TABLE sis_user ( sis_user_id INTEGER PRIMARY KEY, last_visit DATETIME NOT NULL DEFAULT TEXT 'now'); CREATE TABLE session ( session_key CHAR(40) PRIMARY KEY, sis_user_id INTEGER NOT NULL REFERENCES sis_user(sis_user_id), last_access_time DATETIME NOT NULL DEFAULT TEXT 'now'); CREATE RULE session_del AS ON DELETE TO session DO UPDATE sis_user SET last_visit = OLD.last_access_time WHERE OLD.sis_user_id = sis_user.sis_user_id; insert into sis_user values(1); In session 1, do: insert into session values('zzz', 1); begin; delete from session where session_key = 'zzz'; In session 2, do: delete from session where session_key = 'zzz'; Back to session 1: end; Session 2 crashes. Crash happens in current sources as well as 7.1. The crash does NOT happen if the foreign key reference is removed from session.sis_user_id. I'm not sure at this point whether the rule is a crucial element or not. The problem appears to be generation of an invalid plan tree: the Var being decoded has varno = OUTER, which offhand I believe it should not have in an indexscan's qual tree. I'm tired and am going to bed soon, but perhaps someone else would like to poke at this before I get back to it... regards, tom lane
I believe I have found and fixed the crash-in-EvalPlanQual problem you two reported. The patch for current sources is attached --- it should work in 7.1.* too, although line numbers might be off a bit. regards, tom lane *** src/backend/executor/nodeIndexscan.c.orig Mon Oct 29 15:30:53 2001 --- src/backend/executor/nodeIndexscan.c Mon Nov 12 12:18:06 2001 *************** *** 339,348 **** { /* * If we are being passed an outer tuple, save it for runtime key ! * calc */ if (exprCtxt != NULL) econtext->ecxt_outertuple = exprCtxt->ecxt_outertuple; /* * Reset the runtime-key context so we don't leak memory as each --- 339,355 ---- { /* * If we are being passed an outer tuple, save it for runtime key ! * calc. We also need to link it into the "regular" per-tuple ! * econtext, so it can be used during indexqualorig evaluations. */ if (exprCtxt != NULL) + { + ExprContext *stdecontext; + econtext->ecxt_outertuple = exprCtxt->ecxt_outertuple; + stdecontext = node->scan.scanstate->cstate.cs_ExprContext; + stdecontext->ecxt_outertuple = exprCtxt->ecxt_outertuple; + } /* * Reset the runtime-key context so we don't leak memory as each
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > BTW I have a question about this example. > Session 2 deletes no row because *session 1* already > deleted the rows but the session_del rule seems to > be invoked. Is it a right behavior ? Certainly the rule should be invoked --- whether it finds any rows cannot be determined until it's invoked, anyhow. Remember a rule is not a trigger, it is a transform applied to the query before we begin execution. regards, tom lane
Tom Lane wrote: > > Okay, I've extracted a reproducible test case from Warren's info: > > Setup: > > CREATE TABLE sis_user ( > sis_user_id INTEGER PRIMARY KEY, > last_visit DATETIME NOT NULL DEFAULT TEXT 'now'); > > CREATE TABLE session ( > session_key CHAR(40) PRIMARY KEY, > sis_user_id INTEGER NOT NULL REFERENCES sis_user(sis_user_id), > last_access_time DATETIME NOT NULL DEFAULT TEXT 'now'); > > CREATE RULE session_del AS ON DELETE TO session DO > UPDATE sis_user SET last_visit = OLD.last_access_time > WHERE OLD.sis_user_id = sis_user.sis_user_id; > > insert into sis_user values(1); > > In session 1, do: > > insert into session values('zzz', 1); > begin; > delete from session where session_key = 'zzz'; > > In session 2, do: > > delete from session where session_key = 'zzz'; > > Back to session 1: > > end; Thanks for your quick fix as usual. BTW I have a question about this example. Session 2 deletes no row because *session 1* already deleted the rows but the session_del rule seems to be invoked. Is it a right behavior ? regards, Hiroshi Inoue
Tom, Thanks for figuring this out. I applied the patch and the test case no longer causes the backend to crash. I will let you know if I have more problems with this bug. -Warren At 12:23 PM -0500 11/12/01, Tom Lane wrote: >I believe I have found and fixed the crash-in-EvalPlanQual problem you >two reported. The patch for current sources is attached --- it should >work in 7.1.* too, although line numbers might be off a bit.
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > BTW I have a question about this example. > > Session 2 deletes no row because *session 1* already > > deleted the rows but the session_del rule seems to > > be invoked. Is it a right behavior ? > > Certainly the rule should be invoked --- whether it finds any rows > cannot be determined until it's invoked, anyhow. Remember a rule > is not a trigger, it is a transform applied to the query before > we begin execution. If it's right, my way is to not use our rule system. regards, Hiroshi Inoue
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > BTW I have a question about this example. > > Session 2 deletes no row because *session 1* already > > deleted the rows but the session_del rule seems to > > be invoked. Is it a right behavior ? > > Certainly the rule should be invoked --- whether it finds any rows > cannot be determined until it's invoked, anyhow. Remember a rule > is not a trigger, it is a transform applied to the query before > we begin execution. If it's > > regards, tom lane
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > BTW I have a question about this example. > > Session 2 deletes no row because *session 1* already > > deleted the rows but the session_del rule seems to > > be invoked. Is it a right behavior ? > > Certainly the rule should be invoked --- whether it finds any rows > cannot be determined until it's invoked, anyhow. Remember a rule > is not a trigger, it is a transform applied to the query before > we begin execution. Hmm I may have misunderstood your explanation. What I meant is Session 2 invokes the session_del rule and really updates a sis_user row by the rule though it deletes no session row. regards, Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > What I meant is > Session 2 invokes the session_del rule and really > updates a sis_user row by the rule though it deletes > no session row. Hmmm ... that's an ugly thought, isn't it? And I'm not sure there's anything we can do to defend against it. If both sessions are executing the UPDATE at the same time, then neither can possibly know that the other is about to do a DELETE. So the UPDATE will happen twice, which is harmless in the given scenario but would be decidedly not so if the UPDATE were changing some sort of total or balance. Perhaps a safer way to write such a rule would be ... ON DELETE DO SELECT * FROM target_table WHERE key = OLD.key FOR UPDATE; UPDATE other_table SET balance = balance + OLD.quantity WHERE something-or-other; I haven't experimented with this but it seems that the FOR UPDATE ought to provide the necessary interlock to ensure that only one transaction does the UPDATE for a particular target row. If this does work, is there a way to make the FOR UPDATE lock happen implicitly in the rule mechanism? Should we even try? It's a lot of overhead that may not always be needed. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > What I meant is > > Session 2 invokes the session_del rule and really > > updates a sis_user row by the rule though it deletes > > no session row. > > Hmmm ... that's an ugly thought, isn't it? And I'm not sure there's > anything we can do to defend against it. If both sessions are executing > the UPDATE at the same time, then neither can possibly know that the > other is about to do a DELETE. So the UPDATE will happen twice, which > is harmless in the given scenario but would be decidedly not so if the > UPDATE were changing some sort of total or balance. Yes it seems a pretty serious problem. The problem is that session 2 sees a not yet deleted( by session 1) session row and an already updated( by session 1) sis_user row at the same time. There's no such snapshot that could see both rows. How do you think about the following query ? update sis_user set last_visit = session.last_access_time where sis_user_id = session.sis_user_id and session.session_key = 'zzz'; UPDATE acquires row level locks on the target sis_user rows but doesn't acuiqre any row level lock on the related session rows. Could it guarantee the consistency of the query ? regards, Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > The problem is that session 2 sees a not yet deleted( by session 1) > session row and an already updated( by session 1) sis_user row at > the same time. There's no such snapshot that could see both rows. Running in SERIALIZABLE mode would guarantee consistency, no? (Session 2 would be forced to roll back and try again, and when it tried again it would see the session row already gone.) I doubt that there is any lesser answer that guarantees consistency across multiple queries. regards, tom lane
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > The problem is that session 2 sees a not yet deleted( by session 1) > > session row and an already updated( by session 1) sis_user row at > > the same time. There's no such snapshot that could see both rows. > > Running in SERIALIZABLE mode would guarantee consistency, no? > (Session 2 would be forced to roll back and try again, and when > it tried again it would see the session row already gone.) Yes but there would be no one who uses SERIALIZABLE mode because of such reason. In READ COMMITTED mode, PG system couldn't abort easily and it makes the implementation of READ COMMITTED mode much harder than that of SERIALIZABLE. It's impossible to implement READ COMMITTED mode without changing snapshots appropriately and acquire appropriate locks carefully. regards, Hiroshi Inoue