Thread: Segfaults and assertion failures with not too extraordinary views and queries
I have been attempting to migrate my application from 8.1 to 8.2.3. In doing so, I found some queries would always cause the postgres backend to die with a segfault. I was advised to rebuild with -- enable-debug --enable-cassert, and so I did. The same query would now cause an assertion failure instead of segfaulting. I reduced a dump of my database as much as possible and arrived at this: -------------------------------------------- SET client_min_messages = warning; CREATE SCHEMA private; CREATE TABLE private.orderitem ( objectid integer ); CREATE FUNCTION public.i_have_global_priv() RETURNS boolean AS $$ select true; $$ LANGUAGE sql STABLE SECURITY DEFINER; -- removing security definer avoids the problem CREATE TABLE private.orderitemproduct ( objectid integer NOT NULL ); CREATE VIEW public.orderitemproduct AS SELECT orderitemproduct.objectid FROM private.orderitemproduct WHERE i_have_global_priv(); -- not calling i_have_global_priv avoids the problem CREATE VIEW public.orderitem_with_prices AS SELECT 1 FROM private.orderitem LEFT JOIN orderitemproduct USING (objectid); -- removing either pkey avoids the problem ALTER TABLE ONLY private.orderitem ADD CONSTRAINT orderitem_pkey PRIMARY KEY (objectid); ALTER TABLE ONLY private.orderitemproduct ADD CONSTRAINT orderitemproduct_pkey PRIMARY KEY (objectid); SELECT * FROM orderitem_with_prices; -- succeeds SELECT * FROM orderitem_with_prices limit 1; -- fails, output below -------------------------------------------- Apparently, even asking "EXPLAIN SELECT * FROM orderitem_with_prices limit 1;" causes the same problem. The server log will contain this: TRAP: FailedAssertion("!(!(eflags & 0x0008))", File: "nodeResult.c", Line: 183) LOG: server process (PID 12838) was terminated by signal 6 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2007-02-14 17:00:32 EST LOG: checkpoint record is at 0/4AA38710 LOG: redo record is at 0/4AA38710; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/12285; next OID: 2457841 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/4AA38758 LOG: unexpected pageaddr 0/43A54000 in log file 0, segment 74, offset 10829824 LOG: redo done at 0/4AA53B24 LOG: database system is ready gcc provides the wisdom: Program received signal SIGABRT, Aborted. 0x9004796c in kill () (gdb) bt #0 0x9004796c in kill () #1 0x9012dc14 in abort () #2 0x00206fa0 in ExceptionalCondition (conditionName=0x2 <Address 0x2 out of bounds>, errorType=0x25 <Address 0x25 out of bounds>, fileName=0x8 <Address 0x8 out of bounds>, lineNumber=80) at assert.c:51 #3 0x000f8c18 in ExecInitResult (node=0xe48740, estate=0x204901c, eflags=8) at nodeResult.c:183 #4 0x000e60f8 in ExecInitNode (node=0xe48740, estate=0x204901c, eflags=8) at execProcnode.c:141 #5 0x000f7aa0 in ExecInitMergeJoin (node=0xe4885c, estate=0x204901c, eflags=0) at nodeMergejoin.c:1539 #6 0x000e61c8 in ExecInitNode (node=0xe4885c, estate=0x204901c, eflags=0) at execProcnode.c:212 #7 0x000fa64c in ExecInitLimit (node=0xe488e8, estate=0x204901c, eflags=0) at nodeLimit.c:337 #8 0x000e6258 in ExecInitNode (node=0xe488e8, estate=0x204901c, eflags=0) at execProcnode.c:260 #9 0x000e5274 in ExecutorStart (queryDesc=0x2045c48, eflags=0) at execMain.c:628 #10 0x001863b8 in PortalStart (portal=0x204701c, params=0x0, snapshot=0x0) at pquery.c:426 #11 0x00182388 in exec_simple_query (query_string=0x202da1c "select * from orderitem_with_prices limit 1;") at postgres.c:902 #12 0x00183c98 in PostgresMain (argc=4, argv=0x2011790, username=0x20118a8 "postgres") at postgres.c:3424 #13 0x00154880 in ServerLoop () at postmaster.c:2931 #14 0x00155d38 in PostmasterMain (argc=3, argv=0x1900750) at postmaster.c:963 #15 0x001094fc in main (argc=3, argv=0x1900780) at main.c:188 pg_config tells me that i run: BINDIR = /usr/local/pgsql/bin DOCDIR = /usr/local/pgsql/doc INCLUDEDIR = /usr/local/pgsql/include PKGINCLUDEDIR = /usr/local/pgsql/include INCLUDEDIR-SERVER = /usr/local/pgsql/include/server LIBDIR = /usr/local/pgsql/lib PKGLIBDIR = /usr/local/pgsql/lib LOCALEDIR = MANDIR = /usr/local/pgsql/man SHAREDIR = /usr/local/pgsql/share SYSCONFDIR = /usr/local/pgsql/etc PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--with-python' '--with-openssl' '--enable-debug' '-- enable-cassert' CC = gcc -no-cpp-precomp CPPFLAGS = CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline - Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g CFLAGS_SL = LDFLAGS = LDFLAGS_SL = LIBS = -lpgport -lssl -lcrypto -lz -lreadline -lm VERSION = PostgreSQL 8.2.3 uname adds: Darwin Coding-Mac.local 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power Macintosh powerpc
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Tom Lane
Date:
Phil Frost <phil@macprofessionals.com> writes: > I have been attempting to migrate my application from 8.1 to 8.2.3. > In doing so, I found some queries would always cause the postgres > backend to die with a segfault. I was advised to rebuild with -- > enable-debug --enable-cassert, and so I did. The same query would now > cause an assertion failure instead of segfaulting. Hm, I see the assert failure, but this example doesn't seem to crash when asserts are off, and I'd not expect it to: it should either work or elog(ERROR) in ExecRestrPos. So maybe you've found more than one issue. Can you get a stack trace from a case that causes a non-assert core dump? (You don't need to rebuild, just set debug_assertions = 0 while testing.) regards, tom lane
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Tom Lane
Date:
I wrote: > Hm, I see the assert failure, but this example doesn't seem to crash > when asserts are off, and I'd not expect it to: it should either work or > elog(ERROR) in ExecRestrPos. So maybe you've found more than one issue. The attached patch fixes the failure you exhibited, but I still don't see how this problem would lead to a non-assert crash --- what you should get is elog(ERROR, "unrecognized node type") from ExecRestrPos. So please try some more test cases and see if there's another problem lurking. BTW, the problem was due to code added in response to an earlier gripe of yours (allowing gating conditions to be pushed further down in the join tree). Maybe you're the only person stressing that particular aspect of the planner ... regards, tom lane
Attachment
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Phil Frost
Date:
Agh...I was afraid of that. What I've found so far is at <http://pgfoundry.org/pipermail/veil- general/2007-February/000056.html>, and the rest of thread in general. Obviously some of these problems are veil's, but for the test I sent previously I had deleted veil.so so it couldn't be blamed. I'll explore the problem more today and see if I can get a backtrace with a debug version and debug_assertions off. On Feb 14, 2007, at 5:49 PM, Tom Lane wrote: > Phil Frost <phil@macprofessionals.com> writes: >> I have been attempting to migrate my application from 8.1 to 8.2.3. >> In doing so, I found some queries would always cause the postgres >> backend to die with a segfault. I was advised to rebuild with -- >> enable-debug --enable-cassert, and so I did. The same query would now >> cause an assertion failure instead of segfaulting. > > Hm, I see the assert failure, but this example doesn't seem to crash > when asserts are off, and I'd not expect it to: it should either > work or > elog(ERROR) in ExecRestrPos. So maybe you've found more than one > issue. > Can you get a stack trace from a case that causes a non-assert core > dump? (You don't need to rebuild, just set debug_assertions = 0 while > testing.) > > regards, tom lane >
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Phil Frost
Date:
Yeah, maybe I am the only one doing this to the planner. The optimizations which caused this error will be greatly appreciated when I stop getting segfaults, however :) I rebuilt my pg with the attached patch and it did indeed resolve the assertion failure the previous test produced. I tinkered some more and got this patched build to segfault with a slightly different script, below. Part of the funkiness seems to be in that the view is recursive; orderitem_with_prices calls sum_of_subitem_total_taxable_prices, which queries orderitem_with_prices. I would not be suprised if no one else is doing this, since I think the only way to get it done under normal conditions is to create an empty function first (since the view doesn't exist), make the view, then replace the function; or to set check_function_bodies = false. As weird as it is though, it worked in 8.1 and a segfault is bad in any case. ------------------------------------- SET check_function_bodies = false; SET client_min_messages = warning; CREATE SCHEMA private; CREATE TABLE private.orderitem ( objectid integer, showsubitems boolean, longdescription character varying, showsubitemprices boolean, quantity numeric, sortorder real, superitem integer, notes character varying, name character varying, "order" integer ); CREATE FUNCTION public.i_have_global_priv() RETURNS boolean AS $$ select true $$ LANGUAGE sql STABLE SECURITY DEFINER; CREATE VIEW public.orderitem AS SELECT orderitem.objectid, orderitem.superitem FROM private.orderitem WHERE i_have_global_priv(); CREATE FUNCTION public.sum_of_subitem_total_taxable_prices(integer) RETURNS numeric AS $_$ select 1.0 from orderitem_with_prices where superitem = $1 -- changing this to just "select 1.0" avoids the problem $_$ LANGUAGE sql STABLE STRICT; CREATE TABLE private.orderitemproduct ( objectid integer ); CREATE VIEW public.orderitemproduct AS SELECT orderitemproduct.objectid FROM private.orderitemproduct WHERE i_have_global_priv(); CREATE VIEW public.orderitem_with_prices AS SELECT orderitem.objectid, orderitem.superitem, sum_of_subitem_total_taxable_prices(orderitem.objectid) AS taxablepriceeach FROM (orderitem LEFT JOIN orderitemproduct USING (objectid)); insert into private.orderitem(objectid) select generate_series(1, 6000); insert into private.orderitem(objectid, superitem) select generate_series(6001, 12000), generate_series(1, 6000); insert into private.orderitemproduct(objectid) select generate_series (1, 6000); ALTER TABLE ONLY private.orderitem ADD CONSTRAINT orderitem_pkey PRIMARY KEY (objectid); ALTER TABLE ONLY private.orderitemproduct ADD CONSTRAINT orderitemproduct_pkey PRIMARY KEY (objectid); -------------------------------- To produce the segfault, I do this: vacuum analyze; set enable_seqscan = 0; prepare crash as select * from orderitem_with_prices limit 1; explain execute crash; execute crash; The error seems to depend on the plan chosen; here it won't fail if I don't vacuum. The plan generated for me is: QUERY PLAN ------------------------------------------------------------------------ --------------------------------------------------- Limit (cost=0.01..0.06 rows=1 width=8) -> Result (cost=0.01..680.51 rows=12000 width=8) One-Time Filter: i_have_global_priv() -> Merge Left Join (cost=0.01..650.51 rows=12000 width=8) Merge Cond: (private.orderitem.objectid = private.orderitemproduct.objectid) -> Index Scan using orderitem_pkey on orderitem (cost=0.00..361.25 rows=12000 width=8) -> Result (cost=0.00..184.25 rows=6000 width=4) One-Time Filter: i_have_global_priv() -> Index Scan using orderitemproduct_pkey on orderitemproduct (cost=0.00..184.25 rows=6000 width=4) I get this backtrace: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x0000000c ExecEvalVar (exprstate=0xed0aec, econtext=0xed0b08, isNull=0xbfffc5c8 "????", isDone=0x0) at execQual.c:496 496 TupleDesc slot_tupdesc = slot- >tts_tupleDescriptor; (gdb) bt #0 ExecEvalVar (exprstate=0xed0aec, econtext=0xed0b08, isNull=0xbfffc5c8 "????", isDone=0x0) at execQual.c:496 #1 0x000e9304 in ExecEvalExprSwitchContext (expression=0xed0aec, econtext=0xed0b08, isNull=0xbfffc5c8 "????", isDone=0x0) at execQual.c:3302 #2 0x000f5750 in ExecIndexEvalRuntimeKeys (econtext=0xed0b08, runtimeKeys=0xed0b08, numRuntimeKeys=1) at nodeIndexscan.c:249 #3 0x000f5810 in ExecIndexReScan (node=0xed0458, exprCtxt=0xed0b08) at nodeIndexscan.c:202 #4 0x000e2040 in ExecReScan (node=0xed0458, exprCtxt=0x20692ec) at execAmi.c:125 #5 0x000e1ff0 in ExecReScan (node=0x206a5a8, exprCtxt=0x20692ec) at execAmi.c:105 #6 0x000f7f88 in ExecNestLoop (node=0x2069260) at nodeNestloop.c:162 #7 0x000e59e0 in ExecProcNode (node=0x2069260) at execProcnode.c:382 #8 0x000f8a5c in ExecResult (node=0x2068cb4) at nodeResult.c:130 #9 0x000e5950 in ExecProcNode (node=0x2068cb4) at execProcnode.c:334 #10 0x000e405c in ExecutorRun (queryDesc=0xeccef4, direction=ForwardScanDirection, count=1) at execMain.c:1082 #11 0x000eee44 in postquel_getnext (es=0xece2b8) at functions.c:359 #12 0x000eff98 in fmgr_sql (fcinfo=0xbfffcdf8) at functions.c:460 #13 0x000e8fe8 in ExecMakeFunctionResult (fcache=0x2072794, econtext=0x2072664, isNull=0xec4fca "", isDone=0xec5030) at execQual.c:1269 #14 0x000e99b0 in ExecProject (projInfo=0x2072bec, isDone=0xbfffd108) at execQual.c:4119 #15 0x000f8a90 in ExecResult (node=0x20725d8) at nodeResult.c:157 #16 0x000e5950 in ExecProcNode (node=0x20725d8) at execProcnode.c:334 #17 0x000fa268 in ExecLimit (node=0x20724b4) at nodeLimit.c:84 #18 0x000e5a80 in ExecProcNode (node=0x20724b4) at execProcnode.c:425 #19 0x000e405c in ExecutorRun (queryDesc=0x205f048, direction=ForwardScanDirection, count=0) at execMain.c:1082 #20 0x0018567c in PortalRunSelect (portal=0x20466a8, forward=1 '\001', count=0, dest=0x204603c) at pquery.c:831 #21 0x00186ef8 in PortalRun (portal=0x20466a8, count=2147483647, dest=0x204603c, altdest=0x204603c, completionTag=0x0) at pquery.c:656 #22 0x000bc7b4 in ExecuteQuery (stmt=0x2031cf0, params=0x0, dest=0x204603c, completionTag=0x0) at prepare.c:216 #23 0x00185d60 in PortalRunUtility (portal=0x204661c, query=0x2031d98, dest=0x204603c, completionTag=0x0) at pquery.c:1063 #24 0x00186900 in FillPortalStore (portal=0x204661c) at pquery.c:937 #25 0x00186fac in PortalRun (portal=0x204661c, count=2147483647, dest=0x2031d4c, altdest=0x2031d4c, completionTag=0xbfffdc5a "") at pquery.c:679 #26 0x001823c8 in exec_simple_query (query_string=0x2031a1c "execute crash;") at postgres.c:939 #27 0x00183c54 in PostgresMain (argc=4, argv=0x201544c, username=0x2015424 "pfrost") at postgres.c:3424 #28 0x0015483c in ServerLoop () at postmaster.c:2931 #29 0x00155cf4 in PostmasterMain (argc=3, argv=0x1900750) at postmaster.c:963 #30 0x001094b8 in main (argc=3, argv=0x1900780) at main.c:188 On Feb 14, 2007, at 10:13 PM, Tom Lane wrote: > I wrote: >> Hm, I see the assert failure, but this example doesn't seem to crash >> when asserts are off, and I'd not expect it to: it should either >> work or >> elog(ERROR) in ExecRestrPos. So maybe you've found more than one >> issue. > > The attached patch fixes the failure you exhibited, but I still don't > see how this problem would lead to a non-assert crash --- what you > should get is elog(ERROR, "unrecognized node type") from ExecRestrPos. > So please try some more test cases and see if there's another problem > lurking. > > BTW, the problem was due to code added in response to an earlier > gripe of > yours (allowing gating conditions to be pushed further down in the > join > tree). Maybe you're the only person stressing that particular aspect > of the planner ... > > regards, tom lane
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Tom Lane
Date:
Phil Frost <phil@macprofessionals.com> writes: > I rebuilt my pg with the attached patch and it did indeed resolve the > assertion failure the previous test produced. I tinkered some more > and got this patched build to segfault with a slightly different > script, below. Sigh ... I looked through nodeResult to see if there were any other bits of pass-through functionality that were missing, but I did not recognize the need for this hack ... probably should have, seeing that I had to do the same thing to Append last year. regards, tom lane
Attachment
Re: Segfaults and assertion failures with not too extraordinary views and queries
From
Phil Frost
Date:
Great, this seems to solve the problem. Thanks for the prompt attention. On Feb 15, 2007, at 10:52 PM, Tom Lane wrote: > Phil Frost <phil@macprofessionals.com> writes: >> I rebuilt my pg with the attached patch and it did indeed resolve the >> assertion failure the previous test produced. I tinkered some more >> and got this patched build to segfault with a slightly different >> script, below. > > Sigh ... I looked through nodeResult to see if there were any other > bits > of pass-through functionality that were missing, but I did not > recognize > the need for this hack ... probably should have, seeing that I had > to do > the same thing to Append last year.