Thread: Segfaults and assertion failures with not too extraordinary views and queries

Segfaults and assertion failures with not too extraordinary views and queries

From
Phil Frost
Date:
I have been attempting to migrate my application from 8.1 to 8.2.3.
In doing so, I found some queries would always cause the postgres
backend to die with a segfault. I was advised to rebuild with --
enable-debug --enable-cassert, and so I did. The same query would now
cause an assertion failure instead of segfaulting. I reduced a dump
of my database as much as possible and arrived at this:


--------------------------------------------

SET client_min_messages = warning;


CREATE SCHEMA private;


CREATE TABLE private.orderitem (
     objectid integer
);


CREATE FUNCTION public.i_have_global_priv() RETURNS boolean
     AS $$
     select true;
$$
     LANGUAGE sql STABLE SECURITY DEFINER;       -- removing security
definer avoids the problem


CREATE TABLE private.orderitemproduct (
     objectid integer NOT NULL
);


CREATE VIEW public.orderitemproduct AS
     SELECT orderitemproduct.objectid FROM private.orderitemproduct
WHERE i_have_global_priv();  -- not calling i_have_global_priv avoids
the problem


CREATE VIEW public.orderitem_with_prices AS
     SELECT 1 FROM private.orderitem LEFT JOIN orderitemproduct USING
(objectid);


-- removing either pkey avoids the problem

ALTER TABLE ONLY private.orderitem
     ADD CONSTRAINT orderitem_pkey PRIMARY KEY (objectid);

ALTER TABLE ONLY private.orderitemproduct
     ADD CONSTRAINT orderitemproduct_pkey PRIMARY KEY (objectid);


SELECT * FROM orderitem_with_prices; -- succeeds
SELECT * FROM orderitem_with_prices limit 1; -- fails, output below

--------------------------------------------

Apparently, even asking "EXPLAIN SELECT * FROM orderitem_with_prices
limit 1;" causes the same problem. The server log will contain this:

TRAP: FailedAssertion("!(!(eflags & 0x0008))", File: "nodeResult.c",
Line: 183)
LOG:  server process (PID 12838) was terminated by signal 6
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2007-02-14 17:00:32 EST
LOG:  checkpoint record is at 0/4AA38710
LOG:  redo record is at 0/4AA38710; undo record is at 0/0; shutdown
FALSE
LOG:  next transaction ID: 0/12285; next OID: 2457841
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery
in progress
LOG:  redo starts at 0/4AA38758
LOG:  unexpected pageaddr 0/43A54000 in log file 0, segment 74,
offset 10829824
LOG:  redo done at 0/4AA53B24
LOG:  database system is ready


gcc provides the wisdom:


Program received signal SIGABRT, Aborted.
0x9004796c in kill ()
(gdb) bt
#0  0x9004796c in kill ()
#1  0x9012dc14 in abort ()
#2  0x00206fa0 in ExceptionalCondition (conditionName=0x2 <Address
0x2 out of bounds>, errorType=0x25 <Address 0x25 out of bounds>,
fileName=0x8 <Address 0x8 out of bounds>, lineNumber=80) at assert.c:51
#3  0x000f8c18 in ExecInitResult (node=0xe48740, estate=0x204901c,
eflags=8) at nodeResult.c:183
#4  0x000e60f8 in ExecInitNode (node=0xe48740, estate=0x204901c,
eflags=8) at execProcnode.c:141
#5  0x000f7aa0 in ExecInitMergeJoin (node=0xe4885c, estate=0x204901c,
eflags=0) at nodeMergejoin.c:1539
#6  0x000e61c8 in ExecInitNode (node=0xe4885c, estate=0x204901c,
eflags=0) at execProcnode.c:212
#7  0x000fa64c in ExecInitLimit (node=0xe488e8, estate=0x204901c,
eflags=0) at nodeLimit.c:337
#8  0x000e6258 in ExecInitNode (node=0xe488e8, estate=0x204901c,
eflags=0) at execProcnode.c:260
#9  0x000e5274 in ExecutorStart (queryDesc=0x2045c48, eflags=0) at
execMain.c:628
#10 0x001863b8 in PortalStart (portal=0x204701c, params=0x0,
snapshot=0x0) at pquery.c:426
#11 0x00182388 in exec_simple_query (query_string=0x202da1c "select *
from orderitem_with_prices limit 1;") at postgres.c:902
#12 0x00183c98 in PostgresMain (argc=4, argv=0x2011790,
username=0x20118a8 "postgres") at postgres.c:3424
#13 0x00154880 in ServerLoop () at postmaster.c:2931
#14 0x00155d38 in PostmasterMain (argc=3, argv=0x1900750) at
postmaster.c:963
#15 0x001094fc in main (argc=3, argv=0x1900780) at main.c:188


pg_config tells me that i run:


BINDIR = /usr/local/pgsql/bin
DOCDIR = /usr/local/pgsql/doc
INCLUDEDIR = /usr/local/pgsql/include
PKGINCLUDEDIR = /usr/local/pgsql/include
INCLUDEDIR-SERVER = /usr/local/pgsql/include/server
LIBDIR = /usr/local/pgsql/lib
PKGLIBDIR = /usr/local/pgsql/lib
LOCALEDIR =
MANDIR = /usr/local/pgsql/man
SHAREDIR = /usr/local/pgsql/share
SYSCONFDIR = /usr/local/pgsql/etc
PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-python' '--with-openssl' '--enable-debug' '--
enable-cassert'
CC = gcc -no-cpp-precomp
CPPFLAGS =
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -
Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g
CFLAGS_SL =
LDFLAGS =
LDFLAGS_SL =
LIBS = -lpgport -lssl -lcrypto -lz -lreadline -lm
VERSION = PostgreSQL 8.2.3


uname adds:


Darwin Coding-Mac.local 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep  8
17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power
Macintosh powerpc
Phil Frost <phil@macprofessionals.com> writes:
> I have been attempting to migrate my application from 8.1 to 8.2.3.
> In doing so, I found some queries would always cause the postgres
> backend to die with a segfault. I was advised to rebuild with --
> enable-debug --enable-cassert, and so I did. The same query would now
> cause an assertion failure instead of segfaulting.

Hm, I see the assert failure, but this example doesn't seem to crash
when asserts are off, and I'd not expect it to: it should either work or
elog(ERROR) in ExecRestrPos.  So maybe you've found more than one issue.
Can you get a stack trace from a case that causes a non-assert core
dump?  (You don't need to rebuild, just set debug_assertions = 0 while
testing.)

            regards, tom lane
I wrote:
> Hm, I see the assert failure, but this example doesn't seem to crash
> when asserts are off, and I'd not expect it to: it should either work or
> elog(ERROR) in ExecRestrPos.  So maybe you've found more than one issue.

The attached patch fixes the failure you exhibited, but I still don't
see how this problem would lead to a non-assert crash --- what you
should get is elog(ERROR, "unrecognized node type") from ExecRestrPos.
So please try some more test cases and see if there's another problem
lurking.

BTW, the problem was due to code added in response to an earlier gripe of
yours (allowing gating conditions to be pushed further down in the join
tree).  Maybe you're the only person stressing that particular aspect
of the planner ...

            regards, tom lane


Attachment
Agh...I was afraid of that.

What I've found so far is at <http://pgfoundry.org/pipermail/veil-
general/2007-February/000056.html>, and the rest of thread in
general. Obviously some of these problems are veil's, but for the
test I sent previously I had deleted veil.so so it couldn't be
blamed. I'll explore the problem more today and see if I can get a
backtrace with a debug version and debug_assertions off.

On Feb 14, 2007, at 5:49 PM, Tom Lane wrote:

> Phil Frost <phil@macprofessionals.com> writes:
>> I have been attempting to migrate my application from 8.1 to 8.2.3.
>> In doing so, I found some queries would always cause the postgres
>> backend to die with a segfault. I was advised to rebuild with --
>> enable-debug --enable-cassert, and so I did. The same query would now
>> cause an assertion failure instead of segfaulting.
>
> Hm, I see the assert failure, but this example doesn't seem to crash
> when asserts are off, and I'd not expect it to: it should either
> work or
> elog(ERROR) in ExecRestrPos.  So maybe you've found more than one
> issue.
> Can you get a stack trace from a case that causes a non-assert core
> dump?  (You don't need to rebuild, just set debug_assertions = 0 while
> testing.)
>
>             regards, tom lane
>
Yeah, maybe I am the only one doing this to the planner. The
optimizations which caused this error will be greatly appreciated
when I stop getting segfaults, however :)

I rebuilt my pg with the attached patch and it did indeed resolve the
assertion failure the previous test produced. I tinkered some more
and got this patched build to segfault with a slightly different
script, below.

Part of the funkiness seems to be in that the view is recursive;
orderitem_with_prices calls sum_of_subitem_total_taxable_prices,
which queries orderitem_with_prices. I would not be suprised if no
one else is doing this, since I think the only way to get it done
under normal conditions is to create an empty function first (since
the view doesn't exist), make the view, then replace the function; or
to set check_function_bodies = false. As weird as it is though, it
worked in 8.1 and a segfault is bad in any case.

-------------------------------------

SET check_function_bodies = false;
SET client_min_messages = warning;


CREATE SCHEMA private;


CREATE TABLE private.orderitem (
     objectid integer,
     showsubitems boolean,
     longdescription character varying,
     showsubitemprices boolean,
     quantity numeric,
     sortorder real,
     superitem integer,
     notes character varying,
     name character varying,
     "order" integer
);


CREATE FUNCTION public.i_have_global_priv() RETURNS boolean
     AS $$
     select true
$$
     LANGUAGE sql STABLE SECURITY DEFINER;


CREATE VIEW public.orderitem AS
     SELECT orderitem.objectid,
     orderitem.superitem
     FROM private.orderitem WHERE i_have_global_priv();


CREATE FUNCTION public.sum_of_subitem_total_taxable_prices(integer)
RETURNS numeric
     AS $_$
     select 1.0 from orderitem_with_prices where superitem = $1  --
changing this to just "select 1.0" avoids the problem
$_$
     LANGUAGE sql STABLE STRICT;



CREATE TABLE private.orderitemproduct (
     objectid integer
);


CREATE VIEW public.orderitemproduct AS
     SELECT orderitemproduct.objectid
     FROM private.orderitemproduct WHERE i_have_global_priv();


CREATE VIEW public.orderitem_with_prices AS
     SELECT orderitem.objectid,
     orderitem.superitem,
     sum_of_subitem_total_taxable_prices(orderitem.objectid) AS
taxablepriceeach
     FROM (orderitem LEFT JOIN orderitemproduct USING (objectid));


insert into private.orderitem(objectid) select generate_series(1, 6000);
insert into private.orderitem(objectid, superitem) select
generate_series(6001, 12000), generate_series(1, 6000);

insert into private.orderitemproduct(objectid) select generate_series
(1, 6000);


ALTER TABLE ONLY private.orderitem
     ADD CONSTRAINT orderitem_pkey PRIMARY KEY (objectid);

ALTER TABLE ONLY private.orderitemproduct
     ADD CONSTRAINT orderitemproduct_pkey PRIMARY KEY (objectid);

--------------------------------

To produce the segfault, I do this:

vacuum analyze;
set enable_seqscan = 0;
prepare crash as select * from orderitem_with_prices limit 1;
explain execute crash;
execute crash;

The error seems to depend on the plan chosen; here it won't fail if I
don't vacuum. The plan generated for me is:

                                                         QUERY PLAN
------------------------------------------------------------------------
---------------------------------------------------
Limit  (cost=0.01..0.06 rows=1 width=8)
    ->  Result  (cost=0.01..680.51 rows=12000 width=8)
          One-Time Filter: i_have_global_priv()
          ->  Merge Left Join  (cost=0.01..650.51 rows=12000 width=8)
                Merge Cond: (private.orderitem.objectid =
private.orderitemproduct.objectid)
                ->  Index Scan using orderitem_pkey on orderitem
(cost=0.00..361.25 rows=12000 width=8)
                ->  Result  (cost=0.00..184.25 rows=6000 width=4)
                      One-Time Filter: i_have_global_priv()
                      ->  Index Scan using orderitemproduct_pkey on
orderitemproduct  (cost=0.00..184.25 rows=6000 width=4)

I get this backtrace:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000c
ExecEvalVar (exprstate=0xed0aec, econtext=0xed0b08, isNull=0xbfffc5c8
"????", isDone=0x0) at execQual.c:496
496                             TupleDesc       slot_tupdesc = slot-
 >tts_tupleDescriptor;
(gdb) bt
#0  ExecEvalVar (exprstate=0xed0aec, econtext=0xed0b08,
isNull=0xbfffc5c8 "????", isDone=0x0) at execQual.c:496
#1  0x000e9304 in ExecEvalExprSwitchContext (expression=0xed0aec,
econtext=0xed0b08, isNull=0xbfffc5c8 "????", isDone=0x0) at
execQual.c:3302
#2  0x000f5750 in ExecIndexEvalRuntimeKeys (econtext=0xed0b08,
runtimeKeys=0xed0b08, numRuntimeKeys=1) at nodeIndexscan.c:249
#3  0x000f5810 in ExecIndexReScan (node=0xed0458, exprCtxt=0xed0b08)
at nodeIndexscan.c:202
#4  0x000e2040 in ExecReScan (node=0xed0458, exprCtxt=0x20692ec) at
execAmi.c:125
#5  0x000e1ff0 in ExecReScan (node=0x206a5a8, exprCtxt=0x20692ec) at
execAmi.c:105
#6  0x000f7f88 in ExecNestLoop (node=0x2069260) at nodeNestloop.c:162
#7  0x000e59e0 in ExecProcNode (node=0x2069260) at execProcnode.c:382
#8  0x000f8a5c in ExecResult (node=0x2068cb4) at nodeResult.c:130
#9  0x000e5950 in ExecProcNode (node=0x2068cb4) at execProcnode.c:334
#10 0x000e405c in ExecutorRun (queryDesc=0xeccef4,
direction=ForwardScanDirection, count=1) at execMain.c:1082
#11 0x000eee44 in postquel_getnext (es=0xece2b8) at functions.c:359
#12 0x000eff98 in fmgr_sql (fcinfo=0xbfffcdf8) at functions.c:460
#13 0x000e8fe8 in ExecMakeFunctionResult (fcache=0x2072794,
econtext=0x2072664, isNull=0xec4fca "", isDone=0xec5030) at
execQual.c:1269
#14 0x000e99b0 in ExecProject (projInfo=0x2072bec, isDone=0xbfffd108)
at execQual.c:4119
#15 0x000f8a90 in ExecResult (node=0x20725d8) at nodeResult.c:157
#16 0x000e5950 in ExecProcNode (node=0x20725d8) at execProcnode.c:334
#17 0x000fa268 in ExecLimit (node=0x20724b4) at nodeLimit.c:84
#18 0x000e5a80 in ExecProcNode (node=0x20724b4) at execProcnode.c:425
#19 0x000e405c in ExecutorRun (queryDesc=0x205f048,
direction=ForwardScanDirection, count=0) at execMain.c:1082
#20 0x0018567c in PortalRunSelect (portal=0x20466a8, forward=1
'\001', count=0, dest=0x204603c) at pquery.c:831
#21 0x00186ef8 in PortalRun (portal=0x20466a8, count=2147483647,
dest=0x204603c, altdest=0x204603c, completionTag=0x0) at pquery.c:656
#22 0x000bc7b4 in ExecuteQuery (stmt=0x2031cf0, params=0x0,
dest=0x204603c, completionTag=0x0) at prepare.c:216
#23 0x00185d60 in PortalRunUtility (portal=0x204661c,
query=0x2031d98, dest=0x204603c, completionTag=0x0) at pquery.c:1063
#24 0x00186900 in FillPortalStore (portal=0x204661c) at pquery.c:937
#25 0x00186fac in PortalRun (portal=0x204661c, count=2147483647,
dest=0x2031d4c, altdest=0x2031d4c, completionTag=0xbfffdc5a "") at
pquery.c:679
#26 0x001823c8 in exec_simple_query (query_string=0x2031a1c "execute
crash;") at postgres.c:939
#27 0x00183c54 in PostgresMain (argc=4, argv=0x201544c,
username=0x2015424 "pfrost") at postgres.c:3424
#28 0x0015483c in ServerLoop () at postmaster.c:2931
#29 0x00155cf4 in PostmasterMain (argc=3, argv=0x1900750) at
postmaster.c:963
#30 0x001094b8 in main (argc=3, argv=0x1900780) at main.c:188



On Feb 14, 2007, at 10:13 PM, Tom Lane wrote:

> I wrote:
>> Hm, I see the assert failure, but this example doesn't seem to crash
>> when asserts are off, and I'd not expect it to: it should either
>> work or
>> elog(ERROR) in ExecRestrPos.  So maybe you've found more than one
>> issue.
>
> The attached patch fixes the failure you exhibited, but I still don't
> see how this problem would lead to a non-assert crash --- what you
> should get is elog(ERROR, "unrecognized node type") from ExecRestrPos.
> So please try some more test cases and see if there's another problem
> lurking.
>
> BTW, the problem was due to code added in response to an earlier
> gripe of
> yours (allowing gating conditions to be pushed further down in the
> join
> tree).  Maybe you're the only person stressing that particular aspect
> of the planner ...
>
>             regards, tom lane
Phil Frost <phil@macprofessionals.com> writes:
> I rebuilt my pg with the attached patch and it did indeed resolve the
> assertion failure the previous test produced. I tinkered some more
> and got this patched build to segfault with a slightly different
> script, below.

Sigh ... I looked through nodeResult to see if there were any other bits
of pass-through functionality that were missing, but I did not recognize
the need for this hack ... probably should have, seeing that I had to do
the same thing to Append last year.

            regards, tom lane


Attachment
Great, this seems to solve the problem. Thanks for the prompt attention.


On Feb 15, 2007, at 10:52 PM, Tom Lane wrote:

> Phil Frost <phil@macprofessionals.com> writes:
>> I rebuilt my pg with the attached patch and it did indeed resolve the
>> assertion failure the previous test produced. I tinkered some more
>> and got this patched build to segfault with a slightly different
>> script, below.
>
> Sigh ... I looked through nodeResult to see if there were any other
> bits
> of pass-through functionality that were missing, but I did not
> recognize
> the need for this hack ... probably should have, seeing that I had
> to do
> the same thing to Append last year.