Thread: tripping an assert in 8.1.6

tripping an assert in 8.1.6

From
Brian Hurt
Date:
Hello all.  It seems I'm tripping an assert in 8.1.6- the assert on line 
219 of  src/backend/executor/execScan.c (found by running gdb on a core 
dump).  This is on x86 and Redhat Linux (forget which version).  Note 
that if I recompile 8.1.6 with asserts turned off the query completes 
just fine.  I'm trying to put together an example which reproduces the 
problem without requiring half our company's data- that should follow soon.

The gdb backtrace is:

> #0  0xffffe410 in __kernel_vsyscall ()
> (gdb) bt
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0xb7d2dee9 in raise () from /lib/libc.so.6
> #2  0xb7d2f4f1 in abort () from /lib/libc.so.6
> #3  0x0824f931 in ExceptionalCondition (conditionName=Variable 
> "conditionName" is not available.
> ) at assert.c:51
> #4  0x081537ac in ExecAssignScanProjectionInfo (node=0x8426bec)
>     at execScan.c:219
> #5  0x08161339 in ExecInitSubqueryScan (node=0x8412de4, estate=0x8426ad4)
>     at nodeSubqueryscan.c:212
> #6  0x0814e0e4 in ExecInitNode (node=0x8412de4, estate=0x8426ad4)
>     at execProcnode.c:179
> #7  0x0814c554 in ExecutorStart (queryDesc=0x842554c, explainOnly=1 
> '\001')
>     at execMain.c:618
> #8  0x081193f5 in ExplainOnePlan (queryDesc=0x842554c, stmt=0x839afe4,
>     tstate=0x83cbdac) at explain.c:243
> #9  0x081198ac in ExplainOneQuery (query=0x83b88e4, stmt=0x839afe4,
>     tstate=0x83cbdac) at explain.c:214
> #10 0x08119a92 in ExplainQuery (stmt=0x839afe4, dest=0x83b8a54)
>     at explain.c:121
> #11 0x081da391 in PortalRunUtility (portal=0x83b67b4, query=0x839b07c,
>     dest=0x83b8a54, completionTag=0x0) at pquery.c:987
> #12 0x081db6dc in PortalRun (portal=0x83b67b4, count=2147483647,
>     dest=0x839b030, altdest=0x839b030, completionTag=0xbf9efee8 "")
>     at pquery.c:637
> #13 0x081d713c in exec_simple_query (
>     query_string=0x839a26c "explain SELECT action, bloomberg_code, 
> composite_bloomberg_code, reuters_code, cusip_code, sedol_code, 
> isin_code FROM vw_ca_generic_actions WHERE (action_date >= 
> '20070122'::date) AND (action_date <= "...)
>     at postgres.c:1004
> #14 0x081d8bd3 in PostgresMain (argc=4, argv=0x83593f0,
>     username=0x83593b8 "postgres") at postgres.c:3232
> #15 0x081aca37 in ServerLoop () at postmaster.c:2865
> #16 0x081ad936 in PostmasterMain (argc=3, argv=0x8358560) at 
> postmaster.c:941
> #17 0x0816c1c9 in main (argc=3, argv=Cannot access memory at address 
> 0x1515
> ) at main.c:265
>

This is mainly a "heads up- bug incomming" message.  Thanks.

Brian



Re: tripping an assert in 8.1.6 (more info)

From
Brian Hurt
Date:
More info on that assert I've hit.

Compile 8.1.6 with configuration options:

./configure --with-perl --enable-debug --enable-cassert

(not sure if --perl is relevent or not, I think not).

This is on Fedora Core 5 on x86-32.

Execute, on a fresh database, the following sql, to recreate the bug:

CREATE TABLE foo
(    some_data VARCHAR(32)
,    row_date DATE
);

CREATE TABLE bar
(    some_data VARCHAR(32)
,    row_date DATE
);

CREATE TABLE quux
(    some_data VARCHAR(32)
,    more_data VARCHAR(32)
);


CREATE OR REPLACE VIEW bazz AS
SELECT       ('bar: ' || bar.row_date) :: TEXT AS action,       quux.more_data AS more_data,       bar.row_date AS
row_date
FROM       bar       JOIN quux       ON bar.some_data = quux.some_data
UNION ALL
SELECT       ('foo: ' || foo.row_date) :: TEXT AS action,       quux.more_data AS more_data,       foo.row_date AS
row_date
FROM       foo       JOIN quux       ON foo.some_data = quux.some_data
;

SELECT action, more_data FROM bazz;




Re: tripping an assert in 8.1.6 (more info)

From
Brian Hurt
Date:
I forgot to mention: core dumps available upon request (obviously I 
don't want to post them to the list).

Brian



Re: tripping an assert in 8.1.6 (more info)

From
Tom Lane
Date:
Brian Hurt <bhurt@janestcapital.com> writes:
> Execute, on a fresh database, the following sql, to recreate the bug:

Hmm, I thought that stack trace looked a bit familiar --- we seem to
have fixed the problem as of 8.2.  Unfortunately I can't recall what
the relevant change was exactly; time for some digging in the CVS logs.
        regards, tom lane


Re: tripping an assert in 8.1.6 (more info)

From
Brian Hurt
Date:
Tom Lane wrote:<br /><blockquote cite="mid4176.1169583282@sss.pgh.pa.us" type="cite"><pre wrap="">Brian Hurt <a
class="moz-txt-link-rfc2396E"href="mailto:bhurt@janestcapital.com"><bhurt@janestcapital.com></a> writes:
</pre><blockquotetype="cite"><pre wrap="">Execute, on a fresh database, the following sql, to recreate the bug:
</pre></blockquote><prewrap="">
 
Hmm, I thought that stack trace looked a bit familiar --- we seem to
have fixed the problem as of 8.2.  Unfortunately I can't recall what
the relevant change was exactly; time for some digging in the CVS logs.
</pre></blockquote><br /> Any hope of getting the fix back-ported into the 8.1 tree?<br /><br /> Brian<br /><br />

Re: tripping an assert in 8.1.6 (more info)

From
Tom Lane
Date:
Brian Hurt <bhurt@janestcapital.com> writes:
> Tom Lane wrote:
>> Hmm, I thought that stack trace looked a bit familiar --- we seem to
>> have fixed the problem as of 8.2.  Unfortunately I can't recall what
>> the relevant change was exactly; time for some digging in the CVS logs.

> Any hope of getting the fix back-ported into the 8.1 tree?

The bug I was remembering is this one:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00313.php
http://archives.postgresql.org/pgsql-committers/2006-12/msg00055.php
which happens to fail at the same place but the cause is quite different
--- the fix was in code that didn't even exist in 8.1.

What seems to be happening here is that since 8.1 did not try to track
typmod in UNION trees, the view's column is declared as having type
varchar with typmod -1, and so the upper-level Var for the column has
that too.  But because 8.1 strips out "unnecessary" SubqueryScan nodes,
that Var ends up just one level above the outputs of the MergeJoins,
which show the correct typmod from the underlying tables.  So that
Assert gets upset.  (You don't see the bug in 8.0 because it didn't
have that optimization.)

I'm tempted to suggest that we just remove the Assert on vartypmod in
the 8.1 branch.  The Assert on vartype is doing as much as is really
important to check, and I don't want to disable the trivial_subqueryscan
optimization, which seems the only other low-risk fix.
        regards, tom lane


Re: tripping an assert in 8.1.6 (more info)

From
Tom Lane
Date:
I wrote:
> I'm tempted to suggest that we just remove the Assert on vartypmod in
> the 8.1 branch.  The Assert on vartype is doing as much as is really
> important to check, and I don't want to disable the trivial_subqueryscan
> optimization, which seems the only other low-risk fix.

After further thought I've developed a modified version of Brian's case
that crashes 8.2 and HEAD but not 8.1 --- it turns the situation around
by having the view fall back to typmod -1.  So what I'm now thinking
is that the real problem is that an Append path generates its Vars by
copying the first input, and in these sorts of situations that might not
produce the correct typmod.  Back to the drawing board ...
        regards, tom lane


CREATE TABLE foo
(x_data varchar(32),row_date date
);

CREATE TABLE bar
(x_data varchar(36),row_date date
);

CREATE OR REPLACE VIEW bazz AS
SELECT       ('bar: ' || bar.row_date) :: TEXT AS action,       bar.x_data AS more_data,row_date
FROM       bar
UNION ALL
SELECT       ('foo: ' || foo.row_date) :: TEXT AS action,       foo.x_data AS more_data,row_date
FROM       foo
;

SELECT action, more_data FROM bazz;


Re: tripping an assert in 8.1.6 (more info)

From
Tom Lane
Date:
I wrote:
> After further thought I've developed a modified version of Brian's case
> that crashes 8.2 and HEAD but not 8.1 --- it turns the situation around
> by having the view fall back to typmod -1.  So what I'm now thinking
> is that the real problem is that an Append path generates its Vars by
> copying the first input, and in these sorts of situations that might not
> produce the correct typmod.  Back to the drawing board ...

Nope, that's not right either.  What it seems to boil down to is what I
said the first time: removal of trivial subqueryscans can leave us with
a plan that uses an above-a-union Var to describe a value from one of
the union inputs.  Some other 8.2 changes moved the subqueryscans
around, which is why the cases that manifest are a bit different, but
in the end it's the same issue.

I've concluded that this isn't really wrong; it's certainly not worth
disabling the trivial-subqueryscan optimization for, and I think getting
the planner to avoid the problem without that would be tricky.
Accordingly, I've applied the attached patch that just relaxes the
Assert.
        regards, tom lane


Index: execScan.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execScan.c,v
retrieving revision 1.39
diff -c -r1.39 execScan.c
*** execScan.c    5 Jan 2007 22:19:27 -0000    1.39
--- execScan.c    24 Jan 2007 01:03:03 -0000
***************
*** 215,222 ****             return false;        /* out of order */         if (att_tup->attisdropped)
returnfalse;        /* table contains dropped columns */         Assert(var->vartype == att_tup->atttypid);
 
!         Assert(var->vartypmod == att_tup->atttypmod);          tlist_item = lnext(tlist_item);     }
--- 215,232 ----             return false;        /* out of order */         if (att_tup->attisdropped)
returnfalse;        /* table contains dropped columns */
 
+         /*
+          * Note: usually the Var's type should match the tupdesc exactly,
+          * but in situations involving unions of columns that have different
+          * typmods, the Var may have come from above the union and hence have
+          * typmod -1.  This is a legitimate situation since the Var still
+          * describes the column, just not as exactly as the tupdesc does.
+          * We could change the planner to prevent it, but it'd then insert
+          * projection steps just to convert from specific typmod to typmod -1,
+          * which is pretty silly.
+          */         Assert(var->vartype == att_tup->atttypid);
!         Assert(var->vartypmod == att_tup->atttypmod || var->vartypmod == -1);          tlist_item =
lnext(tlist_item);    }