Thread: tripping an assert in 8.1.6
Hello all. It seems I'm tripping an assert in 8.1.6- the assert on line 219 of src/backend/executor/execScan.c (found by running gdb on a core dump). This is on x86 and Redhat Linux (forget which version). Note that if I recompile 8.1.6 with asserts turned off the query completes just fine. I'm trying to put together an example which reproduces the problem without requiring half our company's data- that should follow soon. The gdb backtrace is: > #0 0xffffe410 in __kernel_vsyscall () > (gdb) bt > #0 0xffffe410 in __kernel_vsyscall () > #1 0xb7d2dee9 in raise () from /lib/libc.so.6 > #2 0xb7d2f4f1 in abort () from /lib/libc.so.6 > #3 0x0824f931 in ExceptionalCondition (conditionName=Variable > "conditionName" is not available. > ) at assert.c:51 > #4 0x081537ac in ExecAssignScanProjectionInfo (node=0x8426bec) > at execScan.c:219 > #5 0x08161339 in ExecInitSubqueryScan (node=0x8412de4, estate=0x8426ad4) > at nodeSubqueryscan.c:212 > #6 0x0814e0e4 in ExecInitNode (node=0x8412de4, estate=0x8426ad4) > at execProcnode.c:179 > #7 0x0814c554 in ExecutorStart (queryDesc=0x842554c, explainOnly=1 > '\001') > at execMain.c:618 > #8 0x081193f5 in ExplainOnePlan (queryDesc=0x842554c, stmt=0x839afe4, > tstate=0x83cbdac) at explain.c:243 > #9 0x081198ac in ExplainOneQuery (query=0x83b88e4, stmt=0x839afe4, > tstate=0x83cbdac) at explain.c:214 > #10 0x08119a92 in ExplainQuery (stmt=0x839afe4, dest=0x83b8a54) > at explain.c:121 > #11 0x081da391 in PortalRunUtility (portal=0x83b67b4, query=0x839b07c, > dest=0x83b8a54, completionTag=0x0) at pquery.c:987 > #12 0x081db6dc in PortalRun (portal=0x83b67b4, count=2147483647, > dest=0x839b030, altdest=0x839b030, completionTag=0xbf9efee8 "") > at pquery.c:637 > #13 0x081d713c in exec_simple_query ( > query_string=0x839a26c "explain SELECT action, bloomberg_code, > composite_bloomberg_code, reuters_code, cusip_code, sedol_code, > isin_code FROM vw_ca_generic_actions WHERE (action_date >= > '20070122'::date) AND (action_date <= "...) > at postgres.c:1004 > #14 0x081d8bd3 in PostgresMain (argc=4, argv=0x83593f0, > username=0x83593b8 "postgres") at postgres.c:3232 > #15 0x081aca37 in ServerLoop () at postmaster.c:2865 > #16 0x081ad936 in PostmasterMain (argc=3, argv=0x8358560) at > postmaster.c:941 > #17 0x0816c1c9 in main (argc=3, argv=Cannot access memory at address > 0x1515 > ) at main.c:265 > This is mainly a "heads up- bug incomming" message. Thanks. Brian
More info on that assert I've hit. Compile 8.1.6 with configuration options: ./configure --with-perl --enable-debug --enable-cassert (not sure if --perl is relevent or not, I think not). This is on Fedora Core 5 on x86-32. Execute, on a fresh database, the following sql, to recreate the bug: CREATE TABLE foo ( some_data VARCHAR(32) , row_date DATE ); CREATE TABLE bar ( some_data VARCHAR(32) , row_date DATE ); CREATE TABLE quux ( some_data VARCHAR(32) , more_data VARCHAR(32) ); CREATE OR REPLACE VIEW bazz AS SELECT ('bar: ' || bar.row_date) :: TEXT AS action, quux.more_data AS more_data, bar.row_date AS row_date FROM bar JOIN quux ON bar.some_data = quux.some_data UNION ALL SELECT ('foo: ' || foo.row_date) :: TEXT AS action, quux.more_data AS more_data, foo.row_date AS row_date FROM foo JOIN quux ON foo.some_data = quux.some_data ; SELECT action, more_data FROM bazz;
I forgot to mention: core dumps available upon request (obviously I don't want to post them to the list). Brian
Brian Hurt <bhurt@janestcapital.com> writes: > Execute, on a fresh database, the following sql, to recreate the bug: Hmm, I thought that stack trace looked a bit familiar --- we seem to have fixed the problem as of 8.2. Unfortunately I can't recall what the relevant change was exactly; time for some digging in the CVS logs. regards, tom lane
Tom Lane wrote:<br /><blockquote cite="mid4176.1169583282@sss.pgh.pa.us" type="cite"><pre wrap="">Brian Hurt <a class="moz-txt-link-rfc2396E"href="mailto:bhurt@janestcapital.com"><bhurt@janestcapital.com></a> writes: </pre><blockquotetype="cite"><pre wrap="">Execute, on a fresh database, the following sql, to recreate the bug: </pre></blockquote><prewrap=""> Hmm, I thought that stack trace looked a bit familiar --- we seem to have fixed the problem as of 8.2. Unfortunately I can't recall what the relevant change was exactly; time for some digging in the CVS logs. </pre></blockquote><br /> Any hope of getting the fix back-ported into the 8.1 tree?<br /><br /> Brian<br /><br />
Brian Hurt <bhurt@janestcapital.com> writes: > Tom Lane wrote: >> Hmm, I thought that stack trace looked a bit familiar --- we seem to >> have fixed the problem as of 8.2. Unfortunately I can't recall what >> the relevant change was exactly; time for some digging in the CVS logs. > Any hope of getting the fix back-ported into the 8.1 tree? The bug I was remembering is this one: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00313.php http://archives.postgresql.org/pgsql-committers/2006-12/msg00055.php which happens to fail at the same place but the cause is quite different --- the fix was in code that didn't even exist in 8.1. What seems to be happening here is that since 8.1 did not try to track typmod in UNION trees, the view's column is declared as having type varchar with typmod -1, and so the upper-level Var for the column has that too. But because 8.1 strips out "unnecessary" SubqueryScan nodes, that Var ends up just one level above the outputs of the MergeJoins, which show the correct typmod from the underlying tables. So that Assert gets upset. (You don't see the bug in 8.0 because it didn't have that optimization.) I'm tempted to suggest that we just remove the Assert on vartypmod in the 8.1 branch. The Assert on vartype is doing as much as is really important to check, and I don't want to disable the trivial_subqueryscan optimization, which seems the only other low-risk fix. regards, tom lane
I wrote: > I'm tempted to suggest that we just remove the Assert on vartypmod in > the 8.1 branch. The Assert on vartype is doing as much as is really > important to check, and I don't want to disable the trivial_subqueryscan > optimization, which seems the only other low-risk fix. After further thought I've developed a modified version of Brian's case that crashes 8.2 and HEAD but not 8.1 --- it turns the situation around by having the view fall back to typmod -1. So what I'm now thinking is that the real problem is that an Append path generates its Vars by copying the first input, and in these sorts of situations that might not produce the correct typmod. Back to the drawing board ... regards, tom lane CREATE TABLE foo (x_data varchar(32),row_date date ); CREATE TABLE bar (x_data varchar(36),row_date date ); CREATE OR REPLACE VIEW bazz AS SELECT ('bar: ' || bar.row_date) :: TEXT AS action, bar.x_data AS more_data,row_date FROM bar UNION ALL SELECT ('foo: ' || foo.row_date) :: TEXT AS action, foo.x_data AS more_data,row_date FROM foo ; SELECT action, more_data FROM bazz;
I wrote: > After further thought I've developed a modified version of Brian's case > that crashes 8.2 and HEAD but not 8.1 --- it turns the situation around > by having the view fall back to typmod -1. So what I'm now thinking > is that the real problem is that an Append path generates its Vars by > copying the first input, and in these sorts of situations that might not > produce the correct typmod. Back to the drawing board ... Nope, that's not right either. What it seems to boil down to is what I said the first time: removal of trivial subqueryscans can leave us with a plan that uses an above-a-union Var to describe a value from one of the union inputs. Some other 8.2 changes moved the subqueryscans around, which is why the cases that manifest are a bit different, but in the end it's the same issue. I've concluded that this isn't really wrong; it's certainly not worth disabling the trivial-subqueryscan optimization for, and I think getting the planner to avoid the problem without that would be tricky. Accordingly, I've applied the attached patch that just relaxes the Assert. regards, tom lane Index: execScan.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/executor/execScan.c,v retrieving revision 1.39 diff -c -r1.39 execScan.c *** execScan.c 5 Jan 2007 22:19:27 -0000 1.39 --- execScan.c 24 Jan 2007 01:03:03 -0000 *************** *** 215,222 **** return false; /* out of order */ if (att_tup->attisdropped) returnfalse; /* table contains dropped columns */ Assert(var->vartype == att_tup->atttypid); ! Assert(var->vartypmod == att_tup->atttypmod); tlist_item = lnext(tlist_item); } --- 215,232 ---- return false; /* out of order */ if (att_tup->attisdropped) returnfalse; /* table contains dropped columns */ + /* + * Note: usually the Var's type should match the tupdesc exactly, + * but in situations involving unions of columns that have different + * typmods, the Var may have come from above the union and hence have + * typmod -1. This is a legitimate situation since the Var still + * describes the column, just not as exactly as the tupdesc does. + * We could change the planner to prevent it, but it'd then insert + * projection steps just to convert from specific typmod to typmod -1, + * which is pretty silly. + */ Assert(var->vartype == att_tup->atttypid); ! Assert(var->vartypmod == att_tup->atttypmod || var->vartypmod == -1); tlist_item = lnext(tlist_item); }