Thread: [BUG] views and functions on relations
In latest 7.1 (checked out 2 days ago from CVS), I see following behaviour: create table foo(x int4); create function xx(foo) returns int4 as ' return 0;' language 'plpgsql'; create view tv2 as select xx(foo) from foo; users=# \d tv2 ERROR: cache lookup of attribute 0 in relation 21747 failed (21747 is table oid for foo) HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with the same error (cache lookup failed). I think the above should be enough to reproduce this bug. Any hints? -alex
Alex Pilosov <alex@pilosoft.com> writes: > users=# \d tv2 > ERROR: cache lookup of attribute 0 in relation 21747 failed Confirmed here. Too tired to chase it further tonight, though. > HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with > the same error (cache lookup failed). Couldn't reproduce this failure --- can you work out a sequence that makes it happen? regards, tom lane
Alex Pilosov <alex@pilosoft.com> writes: > In latest 7.1 (checked out 2 days ago from CVS), I see following > behaviour: > create table foo(x int4); > create function xx(foo) returns int4 as ' return 0;' language 'plpgsql'; > create view tv2 as select xx(foo) from foo; > users=# \d tv2 > ERROR: cache lookup of attribute 0 in relation 21747 failed Okay, this is a simple oversight in ruleutils.c: the rule dumper doesn't have logic to handle whole-tuple function arguments, such as (foo) in the above example. Will fix. > HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with > the same error (cache lookup failed). The ruleutils.c bug cannot explain this however, since ruleutils won't even be invoked. Can you find a sequence to reproduce it? regards, tom lane
On Wed, 18 Apr 2001, Tom Lane wrote: > The ruleutils.c bug cannot explain this however, since ruleutils won't > even be invoked. Can you find a sequence to reproduce it? Sorry, I was mistaken. The error I get for select is this: ERROR: cache lookup for type 0 failed This is a far harder to trigger bug, and actually, it doesn't happen in this simple case (oops), and the only test case I have involves 2 tables and 3 stored procedures. It is not related to views at all, just doing the underlying select causes the problem. Taking out _any_ stored procedure from the query removes the problem. FWIW, this is what I see in server error log: ERROR: cache lookup for type 0 failed DEBUG: Last error occured while executing PL/pgSQL function cust_name DEBUG: while putting call arguments to local variables And this is the query: SELECT cust_name(a) FROM customers AS a, addresses AS b WHERE b.cust_id=a.cust_id and b.oid=get_billing_record(a.cust_id) and cust_balance(a.cust_id)>0 Removing either get_billing_record or cust_balance conditions or cust_name selection leaves the problem. Unfortunately, each function is very long, and involves lots of tables and it'd make no sense to post this all to the list, so I'm going to try to narrow down the problem more to get a good reproducible result, but if the above helps any in diagnostic, it'd be great ;)
On Wed, 18 Apr 2001, Alex Pilosov wrote: > This is a far harder to trigger bug, and actually, it doesn't happen in > this simple case (oops), and the only test case I have involves 2 tables > and 3 stored procedures. It is not related to views at all, just doing the > underlying select causes the problem. Taking out _any_ stored procedure > from the query removes the problem. Oh yes. One thing I forgot: It all worked in 7.0 and it only broke after upgrading to 7.1 -alex
Here's more info on the bug: background: function cust_name(customers) returns varchar; Query in question: SELECT cust_name(a) FROM customers AS a, addresses AS b WHERE b.cust_id=a.cust_id and b.oid=get_billing_record(a.cust_id) and cust_balance(a.cust_id)>0 First, my idea of what's happening: Tuple in question contains the row from 'customers' table. Something (when the query is evaluated, before cust_name function is called) sets the tupdesc->natts=0, however, everything else in that tupdesc is right (all the attrs are present and have correct values and atttypes), and tuple->t_data->t_natts is correct (12). When SPI_getbinval is called, it checks tuple->t_data->t_natts, and works OK, but, however, when SPI_gettypeid is called, it checks tupledesc->nattrs, and returns 0. Question: Should SPI_gettypeid look at tuple->t_data->t_natts (to do that, it needs to be passed tuple along with tupdesc)? Or some other code should be fixed to properly set tupledesc->nattrs? NOTE: when I removed the check in SPI_gettypeid, it _also_ fixed the '\d viewname' bug, so these two bugs are related (i.e. you cannot see \d because nattrs is set incorrectly). You may have more luck tracing the code which improperly sets nattrs than me... Hoping for proper fix, -alex traceback: #0 elog (lev=-1, fmt=0x45d4b340 "cache lookup for type %u failed") at elog.c:119 #1 0x45d4693e in exec_cast_value (value=1791, valtype=0, reqtype=23, reqinput=0x82bfdb0, reqtypelem=0, reqtypmod=-1, isnull=0xbfffeb6f"") at pl_exec.c:2682 #2 0x45d45f19 in exec_assign_value (estate=0xbfffec40, target=0x82cdd88, value=1791, valtype=0, isNull=0xbfffeb6f "")at pl_exec.c:2173 #3 0x45d4687a in exec_move_row (estate=0xbfffec40, rec=0x0, row=0x82bfcc8, tup=0x827a170, tupdesc=0x827a130) at pl_exec.c:2629 #4 0x45d43e64 in plpgsql_exec_function (func=0x82b3188, fcinfo=0x828e364) at pl_exec.c:331 #5 0x45d41f57 in plpgsql_call_handler (fcinfo=0x828e364) at pl_handler.c:128 #6 0x80b78ad in ExecMakeFunctionResult (fcache=0x828e350, arguments=0x826eb28, econtext=0x826fc98, isNull=0xbfffed37 "", isDone=0xbfffed68) at execQual.c:796 #7 0x80b794e in ExecEvalFunc (funcClause=0x826ead8, econtext=0x826fc98, isNull=0xbfffed37 "", isDone=0xbfffed68) at execQual.c:890 #8 0x80b7d1c in ExecEvalExpr (expression=0x826ead8, econtext=0x826fc98, isNull=0xbfffed37 "", isDone=0xbfffed68) at execQual.c:1215 #9 0x80b7fbb in ExecTargetList (targetlist=0x826e6a0, nodomains=19, targettype=0x8284620, values=0x8285100, econtext=0x826fc98, isDone=0xbfffef08) at execQual.c:1536 #10 0x80b8215 in ExecProject (projInfo=0x82850d8, isDone=0xbfffef08) at execQual.c:1764 #11 0x80bcd9a in ExecNestLoop (node=0x826e5c0) at nodeNestloop.c:245 #12 0x80b6b76 in ExecProcNode (node=0x826e5c0, parent=0x826e5c0) at execProcnode.c:297 #13 0x80b5eee in ExecutePlan (estate=0x826f770, plan=0x826e5c0, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, destfunc=0x8285de0) at execMain.c:973 #14 0x80b5463 in ExecutorRun (queryDesc=0x826f758, estate=0x826f770, feature=3, count=0) at execMain.c:233 #15 0x80f76b3 in ProcessQuery (parsetree=0x82433e8, plan=0x826e5c0, dest=Remote) at pquery.c:295 #16 0x80f62bb in pg_exec_query_string ( query_string=0x8243090 "select * from outstanding_balances;", dest=Remote, parse_context=0x8218730) at postgres.c:810 #17 0x80f71e6 in PostgresMain (argc=4, argv=0xbffff1e0, real_argc=8, real_argv=0xbffffaf4, username=0x81cbf69 "sw") atpostgres.c:1908 #18 0x80e14c3 in DoBackend (port=0x81cbd00) at postmaster.c:2111 #19 0x80e10ac in BackendStartup (port=0x81cbd00) at postmaster.c:1894 #20 0x80e0436 in ServerLoop () at postmaster.c:992 #21 0x80dfe63 in PostmasterMain (argc=8, argv=0xbffffaf4) at postmaster.c:682 #22 0x80c4055 in main (argc=8, argv=0xbffffaf4) at main.c:151
Alex Pilosov <alex@pilosoft.com> writes: > Something (when the query is evaluated, before cust_name function is > called) sets the tupdesc->natts=0, Ugh. You verified the natts is wrong in the tupdesc? > Question: Should SPI_gettypeid look at tuple->t_data->t_natts (to do that, > it needs to be passed tuple along with tupdesc)? > Or some other code should be fixed to properly set tupledesc->nattrs? The tupdesc natts *must* match the actual tuple, else all sorts of things will go wrong. I don't think SPI_gettypeid is broken. > NOTE: when I removed the check in SPI_gettypeid, it _also_ fixed the '\d > viewname' bug, so these two bugs are related (i.e. you cannot see \d > because nattrs is set incorrectly). That seems moderately unlikely, since \d doesn't depend on SPI... > You may have more luck tracing the > code which improperly sets nattrs than me... Hard to do without a working (failing ;-)) example to look at. Have you had any luck reducing your example? Alternatively, would you be willing to give me telnet or ssh access to your machine, and I'll look at the problem in situ? regards, tom lane
Alex Pilosov <alex@pilosoft.com> writes: > Something (when the query is evaluated, before cust_name function is > called) sets the tupdesc->natts=0, FWIW, I have just looked through all the code that sets natts fields, and I don't believe that any of it can set a tupdesc's natts field to zero. Therefore the zeroing must be an accidental stomp of some kind. Since natts is the first field in a tupdesc, it seems plausible that this might happen if some bit of code misinterprets a tupdesc pointer as something else. However, that makes the odds of finding the problem by staring at code even lower. I really need to get after this with a debugger... BTW, are you building with --enable-cassert? If not I strongly recommend it for chasing this sort of problem. regards, tom lane
Alex Pilosov <alex@pilosoft.com> writes: > Here's more info on the bug: > background: function cust_name(customers) returns varchar; > Query in question: > SELECT > cust_name(a) > FROM customers AS a, addresses AS b > WHERE > b.cust_id=a.cust_id > and b.oid=get_billing_record(a.cust_id) > and cust_balance(a.cust_id)>0 I think I see the problem. Is your query being executed via a mergejoin plan with an explicit sort on customers? Does the failure go away if you force a nestloop join? regards, tom lane