Thread: [BUG] views and functions on relations

[BUG] views and functions on relations

From
Alex Pilosov
Date:
In latest 7.1 (checked out 2 days ago from CVS), I see following
behaviour:

create table foo(x int4);
create function xx(foo) returns int4 as ' return 0;' language 'plpgsql';
create view tv2 as select xx(foo) from foo;

users=# \d tv2
ERROR:  cache lookup of attribute 0 in relation 21747 failed

(21747 is table oid for foo)

HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with
the same error (cache lookup failed).

I think the above should be enough to reproduce this bug.  Any hints? 

-alex




Re: [BUG] views and functions on relations

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> users=# \d tv2
> ERROR:  cache lookup of attribute 0 in relation 21747 failed

Confirmed here.  Too tired to chase it further tonight, though.

> HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with
> the same error (cache lookup failed).

Couldn't reproduce this failure --- can you work out a sequence that
makes it happen?
        regards, tom lane


Re: [BUG] views and functions on relations

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> In latest 7.1 (checked out 2 days ago from CVS), I see following
> behaviour:

> create table foo(x int4);
> create function xx(foo) returns int4 as ' return 0;' language 'plpgsql';
> create view tv2 as select xx(foo) from foo;

> users=# \d tv2
> ERROR:  cache lookup of attribute 0 in relation 21747 failed

Okay, this is a simple oversight in ruleutils.c: the rule dumper doesn't
have logic to handle whole-tuple function arguments, such as (foo) in
the above example.  Will fix.

> HOWEVER, 'select * from tv2' succeeds (sometimes). Sometimes it fails with
> the same error (cache lookup failed).

The ruleutils.c bug cannot explain this however, since ruleutils won't
even be invoked.  Can you find a sequence to reproduce it?
        regards, tom lane


Re: [BUG] views and functions on relations

From
Alex Pilosov
Date:
On Wed, 18 Apr 2001, Tom Lane wrote:

> The ruleutils.c bug cannot explain this however, since ruleutils won't
> even be invoked.  Can you find a sequence to reproduce it?
Sorry, I was mistaken. The error I get for select is this:
ERROR:  cache lookup for type 0 failed

This is a far harder to trigger bug, and actually, it doesn't happen in
this simple case (oops), and the only test case I have involves 2 tables
and 3 stored procedures. It is not related to views at all, just doing the
underlying select causes the problem. Taking out _any_ stored procedure
from the query removes the problem. 

FWIW, this is what I see in server error log:

ERROR:  cache lookup for type 0 failed
DEBUG:  Last error occured while executing PL/pgSQL function cust_name
DEBUG:  while putting call arguments to local variables

And this is the query:
SELECT
cust_name(a)
FROM customers AS a, addresses AS b
WHERE
b.cust_id=a.cust_id
and b.oid=get_billing_record(a.cust_id)
and cust_balance(a.cust_id)>0

Removing either get_billing_record or cust_balance conditions or cust_name
selection leaves the problem. Unfortunately, each function is very long,
and involves lots of tables and it'd make no sense to post this all to the
list, so I'm going to try to narrow down the problem more to get a good
reproducible result, but if the above helps any in diagnostic, it'd be
great ;)




Re: [BUG] views and functions on relations

From
Alex Pilosov
Date:
On Wed, 18 Apr 2001, Alex Pilosov wrote:

> This is a far harder to trigger bug, and actually, it doesn't happen in
> this simple case (oops), and the only test case I have involves 2 tables
> and 3 stored procedures. It is not related to views at all, just doing the
> underlying select causes the problem. Taking out _any_ stored procedure
> from the query removes the problem. 
Oh yes. One thing I forgot: It all worked in 7.0 and it only broke after
upgrading to 7.1

-alex



Re: [BUG] views and functions on relations

From
Alex Pilosov
Date:
Here's more info on the bug:

background: function cust_name(customers) returns varchar;
Query in question:

SELECT
cust_name(a)
FROM customers AS a, addresses AS b
WHERE
b.cust_id=a.cust_id
and b.oid=get_billing_record(a.cust_id)
and cust_balance(a.cust_id)>0

First, my idea of what's happening:

Tuple in question contains the row from 'customers' table.

Something (when the query is evaluated, before cust_name function is
called) sets the tupdesc->natts=0, however, everything else in that
tupdesc is right (all the attrs are present and have correct values and
atttypes), and tuple->t_data->t_natts is correct (12).

When SPI_getbinval is called, it checks tuple->t_data->t_natts, and works
OK, but, however, when SPI_gettypeid is called, it checks
tupledesc->nattrs, and returns 0. 

Question: Should SPI_gettypeid look at tuple->t_data->t_natts (to do that,
it needs to be passed tuple along with tupdesc)? 
Or some other code should be fixed to properly set tupledesc->nattrs?

NOTE: when I removed the check in SPI_gettypeid, it _also_ fixed the '\d
viewname' bug, so these two bugs are related (i.e. you cannot see \d
because nattrs is set incorrectly). You may have more luck tracing the
code which improperly sets nattrs than me...

Hoping for proper fix, 


-alex

traceback:
#0  elog (lev=-1, fmt=0x45d4b340 "cache lookup for type %u failed")   at elog.c:119
#1  0x45d4693e in exec_cast_value (value=1791, valtype=0, reqtype=23,   reqinput=0x82bfdb0, reqtypelem=0, reqtypmod=-1,
isnull=0xbfffeb6f"")   at pl_exec.c:2682
 
#2  0x45d45f19 in exec_assign_value (estate=0xbfffec40, target=0x82cdd88,   value=1791, valtype=0, isNull=0xbfffeb6f
"")at pl_exec.c:2173
 
#3  0x45d4687a in exec_move_row (estate=0xbfffec40, rec=0x0,
row=0x82bfcc8,   tup=0x827a170, tupdesc=0x827a130) at pl_exec.c:2629
#4  0x45d43e64 in plpgsql_exec_function (func=0x82b3188, fcinfo=0x828e364)   at pl_exec.c:331
#5  0x45d41f57 in plpgsql_call_handler (fcinfo=0x828e364) at
pl_handler.c:128
#6  0x80b78ad in ExecMakeFunctionResult (fcache=0x828e350,   arguments=0x826eb28, econtext=0x826fc98, isNull=0xbfffed37
"",  isDone=0xbfffed68) at execQual.c:796
 
#7  0x80b794e in ExecEvalFunc (funcClause=0x826ead8, econtext=0x826fc98,   isNull=0xbfffed37 "", isDone=0xbfffed68) at
execQual.c:890
#8  0x80b7d1c in ExecEvalExpr (expression=0x826ead8, econtext=0x826fc98,   isNull=0xbfffed37 "", isDone=0xbfffed68) at
execQual.c:1215
#9  0x80b7fbb in ExecTargetList (targetlist=0x826e6a0, nodomains=19,   targettype=0x8284620, values=0x8285100,
econtext=0x826fc98,  isDone=0xbfffef08) at execQual.c:1536
 
#10 0x80b8215 in ExecProject (projInfo=0x82850d8, isDone=0xbfffef08)   at execQual.c:1764
#11 0x80bcd9a in ExecNestLoop (node=0x826e5c0) at nodeNestloop.c:245
#12 0x80b6b76 in ExecProcNode (node=0x826e5c0, parent=0x826e5c0)   at execProcnode.c:297
#13 0x80b5eee in ExecutePlan (estate=0x826f770, plan=0x826e5c0,   operation=CMD_SELECT, numberTuples=0,
direction=ForwardScanDirection,  destfunc=0x8285de0) at execMain.c:973
 
#14 0x80b5463 in ExecutorRun (queryDesc=0x826f758, estate=0x826f770,   feature=3, count=0) at execMain.c:233
#15 0x80f76b3 in ProcessQuery (parsetree=0x82433e8, plan=0x826e5c0,   dest=Remote) at pquery.c:295
#16 0x80f62bb in pg_exec_query_string (   query_string=0x8243090 "select * from outstanding_balances;",
dest=Remote,   parse_context=0x8218730) at postgres.c:810
#17 0x80f71e6 in PostgresMain (argc=4, argv=0xbffff1e0, real_argc=8,   real_argv=0xbffffaf4, username=0x81cbf69 "sw")
atpostgres.c:1908
 
#18 0x80e14c3 in DoBackend (port=0x81cbd00) at postmaster.c:2111
#19 0x80e10ac in BackendStartup (port=0x81cbd00) at postmaster.c:1894
#20 0x80e0436 in ServerLoop () at postmaster.c:992
#21 0x80dfe63 in PostmasterMain (argc=8, argv=0xbffffaf4) at
postmaster.c:682
#22 0x80c4055 in main (argc=8, argv=0xbffffaf4) at main.c:151







Re: [BUG] views and functions on relations

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> Something (when the query is evaluated, before cust_name function is
> called) sets the tupdesc->natts=0,

Ugh.  You verified the natts is wrong in the tupdesc?

> Question: Should SPI_gettypeid look at tuple->t_data->t_natts (to do that,
> it needs to be passed tuple along with tupdesc)? 
> Or some other code should be fixed to properly set tupledesc->nattrs?

The tupdesc natts *must* match the actual tuple, else all sorts of
things will go wrong.  I don't think SPI_gettypeid is broken.

> NOTE: when I removed the check in SPI_gettypeid, it _also_ fixed the '\d
> viewname' bug, so these two bugs are related (i.e. you cannot see \d
> because nattrs is set incorrectly).

That seems moderately unlikely, since \d doesn't depend on SPI...

> You may have more luck tracing the
> code which improperly sets nattrs than me...

Hard to do without a working (failing ;-)) example to look at.
Have you had any luck reducing your example?  Alternatively,
would you be willing to give me telnet or ssh access to your
machine, and I'll look at the problem in situ?
        regards, tom lane


Re: [BUG] views and functions on relations

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> Something (when the query is evaluated, before cust_name function is
> called) sets the tupdesc->natts=0,

FWIW, I have just looked through all the code that sets natts fields,
and I don't believe that any of it can set a tupdesc's natts field to
zero.  Therefore the zeroing must be an accidental stomp of some kind.
Since natts is the first field in a tupdesc, it seems plausible that
this might happen if some bit of code misinterprets a tupdesc pointer
as something else.  However, that makes the odds of finding the problem
by staring at code even lower.  I really need to get after this with
a debugger...

BTW, are you building with --enable-cassert?  If not I strongly recommend
it for chasing this sort of problem.
        regards, tom lane


Re: [BUG] views and functions on relations

From
Tom Lane
Date:
Alex Pilosov <alex@pilosoft.com> writes:
> Here's more info on the bug:
> background: function cust_name(customers) returns varchar;
> Query in question:

> SELECT
> cust_name(a)
> FROM customers AS a, addresses AS b
> WHERE
> b.cust_id=a.cust_id
> and b.oid=get_billing_record(a.cust_id)
> and cust_balance(a.cust_id)>0


I think I see the problem.  Is your query being executed via a mergejoin
plan with an explicit sort on customers?  Does the failure go away if
you force a nestloop join?
        regards, tom lane