Thread: "ERROR: cache lookup failed for function %d" while dropping function from another session

"ERROR: cache lookup failed for function %d" while dropping function from another session

From
Mahendra Singh Thalor
Date:
Hi,
I am able to hit "ERROR:  cache lookup failed for function %d" while I am dropping function from another session.

Reproducible steps are as(I tried on current head d8a75b1308b133502ae3):

Step 1) Fire below query from session-1 to create a function:

create function fun1(int) returns record as ' DECLARE retval RECORD; BEGIN SELECT INTO retval 5, 10, 15; RETURN retval;
END;' language plpgsql;

Step 2) Attach PID of session-1 into gdb and set a breakpoint into internal_get_result_type and then press c.

(gdb) b internal_get_result_type Breakpoint 1 at 0x5565dd4f3ce4: file funcapi.c, line 325. (gdb) c
Continuing.

Step 3) Fire below select command from session-1:

SELECT * FROM fun1(1500) AS (a int, b int, c int);

Step 3) Open session-2 and drop the function:

psql (15devel) Type "help" for help. postgres=# drop function fun1; DROP FUNCTION
postgres=#

Step 4) Press 'c' in gdb for 2 times and we will get an error in the 1st session.

(gdb) c Continuing. Breakpoint 1, internal_get_result_type (funcid=16384, call_expr=0x5565de885af0, rsinfo=0x0, resultTypeId=0x7ffc94572608, resultTupleDesc=0x7ffc94572628) at funcapi.c:325 (gdb) c
Continuing.

postgres=# SELECT * FROM fun1(1500) AS (a int, b int, c int);
ERROR: cache lookup failed for function 16384

Call stack of error:
Breakpoint 2, errfinish (filename=0x55e0b8907c6f "funcapi.c", lineno=336, funcname=0x55e0b89083a0 <__func__.23015> "internal_get_result_type") at elog.c:515
515 ErrorData  *edata = &errordata[errordata_stack_depth];
(gdb) bt
#0  errfinish (filename=0x55e0b8907c6f "funcapi.c", lineno=336, funcname=0x55e0b89083a0 <__func__.23015> "internal_get_result_type") at elog.c:515
#1  0x000055e0b86abdaf in internal_get_result_type (funcid=16384, call_expr=0x55e0ba9e2ad0, rsinfo=0x0, resultTypeId=0x7ffcb7ca4918, resultTupleDesc=0x7ffcb7ca4938)
    at funcapi.c:336
#2  0x000055e0b86ab6d0 in get_expr_result_type (expr=0x55e0ba9e2ad0, resultTypeId=0x7ffcb7ca4918, resultTupleDesc=0x7ffcb7ca4938) at funcapi.c:230
#3  0x000055e0b7fef027 in ExecInitFunctionScan (node=0x55e0ba9e3cd0, estate=0x55e0ba9d2fe0, eflags=16) at nodeFunctionscan.c:370
#4  0x000055e0b7fbb408 in ExecInitNode (node=0x55e0ba9e3cd0, estate=0x55e0ba9d2fe0, eflags=16) at execProcnode.c:254
#5  0x000055e0b7faa1d5 in InitPlan (queryDesc=0x55e0ba926f50, eflags=16) at execMain.c:936
#6  0x000055e0b7fa84b6 in standard_ExecutorStart (queryDesc=0x55e0ba926f50, eflags=16) at execMain.c:263
#7  0x000055e0b7fa7fc7 in ExecutorStart (queryDesc=0x55e0ba926f50, eflags=0) at execMain.c:143
#8  0x000055e0b83ba8de in PortalStart (portal=0x55e0ba968870, params=0x0, eflags=0, snapshot=0x0) at pquery.c:512
#9  0x000055e0b83ae112 in exec_simple_query (query_string=0x55e0ba906760 "SELECT * FROM fun1(1500) AS (a int, b int, c int);") at postgres.c:1175
#10 0x000055e0b83b7a86 in PostgresMain (argc=1, argv=0x7ffcb7ca4f70, dbname=0x55e0ba932770 "postgres", username=0x55e0ba932748 "mahendrathalor") at postgres.c:4488
#11 0x000055e0b825161b in BackendRun (port=0x55e0ba929730) at postmaster.c:4519
#12 0x000055e0b8250971 in BackendStartup (port=0x55e0ba929730) at postmaster.c:4241
#13 0x000055e0b8248a1b in ServerLoop () at postmaster.c:1758
#14 0x000055e0b8247c18 in PostmasterMain (argc=3, argv=0x55e0ba8fee70) at postmaster.c:1430
#15 0x000055e0b808dbe9 in main (argc=3, argv=0x55e0ba8fee70) at main.c:199

 cache lookup failed errors are never an expected behavior(Thanks Robert Hass for your opinion) so I think we should fix this error.

I haven't debugged it so I will debug it and will post my findings in the coming days.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
On Mon, 9 Aug 2021 at 11:07, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Hi,
> I am able to hit "ERROR:  cache lookup failed for function %d" while I am dropping function from another session.
>
> Reproducible steps are as(I tried on current head d8a75b1308b133502ae3):
>
> Step 1) Fire below query from session-1 to create a function:
>
> create function fun1(int) returns record as ' DECLARE retval RECORD; BEGIN SELECT INTO retval 5, 10, 15; RETURN retval;
> END;' language plpgsql;
>
> Step 2) Attach PID of session-1 into gdb and set a breakpoint into internal_get_result_type and then press c.
>
> (gdb) b internal_get_result_type Breakpoint 1 at 0x5565dd4f3ce4: file funcapi.c, line 325. (gdb) c
> Continuing.
>
> Step 3) Fire below select command from session-1:
>
> SELECT * FROM fun1(1500) AS (a int, b int, c int);
>
> Step 3) Open session-2 and drop the function:
>
> psql (15devel) Type "help" for help. postgres=# drop function fun1; DROP FUNCTION
> postgres=#
>
> Step 4) Press 'c' in gdb for 2 times and we will get an error in the 1st session.
>
> (gdb) c Continuing. Breakpoint 1, internal_get_result_type (funcid=16384, call_expr=0x5565de885af0, rsinfo=0x0, resultTypeId=0x7ffc94572608, resultTupleDesc=0x7ffc94572628) at funcapi.c:325 (gdb) c
> Continuing.
>
> postgres=# SELECT * FROM fun1(1500) AS (a int, b int, c int);
> ERROR: cache lookup failed for function 16384
>
> Call stack of error:
>>
>> Breakpoint 2, errfinish (filename=0x55e0b8907c6f "funcapi.c", lineno=336, funcname=0x55e0b89083a0 <__func__.23015> "internal_get_result_type") at elog.c:515
>> 515 ErrorData  *edata = &errordata[errordata_stack_depth];
>> (gdb) bt
>> #0  errfinish (filename=0x55e0b8907c6f "funcapi.c", lineno=336, funcname=0x55e0b89083a0 <__func__.23015> "internal_get_result_type") at elog.c:515
>> #1  0x000055e0b86abdaf in internal_get_result_type (funcid=16384, call_expr=0x55e0ba9e2ad0, rsinfo=0x0, resultTypeId=0x7ffcb7ca4918, resultTupleDesc=0x7ffcb7ca4938)
>>     at funcapi.c:336
>> #2  0x000055e0b86ab6d0 in get_expr_result_type (expr=0x55e0ba9e2ad0, resultTypeId=0x7ffcb7ca4918, resultTupleDesc=0x7ffcb7ca4938) at funcapi.c:230
>> #3  0x000055e0b7fef027 in ExecInitFunctionScan (node=0x55e0ba9e3cd0, estate=0x55e0ba9d2fe0, eflags=16) at nodeFunctionscan.c:370
>> #4  0x000055e0b7fbb408 in ExecInitNode (node=0x55e0ba9e3cd0, estate=0x55e0ba9d2fe0, eflags=16) at execProcnode.c:254
>> #5  0x000055e0b7faa1d5 in InitPlan (queryDesc=0x55e0ba926f50, eflags=16) at execMain.c:936
>> #6  0x000055e0b7fa84b6 in standard_ExecutorStart (queryDesc=0x55e0ba926f50, eflags=16) at execMain.c:263
>> #7  0x000055e0b7fa7fc7 in ExecutorStart (queryDesc=0x55e0ba926f50, eflags=0) at execMain.c:143
>> #8  0x000055e0b83ba8de in PortalStart (portal=0x55e0ba968870, params=0x0, eflags=0, snapshot=0x0) at pquery.c:512
>> #9  0x000055e0b83ae112 in exec_simple_query (query_string=0x55e0ba906760 "SELECT * FROM fun1(1500) AS (a int, b int, c int);") at postgres.c:1175
>> #10 0x000055e0b83b7a86 in PostgresMain (argc=1, argv=0x7ffcb7ca4f70, dbname=0x55e0ba932770 "postgres", username=0x55e0ba932748 "mahendrathalor") at postgres.c:4488
>> #11 0x000055e0b825161b in BackendRun (port=0x55e0ba929730) at postmaster.c:4519
>> #12 0x000055e0b8250971 in BackendStartup (port=0x55e0ba929730) at postmaster.c:4241
>> #13 0x000055e0b8248a1b in ServerLoop () at postmaster.c:1758
>> #14 0x000055e0b8247c18 in PostmasterMain (argc=3, argv=0x55e0ba8fee70) at postmaster.c:1430
>> #15 0x000055e0b808dbe9 in main (argc=3, argv=0x55e0ba8fee70) at main.c:199
>
>
>  cache lookup failed errors are never an expected behavior(Thanks Robert Hass for your opinion) so I think we should fix this error.
>
> I haven't debugged it so I will debug it and will post my findings in the coming days.
>

I googled it and found that we already discussed this issue in another thread.

Link: https://www.postgresql.org/message-id/flat/NX26BD7O.SQ6NCQDL.Z4KK4FBY%40HXCHZXMY.EBBFSJQH.OUMOBSJN

I reproduced this issue locally (to avoid race conditions, I just had the
function drop itself ;-)) and traced the site of the failure to this
bit in plpgsql function exit:

/*
* Need to look up the expected result type. XXX would be
* better to cache the tupdesc instead of repeating
* get_call_result_type(), but the only easy place to save it
* is in the PLpgSQL_function struct, and that's too
* long-lived: composite types could change during the
* existence of a PLpgSQL_function.
*/
switch (get_call_result_type(fcinfo, &resultTypeId, &tupdesc))

The catalog access inside get_call_result_type is fairly essential,
because this function has OUT parameters, so its pg_proc row is the
ultimate source of truth about what it returns. We could imagine
caching the info earlier during function execution, but as the comment
says, that has its own failure modes ... and they're more important
ones in practice. So I'm afraid there's not much to be done to
improve this.

regards, tom lane

 
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com