Thread: Function to blame?

Function to blame?

From
Michael Guerin
Date:
Hi  All,

  I'm trying to resolve an issue with the database crashing when under a
heavy load.  The error is hard to reproduce and happens once or twice a
month.  It's also been happening since Postgresql v7 and just happened
again on v8.   The database always crashes in the same way and always
involves the function below.   The log starts to fill with these messages:

ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
ERROR:  relation 1823358704 deleted while still in use
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
memberid bigint, membertype varchar(50), ownerid smallint, tag
varchar(50), level int4 )"
PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
memberid bigint, membertype varchar(50), ownerid smallint, tag
varchar(50), level int4 )"
PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"
CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
memberid bigint, membertype varchar(50), ownerid smallint, tag
varchar(50), level int4 )"
PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
ERROR:  duplicate key violates unique constraint
"pg_class_relname_nsp_index"

Followed by db restart.. Then these messages:

ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958
ERROR:  cache lookup failed for relation 1823360958


or these errors

LOG:  unexpected EOF on client connection
ERROR:  duplicate key violates unique constraint
"pg_type_typname_nsp_index"
ERROR:  duplicate key violates unique constraint
"pg_type_typname_nsp_index"
CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
memberid bigint, membertype varchar(50), ownerid smallint, tag varc
har(50), level int4 )"
PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
ERROR:  duplicate key violates unique constraint
"pg_type_typname_nsp_index"
CONTEXT:  SQL statement "create temp table tmp_children ( uniqid bigint,
memberid bigint, membertype varchar(50), ownerid smallint, tag varc
har(50), level int4 )"
PL/pgSQL function "fngetcompositeids2" line 14 at SQL statement
.....
ERROR:  cache lookup failed for type 813612037
FATAL:  cache lookup failed for type 813612037





These errors don't seem to be recoverable.  When this happens I
re-indexed all user & system tables, drop and recreate user types &
functions, and vacuum the database.  None of these steps get rid of the
error and none report any problems, only a dump and restore seems to
work.  Anything else I should try?

Other info:

We're using the Suse 2.6 kernel and a Reiser file system, in case there are any known problems with this setup. I
believethe kernel has the latest patches applied. 


 I've included the schema for the table involved and the function code
to see if I'm doing something in the function that I ought to do
better.  The one table involved is self-referential and the function
loops around to return all the rows involved, typical result sets have
at most 4 levels .  If it's helpful  I can create  a small dataset to
see the function work.

--------------------
CREATE TYPE object_composite_row_3 AS
   (uniqid int8,
    memberid int8,
    membertype varchar(50),
    ownerid int2,
    tag varchar(50),
    "level" int4);

CREATE TABLE tblobjectcomposite
(
  uniqid int8,
  "type" varchar(50),
  memberid int8,
  tag varchar(50),
  membertype varchar(50),
  ownerid int2
)

CREATE OR REPLACE FUNCTION fngetcompositeids2(int8)  RETURNS SETOF
object_composite_row_3 AS

declare
    r object_composite_row_3%rowtype;
    pid alias for $1;
    x   int;
    qry text;
    retval int;
    rec record;

 begin
    x := 1;

   create temp table tmp_children (
       uniqid bigint,
       memberid bigint,
       membertype varchar(50),
       ownerid smallint,
       tag varchar(50),
       level int4 );

   EXECUTE 'insert into tmp_children
             select uniqId,memberId,memberType,ownerid,tag, ' || x || '
as level
             from tblObjectComposite
             where uniqid = ' || pid || '::bigint;';

   qry:= 'select count(*) as numrows from (select * from tmp_children
where level = ' || x || ' limit 1) as x;';
   FOR rec IN EXECUTE qry LOOP
     retVal:= rec.numrows;
   END LOOP;

   while (retVal > 0) loop
       x := x + 1;
       EXECUTE 'insert into tmp_children
                 select oc.uniqid,oc.memberid,
oc.membertype,oc.ownerid,oc.tag, ' || x  ||
               ' from tblobjectcomposite oc, tmp_children tmp
                  where  oc.uniqid = tmp.memberid and level = (' ||
(x-1) || ');';

       qry:= 'select count(*) as numrows from (select * from
tmp_children where level = ' || x || ' limit 1) as x;';
       FOR rec IN EXECUTE qry LOOP
         retVal:= rec.numrows;
       END LOOP;
   end loop;

   FOR r IN EXECUTE '
     select uniqId,memberId,memberType,ownerid,tag,level from
tmp_children order by level' LOOP
     RETURN NEXT r;
   END LOOP;

   drop table tmp_children;

   return;
 END;

  LANGUAGE 'plpgsql' VOLATILE;


thanks
michael



Re: Function to blame?

From
Tom Lane
Date:
Michael Guerin <guerin@rentec.com> writes:
>   I'm trying to resolve an issue with the database crashing when under a
> heavy load.  The error is hard to reproduce and happens once or twice a
> month.  It's also been happening since Postgresql v7 and just happened
> again on v8.   The database always crashes in the same way and always
> involves the function below.   The log starts to fill with these messages:

Hmm, I suppose these messages are coming from several different
backends?  It would be useful to adjust your logging configuration so
we can tell which messages are coming from which sessions.  (See
log_pid or log_line_prefix depending on PG version.)

My first guess is that something is getting confused about which
temporary-table schema belongs to which backend, but it's hard to see
how ... and some attempts at creating a race condition during backend
entry/exit didn't show a problem, so that theory could be all wet.

It'd be helpful if you could develop a self-contained test case (even
one that takes a lot of repetitions to hit the error).

>  I've included the schema for the table involved and the function code
> to see if I'm doing something in the function that I ought to do
> better.

You should be thinking in terms of how to make the error reproducible,
not how to avoid it ;-).  This is surely a backend bug of some sort.

            regards, tom lane

Re: Function to blame?

From
Michael Guerin
Date:
Hi Tom,

    It looks like I can reproduce this error, I just want to try a few
more things.   Where should I post the snippet and schema when I'm ready?

-Michael

>You should be thinking in terms of how to make the error reproducible,
>not how to avoid it ;-).  This is surely a backend bug of some sort.
>
>            regards, tom lane
>
>
>


Re: Function to blame?

From
Tom Lane
Date:
Michael Guerin <guerin@rentec.com> writes:
>     It looks like I can reproduce this error, I just want to try a few
> more things.   Where should I post the snippet and schema when I'm ready?

pgsql-bugs is good.

            regards, tom lane