Re: [GENERAL] cache lookup of relation 165058647 failed - Mailing list pgsql-bugs

From Sean Chittenden
Subject Re: [GENERAL] cache lookup of relation 165058647 failed
Date
Msg-id 91B18B5E-9D8A-11D8-8912-000A95C705DC@chittenden.org
Whole thread Raw
Responses Re: [GENERAL] cache lookup of relation 165058647 failed
Re: [GENERAL] cache lookup of relation 165058647 failed
List pgsql-bugs
> I'v find out that this error occurs in:
>  dependency.c file
>
> 2004-04-26 11:09:34 ERROR:  dependency.c 1621: cache lookup of relation
> 149064743 failed
> 2004-04-26 11:09:34 ERROR:  Relation "tmp_table1" does not exist
> 2004-04-26 11:09:34 ERROR:  Relation "tmp_table1" does not exist
>
> in getRelationDescription(StringInfo buffer, Oid relid) function.
>
> Any ideas what can cause this errors.

<aol>Me too.</aol>

But, I am suspecting that it's a race condition with the new background
writer code.  I've started testing a new database design and was able
to reproduce this on my laptop nearly 90% of the time, but could only
reproduce it about 10% of the time on my production databases until I
figured out what the difference was, fsync.

fsync was causing enough of a slow down that SearchSysCache() was
finding the tuple, whereas with fsync = false, it wasn't able to find
it.  But, in search of proving that it wasn't fsync (I use fsync =
false on my laptop to save my pour drive), I threw in a sleep in
between my tests, and I'm able to get things to work 100% of the time
by adding a sleep.  The following fails to work with fsync = false, 90%
of the time and with fsync = true, only 10% of the time.

% psql -f test-begin.sql template1 && psql -f test_enterprise_class.sql
&& psql -f test-end1.sql template1 && psql -f test-end2.sql template1

But, if I change the command to:

% psql -f test-begin.sql template1 && psql -f test_enterprise_class.sql
&& psql -f test-end1.sql template1 && sleep 1 && psql -f test-end2.sql
template1

I have no problems with cache relation misses.  As for what happens in
those commands, I'm:

-- 1) Dropping the test database and re-creating it
-- 2) In a different connection, load a rather large schema as the dba
-- 3) Connect again and create a temp table
-- 4) Connect a second time, and check to see if the temp table exists

The sleep comes at step 3.5 in the above sequence of operations.

*boom*  Here's a snippet of my terminal (the first thing I do after
BEGINning a transaction is create a temp table if it doesn't exist):

## BEGIN ##
[snip]
[...]
COMMIT
You are now connected to database "test" as user "usr".
BEGIN
psql:test-end2.sql:3: ERROR:  cache lookup failed for relation 398033
CONTEXT:  SQL query "SELECT  TRUE FROM pg_catalog.pg_class c LEFT JOIN
pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname =
'tmptbl'::TEXT AND c.relkind = 'r'::TEXT AND
pg_catalog.pg_table_is_visible(c.oid)"
PL/pgSQL function "create_tmptbl" line 2 at perform
PL/pgSQL function "check_or_populate_func" line 8 at assignment
PL/pgSQL function "setuid_wrapper_func" line 5 at return
## END ##

What's really bothering me is I can push the up arrow on the console,
run the exact same thing (including dropping the database), and it'll
work sometimes.  Very disturbing.  As I said, I'm *very* suspicious of
the background writer goo that Jan added simply because I can't think
of anything else that'd have this problem.

I've run each of those commands 100 times now, with and without the
sleep 1.  With the sleep 1, it's worked 100% of the time.  Jan, any bit
of code that comes to mind?

All of my bgwriter_* settings are set to their default.

-sc

--
Sean Chittenden

pgsql-bugs by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Erro
Next
From: Tom Lane
Date:
Subject: Re: Erro