Re: [GENERAL] cache lookup of relation 165058647 failed - Mailing list pgsql-bugs

From Sean Chittenden
Subject Re: [GENERAL] cache lookup of relation 165058647 failed
Date
Msg-id 816FE1CE-9ED4-11D8-B669-000A95C705DC@chittenden.org
Whole thread Raw
In response to Re: [GENERAL] cache lookup of relation 165058647 failed  (Sean Chittenden <sean@chittenden.org>)
Responses Re: [GENERAL] cache lookup of relation 165058647 failed
List pgsql-bugs
>>> I'v find out that this error occurs in:
>>>  dependency.c file
>>>
>>> 2004-04-26 11:09:34 ERROR:  dependency.c 1621: cache lookup of
>>> relation
>>> 149064743 failed
>>> 2004-04-26 11:09:34 ERROR:  Relation "tmp_table1" does not exist
>>> 2004-04-26 11:09:34 ERROR:  Relation "tmp_table1" does not exist
>>>
>>> in getRelationDescription(StringInfo buffer, Oid relid) function.
>>>
>>> Any ideas what can cause this errors.
>> <aol>Me too.</aol>
>> But, I am suspecting that it's a race condition with the new
>> background writer code.  I've started testing a new database design
>> and was able to reproduce this on my laptop nearly 90% of the time,
>> but could only reproduce it about 10% of the time on my production
>> databases until I figured out what the difference was, fsync.
>
> temp tables don't use the shared buffer cache, how can this be related
> to the BG writer?

Don't the system catalogs use the shared buffer cache?

BEGIN;
SELECT create_temp_table_func();  -- Inserts a row into pg_class via
CREATE TEMP TABLE
-- Do other stuff
COMMIT;              -- After the commit, the row is now visible to other
backends
-- disconnect      -- If the delay between the disconnect and reconnect is
small enough
-- reconnect        -- It's as though there is a race condition that allows
the function
                -- pg_table_is_visible() to assert the "cache lookup of relation"
                -- error.
BEGIN;
SELECT create_temp_table_func();  -- Before the CREATE TEMP TABLE, I
call
                             /* SELECT TRUE FROM pg_catalog.pg_class c
                                LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
                                WHERE c.relname = ''footmp''::TEXT AND
                                c.relkind = ''r''::TEXT AND
                                pg_catalog.pg_table_is_visible(c.oid); */
                            -- But the query fails

My guess was that the series of events went something like:

proc 0) COMMIT's and the row in pg_class is committed
proc 1) bgwriter writer code removes a page for the cache
proc 2) queries for the page  [*]
proc 1) writes it to disk
proc 2) queries for the page  [*]
proc 1) sync's the fd

[*] proc 2 queries for the page at either of these points

In 7.4, there is no bgwriter or background process mucking with cache,
which is why this works 100% of the time.  In 7.5, however, there's a
200ms gap where a race condition appears and pg_table_is_visible()
fails its PointerIsValid() check.  If I put a sleep in, the sleep gives
the bgwriter enough time to commit the pages to disk so that the
queries for the page happen after the fd's been sync()'ed.

I have no other clue as to why this would be happening though, so
believe me when I say, I could very well be quite wrong.... but this is
my best, quasi-educated/grep(1)'ed guess.

-sc

--
Sean Chittenden

pgsql-bugs by date:

Previous
From: Devrim GUNDUZ
Date:
Subject: Re: Turkish locale bug
Next
From: Sean Chittenden
Date:
Subject: Re: [GENERAL] cache lookup of relation 165058647 failed