Re: [GENERAL] cache lookup of relation 165058647 failed - Mailing list pgsql-bugs
From | Sean Chittenden |
---|---|
Subject | Re: [GENERAL] cache lookup of relation 165058647 failed |
Date | |
Msg-id | 816FE1CE-9ED4-11D8-B669-000A95C705DC@chittenden.org Whole thread Raw |
In response to | Re: [GENERAL] cache lookup of relation 165058647 failed (Sean Chittenden <sean@chittenden.org>) |
Responses |
Re: [GENERAL] cache lookup of relation 165058647 failed
|
List | pgsql-bugs |
>>> I'v find out that this error occurs in: >>> dependency.c file >>> >>> 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of >>> relation >>> 149064743 failed >>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist >>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist >>> >>> in getRelationDescription(StringInfo buffer, Oid relid) function. >>> >>> Any ideas what can cause this errors. >> <aol>Me too.</aol> >> But, I am suspecting that it's a race condition with the new >> background writer code. I've started testing a new database design >> and was able to reproduce this on my laptop nearly 90% of the time, >> but could only reproduce it about 10% of the time on my production >> databases until I figured out what the difference was, fsync. > > temp tables don't use the shared buffer cache, how can this be related > to the BG writer? Don't the system catalogs use the shared buffer cache? BEGIN; SELECT create_temp_table_func(); -- Inserts a row into pg_class via CREATE TEMP TABLE -- Do other stuff COMMIT; -- After the commit, the row is now visible to other backends -- disconnect -- If the delay between the disconnect and reconnect is small enough -- reconnect -- It's as though there is a race condition that allows the function -- pg_table_is_visible() to assert the "cache lookup of relation" -- error. BEGIN; SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I call /* SELECT TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = ''footmp''::TEXT AND c.relkind = ''r''::TEXT AND pg_catalog.pg_table_is_visible(c.oid); */ -- But the query fails My guess was that the series of events went something like: proc 0) COMMIT's and the row in pg_class is committed proc 1) bgwriter writer code removes a page for the cache proc 2) queries for the page [*] proc 1) writes it to disk proc 2) queries for the page [*] proc 1) sync's the fd [*] proc 2 queries for the page at either of these points In 7.4, there is no bgwriter or background process mucking with cache, which is why this works 100% of the time. In 7.5, however, there's a 200ms gap where a race condition appears and pg_table_is_visible() fails its PointerIsValid() check. If I put a sleep in, the sleep gives the bgwriter enough time to commit the pages to disk so that the queries for the page happen after the fd's been sync()'ed. I have no other clue as to why this would be happening though, so believe me when I say, I could very well be quite wrong.... but this is my best, quasi-educated/grep(1)'ed guess. -sc -- Sean Chittenden
pgsql-bugs by date: