Re: [GENERAL] cache lookup of relation 165058647 failed - Mailing list pgsql-bugs
From | Jan Wieck |
---|---|
Subject | Re: [GENERAL] cache lookup of relation 165058647 failed |
Date | |
Msg-id | 4099B143.8020006@Yahoo.com Whole thread Raw |
In response to | Re: [GENERAL] cache lookup of relation 165058647 failed (Sean Chittenden <sean@chittenden.org>) |
List | pgsql-bugs |
Sean Chittenden wrote: >>>> I'v find out that this error occurs in: >>>> dependency.c file >>>> >>>> 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of >>>> relation >>>> 149064743 failed >>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist >>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist >>>> >>>> in getRelationDescription(StringInfo buffer, Oid relid) function. >>>> >>>> Any ideas what can cause this errors. >>> <aol>Me too.</aol> >>> But, I am suspecting that it's a race condition with the new >>> background writer code. I've started testing a new database design >>> and was able to reproduce this on my laptop nearly 90% of the time, >>> but could only reproduce it about 10% of the time on my production >>> databases until I figured out what the difference was, fsync. >> >> temp tables don't use the shared buffer cache, how can this be related >> to the BG writer? > > Don't the system catalogs use the shared buffer cache? > > BEGIN; > SELECT create_temp_table_func(); -- Inserts a row into pg_class via > CREATE TEMP TABLE > -- Do other stuff > COMMIT; -- After the commit, the row is now visible to other > backends > -- disconnect -- If the delay between the disconnect and reconnect is > small enough > -- reconnect -- It's as though there is a race condition that allows > the function > -- pg_table_is_visible() to assert the "cache lookup of relation" > -- error. > BEGIN; > SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I > call > /* SELECT TRUE FROM pg_catalog.pg_class c > LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace > WHERE c.relname = ''footmp''::TEXT AND > c.relkind = ''r''::TEXT AND > pg_catalog.pg_table_is_visible(c.oid); */ > -- But the query fails > > My guess was that the series of events went something like: > > proc 0) COMMIT's and the row in pg_class is committed > proc 1) bgwriter writer code removes a page for the cache > proc 2) queries for the page [*] > proc 1) writes it to disk > proc 2) queries for the page [*] > proc 1) sync's the fd > > [*] proc 2 queries for the page at either of these points > > In 7.4, there is no bgwriter or background process mucking with cache, Except for the checkpoint process, which does exactly the same as the bgwriter does, and ALL concurrent backends whenever they feel the need to evict a dirty buffer. If it makes a difference if a pg_class page is dirty in the buffer or copied out to disk with respect to visibility rules of the tuples contained in it, then the whole thing is a way larger bug than the one in MIB. First of all, committed or not, a temp object from one session should NEVER be visible in any other. Jan > which is why this works 100% of the time. In 7.5, however, there's a > 200ms gap where a race condition appears and pg_table_is_visible() > fails its PointerIsValid() check. If I put a sleep in, the sleep gives > the bgwriter enough time to commit the pages to disk so that the > queries for the page happen after the fd's been sync()'ed. > > I have no other clue as to why this would be happening though, so > believe me when I say, I could very well be quite wrong.... but this is > my best, quasi-educated/grep(1)'ed guess. > > -sc > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-bugs by date: