Re: 9.4 and reproducible "ERROR: could not read block 0 in file..." - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: 9.4 and reproducible "ERROR: could not read block 0 in file..." |
Date | |
Msg-id | 15824.1419970310@sss.pgh.pa.us Whole thread Raw |
In response to | 9.4 and reproducible "ERROR: could not read block 0 in file..." (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: 9.4 and reproducible "ERROR: could not read block 0 in
file..."
|
List | pgsql-bugs |
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Looked into this based on a report via IRC, and constructed the > following minimal test case. The original reporter found it in an > attempt to migrate to 9.4 from an earlier version. > (Yes, I know the function is behaving badly in that it is declared > immutable while nevertheless accessing the table. I'm sure this is > related to the problem, but (a) it presumably worked before, for > sufficient values of "worked", and (b) if it's going to be disallowed, > I think it needs to fail more cleanly than this.) For small values of "worked", perhaps. This variant script fails at least as far back as 8.4: create table ftst (id integer, v text); create or replace function actual_v(integer) returns int language plpgsql immutable as $f$ declare res text; begin select v into res from ftst f where f.id=$1; return 1/($1 - 30); end; $f$; insert into ftst select i, 'value '||i from generate_series(1,60) i; create index ftst_id on ftst (id); create index ftst_trouble on ftst (actual_v(id)); -- ERROR: division by zero select * from ftst; -- ERROR: could not open relation with OID 40457 What seems to be happening is that 1. After creation of the first index, we have a relcache entry whose rd_indexlist is marked invalid, quite properly (no one has requested the index list since the first index was added). 2. If a function called in the index expression tries to plan any query involving the table, we build a new indexlist --- in which the index-being-built is included, since it already has visible pg_class and pg_index entries. 3. If index build fails later on, the indexlist built in step 2 is never flushed, so it is still there and will cause the planner to try to access the bogus index. The cause of #3 is that we don't register a pending relcache invalidation event for the table until the index_update_stats call near the end of index_build() --- which will never be reached if we get errors while computing the index entries. The new behavior in 9.3 and up (not only 9.4) is that the planner calls _bt_getrootheight() on every index it thinks is valid, and that results in the "could not read block 0" failure, since in fact the btree metapage ain't there yet. So there are two separate bugs/symptoms here, but I think we can fix both of them with one change: we need the new index to appear invalid for queries/updates until we're done building it. We could implement that (in 9.2 and later) by creating the pg_index row with indislive = false and updating it to true after the build completes; but that's a tad annoying since it means every index build leaves a dead pg_index row behind. (Unless we do the update in-place, which I think would work but it's still pretty ugly.) Or we could hack things up by having RelationGetIndexList consult some internal-to-the-backend state so that it knows which index builds are in-progress and leaves those indexes out of the indexlist. That seems a bit messy too, but it would avoid one catalog update, and would stand a better chance of being back-patchable to before 9.2. regards, tom lane
pgsql-bugs by date: