Re: 9.4 and reproducible "ERROR: could not read block 0 in file..." - Mailing list pgsql-bugs

From Tom Lane
Subject Re: 9.4 and reproducible "ERROR: could not read block 0 in file..."
Date
Msg-id 15824.1419970310@sss.pgh.pa.us
Whole thread Raw
In response to 9.4 and reproducible "ERROR: could not read block 0 in file..."  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: 9.4 and reproducible "ERROR: could not read block 0 in file..."
List pgsql-bugs
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Looked into this based on a report via IRC, and constructed the
> following minimal test case. The original reporter found it in an
> attempt to migrate to 9.4 from an earlier version.

> (Yes, I know the function is behaving badly in that it is declared
> immutable while nevertheless accessing the table. I'm sure this is
> related to the problem, but (a) it presumably worked before, for
> sufficient values of "worked", and (b) if it's going to be disallowed,
> I think it needs to fail more cleanly than this.)

For small values of "worked", perhaps.  This variant script fails at
least as far back as 8.4:

create table ftst (id integer, v text);
create or replace function actual_v(integer) returns int
  language plpgsql immutable
  as $f$
    declare
      res text;
    begin
      select v into res from ftst f where f.id=$1;
      return 1/($1 - 30);
    end;
  $f$;
insert into ftst select i, 'value '||i from generate_series(1,60) i;
create index ftst_id on ftst (id);
create index ftst_trouble on ftst (actual_v(id));
-- ERROR:  division by zero
select * from ftst;
-- ERROR:  could not open relation with OID 40457

What seems to be happening is that

1. After creation of the first index, we have a relcache entry whose
rd_indexlist is marked invalid, quite properly (no one has requested
the index list since the first index was added).

2. If a function called in the index expression tries to plan any
query involving the table, we build a new indexlist --- in which
the index-being-built is included, since it already has visible
pg_class and pg_index entries.

3. If index build fails later on, the indexlist built in step 2 is
never flushed, so it is still there and will cause the planner to
try to access the bogus index.

The cause of #3 is that we don't register a pending relcache
invalidation event for the table until the index_update_stats call
near the end of index_build() --- which will never be reached
if we get errors while computing the index entries.

The new behavior in 9.3 and up (not only 9.4) is that the planner
calls _bt_getrootheight() on every index it thinks is valid, and
that results in the "could not read block 0" failure, since in
fact the btree metapage ain't there yet.

So there are two separate bugs/symptoms here, but I think we can
fix both of them with one change: we need the new index to appear
invalid for queries/updates until we're done building it.  We
could implement that (in 9.2 and later) by creating the pg_index
row with indislive = false and updating it to true after the build
completes; but that's a tad annoying since it means every index build
leaves a dead pg_index row behind.  (Unless we do the update in-place,
which I think would work but it's still pretty ugly.)

Or we could hack things up by having RelationGetIndexList consult some
internal-to-the-backend state so that it knows which index builds are
in-progress and leaves those indexes out of the indexlist.  That seems
a bit messy too, but it would avoid one catalog update, and would stand
a better chance of being back-patchable to before 9.2.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Andrew Gierth
Date:
Subject: 9.4 and reproducible "ERROR: could not read block 0 in file..."
Next
From: mail@bwe.im
Date:
Subject: BUG #12379: pgbench should hint to pgbench -i