Bug in amcheck? - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Bug in amcheck?
Date
Msg-id 33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru
Whole thread Raw
List pgsql-hackers
Hi hackers.

We see the following error reported by amcheck (I have added dump of 
opaque) when it interleaves with autovacuum and cancel pt:


ERROR:  mismatch between parent key and child high key in index 
"pg_attribute_relid_attnam_index"
DETAIL:  Target block=274, target opaque->flags=0, child block=427, 
child opaque=11, target page lsn=1/484A8FC8.
CONTEXT:  SQL statement "SELECT bt_index_parent_check(indexrelid, true, 
true) from pg_index"

So child has BTP_HALF_DEAD bit set.
Autovacuum is interrupted in this place in _bt_pagedel:

         /*
          * Check here, as calling loops will have locks held, preventing
          * interrupts from being processed.
          */
         CHECK_FOR_INTERRUPTS();

Reproducing it is not so easy.
First of all I added sleep here:

         /*
          * Check here, as calling loops will have locks held, preventing
          * interrupts from being processed.
          */
         pg_usleep(10000);
         CHECK_FOR_INTERRUPTS();

Then I create two procedures:

create or replace procedure create_tables(tables integer, partitions 
integer) as $$
declare
     i integer;
     j integer;
begin
     for i in 1..tables
     loop
         execute 'DROP TABLE IF EXISTS t_' || i;
         execute 'CREATE TABLE t_' || i || '(pk integer) partition by 
range (pk)';
         for j in 1..partitions
         loop
             execute 'create table p_'||i||'_'||j||' partition of 
t_'||i||' for values from ('||j||') to ('||(j + 1)||')';
         end loop;
         execute 'insert into t_'||i||' values 
(generate_series(1,'||partitions||'))';
     end loop;
end;
$$ language plpgsql;

and

create or replace procedure run_amcheck() as $$
begin
     loop
         if (select count(*) from pg_stat_activity where 
backend_type='autovacuum worker') > 0
         then
             raise notice 'Run amcheck!';
             perform bt_index_parent_check(indexrelid, true, true) from 
pg_index;
         end if;
         perform pg_sleep(1);
     end loop;
end;
$$ language plpgsql;

Then I run concurrently run_amcheck()
and the following script for pgbench:

call create_tables(2,1000);
select pg_sleep(2);

If the problem is not reproduced, then cancel run_amcheck()  and restart 
it once again.


Backtrace (pg16) is the following:

   * frame #0: 0x00000001017b6aac 
amcheck.dylib`bt_child_highkey_check(state=0x000000010c846318, 
target_downlinkoffnum=37, loaded_child="\U00000001", target_level=1) at 
verify_nbtree.c:2146:23
     frame #1: 0x00000001017b7fd8 
amcheck.dylib`bt_child_check(state=0x000000010c846318, 
targetkey=0x000000013c01c448, downlinkoffnum=37) at verify_nbtree.c:2262:2
     frame #2: 0x00000001017b5f4c 
amcheck.dylib`bt_target_page_check(state=0x000000010c846318) at 
verify_nbtree.c:1623:4
     frame #3: 0x00000001017b3908 
amcheck.dylib`bt_check_level_from_leftmost(state=0x000000010c846318, 
level=(level = 1, leftmost = 3, istruerootlevel = false)) at 
verify_nbtree.c:859:3
     frame #4: 0x00000001017b24e8 
amcheck.dylib`bt_check_every_level(rel=0x0000000140074f18, 
heaprel=0x0000000130070148, heapkeyspace=true, readonly=true, 
heapallindexed=true, rootdescend=true) at verify_nbtree.c:603:13
     frame #5: 0x00000001017b198c 
amcheck.dylib`bt_index_check_internal(indrelid=2674, parentcheck=true, 
heapallindexed=true, rootdescend=true) at verify_nbtree.c:362:3
     frame #6: 0x00000001017b1a78 
amcheck.dylib`bt_index_parent_check(fcinfo=0x000000010c83b040) at 
verify_nbtree.c:242:2


I wonder if we should add P_ISHALFDEAD(opaque) for child page?





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Dynamic shared memory areas
Next
From: Nathan Bossart
Date:
Subject: Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()