Re: pg_verify_checksums failure with hash indexes - Mailing list pgsql-hackers

From Yugo Nagata
Subject Re: pg_verify_checksums failure with hash indexes
Date
Msg-id 20180829182526.9a4ab0c3deec9f81c439195b@sraoss.co.jp
Whole thread Raw
In response to Re: pg_verify_checksums failure with hash indexes  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
On Tue, 28 Aug 2018 15:02:56 +0200
Michael Banck <michael.banck@credativ.de> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 11:21:34AM +0200, Peter Eisentraut wrote:
> > This is reproducible with PG11 and PG12:
> > 
> > initdb -k data
> > postgres -D data
> > 
> > make installcheck
> > # shut down postgres with Ctrl-C
> > 
> > pg_verify_checksums data
> > 
> > pg_verify_checksums: checksum verification failed in file
> > "data/base/16384/28647", block 65: calculated checksum DC70 but expected 0
> > pg_verify_checksums: checksum verification failed in file
> > "data/base/16384/28649", block 65: calculated checksum 89D8 but expected 0
> > pg_verify_checksums: checksum verification failed in file
> > "data/base/16384/28648", block 65: calculated checksum 9636 but expected 0
> > Checksum scan completed
> > Data checksum version: 1
> > Files scanned:  2493
> > Blocks scanned: 13172
> > Bad checksums:  3
> > 
> > The files in question correspond to
> > 
> > hash_i4_index
> > hash_name_index
> > hash_txt_index
> > 
> > Discuss. ;-)
> 
> I took a look at hash_name_index, assuming the others are similar.
> 
> Page 65 is the last page, pageinspect barfs on it as well:
> 
> regression=# SELECT get_raw_page('hash_name_index', 'main', 65);
> WARNING:  page verification failed, calculated checksum 18066 but expected 0
> ERROR:  invalid page in block 65 of relation base/16384/28638
> 
> The pages before that one from page 35 on are empty:
> 
> regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 1));
>     lsn    | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/422D890 |     8807 |     0 |   664 |  5616 |    8176 |     8192 |       4 |         0
> (1 Zeile)
> [...]
> regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 34));
>     lsn    | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
> -----------+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/422C690 |    18153 |     0 |   580 |  5952 |    8176 |     8192 |       4 |         0
> (1 Zeile)
> regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 35));
>  lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
> -----+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/0 |        0 |     0 |     0 |     0 |       0 |        0 |       0 |         0
> [...]
> regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 64));
>  lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
> -----+----------+-------+-------+-------+---------+----------+---------+-----------
>  0/0 |        0 |     0 |     0 |     0 |       0 |        0 |       0 |         0
> regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 65));
> WARNING:  page verification failed, calculated checksum 18066 but expected 0
> ERROR:  invalid page in block 65 of relation base/16384/28638
> 
> Running pg_filedump on the last two pages results in (not sure the
> "Invalid header information." are legit; neither about the checksum
> failure on block 64):
> 
> mba@fock:~/[...]postgresql/build/src/test/regress$ ~/tmp/bin/pg_filedump -R 64 65 -k -f
tmp_check/data/base/16384/28638
 
> 
> --8<--
> *******************************************************************
> * PostgreSQL File/Block Formatted Dump Utility - Version 10.1
> *
> * File: tmp_check/data/base/16384/28638
> * Options used: -R 64 65 -k -f 
> *
> * Dump created on: Tue Aug 28 14:53:37 2018
> *******************************************************************
> 
> Block   64 ********************************************************
> <Header> -----
>  Block Offset: 0x00080000         Offsets: Lower       0 (0x0000)
>  Block: Size    0  Version    0            Upper       0 (0x0000)
>  LSN:  logid      0 recoff 0x00000000      Special     0 (0x0000)
>  Items:    0                      Free Space:    0
>  Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0000 ()
>  Length (including item array): 24
> 
>  Error: Invalid header information.
> 
>  Error: checksum failure: calculated 0xc66a.
> 
>   0000: 00000000 00000000 00000000 00000000  ................
>   0010: 00000000 00000000                    ........        
> 
> <Data> ------ 
>  Empty block - no items listed 
> 
> <Special Section> -----
>  Error: Invalid special section encountered.
>  Error: Special section points off page. Unable to dump contents.
> 
> Block   65 ********************************************************
> <Header> -----
>  Block Offset: 0x00082000         Offsets: Lower      24 (0x0018)
>  Block: Size 8192  Version    4            Upper    8176 (0x1ff0)
>  LSN:  logid      0 recoff 0x04229c20      Special  8176 (0x1ff0)
>  Items:    0                      Free Space: 8152
>  Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0000 ()
>  Length (including item array): 24
> 
>  Error: checksum failure: calculated 0x4692.
> 
>   0000: 00000000 209c2204 00000000 1800f01f  .... .".........
>   0010: f01f0420 00000000                    ... ....        
> 
> <Data> ------ 
>  Empty block - no items listed 
> 
> <Special Section> -----
>  Hash Index Section:
>   Flags: 0x0000 ()
>   Bucket Number: 0xffffffff
>   Blocks: Previous (-1)  Next (-1)
> 
>   1ff0: ffffffff ffffffff ffffffff 000080ff  ................
> 
> 
> *** End of Requested Range Encountered. Last Block Read: 65 ***
> --8<--
> 
> So it seems there is some data on the last page, which makes the zero
> checksum bogus, but I don't know anything about hash indexes. Also maybe
> those empty pages are not initialized correctly? Or maybe the "Invalid
> special section encountered" error meand pg_filedump cannot handle hash
> indexes completely.

I saw the same thing in the hash_i4_index case using pageinspect with
checksum disablbed. The last page (block 65) has some data in its header. 

regression=# select * from page_header(get_raw_page('hash_i4_index',65));
    lsn    | checksum | flags | lower | upper | special | pagesize | version | prune_xid 
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
 0/939FE48 |        0 |     0 |    24 |  8176 |    8176 |     8192 |       4 |         0
(1 row)

Looking at the code to check the checksum, each page is checked if this is a
new page by using PageIsNew(), and if so its checksum is not checked because
new pages are assumed to have no checksum.  PageIsNew() determines if a
page is new or not from pd_upper.  For some reason, the last page has pd_upper
but no checksum, so the checksum verification fails.

It is not clear for me why the last page has a head information, but, after
some code investigation, I think it happend in _hash_alloc_buckets().  When
expanding a hash table, smgrextend() add some blocks to a file. At that time,
it seems that a page that has a header infomation is written to the end of
the file (in mdextend()).

I'm not sure how to fix this for now, but it might be worth to share my 
analysis for this issue.

Regards,
-- 
Yugo Nagata <nagata@sraoss.co.jp>


pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: pg_verify_checksums failure with hash indexes
Next
From: Peter Eisentraut
Date:
Subject: PL/Python: Remove use of simple slicing API