Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: new heapcheck contrib module
Date
Msg-id E9D7ECC3-5538-4B14-AD4C-75931B6BEE22@enterprisedb.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers

> On Oct 22, 2020, at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mark Dilger <mark.dilger@enterprisedb.com> writes:
>> Ahh, crud.  It's because
>>     syswrite($fh, '\x77\x77\x77\x77', 500)
>> is wrong twice.  The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal
withbackslashes and such.  It should have been double-quoted. 
>
> Argh.  So we really have, using same test except
>
>     memcpy(&lp, "\\x77", sizeof(lp));
>
> little endian:    off = 785c, flags = 2, len = 1b9b
> big endian:    off = 2e3c, flags = 0, len = 3737
>
> which explains the apparent LP_DEAD result.
>
> I'm not particularly on board with your suggestion of "well, if it works
> sometimes then it's okay".  Then we have no idea of what we really tested.
>
>             regards, tom lane

Ok, I've pruned it down to something you may like better.  Instead of just checking that *some* corruption occurs, it
checksthe returned corruption against an expected regex, and if it fails to match, you should see in the logs what you
gotvs. what you expected. 

It only corrupts the first two line pointers, the first one with 0x77777777 and the second one with 0xAAAAAAAA, which
areconsciously chosen to be bitwise reverses of each other and just strings of alternating bits rather than anything
thatcould have a more complicated interpretation. 

On my little-endian mac, the 0x77777777 value creates a line pointer which redirects to an invalid offset 0x7777, which
getsreported as decimal 30583 in the corruption report, "line pointer redirection to item at offset 30583 exceeds
maximumoffset 38".  The test is indifferent to whether the corruption it is looking for is reported relative to the
firstline pointer or the second one, so if endian-ness matters, it may be the 0xAAAAAAAA that results in that
corruptionmessage.  I don't have a machine handy to test that.  It would be nice to determine the minimum amount of
paranoianecessary to make this portable and not commit the rest. 




—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Attachment

pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: Re: "unix_socket_directories" should be GUC_LIST_INPUT?
Next
From: Tom Lane
Date:
Subject: Re: "unix_socket_directories" should be GUC_LIST_INPUT?