Re: Perl modules for testing/viewing/corrupting/repairing your heapfiles - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Perl modules for testing/viewing/corrupting/repairing your heapfiles
Date
Msg-id 913D6F73-8337-4FDA-B11E-EFFCA20E1A44@enterprisedb.com
Whole thread Raw
In response to Re: Perl modules for testing/viewing/corrupting/repairing your heap files  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Perl modules for testing/viewing/corrupting/repairing your heap files  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers

> On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>> Recently, as part of testing something else, I had need of a tool to create
>> surgically precise corruption within heap pages.  I wanted to make the
>> corruption from within TAP tests, so I wrote the tool as a set of perl modules.
>
> There is also pg_hexedit:
>
> https://github.com/petergeoghegan/pg_hexedit

I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I
developaccepted.  (That's a hard enough problem without licensing issues.).  I'm not taking a political stand for or
againstthe GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres
submission.

(Thanks for writing pg_hexedit, BTW.  I'm not criticizing it.)

The purpose of these perl modules is not the viewing of files, but the intentional and targeted corruption of files
fromwithin TAP tests.  There are limited examples of tests in the postgres source tree that intentionally corrupt
files,and as I read them, they employ a blunt force trauma approach: 

In src/bin/pg_basebackup/t/010_pg_basebackup.pl:

> # induce corruption
> system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
> open $file, '+<', "$pgdata/$file_corrupt1";
> seek($file, $pageheader_size, 0);
> syswrite($file, "\0\0\0\0\0\0\0\0\0");
> close $file;
> system_or_bail 'pg_ctl', '-D', $pgdata, 'start';

In src/bin/pg_checksums/t/002_actions.pl:
>     # Time to create some corruption
>     open my $file, '+<', "$pgdata/$file_corrupted";
>     seek($file, $pageheader_size, 0);
>     syswrite($file, "\0\0\0\0\0\0\0\0\0");
>     close $file;

These blunt force trauma tests are fine, as far as they go.  But I wanted to be able to do things like

        # Corrupt the tuple to look like it has lots of attributes, some of
        # them null.  This falsely creates the impression that the t_bits
        # array is longer than just one byte, but t_hoff still says otherwise.
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;

or

    # Same as above, but this time t_hoff plays along
        $tup->{HEAP_HASNULL} = 1;
        $tup->{HEAP_NATTS_MASK} = 0x3FF;
        $tup->{t_bits} = 0xAA;
        $tup->{t_hoff} = 32;

That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you're
goingto write the corruption, and the bit pattern you are going to write to that location.  Even if you do all that,
nobodyelse is likely going to be able to read and maintain your tests. 

I'd like an easy way from within TAP tests to selectively corrupt files, to test whether various parts of the system
failgracefully in the presence of corruption.  What happens when a child partition is corrupted?  Does that impact
queriesthat only access other partitions?  What kinds of corruption cause pg_upgrade to fail? ...to expand the scope of
thecorruption?  What happens to logical replication when there is corruption on the primary? ...on the standby?  What
kindsof corruption cause a query to return data from neighboring tuples that the querying role has not permission to
view? What happens when a NAS is only intermittently corrupt? 

The modules I've submitted thus far are incomplete for this purpose.  They don't yet handle toast tables, btree, hash,
gist,gin, fsm, or vm, and I might be forgetting a few other things in the list.  Before I go and implement all of that,
Ithought perhaps others would express preferences about how this should all work, even stuff like, "Don't bother
implementingthat in perl, as I'm reimplementing the entire testing structure in COBOL", or similarly unexpected
feedback.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel copy
Next
From: Robert Haas
Date:
Subject: Re: wrong relkind error messages