Perl modules for testing/viewing/corrupting/repairing your heap files - Mailing list pgsql-hackers

From Mark Dilger
Subject Perl modules for testing/viewing/corrupting/repairing your heap files
Date
Msg-id 5475CA8E-9F68-4632-85A4-18AB64BB5723@enterprisedb.com
Whole thread Raw
Responses Re: Perl modules for testing/viewing/corrupting/repairing your heapfiles  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: Perl modules for testing/viewing/corrupting/repairing your heap files  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hackers,

Recently, as part of testing something else, I had need of a tool to create
surgically precise corruption within heap pages.  I wanted to make the
corruption from within TAP tests, so I wrote the tool as a set of perl modules.

The modules allow you to "tie" a perl array to a heap file, in essence thinking
of the file as an array of heap pages.  Each page within the file manifests as
a tied perl hash, where each of the page header fields are an element in the
hash, and the tuples in the page are an array of tied hashes, with each field
in the tuple header as a field in that tied hash.

This is all done in pure perl.  There is no eXtended Subroutine component of
this.

The body of each tuple (stuff beyond the tuple header) is thought of merely as
binary data.  I haven't done any work to decode it into perl datastructures
equivalent to integer, text, timestamp, etc., nor have I needed that
functionality as yet.  That seems doable as an extension of this work, at least
if the caller passes tuple descriptor type information into the `tie @file`
command.

Stuff like the following example works in the implementation already completed.
Note in particular that the file is bound in O_RDWR mode.  That means it all
gets written back to the underlying file and truly updates (corrupts) your
data.  It all also works in O_RDONLY mode, in which case the updates are made
to a copy of the data in perl's memory, but none of it goes back to disk.  Of course,
nothing forces you to update anything.  You could use this to read the fields from
the file/page/tuple without making modifications.

    #!/usr/bin/perl

    use HeapTuple;
    use HeapPage;
    use HeapFile;
    use Fcntl;

    my @file;
    tie @file, 'HeapFile', path => 'base/12925/3599', pagesize => 8192, mode => O_RDWR;
    for my $page (@file)
    {
        $page->{pd_lsn_xrecoff}++;
        print $page->{pd_checksum}, "\n";
        for (@{$page->{'tuples'}})
        {
            $_->{HEAP_COMBOCID} = 1 if ($_->{HEAP_HASNULL});
            $_->{t_xmin} = $_->{t_xmax} if $_->{HEAP_XMAX_COMMITTED};
        }
    }
    untie @file;

In my TAP test usage of these modules, I tend to fall into the pattern of:

    my $node = get_new_node('master');
    $node->init;
    my $pgdata = $node->data_dir;
    $node->safe_psql('postgres', 'create table public.test (bar text)');
    my $path = join('/', $pgdata, $node->safe_psql(
        'postgres', "SELECT pg_relation_filepath('public.test')"));
    $node->stop;

    my @file;
    tie @file, 'HeapFile', path => $path, pagesize => 8192, mode => O_RDWR;
    # do some corruption

    $node->start;
    # do some queries against the corrupt table, see what happens

For kicks, I just ran this one-liner and got many screenfuls of data.  I'll just include
the tail end:

    perl -e 'use HeapFile; tie @file, "HeapFile", path => "pgdata/base/12925/1255"; print(scalar(%$_)) for(@file);'

BODY AS HEX               ===>  PRINTABLE ASCII
ff 0f 06 00 00 00 00 00   ===>  . . . . . . . .
47 20 00 00 46 06 46 43   ===>  q 2 . . p l p g
49 47 06 05 3f 3d 06 06   ===>  s q l _ c a l l
05 44 3d 06 40 06 41 48   ===>  _ h a n d l e r
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 50 03 00 00 00 00   ===>  . . . ? . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
42 00 00 00 00 4c 4b 00   ===>  f . . . . v u .
00 00 00 00 00 08 00 00   ===>  . . . . . . . .
3c 00 00 00 01 00 00 00   ===>  ` . . . . . . .
00 00 00 00 01 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
02 46 06 46 43 49 47 06   ===>  + p l p g s q l
05 3f 3d 06 06 05 44 3d   ===>  _ c a l l _ h a
06 40 06 41 48 15 18 06   ===>  n d l e r ! $ l
45 3e 40 45 48 02 46 06   ===>  i b d i r / p l
46 43 49 47 06            ===>  p g s q l
b6 01 00 00            t_xmin: 438
00 00 00 00            t_xmax: 0
02 00 00 00          t_field3: 2
00 00                   bi_hi: 0
50 00                   bi_lo: 80
06 00                ip_posid: 6
1d 00             t_infomask2: 29
                        Natts: 29
            HEAP_KEYS_UPDATED: 0
             HEAP_HOT_UPDATED: 0
              HEAP_ONLY_TUPLE: 0
03 0b              t_infomask: 2819
                 HEAP_HASNULL: 1
             HEAP_HASVARWIDTH: 1
             HEAP_HASEXTERNAL: 0
              HEAP_HASOID_OLD: 0
        HEAP_XMAX_KEYSHR_LOCK: 0
                HEAP_COMBOCID: 0
          HEAP_XMAX_EXCL_LOCK: 0
          HEAP_XMAX_LOCK_ONLY: 0
          HEAP_XMIN_COMMITTED: 1
            HEAP_XMIN_INVALID: 1
          HEAP_XMAX_COMMITTED: 0
            HEAP_XMAX_INVALID: 1
           HEAP_XMAX_IS_MULTI: 0
                 HEAP_UPDATED: 0
               HEAP_MOVED_OFF: 0
                HEAP_MOVED_IN: 0
20                     t_hoff: 32
ffff0f06        NULL_BITFIELD: 11111111111111111111000001100
                      OID_OLD:

BODY AS HEX               ===>  PRINTABLE ASCII
ff 0f 06 00 00 00 00 00   ===>  . . . . . . . .
48 20 00 00 46 06 46 43   ===>  r 2 . . p l p g
49 47 06 05 45 06 06 45   ===>  s q l _ i n l i
06 41 05 44 3d 06 40 06   ===>  n e _ h a n d l
41 48 00 00 00 00 00 00   ===>  e r . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 50 03 00 00 00 00   ===>  . . . ? . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
42 00 00 01 00 4c 4b 00   ===>  f . . . . v u .
01 00 00 00 00 08 00 00   ===>  . . . . . . . .
46 00 00 00 01 00 00 00   ===>  p . . . . . . .
00 00 00 00 01 00 00 00   ===>  . . . . . . . .
01 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 08 00 00 02 46 06 46   ===>  . . . . / p l p
43 49 47 06 05 45 06 06   ===>  g s q l _ i n l
45 06 41 05 44 3d 06 40   ===>  i n e _ h a n d
06 41 48 15 18 06 45 3e   ===>  l e r ! $ l i b
40 45 48 02 46 06 46 43   ===>  d i r / p l p g
49 47 06                  ===>  s q l
b6 01 00 00            t_xmin: 438
00 00 00 00            t_xmax: 0
03 00 00 00          t_field3: 3
00 00                   bi_hi: 0
50 00                   bi_lo: 80
07 00                ip_posid: 7
1d 00             t_infomask2: 29
                        Natts: 29
            HEAP_KEYS_UPDATED: 0
             HEAP_HOT_UPDATED: 0
              HEAP_ONLY_TUPLE: 0
03 0b              t_infomask: 2819
                 HEAP_HASNULL: 1
             HEAP_HASVARWIDTH: 1
             HEAP_HASEXTERNAL: 0
              HEAP_HASOID_OLD: 0
        HEAP_XMAX_KEYSHR_LOCK: 0
                HEAP_COMBOCID: 0
          HEAP_XMAX_EXCL_LOCK: 0
          HEAP_XMAX_LOCK_ONLY: 0
          HEAP_XMIN_COMMITTED: 1
            HEAP_XMIN_INVALID: 1
          HEAP_XMAX_COMMITTED: 0
            HEAP_XMAX_INVALID: 1
           HEAP_XMAX_IS_MULTI: 0
                 HEAP_UPDATED: 0
               HEAP_MOVED_OFF: 0
                HEAP_MOVED_IN: 0
20                     t_hoff: 32
ffff0f06        NULL_BITFIELD: 11111111111111111111000001100
                      OID_OLD:

BODY AS HEX               ===>  PRINTABLE ASCII
ff 0f 06 00 00 00 00 00   ===>  . . . . . . . .
49 20 00 00 46 06 46 43   ===>  s 2 . . p l p g
49 47 06 05 4c 3d 06 45   ===>  s q l _ v a l i
40 3d 4a 06 48 00 00 00   ===>  d a t o r . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
00 00 50 03 00 00 00 00   ===>  . . . ? . . . .
00 00 00 00 00 00 00 00   ===>  . . . . . . . .
42 00 00 01 00 4c 4b 00   ===>  f . . . . v u .
01 00 00 00 00 08 00 00   ===>  . . . . . . . .
46 00 00 00 01 00 00 00   ===>  p . . . . . . .
00 00 00 00 01 00 00 00   ===>  . . . . . . . .
01 00 00 00 00 00 00 00   ===>  . . . . . . . .
01 00 00 00 19 46 06 46   ===>  . . . . % p l p
43 49 47 06 05 4c 3d 06   ===>  g s q l _ v a l
45 40 3d 4a 06 48 15 18   ===>  i d a t o r ! $
06 45 3e 40 45 48 02 46   ===>  l i b d i r / p
06 46 43 49 47 06         ===>  l p g s q l



Is there any interest in this stuff, and if so, where should it live?  I'm happy to
reorganize this a bit if there is general interest in such a submission.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()
Next
From: Alexandra Wang
Date:
Subject: Report error position in partition bound check