Home > mailing lists

Re: pg_rawdump - Mailing list pgsql-hackers

From	Stephen R. van den Berg
Subject	Re: pg_rawdump
Date	October 22, 2010 00:59:31
Msg-id	20101022005913.GC27839@cuci.nl Whole thread Raw
In response to	Re: pg_rawdump (Greg Stark <gsstark@mit.edu>)
List	pgsql-hackers

Tree view

Greg Stark wrote:
>On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> For the recovery information I'd like to reserve:
>> identifier: 00: table OID
>> ? ? ? ? ? ?01: table layout

>So here's a proposal for something that could maybe be implemented. I
>think I'm leaning against this currently as there are just too many
>caveats even for this limited functionality. I'm more inclined to go
>with the idea someone else proposed of dumping an audit log of all DDL
>on the table or after any DDL dumping the create table statements as
>pg_dump would generate them would to a separate fork.

Yes, but that has two distinct downsides I'd like to avoid:
- It can grow uncontrollably in the case of someone using alter table on a (too) regular basis.
- It separates the data from the tablefile it pertains to (which could complicate recovery (a lot)).

>In this meta data object put:

>table oid
>table name
>number of columns
>array of typlen for those columns (-1 for varlena and size for fixed length)

[...]

>Including the type brings in a new set of complications. Even if you
>assume the built-in typoids never change there are going to be typoids
>that correspond to user defined types. Without the names of those
>types the oids are pretty useless.

>Just the typoids would put you over 8k in the worst case and the names
>would put you into the realm of needing arbitrary numbers of blocks
>for even average cases. Simiarly including the column names would
>require potentially many blocks.

All valid points/concerns.
But, let's approach this from the side of the forensics analist instead,
and see what information typically really would be needed (for argument's
sake, let's call the unfortunate sod that lost the catalog to his database
"the customer"):

The customer usually still has access to the developer, or some developer
documentation which documents which columns are used for what.  It would
most likely document most columns (especially the column names, to a
lesser extent, the column types), but might be lacking some of the
more recent changes which (unfortunately) were done on the live
database using alter table, and hadn't been documented properly yet
(I'm talking about typical real world cases I've encountered).

Which means that analist would primarily be concerned with getting back
the information of the column types and the column names.  The next
step would be to know about slight deviations from the documented
columns (dropped or added columns since the last revision of the docs).

Next are custom types.  Custom types are likely to be better documented,
hence the column name would usually be enough to recover the definition
of a custom type from the docs.

Assuming the above assumptions to be true in the common case, I would
propose to implement something along the following lines:

Entries for the special area in tables:

0007 00 tableoid        To be sprinkled in every            megabyte or so.

nnnn 01 00 cc tablename...    cc = number of columns

nnnn 01 01 aaaa bbbb cccc ...   aaaa/bbbb/cccc are column widths            FFFF for varlena widths
nnnn 01 02 aaaa bbbb cccc ...   aaaa/bbbb/cccc are typeoids per            column (is 16-bits wide enough
forthe common cases?  If not            disregard my suggestion and make            these 32-bit wide each)

nnnn 01 03 nn col1name nn col2name ...            nn = length of the column name            that follows
0000                End of filled special area, no            need to parse beyond here in the            current
page.

Whereas the 01 special area types could be present in any number
of pages.
If they are present, they shall be present starting at the
first page of the file, and possibly will be present in the
next following page(s), until a page is encountered without
them.

Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated
to form the complete informationset.
-- 
Stephen.

pgsql-hackers by date:

From: Bruce Momjian
Date: 22 October 2010, 00:58:51
Subject: Re: Simplifying replication

From: "Stephen R. van den Berg"
Date: 22 October 2010, 01:03:48
Subject: Re: pg_rawdump

Re: pg_rawdump - Mailing list pgsql-hackers

Previous

Next