Re: pg_rawdump - Mailing list pgsql-hackers
From | Stephen R. van den Berg |
---|---|
Subject | Re: pg_rawdump |
Date | |
Msg-id | 20101022005913.GC27839@cuci.nl Whole thread Raw |
In response to | Re: pg_rawdump (Greg Stark <gsstark@mit.edu>) |
List | pgsql-hackers |
Greg Stark wrote: >On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> For the recovery information I'd like to reserve: >> identifier: 00: table OID >> ? ? ? ? ? ?01: table layout >So here's a proposal for something that could maybe be implemented. I >think I'm leaning against this currently as there are just too many >caveats even for this limited functionality. I'm more inclined to go >with the idea someone else proposed of dumping an audit log of all DDL >on the table or after any DDL dumping the create table statements as >pg_dump would generate them would to a separate fork. Yes, but that has two distinct downsides I'd like to avoid: - It can grow uncontrollably in the case of someone using alter table on a (too) regular basis. - It separates the data from the tablefile it pertains to (which could complicate recovery (a lot)). >In this meta data object put: >table oid >table name >number of columns >array of typlen for those columns (-1 for varlena and size for fixed length) [...] >Including the type brings in a new set of complications. Even if you >assume the built-in typoids never change there are going to be typoids >that correspond to user defined types. Without the names of those >types the oids are pretty useless. >Just the typoids would put you over 8k in the worst case and the names >would put you into the realm of needing arbitrary numbers of blocks >for even average cases. Simiarly including the column names would >require potentially many blocks. All valid points/concerns. But, let's approach this from the side of the forensics analist instead, and see what information typically really would be needed (for argument's sake, let's call the unfortunate sod that lost the catalog to his database "the customer"): The customer usually still has access to the developer, or some developer documentation which documents which columns are used for what. It would most likely document most columns (especially the column names, to a lesser extent, the column types), but might be lacking some of the more recent changes which (unfortunately) were done on the live database using alter table, and hadn't been documented properly yet (I'm talking about typical real world cases I've encountered). Which means that analist would primarily be concerned with getting back the information of the column types and the column names. The next step would be to know about slight deviations from the documented columns (dropped or added columns since the last revision of the docs). Next are custom types. Custom types are likely to be better documented, hence the column name would usually be enough to recover the definition of a custom type from the docs. Assuming the above assumptions to be true in the common case, I would propose to implement something along the following lines: Entries for the special area in tables: 0007 00 tableoid To be sprinkled in every megabyte or so. nnnn 01 00 cc tablename... cc = number of columns nnnn 01 01 aaaa bbbb cccc ... aaaa/bbbb/cccc are column widths FFFF for varlena widths nnnn 01 02 aaaa bbbb cccc ... aaaa/bbbb/cccc are typeoids per column (is 16-bits wide enough forthe common cases? If not disregard my suggestion and make these 32-bit wide each) nnnn 01 03 nn col1name nn col2name ... nn = length of the column name that follows 0000 End of filled special area, no need to parse beyond here in the current page. Whereas the 01 special area types could be present in any number of pages. If they are present, they shall be present starting at the first page of the file, and possibly will be present in the next following page(s), until a page is encountered without them. Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated to form the complete informationset. -- Stephen.
pgsql-hackers by date: