Thread: pg_rawdump
I spent yesterday writing a new tool pg_rawdump (which will be released as open source in due course), which takes the table files in an arbitrary pgsql database, and is able to transform those back into tables (including toast values). In the course of doing this (a customer needed it because he only had a copy of those files, the pg_clog etc. dirs were lost), I noticed that there are two problems which complicate recovery (needlessly): a. It takes some guesswork to find out which file corresponds with which table. b. In order to recover the table content, it is (obviously) necessary to correlate with the original table definition, whichobviously is lost with the rest of the information. In order to simplify recovery at this point (enormously), it would have been very helpful (at almost negligible cost), to have the name of the table, the name of the columns, and the types of the columns available. Why don't we insert that data into the first page of a regular table file after in the special data area? I'd be willing to create a patch for that (should be pretty easy), if nobody considers it to be a bad idea. -- Stephen.
On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > In order to simplify recovery at this point (enormously), it would > have been very helpful (at almost negligible cost), to have the name > of the table, the name of the columns, and the types of the > columns available. > > Why don't we insert that data into the first page of a regular table > file after in the special data area? > > I'd be willing to create a patch for that (should be pretty easy), > if nobody considers it to be a bad idea. There isn't necessarily one value for these attributes. You can rename columns and that rename may succeed and commit or fail and rollback. You can drop or add columns and some rows will have or not have the added columns at all. You could even add a column, insert some rows, then abort -- all in a transaction. So some (aborted) rows will have extra columns that aren't even present in the current table definition. All this isn't to say the idea you're presenting is impossible or a bad idea. If this meta information was only a hint for forensic purposes and you take into account these caveats it might still be useful. But I'm not sure how useful. I mean, you can't really decipher everything properly without the data in the catalog -- and you have to premise this on the idea that you've lost everything in the catalog but not the data in other tables. Which seems like a narrow use case. -- greg
Greg Stark wrote: >On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> In order to simplify recovery at this point (enormously), it would >> have been very helpful (at almost negligible cost), to have the name >> of the table, the name of the columns, and the types of the >> columns available. >> Why don't we insert that data into the first page of a regular table >> file after in the special data area? >> I'd be willing to create a patch for that (should be pretty easy), >> if nobody considers it to be a bad idea. >There isn't necessarily one value for these attributes. You can >rename columns and that rename may succeed and commit or fail and >rollback. You can drop or add columns and some rows will have or not >have the added columns at all. You could even add a column, insert >some rows, then abort -- all in a transaction. So some (aborted) rows >will have extra columns that aren't even present in the current table >definition. True. >All this isn't to say the idea you're presenting is impossible or a >bad idea. If this meta information was only a hint for forensic >purposes and you take into account these caveats it might still be >useful. This is exactly what I meant it for. It would contain the most recent alter table state (give or take some delay due to cache flushes). > But I'm not sure how useful. I mean, you can't really decipher >everything properly without the data in the catalog -- and you have to >premise this on the idea that you've lost everything in the catalog >but not the data in other tables. Which seems like a narrow use case. It happens, more often than you'd think. My client had it, I've seen numerous google hits which show the same. -- Stephen. Be braver. You cannot cross a chasm in two small jumps.
"Stephen R. van den Berg" <srb@cuci.nl> writes: > In order to simplify recovery at this point (enormously), it would > have been very helpful (at almost negligible cost), to have the name > of the table, the name of the columns, and the types of the > columns available. > Why don't we insert that data into the first page of a regular table > file after in the special data area? (1) it wouldn't necessarily fit (2) what are you going to do to maintain it during ALTER TABLE? (3) if there are any custom types involved, you're still lost. regards, tom lane
On Tue, Oct 19, 2010 at 6:13 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > Greg Stark wrote: > >> But I'm not sure how useful. I mean, you can't really decipher >>everything properly without the data in the catalog -- and you have to >>premise this on the idea that you've lost everything in the catalog >>but not the data in other tables. Which seems like a narrow use case. > > It happens, more often than you'd think. My client had it, I've > seen numerous google hits which show the same. It happened to us recently when a customer had disk issues, and we were able to get the table files back through forensics, but the control files were not in good shape, and the cluster wouldn't start. A tool like Stephen is proposing would most likely have helped us recover at least some or most of the data, I would hope. Roberto
Tom Lane wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> In order to simplify recovery at this point (enormously), it would >> have been very helpful (at almost negligible cost), to have the name >> of the table, the name of the columns, and the types of the >> columns available. >> Why don't we insert that data into the first page of a regular table >> file after in the special data area? >(1) it wouldn't necessarily fit Three viable options (pick any one, depending on other criteria): a. If it doesn't fit at first, drop columnnames, and try again. b. If necessary extend it into the special data area of the following page (repeat until you stored everything). c. Simply put in what fits and discard the rest. Please note that the information in there is purely informational and of a best-effort nature. It is not required for regular operation. It should have close to no performance impact in the normal use case. It is meant to help with forensics if the catalog is damaged or lost, and you still want to attempt to recover most of the data contained in this tablefile. >(2) what are you going to do to maintain it during ALTER TABLE? Simply pick a point in time where it will be blatantly overwritten. Either at reception of the command, or at commit time, whatever is more convenient to implement. It's a best effort service, it is not a problem if we get it wrong sometimes due to wildly overlapping alter table/commit/rollback sequences. >(3) if there are any custom types involved, you're still lost. Yes, this is a not a complete solution, it's a valid attempt at making forensics a lot easier in the common case at virtually no cost to the running system. -- Stephen. Life is that brief interlude between nothingness and eternity.
Roberto Mello wrote: >On Tue, Oct 19, 2010 at 6:13 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> Greg Stark wrote: >>>premise this on the idea that you've lost everything in the catalog >>>but not the data in other tables. Which seems like a narrow use case. >> It happens, more often than you'd think. ??My client had it, I've >> seen numerous google hits which show the same. >It happened to us recently when a customer had disk issues, and we It usually happens when there are disk issues, that's exactly what it is for. >A tool like Stephen is proposing would most likely have helped us >recover at least some or most of the data, I would hope. Well, because the customer could recreate (within reason) the original table definitions, we were able to recover all of his data (12 tables, including some toasted/compressed). It's just that matching table and file, and subsequently figuring out some missing columns which may have been added/removed later, can be rather timeconsuming and could be made a lot easier (not necessarily perfect) if that information would have been present in the first page of a file. -- Stephen. Life is that brief interlude between nothingness and eternity.
"Stephen R. van den Berg" <srb@cuci.nl> writes: > It's just that matching table and file, and subsequently figuring out > some missing columns which may have been added/removed later, > can be rather timeconsuming and could be made a lot easier (not necessarily > perfect) if that information would have been present in the first page of > a file. So you've already moved the goalposts from what was claimed in your prior message. If the data is not maintained (with 100% reliability) during ALTER TABLE, how are you going to do something like "figure out missing columns"? I can see the potential usefulness of a self-documenting table storage format, but this proposal isn't that; it's just an unreliable kluge. regards, tom lane
Tom Lane wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> It's just that matching table and file, and subsequently figuring out >> some missing columns which may have been added/removed later, >> can be rather timeconsuming and could be made a lot easier (not necessarily >> perfect) if that information would have been present in the first page of >> a file. >So you've already moved the goalposts from what was claimed in your >prior message. If the data is not maintained (with 100% reliability) >during ALTER TABLE, how are you going to do something like "figure out >missing columns"? Most alter table operations are well thought through and rarely undone (at least not on production databases). This means that most tables can be restored. >I can see the potential usefulness of a self-documenting table storage >format, but this proposal isn't that; it's just an unreliable kluge. Restoring tables/databases from table storage only is, by definition, an unreliable kludge. I'm not opposed to making the definition storage more robust, but, since the records in the table already have lost their relation to the pg_clog records, and therefore it *already* is uncertain which records were deleted and/or have the wrong number of columns, it seems to be a needless waste of time and energy to provide more reliable information about the column structure. I know for a fact that those who have lost data in such a way, and are faced with the option to have this "unreliable kludgy information" available now, or wait for a few years/months until a reliable solution is present; they would (in every single case) opt for the former and get at least some (if not all) of their data back in a shorter amount of time. -- Stephen. Life is that brief interlude between nothingness and eternity.
On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Stephen R. van den Berg" <srb@cuci.nl> writes: >> It's just that matching table and file, and subsequently figuring out >> some missing columns which may have been added/removed later, >> can be rather timeconsuming and could be made a lot easier (not necessarily >> perfect) if that information would have been present in the first page of >> a file. > > So you've already moved the goalposts from what was claimed in your > prior message. If the data is not maintained (with 100% reliability) > during ALTER TABLE, how are you going to do something like "figure out > missing columns"? > > I can see the potential usefulness of a self-documenting table storage > format, but this proposal isn't that; it's just an unreliable kluge. If we're looking to have any sort of "out of catalog" documentation of table storage format, what about just having a new relation fork that just "appends" each and every change made to the table formats, including ones rolled back, etc. Make this relation fork append only, and dump a completely new set of metadata to it each and every ALTER TABLE. This fork would never need to be read by PG, so a relation fork might even be too much. All you really need is a file you can tie to a relation, and blindly append "data" to on create/alter statements. Sure, it will have more information than *needed*, but I can't see it ever growing too big, and people doing forensics rarely complain about having *too much* information available. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Aidan Van Dyk <aidan@highrise.ca> writes: > On Wed, Oct 20, 2010 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I can see the potential usefulness of a self-documenting table storage >> format, but this proposal isn't that; it's just an unreliable kluge. > If we're looking to have any sort of "out of catalog" documentation of > table storage format, what about just having a new relation fork that > just "appends" each and every change made to the table formats, > including ones rolled back, etc. I thought of that too, but I'm not sure if it's going to help enough. The trouble is that the information is only tied to the table itself via file names. In a fsck-recovery scenario, you may not have the correct file names. (One of the multiple problems with Stephen's proposal is that the metadata would only be physically tied to the first segment of a large table, and unless you can trust the file names you won't be able to match it to the other segments.) [ thinks for a bit... ] Perhaps we could stick some sort of unique ID into tables, which could be correlated to the same unique ID appearing in a metadata fork. [ thinks some more... ] Of course, at this point you have to start asking why the metadata fork should be thought to be any more reliable than the system catalogs. > Make this relation fork append only, > and dump a completely new set of metadata to it each and every ALTER > TABLE. You can bet that there'll be somebody out there who whines because their app does lots and lots of repeated ALTER TABLEs, and so the metadata fork grows forever. I think we'd have to be a bit smarter than this. regards, tom lane
On Wed, Oct 20, 2010 at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ thinks for a bit... ] Perhaps we could stick some sort of unique > ID into tables, which could be correlated to the same unique ID > appearing in a metadata fork. It would be awfully nice if the table name was in the first few bytes of the segments ;-) > [ thinks some more... ] Of course, at this point you have to start > asking why the metadata fork should be thought to be any more reliable > than the system catalogs. It's not. But it's about data duplication. If you've lost some files, and others are found in lost+found, etc. Or maybe you lost only the "main" tablespace, and you have your big-honking-files from another table-space around, etc. The more copies of the data around, the more chance you have of being able to get something usable from of it. But we already have WAL archive as an external safe backup copy of *everything*. Maybe the cost of those extra forks/duplicates is too much. Maybe it would be cheaper to try and parse the WAL archive, and just specifically rebuild the system catalogs for the couple of times people actually need this type of data? Or maybe a query/function that ultra paranoid types can run daily, etc, which would show the system catalog information about table storage format that the could save safely instead of munging GB of WAL logs when they want it... > You can bet that there'll be somebody out there who whines because their > app does lots and lots of repeated ALTER TABLEs, and so the metadata fork > grows forever. I think we'd have to be a bit smarter than this. Well, for bad habits, we have an equally bad solution ;-) BEGIN; ALTER TABLE bad_design RENAME TO bad_design_too_old; CREATE TABLE bad_design LIKE bad_design_too_old INCLUDING DEFAULTS CONSTRAINTS INDEXES; INSERT INTO bad_design SELECT * FROM bad_design_too_old; DROP TABLE bad_design_too_old; COMMIT; It's all about how much extra stuff do we want "around" to make forensic/reconstruction type work easier when they can't go to the documented, tried, tested, "normal restore from backup/WAL". None? Or as much as possible? And what are the tradeoffs. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Tom Lane wrote: >Aidan Van Dyk <aidan@highrise.ca> writes: >> If we're looking to have any sort of "out of catalog" documentation of >> table storage format, what about just having a new relation fork that >> just "appends" each and every change made to the table formats, >> including ones rolled back, etc. I presume that means that all tables changes are appended to a single central file per database? That would be a bad idea, because in the typical problem scenario, losing this new catalogue, basically creates the same problem. It would be preferable to keep the information tied in with the actual table(file) it concerns. >[ thinks for a bit... ] Perhaps we could stick some sort of unique >ID into tables, which could be correlated to the same unique ID >appearing in a metadata fork. Ideal would be: put the table-oid inside the header of each page (either in the official header, or in the special area). This way even lost blocks can be correlated to the same table. I'd still vote for the latest known table definition in the first page. It's by no means perfect, but it will help 99% of all salvage attempts by an order of magnitude. >[ thinks some more... ] Of course, at this point you have to start >asking why the metadata fork should be thought to be any more reliable >than the system catalogs. Quite. Which is why I wanted the best-effort latest version of the table description in the first page of the tablefile instead. >> Make this relation fork append only, >> and dump a completely new set of metadata to it each and every ALTER >> TABLE. >You can bet that there'll be somebody out there who whines because their >app does lots and lots of repeated ALTER TABLEs, and so the metadata fork >grows forever. I think we'd have to be a bit smarter than this. Which means we come full circle and have to conclude that doing anything comprehensive is too invasive for normal operations; best-effort is all a forensics operation wants or can hope for. -- Stephen. Life is that brief interlude between nothingness and eternity.
On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > Ideal would be: put the table-oid inside the header of each page > (either in the official header, or in the special area). > This way even lost blocks can be correlated to the same table. > I'd still vote for the latest known table definition in the first > page. It's by no means perfect, but it will help 99% of all > salvage attempts by an order of magnitude. I don't think we should shrink the amount of usable space by 4 bytes per block and break on-disk compatibility just to make it easier to reconstruct corrupted tables. I'm pretty dubious about the proposal to stuff an otherwise-useless metapage in every heap, too. If you have many small tables, you just doubled your disk utilization - worse than that, maybe, if some of them are empty. If we needed a metapage anyway and had extra space to play with, stuffing some useful forensic information in there might be worthwhile, but I have a hard time thinking that forensics alone is a sufficient justification for such a change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: >On Wed, Oct 20, 2010 at 5:30 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> Ideal would be: put the table-oid inside the header of each page >> (either in the official header, or in the special area). >> This way even lost blocks can be correlated to the same table. >> I'd still vote for the latest known table definition in the first >> page. ?It's by no means perfect, but it will help 99% of all >> salvage attempts by an order of magnitude. > and break on-disk compatibility just to make it easier to If it's inserted in the "special" area, it will not break any compatibility. >I don't think we should shrink the amount of usable space by 4 bytes >per block Instead of putting it in every page, it could be inserted (say) once every megabyte (if done in the special area) to avoid excessive overhead. >I'm pretty dubious about the proposal to stuff an otherwise-useless >metapage in every heap, too. The information is supposed to go in the special area, so it will not be an extra page. > If you have many small tables, you just >doubled your disk utilization For small tables, the table description typically is small as well, so in the common case it all will still fit in one page. >- worse than that, maybe, if some of >them are empty. An empty table does not contain any critical information which needs to be restored (by definition :-); so the code that inserts the table definition in the special area could easily be instructed *not* to write out this information unless the table actually has entries. > If we needed a metapage anyway and had extra space to >play with, stuffing some useful forensic information in there might be >worthwhile, but I have a hard time thinking that forensics alone is a >sufficient justification for such a change. The "change" could easily be made backward compatible to all on disk formats which support the special area pointer (I'm not sure how far back that is). -- Stephen. For children with short attention spans: boomerangs that don't come back.
"Stephen R. van den Berg" <srb@cuci.nl> writes: > Robert Haas wrote: >> and break on-disk compatibility just to make it easier to > If it's inserted in the "special" area, it will not break any > compatibility. I'll tell you what I really don't like about this proposal: we discuss some scheme or other for taking over the "special space" in heap pages at least once a year. None of them have been particularly compelling so far, but one may come along that is; especially given that we're now trying to maintain on-disk compatibility across versions. So I think the opportunity cost of assigning a use to that space is mighty high. I don't find this idea important enough to justify foreclosing future uses for the special space. The real bottom line is this: if you care enough about your data to be willing to expend a large amount of effort on manual recovery attempts, why didn't you have a decent backup scheme in place? There are way too many scenarios where you'll have no hope of doing any such manual recovery anyway. regards, tom lane
Tom Lane wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> If it's inserted in the "special" area, it will not break any >> compatibility. >I'll tell you what I really don't like about this proposal: we discuss >some scheme or other for taking over the "special space" in heap pages >at least once a year. None of them have been particularly compelling >so far, but one may come along that is; especially given that we're now >trying to maintain on-disk compatibility across versions. So I think >the opportunity cost of assigning a use to that space is mighty high. >I don't find this idea important enough to justify foreclosing future >uses for the special space. Well, I had (of course) thought of that, and the classical solution to this is to specify a certain attribute based format in order not to canabalise the space and block it for further other use. I.e. in the special area, we could start using something like: 2-byte field length (including the length field), 1-byte identifier, field content. For the recovery information I'd like to reserve: identifier: 00: table OID 01: table layout >The real bottom line is this: if you care enough about your data to >be willing to expend a large amount of effort on manual recovery >attempts, why didn't you have a decent backup scheme in place? Two obvious valid answers would be: Stupidity and/or ignorance, sometimes a strain of bad luck. I know it is a sad state of affairs, but not all users of postgresql are equally knowledgable/intelligent/responsible. >There are way too many scenarios where you'll have no hope of doing >any such manual recovery anyway. True. It's all a matter of statistics. Judging by the number of reports I find by googling net-history, I'd have to conclude that the proposed extra information would have helped more than half of them. -- Stephen.
On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg <srb@cuci.nl> wrote: > For the recovery information I'd like to reserve: > identifier: 00: table OID > 01: table layout > So here's a proposal for something that could maybe be implemented. I think I'm leaning against this currently as there are just too many caveats even for this limited functionality. I'm more inclined to go with the idea someone else proposed of dumping an audit log of all DDL on the table or after any DDL dumping the create table statements as pg_dump would generate them would to a separate fork. But here goes just to give you an idea what I think is doable and how limited it would be: In this meta data object put: table oid table name number of columns array of typlen for those columns (-1 for varlena and size for fixed length) That would have a maximum size of just over 6k which is too large for a BUFSZ 4k database but fits on default 8k databases. It would have enough information to be able to find all the table columns but not to understand how to interpret the contents either what their name or types. Including the type brings in a new set of complications. Even if you assume the built-in typoids never change there are going to be typoids that correspond to user defined types. Without the names of those types the oids are pretty useless. Just the typoids would put you over 8k in the worst case and the names would put you into the realm of needing arbitrary numbers of blocks for even average cases. Simiarly including the column names would require potentially many blocks. -- greg
"Stephen R. van den Berg" <srb@cuci.nl> writes: > Tom Lane wrote: >> There are way too many scenarios where you'll have no hope of doing >> any such manual recovery anyway. > True. It's all a matter of statistics. Judging by the number of reports > I find by googling net-history, I'd have to conclude that the proposed > extra information would have helped more than half of them. Uh, no, it would have helped whatever minuscule fraction of them had the tools and the expertise to make use of the information. This should not be confused with a magic fix-it button. If you've lost your system catalogs you're facing a whole lot of work that will at best recover an untrustworthy version of some of your data. Most of the data-loss reports I've seen appeared to come from people who wouldn't be capable of doing such recovery work even if better tools were available. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Most of the data-loss reports I've seen appeared to come from > people who wouldn't be capable of doing such recovery work even if > better tools were available. No doubt; but the recovery work often winds up in the hands of people with more skills than those responsible for the loss. Whoever comes in to help with recovery is looking for every bit of traction they can get. You'd be amazed at some of the odd straws people can grasp to help recover data. I don't know how big the special area is, but if enough room could be carved out to include even the relation ID or filename, it might be a big help to someone. I'm pretty skeptical about including information about attributes, though. -Kevin
On Thu, Oct 21, 2010 at 5:21 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Most of the data-loss reports I've seen appeared to come from >> people who wouldn't be capable of doing such recovery work even if >> better tools were available. > > No doubt; but the recovery work often winds up in the hands of > people with more skills than those responsible for the loss. > Whoever comes in to help with recovery is looking for every bit of > traction they can get. You'd be amazed at some of the odd straws > people can grasp to help recover data. > > I don't know how big the special area is, but if enough room could > be carved out to include even the relation ID or filename, it might > be a big help to someone. I'm pretty skeptical about including > information about attributes, though. Unfortunately, the use case for the relation ID or filename is much thinner than the case for the column definitions. You're less likely to lose that information, and if you do lose it you can probably guess by file size or by running strings on the data files. The really hard thing is to remember just exactly what columns you had in there, in what order... and oh yeah there was that column we dropped. But like you, I'm pretty dubious about trying to store all that stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner wrote: >Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Most of the data-loss reports I've seen appeared to come from >> people who wouldn't be capable of doing such recovery work even if >> better tools were available. >No doubt; but the recovery work often winds up in the hands of >people with more skills than those responsible for the loss. >Whoever comes in to help with recovery is looking for every bit of >traction they can get. You'd be amazed at some of the odd straws >people can grasp to help recover data. And, obviously, the amount of time spent (by the professional) to recover the data goes down a lot when more information is available; which reduces cost and shortens the downtime to the sad sod that lost the data in the first place. -- Stephen.
Greg Stark wrote: > On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: > > In order to simplify recovery at this point (enormously), it would > > have been very helpful (at almost negligible cost), to have the name > > of the table, the name of the columns, and the types of the > > columns available. > > > > Why don't we insert that data into the first page of a regular table > > file after in the special data area? > > > > I'd be willing to create a patch for that (should be pretty easy), > > if nobody considers it to be a bad idea. > > There isn't necessarily one value for these attributes. You can > rename columns and that rename may succeed and commit or fail and > rollback. You can drop or add columns and some rows will have or not > have the added columns at all. You could even add a column, insert > some rows, then abort -- all in a transaction. So some (aborted) rows > will have extra columns that aren't even present in the current table > definition. > > All this isn't to say the idea you're presenting is impossible or a > bad idea. If this meta information was only a hint for forensic > purposes and you take into account these caveats it might still be > useful. But I'm not sure how useful. I mean, you can't really decipher > everything properly without the data in the catalog -- and you have to > premise this on the idea that you've lost everything in the catalog > but not the data in other tables. Which seems like a narrow use case. I was thinking we could dump a flat file very 15 minutes into each database directory that had recovery-useful information. It wouldn't be perfect, but would probably be sufficient for most forensics. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Greg Stark wrote: >On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> For the recovery information I'd like to reserve: >> identifier: 00: table OID >> ? ? ? ? ? ?01: table layout >So here's a proposal for something that could maybe be implemented. I >think I'm leaning against this currently as there are just too many >caveats even for this limited functionality. I'm more inclined to go >with the idea someone else proposed of dumping an audit log of all DDL >on the table or after any DDL dumping the create table statements as >pg_dump would generate them would to a separate fork. Yes, but that has two distinct downsides I'd like to avoid: - It can grow uncontrollably in the case of someone using alter table on a (too) regular basis. - It separates the data from the tablefile it pertains to (which could complicate recovery (a lot)). >In this meta data object put: >table oid >table name >number of columns >array of typlen for those columns (-1 for varlena and size for fixed length) [...] >Including the type brings in a new set of complications. Even if you >assume the built-in typoids never change there are going to be typoids >that correspond to user defined types. Without the names of those >types the oids are pretty useless. >Just the typoids would put you over 8k in the worst case and the names >would put you into the realm of needing arbitrary numbers of blocks >for even average cases. Simiarly including the column names would >require potentially many blocks. All valid points/concerns. But, let's approach this from the side of the forensics analist instead, and see what information typically really would be needed (for argument's sake, let's call the unfortunate sod that lost the catalog to his database "the customer"): The customer usually still has access to the developer, or some developer documentation which documents which columns are used for what. It would most likely document most columns (especially the column names, to a lesser extent, the column types), but might be lacking some of the more recent changes which (unfortunately) were done on the live database using alter table, and hadn't been documented properly yet (I'm talking about typical real world cases I've encountered). Which means that analist would primarily be concerned with getting back the information of the column types and the column names. The next step would be to know about slight deviations from the documented columns (dropped or added columns since the last revision of the docs). Next are custom types. Custom types are likely to be better documented, hence the column name would usually be enough to recover the definition of a custom type from the docs. Assuming the above assumptions to be true in the common case, I would propose to implement something along the following lines: Entries for the special area in tables: 0007 00 tableoid To be sprinkled in every megabyte or so. nnnn 01 00 cc tablename... cc = number of columns nnnn 01 01 aaaa bbbb cccc ... aaaa/bbbb/cccc are column widths FFFF for varlena widths nnnn 01 02 aaaa bbbb cccc ... aaaa/bbbb/cccc are typeoids per column (is 16-bits wide enough forthe common cases? If not disregard my suggestion and make these 32-bit wide each) nnnn 01 03 nn col1name nn col2name ... nn = length of the column name that follows 0000 End of filled special area, no need to parse beyond here in the current page. Whereas the 01 special area types could be present in any number of pages. If they are present, they shall be present starting at the first page of the file, and possibly will be present in the next following page(s), until a page is encountered without them. Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated to form the complete informationset. -- Stephen.
Bruce Momjian wrote: >Greg Stark wrote: >> On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg <srb@cuci.nl> wrote: >> > In order to simplify recovery at this point (enormously), it would >> > have been very helpful (at almost negligible cost), to have the name >> > of the table, the name of the columns, and the types of the >> > columns available. >> > Why don't we insert that data into the first page of a regular table >> > file after in the special data area? >I was thinking we could dump a flat file very 15 minutes into each >database directory that had recovery-useful information. It wouldn't be >perfect, but would probably be sufficient for most forensics. It would definitely be better than the current state. But it still disconnects the information from the files they belong to (a bit). From a cost/benifit ratio point of view, I'd still prefer to interlace the information into the tablefiles (which also scales better in case of numerous tables). -- Stephen.
In relation to the talk and discussions at FOSDEM regarding helping data recovery, I searched the archives for the old thread after I performed my last recovery; for reference: http://www.postgresql.org/message-id/20101019201223.GA15684@cuci.nl I haven't checked yet if the proposed space there is still available in the current disk format, but it might serve as a starter and reminder to get the discussion going once more. -- Stephen.
On 2/3/15 1:43 AM, Stephen R. van den Berg wrote: > In relation to the talk and discussions at FOSDEM regarding > helping data recovery, I searched the archives for the > old thread after I performed my last recovery; for reference: > > http://www.postgresql.org/message-id/20101019201223.GA15684@cuci.nl > > I haven't checked yet if the proposed space there is still > available in the current disk format, but it might serve as > a starter and reminder to get the discussion going once more. Rather than trying to wedge this into a heap page, ISTM it'd be better to use a fork. Presumably if you're storing regular tuples that have the essential data from pg_class, pg_attribute (and maybe pg_type). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby wrote: >Rather than trying to wedge this into a heap page, ISTM it'd be >better to use a fork. Presumably if you're storing regular tuples >that have the essential data from pg_class, pg_attribute (and maybe >pg_type). Well, three things: - The information preferably is present in the same file with the pages; ideally upfront, but reachable through a pointerwithin the file is acceptable. - The information ideally is repeated at the beginning of each segment (separate file). - The information can be in tuple form, but it is acceptable (or even preferable?) to store it as a cleartext SQL CREATETABLE statement. Not quite sure which one I'd prefer at this point. The tuple version might be easier to do machinerecovery with. But in any case it needs needs database name, schema name, table name, column name, column type, NULLconstraint per column. The reason putting it in a separate fork is not ideal, is that when doing recovery you might not have more than just the datafiles, and thus it is not possible to determine which fork file belongs to which datafile (the filenames might be lost, as well as the directory association). -- Stephen.