Hi.
I need a little help on the format of the postgres tables.
I've got this wonderfully corrupted database where just about everything is
fubar. I've tried a number of things to get it back using postgres and
related tools with no success. It looks like most of the data is there, but
there may be a small amount of corruption that's causing all kinds of
problems.
I've broken down and begin development of a tool to allow examination of the
data within the table files. This could actually be useful for recovering
and undoing changes (or at least until the row-reuse code goes into
production).
I've been hacking the file format and trying to find stuff in the source and
docs as much as possible, but here goes...
a) tuples cannot span multiple pages (yet).
b) the data is not platform independant??? Ie the data from a sun looks
different from an intel?
For every page, I see that the first 2 words are for the end of the tuple
pointers and the beginning of the tuple data.
What are the next 2 words used for? In all my cases they appear to be set to
0x2000.
Following that I find the 2 word tuple pointers.
The first is the transactionid that, if comitted gives this tuple
visibility???
The second word appears to be the offset in the page where the tuple can be
found.
Are these tuple pointers always stored in order of last to first? Or should
I be loading and sorting them according to offset?
Now on to the tuple data... I have my tool to the point where it extracts
all the tuple data from the table, but I haven't been able to find the place
in the postgres source that explains the format. I assume a tuple contains a
number of attributes (referencing pg_attribute). Those not found in the
tuple would be assumed to be NULL.
Since I'm ignoring transaction ids right now, I'm planning on extracting all
the tuple and ordering them by oid so you can see all the comitted and
uncomitted changes. I may even make it look good once I've recovered my
data...
-Michael