Thread: Documentation on page files

Documentation on page files

From
Martijn van Oosterhout
Date:
[Please CC any replies, I'm subscribed nomail]

Hi,

Chapter 7 of the Developers guide in about the Page Format on disk and it's
a little out of date not to mention somewhat incomplete.

1. Is there documentation elsewhere (other than the source)?

2. If not, would patches be accepted to correct the situation? I've been
looking into it a bit recently so I think I may be able to whip something
useful up.

Thanks in advance,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America


Re: Documentation on page files

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Chapter 7 of the Developers guide in about the Page Format on disk and it's
> a little out of date not to mention somewhat incomplete.

Indeed, this seems to have very little relation to reality :-(.
I didn't even realize that we had such a description in the SGML docs.
It's obviously not been updated for many years.  I'm not sure if the
"continuation" mechanism it describes ever existed at all, but it sure
hasn't been there since the code left Berkeley.

> 1. Is there documentation elsewhere (other than the source)?

Not that I can think of.  The most accurate information seems to be in
src/include/storage/bufpage.h; AFAICT all the comments in that file are
up-to-date.  In addition to this it'd be worth pulling out some
description of the "special space" structures used by the various index
access methods.

> 2. If not, would patches be accepted to correct the situation?

Go for it.
        regards, tom lane


Re: Documentation on page files

From
Curt Sampson
Date:
On Sat, 20 Apr 2002, Tom Lane wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Chapter 7 of the Developers guide in about the Page Format on disk and it's
> > a little out of date not to mention somewhat incomplete.
>
> Indeed, this seems to have very little relation to reality :-(.

I dunno, it seems to be not too bad to me, though woefully incomplete.
I too was considering writing an updated version of this.

> I'm not sure if the
> "continuation" mechanism it describes ever existed at all, but it sure
> hasn't been there since the code left Berkeley.

Yeah, I was wondering about that. This has been replaced by TOAST, right?

> > 2. If not, would patches be accepted to correct the situation?
>
> Go for it.

Yes, please! I'd be happy to review and updated version.

One thing that would be good, since this is a developers' guide,
would be to include references to the source files and dates from
which the information comes. That way one could see if updates are
necessary by doing a diff on those files between the given date
and the head, to see what changes have been made since the description
was written. Also good would be to have the data structures explicitly
named so that when one dives into the source, one already has a
good idea of what one's looking at.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Documentation on page files

From
Martijn van Oosterhout
Date:
On Sun, Apr 21, 2002 at 03:46:07PM +0900, Curt Sampson wrote:
> On Sat, 20 Apr 2002, Tom Lane wrote:
> 
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > 2. If not, would patches be accepted to correct the situation?
> >
> > Go for it.
> 
> Yes, please! I'd be happy to review and updated version.

Ok, my first attempt can be seen here:

http://svana.org/kleptog/pgsql/page.sgml.txt

I don't know whatever SGML format this is using, so the layout is not great,
but the information should be accurate. I used it to create a program to
dump the datafiles directly without the postmaster :).

I'll submit a proper patch once we have something useful.

> One thing that would be good, since this is a developers' guide,
> would be to include references to the source files and dates from
> which the information comes. That way one could see if updates are
> necessary by doing a diff on those files between the given date
> and the head, to see what changes have been made since the description
> was written. Also good would be to have the data structures explicitly
> named so that when one dives into the source, one already has a
> good idea of what one's looking at.

Well, I have included the names of the structures involved. Do you think
it's worth adding filenames given that TAGS makes tracking them down easily
enough? I can put in dates if you like.

Issues to be dealt with:
- Do I need to say more about TOAST?
- Indexes?
- Split into sections
- How much detail is enough/too much?

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America


Re: Documentation on page files

From
"Ross J. Reedstrom"
Date:
On Sun, Apr 21, 2002 at 07:28:32PM +1000, Martijn van Oosterhout wrote:
> 
> http://svana.org/kleptog/pgsql/page.sgml.txt
> 
> I don't know whatever SGML format this is using, so the layout is not great,
> but the information should be accurate. I used it to create a program to
> dump the datafiles directly without the postmaster :).

Excellent - since this is a FRP (Frequently Requested Program) how do you
feel about dumping it in contrib? Even if it's hardcoded for your particular
table structure, it could serve as a starting point for some poor DBA
who's got to recover from a lost xlog, for example.

Ross


Re: Documentation on page files

From
Patrick Macdonald
Date:
Martijn,

It may be useful to look at the pg_filedump utility located
at http://sources.redhat.com/rhdb/tools.html 

This utility dumps out information at the page level and is
commented to help the user understand the format/content of
PostgreSQL heap/index/control files.

Cheers,
Patrick
-----------------
Patrick Macdonald
Red Hat Database


Tom Lane wrote:
> 
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Chapter 7 of the Developers guide in about the Page Format on disk and it's
> > a little out of date not to mention somewhat incomplete.
> 
> Indeed, this seems to have very little relation to reality :-(.
> I didn't even realize that we had such a description in the SGML docs.
> It's obviously not been updated for many years.  I'm not sure if the
> "continuation" mechanism it describes ever existed at all, but it sure
> hasn't been there since the code left Berkeley.
> 
> > 1. Is there documentation elsewhere (other than the source)?
> 
> Not that I can think of.  The most accurate information seems to be in
> src/include/storage/bufpage.h; AFAICT all the comments in that file are
> up-to-date.  In addition to this it'd be worth pulling out some
> description of the "special space" structures used by the various index
> access methods.
> 
> > 2. If not, would patches be accepted to correct the situation?
> 
> Go for it.
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html


Re: Documentation on page files

From
Martijn van Oosterhout
Date:
On Mon, Apr 22, 2002 at 11:14:36AM -0500, Ross J. Reedstrom wrote:
> On Sun, Apr 21, 2002 at 07:28:32PM +1000, Martijn van Oosterhout wrote:
> > 
> > http://svana.org/kleptog/pgsql/page.sgml.txt
> > 
> > I don't know whatever SGML format this is using, so the layout is not great,
> > but the information should be accurate. I used it to create a program to
> > dump the datafiles directly without the postmaster :).
> 
> Excellent - since this is a FRP (Frequently Requested Program) how do you
> feel about dumping it in contrib? Even if it's hardcoded for your particular
> table structure, it could serve as a starting point for some poor DBA
> who's got to recover from a lost xlog, for example.

Actually, it reads the table structure from the catalog. It also will find
the right files to open. It reads files from both PG 6.5 and 7.2 although it
shouldn't be too hard to make work for other versions. And if you people
don't reorder the first few fields in pg_attribute, it will work for all
future versions too.

The dumping is more of an extra, the original idea was to check for errors
in the datafiles. Hence the working name of "pgfsck". At the moment the
dumping dumps only tuples where xmax == 0 but I'm not sure if that's
correct.

It doesn't handle compressed tuples nor toasted ones, though thats more
advanced really. And ofcourse outputing data in human readable format has to
be added for each type. I only started writing it on Sunday, so let me give
it a usable interface and I'll let people try it out.

-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America


Re: Documentation on page files

From
Hannu Krosing
Date:
On Tue, 2002-04-23 at 01:29, Martijn van Oosterhout wrote:
> 
> The dumping is more of an extra, the original idea was to check for errors
> in the datafiles. Hence the working name of "pgfsck". At the moment the
> dumping dumps only tuples where xmax == 0 but I'm not sure if that's
> correct.

AFAIK it is not. As Tom once explained me, it is ok for tuples xmax to
be !=0 and still have a valid tuple. The validity is determined by some
bits in tuple header.


But I think the most useful behaviour should be to dump system fields
too, so mildly knowledgeable sysadmin can import the dump and do the
right thing afterwards (like restore data as it was before transaction
nr 7000)

-------------
Hannu



Re: Documentation on page files

From
Martijn van Oosterhout
Date:
On Tue, Apr 23, 2002 at 09:15:22AM +0200, Hannu Krosing wrote:
> On Tue, 2002-04-23 at 01:29, Martijn van Oosterhout wrote:
> > 
> > The dumping is more of an extra, the original idea was to check for errors
> > in the datafiles. Hence the working name of "pgfsck". At the moment the
> > dumping dumps only tuples where xmax == 0 but I'm not sure if that's
> > correct.
> 
> AFAIK it is not. As Tom once explained me, it is ok for tuples xmax to
> be !=0 and still have a valid tuple. The validity is determined by some
> bits in tuple header.

Well, from my thinking about how you would use these fields in a logical
way, it seems it's possible for xmax to be non-zero if the transaction
numbered xmax was not committed. But in that case (unless it was a delete)
there would be a newer tuple with the same oid but xmax == 0 (and this
uncommitted transaction as xmin).

The problem is that inside the DB, you have a current transaction plus a
list of committed transactions. Externally, you have no idea, so xmax == 0
is as valid a view as any other. This would have the effect of dumping out
whatever would be visible if every transaction were committed.

I think. If anyone knows a good document on MVCC implementations, let me
know.

> But I think the most useful behaviour should be to dump system fields
> too, so mildly knowledgeable sysadmin can import the dump and do the
> right thing afterwards (like restore data as it was before transaction
> nr 7000)

Well, i didn't think you could have statements of the form:

insert into table (xmin,xmax,cmin,cmax,...) values (...);

So you would have to leave it as a comment. In which case someone would have
to go and by hand work out what would be in or out. I can make it an option
but I don't think it would be particularly useful. Maybe
--pretend-uncommitted <xact>

Just a thought, if I did a "delete from table" accedently, and stopped the
postmaster and twiddled the xlog for that transaction, would that have the
effect of undeleting those tuples?
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America


Re: Documentation on page files

From
Curt Sampson
Date:
On Tue, 23 Apr 2002, Martijn van Oosterhout wrote:

> > But I think the most useful behaviour should be to dump system fields
> > too, so mildly knowledgeable sysadmin can import the dump and do the
> > right thing afterwards (like restore data as it was before transaction
> > nr 7000)
>
> Well, i didn't think you could have statements of the form:
>
> insert into table (xmin,xmax,cmin,cmax,...) values (...);
>
> So you would have to leave it as a comment. In which case someone would have
> to go and by hand work out what would be in or out. I can make it an option
> but I don't think it would be particularly useful.

What we really want is to be able to take that output, edit it,
and put it into another program that creates the page files anew.
Or maybe just generate new data ourselves and run it into that
program. Then, if we can take a tablespace we've created this way
and attach it to an already-running postgres system, we've got
really, really fast import/export capability.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



Re: Documentation on page files

From
Hannu Krosing
Date:
On Tue, 2002-04-23 at 12:52, Martijn van Oosterhout wrote:

> Well, from my thinking about how you would use these fields in a logical
> way, it seems it's possible for xmax to be non-zero if the transaction
> numbered xmax was not committed. But in that case (unless it was a delete)
> there would be a newer tuple with the same oid but xmax == 0 (and this
> uncommitted transaction as xmin).

Unless it was an uncommitted DELETE and not UPDATE.

> The problem is that inside the DB, you have a current transaction plus a
> list of committed transactions. Externally, you have no idea, so xmax == 0
> is as valid a view as any other. This would have the effect of dumping out
> whatever would be visible if every transaction were committed.

IIRC there are some bits that determine the commit status of tuple.

> I think. If anyone knows a good document on MVCC implementations, let me
> know.
> 
> > But I think the most useful behaviour should be to dump system fields
> > too, so mildly knowledgeable sysadmin can import the dump and do the
> > right thing afterwards (like restore data as it was before transaction
> > nr 7000)
> 
> Well, i didn't think you could have statements of the form:
> 
> insert into table (xmin,xmax,cmin,cmax,...) values (...);

but you can have
insert into newtable values (...);

So you are free to name your xmin,... whatever you like

> So you would have to leave it as a comment. In which case someone would have
> to go and by hand work out what would be in or out. I can make it an option
> but I don't think it would be particularly useful.

I have written a small python script that does the above, and I did
write it because I needed it, so it must have some use ;)

After inserting the data to database I was then able to select all but
latest (before delete) version of each tuple.

--------------
Hannu





Re: Documentation on page files

From
Bruce Momjian
Date:
Added to CVS.

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
> On Sun, Apr 21, 2002 at 03:46:07PM +0900, Curt Sampson wrote:
> > On Sat, 20 Apr 2002, Tom Lane wrote:
> > 
> > > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > > 2. If not, would patches be accepted to correct the situation?
> > >
> > > Go for it.
> > 
> > Yes, please! I'd be happy to review and updated version.
> 
> Ok, my first attempt can be seen here:
> 
> http://svana.org/kleptog/pgsql/page.sgml.txt
> 
> I don't know whatever SGML format this is using, so the layout is not great,
> but the information should be accurate. I used it to create a program to
> dump the datafiles directly without the postmaster :).
> 
> I'll submit a proper patch once we have something useful.
> 
> > One thing that would be good, since this is a developers' guide,
> > would be to include references to the source files and dates from
> > which the information comes. That way one could see if updates are
> > necessary by doing a diff on those files between the given date
> > and the head, to see what changes have been made since the description
> > was written. Also good would be to have the data structures explicitly
> > named so that when one dives into the source, one already has a
> > good idea of what one's looking at.
> 
> Well, I have included the names of the structures involved. Do you think
> it's worth adding filenames given that TAGS makes tracking them down easily
> enough? I can put in dates if you like.
> 
> Issues to be dealt with:
> - Do I need to say more about TOAST?
> - Indexes?
> - Split into sections
> - How much detail is enough/too much?
> 
> Have a nice day,
> -- 
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Canada, Mexico, and Australia form the Axis of Nations That
> > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026