Thread: Documentation on page files
[Please CC any replies, I'm subscribed nomail] Hi, Chapter 7 of the Developers guide in about the Page Format on disk and it's a little out of date not to mention somewhat incomplete. 1. Is there documentation elsewhere (other than the source)? 2. If not, would patches be accepted to correct the situation? I've been looking into it a bit recently so I think I may be able to whip something useful up. Thanks in advance, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
Martijn van Oosterhout <kleptog@svana.org> writes: > Chapter 7 of the Developers guide in about the Page Format on disk and it's > a little out of date not to mention somewhat incomplete. Indeed, this seems to have very little relation to reality :-(. I didn't even realize that we had such a description in the SGML docs. It's obviously not been updated for many years. I'm not sure if the "continuation" mechanism it describes ever existed at all, but it sure hasn't been there since the code left Berkeley. > 1. Is there documentation elsewhere (other than the source)? Not that I can think of. The most accurate information seems to be in src/include/storage/bufpage.h; AFAICT all the comments in that file are up-to-date. In addition to this it'd be worth pulling out some description of the "special space" structures used by the various index access methods. > 2. If not, would patches be accepted to correct the situation? Go for it. regards, tom lane
On Sat, 20 Apr 2002, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Chapter 7 of the Developers guide in about the Page Format on disk and it's > > a little out of date not to mention somewhat incomplete. > > Indeed, this seems to have very little relation to reality :-(. I dunno, it seems to be not too bad to me, though woefully incomplete. I too was considering writing an updated version of this. > I'm not sure if the > "continuation" mechanism it describes ever existed at all, but it sure > hasn't been there since the code left Berkeley. Yeah, I was wondering about that. This has been replaced by TOAST, right? > > 2. If not, would patches be accepted to correct the situation? > > Go for it. Yes, please! I'd be happy to review and updated version. One thing that would be good, since this is a developers' guide, would be to include references to the source files and dates from which the information comes. That way one could see if updates are necessary by doing a diff on those files between the given date and the head, to see what changes have been made since the description was written. Also good would be to have the data structures explicitly named so that when one dives into the source, one already has a good idea of what one's looking at. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Sun, Apr 21, 2002 at 03:46:07PM +0900, Curt Sampson wrote: > On Sat, 20 Apr 2002, Tom Lane wrote: > > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > 2. If not, would patches be accepted to correct the situation? > > > > Go for it. > > Yes, please! I'd be happy to review and updated version. Ok, my first attempt can be seen here: http://svana.org/kleptog/pgsql/page.sgml.txt I don't know whatever SGML format this is using, so the layout is not great, but the information should be accurate. I used it to create a program to dump the datafiles directly without the postmaster :). I'll submit a proper patch once we have something useful. > One thing that would be good, since this is a developers' guide, > would be to include references to the source files and dates from > which the information comes. That way one could see if updates are > necessary by doing a diff on those files between the given date > and the head, to see what changes have been made since the description > was written. Also good would be to have the data structures explicitly > named so that when one dives into the source, one already has a > good idea of what one's looking at. Well, I have included the names of the structures involved. Do you think it's worth adding filenames given that TAGS makes tracking them down easily enough? I can put in dates if you like. Issues to be dealt with: - Do I need to say more about TOAST? - Indexes? - Split into sections - How much detail is enough/too much? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
On Sun, Apr 21, 2002 at 07:28:32PM +1000, Martijn van Oosterhout wrote: > > http://svana.org/kleptog/pgsql/page.sgml.txt > > I don't know whatever SGML format this is using, so the layout is not great, > but the information should be accurate. I used it to create a program to > dump the datafiles directly without the postmaster :). Excellent - since this is a FRP (Frequently Requested Program) how do you feel about dumping it in contrib? Even if it's hardcoded for your particular table structure, it could serve as a starting point for some poor DBA who's got to recover from a lost xlog, for example. Ross
Martijn, It may be useful to look at the pg_filedump utility located at http://sources.redhat.com/rhdb/tools.html This utility dumps out information at the page level and is commented to help the user understand the format/content of PostgreSQL heap/index/control files. Cheers, Patrick ----------------- Patrick Macdonald Red Hat Database Tom Lane wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > > Chapter 7 of the Developers guide in about the Page Format on disk and it's > > a little out of date not to mention somewhat incomplete. > > Indeed, this seems to have very little relation to reality :-(. > I didn't even realize that we had such a description in the SGML docs. > It's obviously not been updated for many years. I'm not sure if the > "continuation" mechanism it describes ever existed at all, but it sure > hasn't been there since the code left Berkeley. > > > 1. Is there documentation elsewhere (other than the source)? > > Not that I can think of. The most accurate information seems to be in > src/include/storage/bufpage.h; AFAICT all the comments in that file are > up-to-date. In addition to this it'd be worth pulling out some > description of the "special space" structures used by the various index > access methods. > > > 2. If not, would patches be accepted to correct the situation? > > Go for it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
On Mon, Apr 22, 2002 at 11:14:36AM -0500, Ross J. Reedstrom wrote: > On Sun, Apr 21, 2002 at 07:28:32PM +1000, Martijn van Oosterhout wrote: > > > > http://svana.org/kleptog/pgsql/page.sgml.txt > > > > I don't know whatever SGML format this is using, so the layout is not great, > > but the information should be accurate. I used it to create a program to > > dump the datafiles directly without the postmaster :). > > Excellent - since this is a FRP (Frequently Requested Program) how do you > feel about dumping it in contrib? Even if it's hardcoded for your particular > table structure, it could serve as a starting point for some poor DBA > who's got to recover from a lost xlog, for example. Actually, it reads the table structure from the catalog. It also will find the right files to open. It reads files from both PG 6.5 and 7.2 although it shouldn't be too hard to make work for other versions. And if you people don't reorder the first few fields in pg_attribute, it will work for all future versions too. The dumping is more of an extra, the original idea was to check for errors in the datafiles. Hence the working name of "pgfsck". At the moment the dumping dumps only tuples where xmax == 0 but I'm not sure if that's correct. It doesn't handle compressed tuples nor toasted ones, though thats more advanced really. And ofcourse outputing data in human readable format has to be added for each type. I only started writing it on Sunday, so let me give it a usable interface and I'll let people try it out. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
On Tue, 2002-04-23 at 01:29, Martijn van Oosterhout wrote: > > The dumping is more of an extra, the original idea was to check for errors > in the datafiles. Hence the working name of "pgfsck". At the moment the > dumping dumps only tuples where xmax == 0 but I'm not sure if that's > correct. AFAIK it is not. As Tom once explained me, it is ok for tuples xmax to be !=0 and still have a valid tuple. The validity is determined by some bits in tuple header. But I think the most useful behaviour should be to dump system fields too, so mildly knowledgeable sysadmin can import the dump and do the right thing afterwards (like restore data as it was before transaction nr 7000) ------------- Hannu
On Tue, Apr 23, 2002 at 09:15:22AM +0200, Hannu Krosing wrote: > On Tue, 2002-04-23 at 01:29, Martijn van Oosterhout wrote: > > > > The dumping is more of an extra, the original idea was to check for errors > > in the datafiles. Hence the working name of "pgfsck". At the moment the > > dumping dumps only tuples where xmax == 0 but I'm not sure if that's > > correct. > > AFAIK it is not. As Tom once explained me, it is ok for tuples xmax to > be !=0 and still have a valid tuple. The validity is determined by some > bits in tuple header. Well, from my thinking about how you would use these fields in a logical way, it seems it's possible for xmax to be non-zero if the transaction numbered xmax was not committed. But in that case (unless it was a delete) there would be a newer tuple with the same oid but xmax == 0 (and this uncommitted transaction as xmin). The problem is that inside the DB, you have a current transaction plus a list of committed transactions. Externally, you have no idea, so xmax == 0 is as valid a view as any other. This would have the effect of dumping out whatever would be visible if every transaction were committed. I think. If anyone knows a good document on MVCC implementations, let me know. > But I think the most useful behaviour should be to dump system fields > too, so mildly knowledgeable sysadmin can import the dump and do the > right thing afterwards (like restore data as it was before transaction > nr 7000) Well, i didn't think you could have statements of the form: insert into table (xmin,xmax,cmin,cmax,...) values (...); So you would have to leave it as a comment. In which case someone would have to go and by hand work out what would be in or out. I can make it an option but I don't think it would be particularly useful. Maybe --pretend-uncommitted <xact> Just a thought, if I did a "delete from table" accedently, and stopped the postmaster and twiddled the xlog for that transaction, would that have the effect of undeleting those tuples? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Canada, Mexico, and Australia form the Axis of Nations That > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America
On Tue, 23 Apr 2002, Martijn van Oosterhout wrote: > > But I think the most useful behaviour should be to dump system fields > > too, so mildly knowledgeable sysadmin can import the dump and do the > > right thing afterwards (like restore data as it was before transaction > > nr 7000) > > Well, i didn't think you could have statements of the form: > > insert into table (xmin,xmax,cmin,cmax,...) values (...); > > So you would have to leave it as a comment. In which case someone would have > to go and by hand work out what would be in or out. I can make it an option > but I don't think it would be particularly useful. What we really want is to be able to take that output, edit it, and put it into another program that creates the page files anew. Or maybe just generate new data ourselves and run it into that program. Then, if we can take a tablespace we've created this way and attach it to an already-running postgres system, we've got really, really fast import/export capability. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Tue, 2002-04-23 at 12:52, Martijn van Oosterhout wrote: > Well, from my thinking about how you would use these fields in a logical > way, it seems it's possible for xmax to be non-zero if the transaction > numbered xmax was not committed. But in that case (unless it was a delete) > there would be a newer tuple with the same oid but xmax == 0 (and this > uncommitted transaction as xmin). Unless it was an uncommitted DELETE and not UPDATE. > The problem is that inside the DB, you have a current transaction plus a > list of committed transactions. Externally, you have no idea, so xmax == 0 > is as valid a view as any other. This would have the effect of dumping out > whatever would be visible if every transaction were committed. IIRC there are some bits that determine the commit status of tuple. > I think. If anyone knows a good document on MVCC implementations, let me > know. > > > But I think the most useful behaviour should be to dump system fields > > too, so mildly knowledgeable sysadmin can import the dump and do the > > right thing afterwards (like restore data as it was before transaction > > nr 7000) > > Well, i didn't think you could have statements of the form: > > insert into table (xmin,xmax,cmin,cmax,...) values (...); but you can have insert into newtable values (...); So you are free to name your xmin,... whatever you like > So you would have to leave it as a comment. In which case someone would have > to go and by hand work out what would be in or out. I can make it an option > but I don't think it would be particularly useful. I have written a small python script that does the above, and I did write it because I needed it, so it must have some use ;) After inserting the data to database I was then able to select all but latest (before delete) version of each tuple. -------------- Hannu
Added to CVS. --------------------------------------------------------------------------- Martijn van Oosterhout wrote: > On Sun, Apr 21, 2002 at 03:46:07PM +0900, Curt Sampson wrote: > > On Sat, 20 Apr 2002, Tom Lane wrote: > > > > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > > 2. If not, would patches be accepted to correct the situation? > > > > > > Go for it. > > > > Yes, please! I'd be happy to review and updated version. > > Ok, my first attempt can be seen here: > > http://svana.org/kleptog/pgsql/page.sgml.txt > > I don't know whatever SGML format this is using, so the layout is not great, > but the information should be accurate. I used it to create a program to > dump the datafiles directly without the postmaster :). > > I'll submit a proper patch once we have something useful. > > > One thing that would be good, since this is a developers' guide, > > would be to include references to the source files and dates from > > which the information comes. That way one could see if updates are > > necessary by doing a diff on those files between the given date > > and the head, to see what changes have been made since the description > > was written. Also good would be to have the data structures explicitly > > named so that when one dives into the source, one already has a > > good idea of what one's looking at. > > Well, I have included the names of the structures involved. Do you think > it's worth adding filenames given that TAGS makes tracking them down easily > enough? I can put in dates if you like. > > Issues to be dealt with: > - Do I need to say more about TOAST? > - Indexes? > - Split into sections > - How much detail is enough/too much? > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > Canada, Mexico, and Australia form the Axis of Nations That > > Are Actually Quite Nice But Secretly Have Nasty Thoughts About America > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026