Info on Data Storage - Mailing list pgsql-hackers
From | Thomas Lockhart |
---|---|
Subject | Info on Data Storage |
Date | |
Msg-id | 376C399B.93BE95D5@alumni.caltech.edu Whole thread Raw |
In response to | Re: [HACKERS] Savepoints... (Bruce Momjian <maillist@candle.pha.pa.us>) |
List | pgsql-hackers |
istm that this discussion and the one on the 1GB limit on table segments could form the basis for a missing chapter on "Data Storage" in the Admin Guide. Would someone (other than Vadim, who we need to keep coding! :) please keep following this and related threads and extract the info for the Admin Guide chapter? It doesn't need to be very long, perhaps just suggesting how to calculate table storage size, discussing upper limits (e.g. 32-bit OID), and describing the table segmentation scheme. There is already a chapter (with more detail than the AG needs) in the Developer's Guide which should be updated too. Anyway, both chapters are enclosed; the originals are also in doc/src/sgml/{storage,page}.sgml) All we really need is the info, and I can do the markup if whoever picks this up doesn't feel comfortable with trying the SGML markup. Volunteers appreciated... - Thomas > > > To have them I need to add tuple id (6 bytes) to heap tuple > > > header. Are there objections? Though it's not good to increase > > > tuple header size, subj is, imho, very nice feature... > > Gee, that's a lot of overhead. We would go from 40 bytes ->46 bytes. > 40? offsetof(HeapTupleHeaderData, t_bits) is 31... > Well, seems that we can remove 5 bytes from tuple header. > 1. t_hoff (1 byte) may be computed - no reason to store it. > 2. we need in both t_cmin and t_cmax only when tuple is updated > by the same xaction as it was inserted - in such cases we > can put delete command id (t_cmax) to t_xmax and set > flag HEAP_XMAX_THE_SAME (as t_xmin), in all other cases > we will overwrite insert command id with delete command id > (no one is interested in t_cmin of committed insert xaction) > -> yet another 4 bytes (sizeof command id). > If now we'll add 6 bytes to header then > offsetof(HeapTupleHeaderData, t_bits) will be 32 and for > no-nulls tuples there will be no difference at all > (with/without additional 6 bytes), due to double alignment > of header. So, the choice is: new feature or more compact > (than current) header for tuples with nulls. -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California<Chapter Id="storage"> <Title>Disk Storage</Title> <Para> This section needs to be written. Some information is in the FAQ. Volunteers? - thomas 1998-01-11 </Para> </Chapter> <chapter id="page"> <title>Page Files</title> <abstract> <para> A description of the database file default page format. </para> </abstract> <para> This section provides an overview of the page format used by <productname>Postgres</productname> classes. User-defined access methods need not use this page format. </para> <para> In the following explanation, a <firstterm>byte</firstterm> is assumed to contain 8 bits. In addition, the term <firstterm>item</firstterm> refers to data which is stored in <productname>Postgres</productname> classes. </para> <sect1> <title>Page Structure</title> <para> The following table shows how pages in both normal <productname>Postgres</productname> classesand <productname>Postgres</productname>index classes (e.g., a B-tree index) are structured. <table tocentry="1"> <title>Sample Page Layout</title> <titleabbrev>Page Layout</titleabbrev> <tgroup cols="1"> <thead> <row> <entry> Item </entry> <entry> Description </entry> </row> </thead> <tbody> <row> <entry> itemPointerData </entry> </row> <row> <entry> filler </entry> </row> <row> <entry> itemData... </entry> </row> <row> <entry> Unallocated Space </entry> </row> <row> <entry> ItemContinuationData </entry> </row> <row> <entry> Special Space </entry> </row> <row> <entry> ``ItemData 2'' </entry> </row> <row> <entry> ``ItemData 1'' </entry> </row> <row> <entry> ItemIdData </entry> </row> <row> <entry> PageHeaderData </entry> </row> </tbody> </tgroup> </table> </para> <!-- .\" Running .\" .q .../bin/dumpbpages .\" or .\" .q .../src/support/dumpbpages .\" as the postgres superuser .\" with the file paths associated with .\" (heap or B-tree index) classes, .\" .q .../data/base/<database-name>/<class-name>, .\" will display the page structure used by the classes. .\" Specifying the .\" .q -r .\" flag will cause the classes to be .\" treated as heap classes and for more information to be displayed. --> <para> The first 8 bytes of each page consists of a page header (PageHeaderData). Within the header, the first three 2-byte integer fields (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and <firstterm>special</firstterm>) represent byte offsets to the start of unallocated space, to the end of unallocated space, and to the start of <firstterm>special space</firstterm>. Special space is a region at the end of the page which is allocated at page initialization time and which contains information specific to an access method. The last 2 bytes of the page header, <firstterm>opaque</firstterm>, encode the page size and information on the internal fragmentation of the page. Page size is stored in each page because frames in the buffer pool may be subdivided into equal sized pages on a frame by frame basis within a class. The internal fragmentation information is used to aid in determining when page reorganization should occur. </para> <para> Following the page header are item identifiers (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated from the first four bytes of unallocated space. Because an item identifier is never moved until it is freed, its index may be used to indicate the location of an item on a page. In fact, every pointer to an item (<firstterm>ItemPointer</firstterm>) created by <productname>Postgres</productname> consists of a frame number and an index of an item identifier. An item identifier contains a byte-offset to the start of an item, its length in bytes, and a set of attribute bits which affect its interpretation. </para> <para> The items themselves are stored in space allocated backwards from the end of unallocated space. Usually, the items are not interpreted. However when the item is too long to be placed on a single page or when fragmentation of the item is desired, the item is divided and each piece is handled as distinct items in the following manner. The first through the next to last piece are placed in an item continuation structure (<firstterm>ItemContinuationData</firstterm>). This structure contains itemPointerData which points to the next piece and the piece itself. The last piece is handled normally. </para> </sect1> <sect1> <title>Files</title> <para> <variablelist> <varlistentry> <term> <filename>data/</filename> </term> <listitem> <para> Location of shared (global) database files. </para> </listitem> </varlistentry> <varlistentry> <term> <filename>data/base/</filename> </term> <listitem> <para> Location of local database files. </para> </listitem> </varlistentry> </variablelist> </para> </sect1> <sect1> <title>Bugs</title> <para> The page format may change in the future to provide more efficient access to large objects. </para> <para> This section contains insufficient detail to be of any assistance in writing a new access method. </para> </sect1> </chapter>
pgsql-hackers by date: