Re: On-disk Tuple Size - Mailing list pgsql-hackers
From | Curt Sampson |
---|---|
Subject | Re: On-disk Tuple Size |
Date | |
Msg-id | Pine.NEB.4.43.0204211557120.6249-100000@angelic.cynic.net Whole thread Raw |
In response to | Re: On-disk Tuple Size (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: On-disk Tuple Size
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
On Sat, 20 Apr 2002, Tom Lane wrote: > >> I believe we do want to distinguish three states: live tuple, dead > >> tuple, and empty space. Otherwise there will be cases where you're > >> forced to move data immediately to collapse empty space, when there's > >> not a good reason to except that your representation can't cope. > > > I don't understand this. > > I thought more about this in the shower this morning, and realized the > fundamental drawback of the scheme you are suggesting: it requires the > line pointers and physical storage to be in the same order. > ...But in any case line pointer order and physical storage order are > tied together.) I thought that for a while too. As you point out, you want an list of line pointers ordered by address in the block to build your map of space after the free space in the middle of the page because you get the end of the current block from the start address of the next block. (We know where the free space in the middle is from the end of the line pointer array.) However, there's no reason it has to be stored in this order on the disk. You can build a sorted list of line pointers in a separate area of memory after you read the page. Yes, this uses a bit more CPU, but I think it's going to be a pretty trivial amount. It's a short list, and since you're touching the data anyway, it's going to be in the CPU cache. The real cost you'll pay is in the time to access the area of memory where you're storing the sorted list of line pointers. But the potential saving here is up to 5% in I/O costs (due to using less disk space). > The three states of a line pointer that I referred to are live > (pointing at a good tuple), dead (pointing at storage that used > to contain a good tuple, doesn't anymore, but hasn't been compacted > out yet), and empty (doesn't point at storage at all; the space it > used to describe has been merged into the middle-of-the-page free > pool). Right. I now realize that we still do still need the three states, which are in my case: live: points to tuple data in use free space: points to unused space in the page, i.e., a dead tuple. unused: a line pointer that doesn't point to anything at all. > ISTM a pointers-only representation can handle the live and > dead cases nicely, but the empty case is going to be a real headache. This doesn't need a separate flag, since we can just have the line pointer point to something obviously invalid, such as the page header. (0 seems quite convenient for this.) In the header, we need a count of the number of line pointers (line_id_count above), but we can drop the beginning/end of free space pointers, since we know that data space starts after the last line pointer, and ends at the beginnning of special space. So here's an example of a page layout. Sizes are arbitrary ones I picked for the sake of the example, except for the line_id sizes. Address Size Item 0 24 page header (line_id_count = 6) 24 2 line_id: 7751 (free space 1) 26 2 line_id: 7800 (tuple 1) 28 2 line_id: 0 (unused) 30 2 line_id: 7600 (tuple 2) 32 2 line_id: 8000 (tuple 3) 34 2 line_id: 7941(free space 2) 36 7564 free space in the middle of the page 7600 150 tuple 2 7750 50 free space 1 7800 100 tuple 1 7940 60 free space 2 8000 96 tuple 3 8096 96 special space Note above that the free space pointers have the LSB set to indicate that they point to free space, not tuples. So the first line_id actually points to 7750. When I do an insert, the first thing I do is scan for a free line pointer. Finding a free one at 28, I decide to re-use that. Then I look for the smallest block of free space that will hold the data that I need to insert. If it fits, exactly, I use it. If not, I need to extend the line pointer array by one and make that point to the remaining free space in the block of free space I used. If a big enough block of free space doesn't exist, I compact the page and try again. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
pgsql-hackers by date: