Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers
From | Curt Sampson |
---|---|
Subject | Re: Index Scans become Seq Scans after VACUUM ANALYSE |
Date | |
Msg-id | Pine.NEB.4.43.0206211106390.437-100000@angelic.cynic.net Whole thread Raw |
In response to | Re: Index Scans become Seq Scans after VACUUM ANALYSE (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Index Scans become Seq Scans after VACUUM ANALYSE
|
List | pgsql-hackers |
On Thu, 20 Jun 2002, Bruce Momjian wrote: > > MS SQL Server has an interesting way of dealing with this. They have a > > "torn" bit in each 512-byte chunk of a page, and this bit is set the > > same for each chunk. When they are about to write out a page, they first > > flip all of the torn bits and then do the write. If the write does not > > complete due to a system crash or whatever, this can be detected later > > because the torn bits won't match across the entire page. > > I was wondering, how does knowing the block is corrupt help MS SQL? I'm trying to recall, but I can't off hand. I'll have to look it up in my Inside SQL Server book, which is at home right now, unfortunately. I'll bring the book into work and let you know the details later. > Right now, we write changed pages to WAL, then later write them to disk. Ah. You write the entire page? MS writes only the changed tuple. And DB2, in fact, goes one better and writes only the part of the tuple up to the change, IIRC. Thus, if you put smaller and/or more frequently changed columns first, you'll have smaller logs. > I have always been looking for a way to prevent these WAL writes. The > 512-byte bit seems interesting, but how does it help? Well, this would at least let you reduce the write to the 512-byte chunk that changed, rather than writing the entire 8K page. > And how does the bit help them with partial block writes? Is the bit at > the end of the block? Is that reliable? The bit is somewhere within every 512 byte "disk page" within the 8192 byte "filesystem/database page." So an 8KB page is divided up like this: | <----------------------- 8 Kb ----------------------> | | 512b | 512b | 512b | 512b | 512b | 512b | 512b | 512b | Thus, the tear bits start out like this: | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | After a successful write of the entire page, you have this: | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | If the write is unsuccessful, you end up with something like this: | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | And now you know which parts of your page got written, and which parts didn't. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
pgsql-hackers by date: