Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From J. R. Nield
Subject Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date
Msg-id 1024784514.1793.242.camel@localhost.localdomain
Whole thread Raw
In response to Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Index Scans become Seq Scans after VACUUM ANALYSE
List pgsql-hackers
On Thu, 2002-06-20 at 21:58, Bruce Momjian wrote:
> I was wondering, how does knowing the block is corrupt help MS SQL? 
> Right now, we write changed pages to WAL, then later write them to disk.
> I have always been looking for a way to prevent these WAL writes.  The
> 512-byte bit seems interesting, but how does it help?
> 
> And how does the bit help them with partial block writes?  Is the bit at
> the end of the block?  Is that reliable?
> 

My understanding of this is as follows:

1) On most commercial systems, if you get a corrupted block (from
partial write or whatever) you need to restore the file(s) from the most
recent backup, and replay the log from the log archive (usually only the
damaged files will be written to during replay). 

2) If you can't deal with the downtime to recover the file, then EMC,
Sun, or IBM will sell you an expensive disk array with an NVRAM cache
that will do atomic writes. Some plain-vanilla SCSI disks are also
capable of atomic writes, though usually they don't use NVRAM to do it. 

The database must then make sure that each page-write gets translated
into exactly one SCSI-level write. This is one reason why ORACLE and
Sybase recommend that you use raw disk partitions for high availability.
Some operating systems support this through the filesystem, but it is OS
dependent. I think Solaris 7 & 8 has support for this, but I'm not sure.

PostgreSQL has trouble because it can neither archive logs for replay,
nor use raw disk partitions.


One other point:

Page pre-image logging is fundamentally the same as what Jim Grey's
book[1] would call "careful writes". I don't believe they should be in
the XLOG, because we never need to keep the pre-images after we're sure
the buffer has made it to the disk. Instead, we should have the buffer
IO routines implement ping-pong writes of some kind if we want
protection from partial writes.


Does any of this make sense?



;jrnield


[1] Grey, J. and Reuter, A. (1993). "Transaction Processing: Conceptsand Techniques". Morgan Kaufmann.

-- 
J. R. Nield
jrnield@usol.com





pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Hash and bools
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: pg_dump and ALTER TABLE / ADD FOREIGN KEY