Home > mailing lists

Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From	J. R. Nield
Subject	Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date	June 22, 2002 18:28:41
Msg-id	1024784514.1793.242.camel@localhost.localdomain Whole thread Raw
In response to	Re: Index Scans become Seq Scans after VACUUM ANALYSE (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses	Re: Index Scans become Seq Scans after VACUUM ANALYSE
List	pgsql-hackers

Tree view

On Thu, 2002-06-20 at 21:58, Bruce Momjian wrote:
> I was wondering, how does knowing the block is corrupt help MS SQL? 
> Right now, we write changed pages to WAL, then later write them to disk.
> I have always been looking for a way to prevent these WAL writes.  The
> 512-byte bit seems interesting, but how does it help?
> 
> And how does the bit help them with partial block writes?  Is the bit at
> the end of the block?  Is that reliable?
> 

My understanding of this is as follows:

1) On most commercial systems, if you get a corrupted block (from
partial write or whatever) you need to restore the file(s) from the most
recent backup, and replay the log from the log archive (usually only the
damaged files will be written to during replay). 

2) If you can't deal with the downtime to recover the file, then EMC,
Sun, or IBM will sell you an expensive disk array with an NVRAM cache
that will do atomic writes. Some plain-vanilla SCSI disks are also
capable of atomic writes, though usually they don't use NVRAM to do it. 

The database must then make sure that each page-write gets translated
into exactly one SCSI-level write. This is one reason why ORACLE and
Sybase recommend that you use raw disk partitions for high availability.
Some operating systems support this through the filesystem, but it is OS
dependent. I think Solaris 7 & 8 has support for this, but I'm not sure.

PostgreSQL has trouble because it can neither archive logs for replay,
nor use raw disk partitions.

One other point:

Page pre-image logging is fundamentally the same as what Jim Grey's
book[1] would call "careful writes". I don't believe they should be in
the XLOG, because we never need to keep the pre-images after we're sure
the buffer has made it to the disk. Instead, we should have the buffer
IO routines implement ping-pong writes of some kind if we want
protection from partial writes.

Does any of this make sense?

;jrnield

[1] Grey, J. and Reuter, A. (1993). "Transaction Processing: Conceptsand Techniques". Morgan Kaufmann.

-- 
J. R. Nield
jrnield@usol.com

pgsql-hackers by date:

From: Tom Lane
Date: 22 June 2002, 17:45:11
Subject: Re: Hash and bools

From: "Matthew T. O'Connor"
Date: 22 June 2002, 18:32:56
Subject: Re: pg_dump and ALTER TABLE / ADD FOREIGN KEY

Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

Previous

Next