Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From Curt Sampson
Subject Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date
Msg-id Pine.NEB.4.43.0204261140060.449-100000@angelic.cynic.net
Whole thread Raw
In response to Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
On Thu, 25 Apr 2002, Bruce Momjian wrote:

> Actually, this brings up a different point.  We use 8k blocks now
> because at the time PostgreSQL was developed, it used BSD file systems,
> and those prefer 8k blocks, and there was some concept that an 8k write
> was atomic, though with 512 byte disk blocks, that was incorrect.  (We
> knew that at the time too, but we didn't have any options, so we just
> hoped.)

MS SQL Server has an interesting way of dealing with this. They have a
"torn" bit in each 512-byte chunk of a page, and this bit is set the
same for each chunk. When they are about to write out a page, they first
flip all of the torn bits and then do the write. If the write does not
complete due to a system crash or whatever, this can be detected later
because the torn bits won't match across the entire page.

> Now, with larger RAM and disk sizes, it may be time to consider larger
> page sizes, like 32k pages.  That reduces the granularity of the cache,
> but it may have other performance advantages that would be worth it.

It really depends on the block size your underlying layer is using.
Reading less than that is never useful as you pay for that entire
block anyway. (E.g., on an FFS filesystem with 8K blocks, the OS
always reads 8K even if you ask for only 4K.)

On the other hand, reading more does have a tangible cost, as you
saw from the benchmark I posted; reading 16K on my system cost 20%
more than reading 8K, and used twice the buffer space. If I'm doing
lots of really random reads, this would result in a performance
loss (due to doing more I/O, and having less chance that the next
item I want is in the buffer cache).

For some reason I thought we had the ability to change the block
size that postgres uses on a table-by-table basis, but I can't find
anything in the docs about that. Maybe it's just because I saw some
support in the code for it. But this feature would be a nice addition
for those cases where a larger block size would help.

But I think that 8K is a pretty good default, and I think that 32K
blocks would result in a quite noticable performance reduction for
apps that did a lot of random I/O.

> What people are actually suggesting with the read-ahead for sequential
> scans is basically a larger block size for sequential scans than for
> index scans.  While this makes sense, it may be better to just increase
> the block size overall.

I don't think so, because the smaller block size is definitely
better for random I/O.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 



pgsql-hackers by date:

Previous
From: Curt Sampson
Date:
Subject: Re: Index Scans become Seq Scans after VACUUM ANALYSE
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Vote totals for SET in aborted transaction