Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date
Msg-id 200204251534.g3PFYSt16747@candle.pha.pa.us
Whole thread Raw
In response to Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Curt Sampson <cjs@cynic.net>)
Responses Re: Index Scans become Seq Scans after VACUUM ANALYSE  (Curt Sampson <cjs@cynic.net>)
List pgsql-hackers
Actually, this brings up a different point.  We use 8k blocks now
because at the time PostgreSQL was developed, it used BSD file systems,
and those prefer 8k blocks, and there was some concept that an 8k write
was atomic, though with 512 byte disk blocks, that was incorrect.  (We
knew that at the time too, but we didn't have any options, so we just
hoped.)

In fact, we now write pre-modified pages to WAL specifically because we
can't be sure an 8k page write to disk will be atomic.  Part of the page
may make it to disk, and part may not.

Now, with larger RAM and disk sizes, it may be time to consider larger
page sizes, like 32k pages.  That reduces the granularity of the cache,
but it may have other performance advantages that would be worth it.

What people are actually suggesting with the read-ahead for sequential
scans is basically a larger block size for sequential scans than for
index scans.  While this makes sense, it may be better to just increase
the block size overall.

---------------------------------------------------------------------------

Curt Sampson wrote:
> On Wed, 24 Apr 2002, Michael Loftis wrote:
> 
> > A Block-sized read will not be rboken up.  But if you're reading ina
> >  size bigger than the underlying systems block sizes then it can get
> > broken up.
> 
> In which operating systems, and under what circumstances?
> 
> I'll agree that some OSs may not coalesce adjacent reads into a
> single read, but even so, submitting a bunch of single reads for
> consecutive blocks is going to be much, much faster than if other,
> random I/O occured between those reads.
> 
> > If the underlying
> > block size is 8KB and you dump 4MB down on it, the OS may (and in many
> > cases does) decide to write part of it, do a read ona  nearby sector,
> > then write the rest.  This happens when doing long writes that end up
> > spanning block groups because the inodes must be allocated.
> 
> Um...we're talking about 64K vs 8K reads here, not 4 MB reads. I am
> certainly not suggesting Posgres ever submit 4 MB read requests to the OS.
> 
> I agree that any single-chunk reads or writes that cause non-adjacent
> disk blocks to be accessed may be broken up. But in my sense,
> they're "broken up" anyway, in that you have no choice but to take
> a performance hit.
> 
> > Further large read requests can of course be re-ordered by hardware.
> > ...The OS also tags ICP, which can be re-ordered on block-sized chunks.
> 
> Right. All the more reason to read in larger chunks when we know what we
> need in advance, because that will give the OS, controllers, etc. more
> advance information, and let them do the reads more efficiently.
> 
> cjs
> -- 
> Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
>     Don't you know, in this new Dark Age, we're all light.  --XTC
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: Block size: 8K or 16K?
Next
From: Bruce Momjian
Date:
Subject: Re: Vote totals for SET in aborted transaction