Re: Hardware/OS recommendations for large databases ( - Mailing list pgsql-performance

From Greg Stark
Subject Re: Hardware/OS recommendations for large databases (
Date
Msg-id 87d5ksm1tb.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: Hardware/OS recommendations for large databases (  (Alan Stange <stange@rentec.com>)
List pgsql-performance
Alan Stange <stange@rentec.com> writes:

> For sequential scans, you do have a background reader.  It's the kernel.  As
> long as you don't issue a seek() between read() calls, the kernel will get the
> hint about sequential IO and begin to perform a read ahead for you.  This is
> where the above analysis isn't quite right:  while postgresql is processing the
> returned data from the read() call, the kernel has also issued reads as part of
> the read ahead, keeping the device busy while the cpu is busy.  (I'm assuming
> these details for Linux; Solaris/UFS does work this way).  Issue one seek on
> the file and the read ahead algorithm will back off for a while.   This was my
> point about some descriptions of how the system works not being sensible.

Well that's certainly the hope. But we don't know that this is actually as
effective as you assume it is. It's awfully hard in the kernel to make much
more than a vague educated guess about what kind of readahead would actually
help.

This is especially true when a file isn't really being accessed in a
sequential fashion as Postgres may well do if, for example, multiple backends
are reading the same file. And as you pointed out it doesn't help at all for
random access index scans.

> If your goal is sequential IO, then one must use larger block sizes.   No one
> would use 8KB IO for achieving high sequential IO rates.   Simply put, read()
> is about the slowest way to get 8KB of data.     Switching to 32KB blocks
> reduces all the system call overhead by a large margin.  Larger blocks would be
> better still, up to the stripe size of your mirror.   (Of course, you're using
> a mirror and not raid5 if you care about performance.)

Switching to 32kB blocks throughout Postgres has pros but also major cons, not
the least is *extra* i/o for random access read patterns. One of the possible
advantages of the suggestions that were made, the ones you're shouting down,
would actually be the ability to use 32kB scatter/gather reads without
necessarily switching block sizes.

(Incidentally, your parenthetical comment is a bit confused. By "mirror" I
imagine you're referring to raid1+0 since mirrors alone, aka raid1, aren't a
popular way to improve performance. But raid5 actually performs better than
raid1+0 for sequential reads.)

> Does postgresql use the platform specific memcpy() in libc? Some care might
> be needed to ensure that the memory blocks within postgresql are all
> properly aligned to make sure that one isn't ping-ponging cache lines around
> (usually done by padding the buffer sizes by an extra 32 bytes or L1 line
> size). Whatever you do, all the usual high performance computing tricks
> should be used prior to considering any rewriting of major code sections.

So your philosophy is to worry about microoptimizations before worrying about
architectural issues?


--
greg

pgsql-performance by date:

Previous
From: Ralph Mason
Date:
Subject: Re: Binary Refcursor possible?
Next
From: Bruce Momjian
Date:
Subject: Re: Hardware/OS recommendations for large databases (