I want to understand how Postgres organizes data and handles IO
operations so that I will better know how to optimize a Postgres
database server. I am looking for answers to specific questions and
pointers to where this stuff is documented.
How does Postgres organize its data? For example, is it grouped
together on the disk, or is it prone to be spread out over the disk?
Does vacuum reorganize the data? (Seeking to minimize disk head
movement.)
How does Postgres handle sequential IO? Does it treat is specially
such as issuing large IO operations that span block boundaries?
How does Postgres handle direct IOs (operations directly to disk,
bypassing the buffer cache)? Will it issue multiple asynchronous IO
operations?
Is Postgres always one process per client, or can it spawn additional
processes to parallelise some operations such as a nested loops join
operation?
Is there a recommended file system to use for Postgres data, such as
ext2 or another non-journaling FS?