Re: Read/Write block sizes - Mailing list pgsql-performance

From Jeffrey W. Baker
Subject Re: Read/Write block sizes
Date
Msg-id 1124864572.11270.16.camel@noodles
Whole thread Raw
In response to Re: Read/Write block sizes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Read/Write block sizes
List pgsql-performance
On Wed, 2005-08-24 at 01:56 -0400, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > On Wed, 2005-08-24 at 17:20 +1200, Guy Thornley wrote:
> >> Dont forget that already in postgres, you have a process per connection, and
> >> all the processes take care of their own I/O.
>
> > That's the problem.  Instead you want 1 or 4 or 10 i/o slaves
> > coordinating the I/O of all the backends optimally.  For instance, with
> > synchronous scanning.
>
> And why exactly are we going to do a better job of I/O scheduling than
> the OS itself can do?
...
> There are some things we could do to reduce the impedance between us and
> the OS --- for instance, the upthread criticism that a seqscan asks the
> OS for only 8K at a time is fair enough.  But that doesn't translate
> to a conclusion that we should schedule the I/O instead of the OS.

Synchronous scanning is a fairly huge and obvious win.  If you have two
processes 180 degrees out-of-phase in a linear read, neither process is
going to get anywhere near the throughput they would get from a single
scan.

I think you're being deliberately obtuse with regards to file I/O and
the operating system.  The OS isn't magical.  It has to strike a balance
between a reasonable read latency and a reasonable throughput.  As far
as the kernel is concerned, a busy postgresql server is
indistinguishable from 100 unrelated activities.  All backends will be
served equally, even if in this case "equally" means "quite badly all
around."

An I/O slave process could be a big win in Postgres for many kinds of
reads.  Instead of opening and reading files the backends would connect
to the I/O slave and request the file be read.  If a scan of that file
were already underway, the new backends would be attached.  Otherwise a
new scan would commence.  In either case, the slave process can issue
(sometimes non-dependant) reads well ahead of the needs of the backend.
You may think the OS can do this for you but it can't.  On postgres
knows that it needs the whole file from beginning to end.  The OS can
only guess.

Ask me sometime about my replacement for GNU sort.  It uses the same
sorting algorithm, but it's an order of magnitude faster due to better
I/O strategy.  Someday, in my infinite spare time, I hope to demonstrate
that kind of improvement with a patch to pg.

-jwb


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Read/Write block sizes
Next
From: PFC
Date:
Subject: Re: Read/Write block sizes