Home > mailing lists

Re: Read/Write block sizes - Mailing list pgsql-performance

From	Jeffrey W. Baker
Subject	Re: Read/Write block sizes
Date	August 24, 2005 03:22:06
Msg-id	1124864572.11270.16.camel@noodles Whole thread
In response to	Re: Read/Write block sizes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Read/Write block sizes
List	pgsql-performance

Tree view

On Wed, 2005-08-24 at 01:56 -0400, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > On Wed, 2005-08-24 at 17:20 +1200, Guy Thornley wrote:
> >> Dont forget that already in postgres, you have a process per connection, and
> >> all the processes take care of their own I/O.
>
> > That's the problem.  Instead you want 1 or 4 or 10 i/o slaves
> > coordinating the I/O of all the backends optimally.  For instance, with
> > synchronous scanning.
>
> And why exactly are we going to do a better job of I/O scheduling than
> the OS itself can do?
...
> There are some things we could do to reduce the impedance between us and
> the OS --- for instance, the upthread criticism that a seqscan asks the
> OS for only 8K at a time is fair enough.  But that doesn't translate
> to a conclusion that we should schedule the I/O instead of the OS.

Synchronous scanning is a fairly huge and obvious win.  If you have two
processes 180 degrees out-of-phase in a linear read, neither process is
going to get anywhere near the throughput they would get from a single
scan.

I think you're being deliberately obtuse with regards to file I/O and
the operating system.  The OS isn't magical.  It has to strike a balance
between a reasonable read latency and a reasonable throughput.  As far
as the kernel is concerned, a busy postgresql server is
indistinguishable from 100 unrelated activities.  All backends will be
served equally, even if in this case "equally" means "quite badly all
around."

An I/O slave process could be a big win in Postgres for many kinds of
reads.  Instead of opening and reading files the backends would connect
to the I/O slave and request the file be read.  If a scan of that file
were already underway, the new backends would be attached.  Otherwise a
new scan would commence.  In either case, the slave process can issue
(sometimes non-dependant) reads well ahead of the needs of the backend.
You may think the OS can do this for you but it can't.  On postgres
knows that it needs the whole file from beginning to end.  The OS can
only guess.

Ask me sometime about my replacement for GNU sort.  It uses the same
sorting algorithm, but it's an order of magnitude faster due to better
I/O strategy.  Someday, in my infinite spare time, I hope to demonstrate
that kind of improvement with a patch to pg.

-jwb

pgsql-performance by date:

From: Tom Lane
Date: 24 August 2005, 02:57:33
Subject: Re: Read/Write block sizes

From: PFC
Date: 24 August 2005, 07:35:15
Subject: Re: Read/Write block sizes

Re: Read/Write block sizes - Mailing list pgsql-performance

Previous

Next