Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans - Mailing list pgsql-general

From Gerhard Wiesinger
Subject Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Date
Msg-id alpine.LFD.2.00.0910022046470.22077@bbs.intern
Whole thread Raw
In response to Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
List pgsql-general
On Fri, 2 Oct 2009, Greg Smith wrote:

> On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:
>
>> I think this is one of the most critical performance showstopper of
>> PostgreSQL on the I/O side.
>
> I wish, this is an easy problem compared to the real important ones that need
> to be resolved.  Situations where the OS is capable of faster sequential I/O
> performance than PostgreSQL appears to deliver doing reads are often caused
> by something other than what the person doing said benchmarking believes they
> are.  For example, the last time I thought I had a smoking gun situation just
> like the one you're describing, it turns out the background operation I
> didn't know was going on that slowed things down were hint bit updates:
> http://wiki.postgresql.org/wiki/Hint_Bits
>
> Background checkpoints can also cause this, typically if you set
> checkpoint_segments really high and watch when they're happening you can
> avoid that interfering with results too.
>
> It's hard to isolate out the cause of issues like this.  Since most people
> seem to get something close to real disk speed from sequential scans when
> measured properly, I would suggest starting with the assumption there's
> something wrong with your test case rather than PostgreSQL.  The best way to
> do that is to construct a test case others can run that shows the same
> problem on other systems using the database itself.  The easiest way to build
> one of those is using generate_series to create some bogus test data, SELECT
> everything in there with \timing on, and then use the size of the relation on
> disk to estimate MB/s.
>
> Regardless, it's easy enough to build PostgreSQL with larger block sizes if
> you think that really matters for your situation.  You're never going to see
> that in the mainstream version though, because there are plenty of downsides
> to using larger blocks.  And since the database doesn't actually know where
> on disk things are at, it's not really in a good position to make decisions
> about I/O scheduling anyway.  More on that below.
>
>> What's the current status of the patch of Gregory Stark? Any timeframes to
>> integrate?
>
> There needs to be a fairly major rearchitecting of how PostgreSQL handles
> incoming disk I/O for that to go anywhere else, and I don't believe that's
> expected to be ready in the near future.
>
>> Does it also work for sequence scans? Any plans for a generic "multi block
>> read count" solution?
>
> There was a similar patch for sequential scans submitted by someone else
> based on that work.  It was claimed to help performance on a Linux system
> with a rather poor disk I/O setup.  No one else was able to replicate any
> performance improvement using the patch though.  As far as I've been able to
> tell, the read-ahead logic being done by the Linux kernel and in some
> hardware is already doing this sort of optimization for you on that OS,
> whether or not your app knows enough to recognize it's sequentially scanning
> the disk it's working against.

I forgot to mention:
Larger blocksizes also reduce IOPS (I/Os per second) which might be a
critial threshold on storage systems (e.g. Fibre Channel systems). You
would get e.g. the throughput from the storage with large block sizes
(less IOPS) but with small block sizes the IOPS limit is reached and
throughput performance goes down.

Example:
With 100MB/s and 8k blocks you need 12500 IOPS which is a lot (e.g. at
least 90 disks with 140 IOPS)!
When blocks can be read with e.g. 128k block size 781 IOPS are sufficient
(6 disks are sufficient)!

So this makes a major difference.

Ciao,
Gerhard

--
http://www.wiesinger.com/

pgsql-general by date:

Previous
From: Gerhard Wiesinger
Date:
Subject: Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Next
From: Gerhard Wiesinger
Date:
Subject: Re: Limit of bgwriter_lru_maxpages of max. 1000?