Re: Need help with 8.4 Performance Testing - Mailing list pgsql-performance

From Gregory Stark
Subject Re: Need help with 8.4 Performance Testing
Date
Msg-id 873agwj1sr.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Need help with 8.4 Performance Testing  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
Scott Carey <scott@richrelevance.com> writes:

> And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in
> block_size chunks. (hopefully I am wrong)
>...
> In addition to the fadvise patch, postgres needs to merge adjacent I/O's
> into larger ones to reduce the overhead. It only really needs to merge up to
> sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead,
> and additionally potentially save code trips down the shared buffer
> management code paths. At lest, thats my guess I haven't looked at any code
> and could be wrong.

There are a lot of assumptions here that I would be interested in seeing
experiments to back up.

FWIW when I was doing testing of posix_fadvise I did a *lot* of experiments
though only with a couple systems. One had a 3-drive array and one with a
15-drive array, both running Linux. I sometimes could speed up the sequential
scan by about 10% but not consistently. It was never more than about 15% shy
of the highest throughput from dd. And incidentally the throughput from dd
didn't seem to depend much at all on the blocksize.

On your system does "dd bs=8k" and "dd bs=128k" really have an 8x performance
difference?

In short, at least from the evidence available, this all seems like it might
be holdover beliefs from the olden days of sysadmining where syscalls were
much slower and OS filesystem caches much dumber.

I'm still interested in looking into it but I'll have to see actual vmstat or
iostat output while it's happening, preferably some oprofile results too. And
how many drives do you actually need to get into this situation. Also, what is
the output of "vacuum verbose" on the table?


> Additionally, the "If your operating system has any reasonable caching
> itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2)
> does NOT. I can easily make it spend 100% CPU in system time trying to
> figure out what to do with the system cache for an hour. Just do large
> seqscans with memory pressure from work_mem or other forces that the OS will
> not deem 'idle'. Once the requested memory is ~75% of the system total, it
> will freak out. Linux simply will not give up that last 25% or so of the RAM
> for anything but page cache

This seems like just a misconfigured system. Linux and most Unixen definitely
expect to have a substantial portion of RAM dedicated to disk cache. Keep in
mind all your executable pages count towards this page cache too. You can
adjust this to some extent with the "swappiness" variable in Linux -- but I
doubt you'll be happy with the results regardless.

> The other way around (small shared_buffers, let the OS do it) hurts
> performance overall quite a bit -- randomly accessed pages get pushed out to
> the OS cache more often, and the OS tosses thouse out when a big seqscan
> occurs, resulting in a lot more random access from disk and more disk bound
> periods of time. Great wonder, this operating system caching, eh?

How do you observe this?


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: Need help with 8.4 Performance Testing
Next
From: Gregory Stark
Date:
Subject: Re: Need help with 8.4 Performance Testing