Thread: Effective Cache Size

Effective Cache Size

From
"Nick Barr"
Date:
Hi,

I have been following a thread on this list "Inconsistent performance"
and had a few questions especially the bits about effective_cache_size.
I have read some of the docs, and some other threads on this setting,
and it seems to used by the planner to either choose a sequential or
index scan. So it will not necessarily increase performance I suppose
but instead choose the most optimal plan. Is this correct?

We are not that we are suffering massive performance issues at the
moment but it is expected that our database is going to grow
considerably in the next couple of years, both in terms of load and
size.

Also what would an appropriate setting be?

From what I read of Scott Marlowes email, and from the information below
I reckon it should be somewhere in the region of 240,000.

Danger maths ahead. Beware!!!!

<maths>
  141816K  buff
+ 1781764K cached
-----------------
  1923580K total

effective_cache_size = 1923580 / 8 = 240447.5
</maths>

Here is some information on the server in question. If any more
information is required then please say. It is a dedicated PG machine
with no other services being hosted off it. As you can see from the
uptime, its load average is 0.00, and is currently so chilled its almost
frozen!!!!! That will change though :-(


Hardware
========
Dual PIII 1.4GHz
2Gb RAM
1Tb SAN with hardware RAID 5 using 1Gbps Fibre channel.


OS
==
Linux webbasedth5 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002
i686
Red Hat Linux release 7.3 (Valhalla)


PG
==
PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96


Database
========
This includes all indexes and tables. I can provide more information on
how this is chopped up if needed.

Size   : 1,141.305 Mb
Tuples : 13,416,397


Uptime
======
11:15am  up 197 days, 16:50,  1 user,  load average: 0.00, 0.00, 0.00


Top
===
Mem:  2064836K av, 2018648K used,   46188K free,       0K shrd,  141816K
buff
Swap: 2096472K av,    4656K used, 2091816K free                 1781764K
cached


Postgresql.conf (all defaults except)
=====================================
max_connections = 1000
shared_buffers = 16000 (128 Mb)
max_fsm_relations = 5000
max_fsm_pages = 500000
vacuum_mem = 65535



Kind Regards,

Nick Barr


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.






Re: Effective Cache Size

From
"Shridhar Daithankar"
Date:
On 17 Sep 2003 at 11:48, Nick Barr wrote:

> Hi,
>
> I have been following a thread on this list "Inconsistent performance"
> and had a few questions especially the bits about effective_cache_size.
> I have read some of the docs, and some other threads on this setting,
> and it seems to used by the planner to either choose a sequential or
> index scan. So it will not necessarily increase performance I suppose
> but instead choose the most optimal plan. Is this correct?

That is correct.

> Danger maths ahead. Beware!!!!
>
> <maths>
>   141816K  buff
> + 1781764K cached
> -----------------
>   1923580K total
>
> effective_cache_size = 1923580 / 8 = 240447.5
> </maths>

That would be bit too aggressive. I would say set it around 200K to leave room
for odd stuff.

Rest seems fine with your configuration. Of course a latest version of
postgresql is always good though..

Bye
 Shridhar

--
Power is danger.        -- The Centurion, "Balance of Terror", stardate 1709.2


inferior SCSI performance

From
Michael Adler
Date:
I have been experimenting with a new Seagate Cheetah 10k-RPM SCSI to
compare with a cheaper Seagate Barracuda 7200-RPM IDE (each in a
single-drive configuration). The Cheetah definately dominates the generic
IO tests such as bonnie++, but fares poorly with pgbench (and other
postgresql operations).

I don't understand why switching to a SCSI drive in an otherwise identical
setup would so seriously degrade performance. I would have expected the
opposite. vmstat does not reveal (to me) any bottlenecks in the SCSI
configuration.

The only difference between the two test scenarios is that I stopped the
postmaster, copied the data dir to the other drive and put a symlink to
point to the new path. I ran the tests several times, so these are not
flukes.

Can anyone explain why this might be happening and how to better leverage
this 10k drive?

thanks,
Mike Adler


System info:

Box is a Dell 600SC with Adaptec 39160 SCSI controller.
Linux 2.4.18-bf2.4
CPU: Intel(R) Celeron(R) CPU 2.00GHz stepping 09
Memory: 512684k/524224k available (1783k kernel code, 11156k reserved,
549k data, 280k init, 0k highmem)

postgresql.conf settings:
shared_buffers = 10000
random_page_cost = 0.3
sort_mem = 4096

##################################################
TEST 1:

IDE  Seagate Baracuda
hde: ST340014A, ATA DISK drive
hde: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=77545/16/63

bonnie++ -f:
Version 1.02b       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
dellmar          1G           27001  10 11863   4           20867   3 161.7   0

sample vmstat 1 output during bonnie++:
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0   9332   4456   5056 467728   0   0 20864     0  429   698   0   5  95
 0  1  1   9332   4380   5056 467728   0   0  5248 27056  361   207   1   4  95
 0  1  1   9332   4376   5056 467728   0   0   384 26936  338    55   0   0 100
 0  1  0   9332   4416   5064 468368   0   0 10112  9764  385   350   0   4  96
 1  0  0   9332   4408   5056 468120   0   0 20608     0  427   684   1   7  92
 1  0  0   9332   4392   5056 467864   0   0 20992     0  431   692   0   5  95


pgbench:
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 2
number of transactions per client: 400
number of transactions actually processed: 800/800
tps = 110.213013(including connections establishing)
tps = 110.563927(excluding connections establishing)

sample "vmstat 1" output during pgbench:
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 2  0  0    160   4348  50032 419320   0   0   240  3432  514  3849  34   7  59
 0  2  0    160   4392  50764 418544   0   0   224  3348  500  3701  33   6  61
 2  0  0    160   4364  51652 417688   0   0   240  3908  573  4411  43   8  50
 2  0  0    160   4364  52508 416832   0   0   160  3708  548  4273  44   8  49
 1  1  1    160   4420  53332 415944   0   0   160  3604  541  4174  40  13  48
 0  1  1    160   4420  54160 415120   0   0   104  3552  526  4048  42  14  45
 1  0  0    160   4964  54720 414576   0   0   128  4328  645  5819  69   7  24



########################################################
TEST 2:

SCSI Drive Seagate Cheetah 10k.6
  Vendor: SEAGATE   Model: ST336607LW        Rev: DS08

bonnie++ -f:
Version 1.02b       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
dellmar          1G           40249  14 21177   7           46620   7 365.8   0

sample vmstat 1 output during bonnie++:
 0  1  1   8916   4400   1844 467216   0   0   384 42348  475    80   0   0 100
 0  1  1   8916   4392   1844 467216   0   0   512 46420  472   103   0   2  98
 1  0  0   8916   4364   1852 469392   0   0  7168 26552  507   268   0   3  97
 1  0  0   8916   4452   1868 469392   0   0 28544 12312  658   947   1  15  84
 1  0  0   8916   4416   1860 468888   0   0 47744     4  850  1534   0  18  82
 1  0  0   8916   4436   1796 468312   0   0 48384     0  859  1555   0  19  81
 1  0  0   8916   4452   1744 467724   0   0 48640     0  863  1569   2  20  78


pgbench (sounds thrashy):
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 2
number of transactions per client: 400
number of transactions actually processed: 800/800
tps = 33.274922(including connections establishing)
tps = 33.307125(excluding connections establishing)

sample "vmstat 1" output during pgbench:
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1    160   4356  36568 432772   0   0     0  1120  232  1325  12   2  86
 0  1  1    160   4452  36592 432652   0   0     0  1108  229  1295  14   2  84
 0  1  1    160   4428  36616 432652   0   0     0  1168  234  1370   9   4  87
 0  1  1    160   4392  36636 432668   0   0     0  1120  231  1303  12   3  85
 0  1  0    160   4364  36664 432668   0   0     0  1084  230  1361  16   5  79
 0  1  0    160   4456  36696 432548   0   0     0  1196  234  1300  13   2  85


Mike Adler

Re: inferior SCSI performance

From
Tom Lane
Date:
Michael Adler <adler@pobox.com> writes:
> I have been experimenting with a new Seagate Cheetah 10k-RPM SCSI to
> compare with a cheaper Seagate Barracuda 7200-RPM IDE (each in a
> single-drive configuration). The Cheetah definately dominates the generic
> IO tests such as bonnie++, but fares poorly with pgbench (and other
> postgresql operations).

It's fairly common for ATA drives to be configured to lie about write
completion (ie, claim write-complete as soon as data is accepted into
their onboard RAM buffer), whereas SCSI drives usually report write
complete only when the data is actually down to disk.  The performance
differential may thus be coming at the expense of reliability.  If you
run Postgres with fsync off, does the differential go away?

            regards, tom lane

Re: inferior SCSI performance

From
Michael Adler
Date:

On Wed, 17 Sep 2003, Tom Lane wrote:
> Michael Adler <adler@pobox.com> writes:
> > I have been experimenting with a new Seagate Cheetah 10k-RPM SCSI to
> > compare with a cheaper Seagate Barracuda 7200-RPM IDE (each in a
> > single-drive configuration). The Cheetah definately dominates the generic
> > IO tests such as bonnie++, but fares poorly with pgbench (and other
> > postgresql operations).
>
> It's fairly common for ATA drives to be configured to lie about write
> completion (ie, claim write-complete as soon as data is accepted into
> their onboard RAM buffer), whereas SCSI drives usually report write
> complete only when the data is actually down to disk.  The performance
> differential may thus be coming at the expense of reliability.  If you
> run Postgres with fsync off, does the differential go away?

Yes, they both perform equally at about 190 tps with fsync off.

The culprit turns out to be write-caching on the IDE drive. It is enabled
by default, but can be disabled with "hdparm -W0 /dev/hdx". After it is
disabled, the tps are proportional to rpms.

There's an (2001) Linux thread on this if anyone is interested:
http://www.ussg.iu.edu/hypermail/linux/kernel/0103.0/0331.html

So the quesiton is whether it is ever sensible to use write-caching and
expect comparable persistence.

Thanks,

Michael Adler

Re: inferior SCSI performance

From
Andrew Sullivan
Date:
On Wed, Sep 17, 2003 at 04:46:00PM -0400, Michael Adler wrote:
> So the quesiton is whether it is ever sensible to use write-caching and
> expect comparable persistence.

Yes.  If and only if you have a battery-backed cache.  I know of no
IDE drives that have that, but there's nothing about the spec which
makes it impossible.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: inferior SCSI performance

From
"scott.marlowe"
Date:
On Wed, 1 Oct 2003, Andrew Sullivan wrote:

> On Wed, Sep 17, 2003 at 04:46:00PM -0400, Michael Adler wrote:
> > So the quesiton is whether it is ever sensible to use write-caching and
> > expect comparable persistence.
>
> Yes.  If and only if you have a battery-backed cache.  I know of no
> IDE drives that have that, but there's nothing about the spec which
> makes it impossible.

FYI, on a Dual PIV2800 with 2 gig ram and a single UDMA 80 gig hard drive,
I from 420 tps to 22 tps when I disable write caching.  WOW.  A factor of
about 20 times slower.  (pgbench -c 4 -t 100)


Re: inferior SCSI performance

From
Andrew Sullivan
Date:
On Wed, Oct 01, 2003 at 07:14:32AM -0600, scott.marlowe wrote:
> FYI, on a Dual PIV2800 with 2 gig ram and a single UDMA 80 gig hard drive,
> I from 420 tps to 22 tps when I disable write caching.  WOW.  A factor of
> about 20 times slower.  (pgbench -c 4 -t 100)

That's completely consistent with tests Chris Browne has done here on
cache-enabled and cache-disabled boxes that we have.

It's a _really_ big difference.  The combination of battery-backed
write cache on your controller plus a real good UPS is quite possibly
the number one thing you can do to improve performance.  For what
it's worth, I can't see how this is something special about Postgres:
even raw-filesystem type systems have to make sure the disk actually
has the data, and a write cache is bound to be a big help for that.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: inferior SCSI performance

From
Christopher Browne
Date:
andrew@libertyrms.info (Andrew Sullivan) writes:
> On Wed, Oct 01, 2003 at 07:14:32AM -0600, scott.marlowe wrote:
>> FYI, on a Dual PIV2800 with 2 gig ram and a single UDMA 80 gig hard drive,
>> I from 420 tps to 22 tps when I disable write caching.  WOW.  A factor of
>> about 20 times slower.  (pgbench -c 4 -t 100)
>
> That's completely consistent with tests Chris Browne has done here on
> cache-enabled and cache-disabled boxes that we have.
>
> It's a _really_ big difference.  The combination of battery-backed
> write cache on your controller plus a real good UPS is quite possibly
> the number one thing you can do to improve performance.  For what
> it's worth, I can't see how this is something special about Postgres:
> even raw-filesystem type systems have to make sure the disk actually
> has the data, and a write cache is bound to be a big help for that.

Indeed.

When I ran the tests, I found that JFS was preferable to XFS and ext3
on Linux on the machine with the big battery backed cache.  (And the
side-effect that it was getting yes, probably about 20x the
performance of systems without the cache.)

The FS-related result appeared surprising, as the "stories" I had
heard suggested that JFS hadn't been particularly heavily tuned on
Linux, whereas XFS was supposed to be the "speed demon."

It is entirely possible that the result I saw was one that would
reverse partially or even totally on a system LACKING that cache.  XFS
might "play better" when we're cacheless; the (perhaps only fabled)
demerits of JFS being more than totally hidden if we add the cache.

What I find disappointing is that it isn't possible to get SSD cards
that are relatively inexpensive.  A similarly fabulous performance
increase _ought_ to be attainable if you could stick pg_xlog and
pg_clog on a 256MB (or bigger!) battery-backed SSD, ideally one that
plugs into a PCI slot.

This should have the further benefit of diminishing the amount of
mechanical activity going on, as WAL activity would no longer involve
ANY i/o operations.

Unfortunately, while there are companies hawking SSDs, they are in the
"you'll have to talk to our salescritter for pricing" category, which
means that they must be ferociously expensive.  :-(.
--
output = ("cbbrowne" "@" "libertyrms.info")
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: inferior SCSI performance

From
Hannu Krosing
Date:
Christopher Browne kirjutas K, 01.10.2003 kell 19:21:

>
> The FS-related result appeared surprising, as the "stories" I had
> heard suggested that JFS hadn't been particularly heavily tuned on
> Linux, whereas XFS was supposed to be the "speed demon."

Gentoo linux recommends XFS only for SAN+fibre channel + good ups for
anything but database use ;)

> It is entirely possible that the result I saw was one that would
> reverse partially or even totally on a system LACKING that cache.  XFS
> might "play better" when we're cacheless; the (perhaps only fabled)
> demerits of JFS being more than totally hidden if we add the cache.
>
> What I find disappointing is that it isn't possible to get SSD cards
> that are relatively inexpensive.  A similarly fabulous performance
> increase _ought_ to be attainable if you could stick pg_xlog and
> pg_clog on a 256MB (or bigger!) battery-backed SSD, ideally one that
> plugs into a PCI slot.

For really cheap and for small-size transactions you could experiment
with USB2 memory sticks (some of them claim 34MB/s write speed), perhaps
in striped/mirrored configuration. You would just need something
counting writes in the driver layer to alert you when you are reaching
the x00k "writes" limit and have to plug in new sticks :)

OTOH, articles I found through quick googling suggest only 2.4MB/s write
and 7MB/s read speeds for USB 2.0 memory sticks, so the 34MB is proably
just sales pitch and refers to bus speed, not actual write speed ;(

> Unfortunately, while there are companies hawking SSDs, they are in the
> "you'll have to talk to our salescritter for pricing" category, which
> means that they must be ferociously expensive.  :-(.

the cheapest I found was the one with external backup power was ~1.8k$
for 2GB PCI device

http://www.cdw.com/shop/search/Results.aspx?key=platypus&x=0&y=0

An external 16GB one with battery backup and
write-t-small-ide-drives-on-power-failure was ~25k$

-----------------
Hannu

Re: inferior SCSI performance

From
Greg Stark
Date:
> > Unfortunately, while there are companies hawking SSDs, they are in the
> > "you'll have to talk to our salescritter for pricing" category, which
> > means that they must be ferociously expensive.  :-(.
>
> the cheapest I found was the one with external backup power was ~1.8k$
> for 2GB PCI device
>
> http://www.cdw.com/shop/search/Results.aspx?key=platypus&x=0&y=0

That is pretty neat.

> An external 16GB one with battery backup and
> write-t-small-ide-drives-on-power-failure was ~25k$

And they scale up from there. This company has one that goes up to 1TB RAM.
4.5kW power consumption? I hate to see what kind of heat that thing generates.

http://www.imperialtech.com/pdf/MRSpec_021803.pdf

I have no idea what price the monster unit is, but someone described the price
of one of the *lesser* units as "made me physically ill". So I can only imagine.

--
greg

Re: inferior SCSI performance

From
Ang Chin Han
Date:
Andrew Sullivan wrote:

> Yes.  If and only if you have a battery-backed cache.  I know of no
> IDE drives that have that, but there's nothing about the spec which
> makes it impossible.

http://www.ussg.iu.edu/hypermail/linux/kernel/0103.0/1084.html

Relevant section:
<quote>
Maybe that is why there is a vender disk-cache dump zone on the edge of
the platters...just maybe you need to buy your drives from somebody that
does this and has a predictive sector stretcher as the energy from the
inertia by the DC three-phase motor executes the dump.

Ever wondered why modern drives have open collectors on the databuss?
Maybe to disconnect the power draw so that the motor now generator
provides the needed power to complete the data dump...
</quote>

SEEMS to imply that some IDE drives at least have enough power left
after power's off to store the write-cached data to disk.

The rest of the email's not very reassuring, though, but note that this
email's two years old.

Anyone want to test? :)

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
   4:30pm  up 280 days,  8:00,  8 users,  load average: 6.05, 6.01, 6.02

Attachment

Re: inferior SCSI performance

From
Vivek Khera
Date:
>>>>> "CB" == Christopher Browne <cbbrowne@libertyrms.info> writes:

CB> Unfortunately, while there are companies hawking SSDs, they are in the
CB> "you'll have to talk to our salescritter for pricing" category, which
CB> means that they must be ferociously expensive.  :-(.

You ain't kidding.  Unfortunately, one of the major vendors just went
belly up (Imperial Technology) so pricing probably won't get any
better anytime soon.

Perhaps one of these days I'll try that experiment on my SSD... ;-)


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/