Thread: postgresql +AMD64 +big address spaces - does it work?

postgresql +AMD64 +big address spaces - does it work?

From

"Andy B"

Date:

01 July 2004, 18:56:12

If I bought one of these boxes/OS combos as a postgresql database server,
would postgresql be able to make the best use of it with a huge (e.g. 40GB)
database?

Box:     HP ProLiant DL585, with  4 AMD64 CPUs and 64GB of RAM. (other
vendor options also exist)
OS:      SUSE enterprise 8 linux for AMD

(links to product info at bottom)

e.g. Could postgresql make use of most of this address space to map a huge
database (say 40GB) all paged into RAM at once?

----
According to Suse, a process running on such a setup can individually
address 0.5 terabytes of address space, and at any one point, with the
hardware above, up to 64GB (minus a bit!) of this address space could be
paged into physical memory at any one time.
----


According to the docs I could find, I just need to do the following:

1. set postgresql.conf->shared_buffers to a value like (40GB/8KB),

2. Check cat /proc/sys/kernel/shmmax is big enough, or successfully increase
it to  > 40GB

From experience that's a long way from knowing it will work. Does anyone
have any experience at all with such big memory configurations?


Many thanks,
Andy

______
My references:
Suse: www.suse.com/amd64 and
www.suse.com/en/business/products/server/sles/sles8_amd64.pdf
HP:   http://h18004.www1.hp.com/products/servers/proliantdl585/index.html

Re: postgresql +AMD64 +big address spaces - does it work?

From

Tom Lane

Date:

01 July 2004, 19:54:56

"Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk> writes:
> 1. set postgresql.conf->shared_buffers to a value like (40GB/8KB),

That is the very last thing you want to do (even if you could, which you
can't).  Let the OS manage the memory --- the kernel can do just fine at
caching large files in RAM.  I've never seen any evidence that it helps
to set shared_buffers higher than a few tens of thousands.

See the pgsql-performance archives for much more on this subject.

            regards, tom lane

Re: postgresql +AMD64 +big address spaces - does it work?

From

"Scott Marlowe"

Date:

01 July 2004, 20:17:19

On Thu, 2004-07-01 at 06:41, Andy B wrote:
> If I bought one of these boxes/OS combos as a postgresql database server,
> would postgresql be able to make the best use of it with a huge (e.g. 40GB)
> database?
>
> Box:     HP ProLiant DL585, with  4 AMD64 CPUs and 64GB of RAM. (other
> vendor options also exist)
> OS:      SUSE enterprise 8 linux for AMD
>
> (links to product info at bottom)
>
> e.g. Could postgresql make use of most of this address space to map a huge
> database (say 40GB) all paged into RAM at once?
>
> ----
> According to Suse, a process running on such a setup can individually
> address 0.5 terabytes of address space, and at any one point, with the
> hardware above, up to 64GB (minus a bit!) of this address space could be
> paged into physical memory at any one time.
> ----
>
>
> According to the docs I could find, I just need to do the following:
>
> 1. set postgresql.conf->shared_buffers to a value like (40GB/8KB),
>
> 2. Check cat /proc/sys/kernel/shmmax is big enough, or successfully increase
> it to  > 40GB
>
> >From experience that's a long way from knowing it will work. Does anyone
> have any experience at all with such big memory configurations?

Sort of.  PostgreSQL could use however much of it you decided to give it
as a buffer for individual queries, and the newer caching algo should
handle large amounts of data well.

However, PostgreSQL isn't currently in the business, so to speak, of
holding all the data you've accessed in the last X minutes in a great
big buffer.  The reason for this is that PostgreSQL uses shared memory,
and on many architectures, shared memory is good for letting individual
backends communicate quickly, it's not as fast at managing really huge
chunks of data.

So, PostgreSQL is designed to let your kernel do all this.  The buffers
in PostgreSQL are designed to hold the current working set, no more.
The buffers in the kernel cache are designed to handle gigs of data, and
to handle it quickly.  So, if it's in Kernel cache, then PostgreSQL IS
capable of utilizing the extra memory, in the sense that the kernel is
using it for cache.

All modern flavors of unix have pretty good caching built in, with
little tuning needed.

Re: postgresql +AMD64 +big address spaces - does it work?

From

"Glen Parker"

Date:

01 July 2004, 20:52:29

> "Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk> writes:
> > 1. set postgresql.conf->shared_buffers to a value like (40GB/8KB),
>
> That is the very last thing you want to do (even if you could, which you
> can't).  Let the OS manage the memory --- the kernel can do just fine at
> caching large files in RAM.  I've never seen any evidence that it helps
> to set shared_buffers higher than a few tens of thousands.

My experience is the same; keep shared_buffers fairly small.

But I'm curious why we *can't* do 40GB on a 64 bit machine?  I'm testing an
AMD64 + FedoraCore2 + PG7.4 installation right now and I am expecting to
throw lots of memory at it soon...

Glen Parker

Re: postgresql +AMD64 +big address spaces - does it work?

From

Tom Lane

Date:

02 July 2004, 01:38:40

"Glen Parker" <glenebob@nwlink.com> writes:
> But I'm curious why we *can't* do 40GB on a 64 bit machine?

Well, the "can't" part is simple: our code to calculate the size of our
shared-memory request uses int32 arithmetic.  So we definitely can't go
higher than 4GB shared memory, and probably not higher than 2GB.

This could be fixed if anyone was sufficiently motivated, although you'd
have to be careful about maintaining portability to machines that don't
think size_t is 8 bytes, or don't have any 8-byte integer type at all.

However, given the complete lack of evidence that it's useful to boost
shared_buffers above the few-hundred-meg range, I can't see anyone
spending time on doing it...

The more interesting part of this discussion is why there isn't any such
evidence.  You can find this question beat to death in the PG list
archives, but a quick sketch is:

1. PG's algorithms for managing its buffers are not obviously better
than those commonly found in kernels.  In particular there are several
situations in which PG does linear scans of all available buffers.
This is OK for NBuffers counts up to some thousands, but we would need
some serious work to make NBuffers in the millions work well.

2. The whole argument for buffering disk pages in memory is very
dependent on the assumption that your buffers are actually in memory.
However on most Unixen there is *no* guarantee that the shared memory
we request from the kernel will not get swapped out --- and in fact the
larger the shmem request we make, the more likely this will happen.
A disk buffer that get swapped to swap space is completely
counterproductive, as it's costing you at least double the I/O work
compared to just re-reading the original file.  So in practice it's
better to keep the shared-buffer arena small enough that all of it is
"hot" (heavily used) and not likely to get seen as a swap candidate by
the kernel's VM manager.

3. A large fixed-size shared-buffer arena is the worst of all possible
worlds in terms of dynamic memory management.  The real-world situation
is that RAM has to be shared among PG shared buffers, private memory of
PG backend processes, and (usually) workspace of other non-Postgres
processes that are running on the server machine.  The kernel is in a
far better position than we are to evaluate these competing demands and
make the right adjustments to changing situations.  The kernel can
easily drop cached disk pages from its buffers to allocate more RAM to
process workspace, or the reverse when process demands drop; but we
can't change the size of the shared-buffer arena on the fly.

Bottom line: let the kernel manage as much memory as possible.

            regards, tom lane

Re: postgresql +AMD64 +big address spaces - does it work?

From

Chris Browne

Date:

06 July 2004, 17:28:24

"Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk> writes:
>> I get the feeling that, that regardless 64bit support or not, that the
>> *concept* of a database which just happens to all easily fit within RAM
>> isn't one that gets the thumbs down...
>
> Oops, I meant to say '*is*' one that gets the thumbs down...

No, to the contrary, having all the data fit easily within RAM is
quite desirable.  One of my coworkers is investigating the
entertaining possibility of hooking up SSDs to some of our systems to
entirely eliminate disk I/O for WAL.  (The fun part then is to see
what more can be done with another 3GB of space on the SSD that can
eliminate a bunch more I/O  :-)...)

The thing that gets "thumbs down" is the notion of trying to _force_
that to take place by maximizing shared buffer size, based on the
assumption that such a strategy is optimal for the purpose.

By all means, get a big AMD box with plenty of RAM; just _don't_
assume that tweaking shared buffers is the "majick trick" that will
make it all work 50x faster.
--
output = reverse("moc.enworbbc" "@" "enworbbc")
http://cbbrowne.com/info/internet.html
"A lot  of people come to  this newsgroup and  do nothing but complain
about Lisp.   I think maybe they  are such heavy complainers that they
think they read comp.lain.lisp." -- Erik Naggum

Re: postgresql +AMD64 +big address spaces - does it work?

From

Christopher Petrilli

Date:

06 July 2004, 17:52:34

On Tue, 06 Jul 2004 15:27:54 -0400, Chris Browne <cbbrowne@acm.org> wrote:
> "Andy B" <abhousehuntRE-M--O--V-E@blueyonder.co.uk> writes:
> >> I get the feeling that, that regardless 64bit support or not, that the
> >> *concept* of a database which just happens to all easily fit within RAM
> >> isn't one that gets the thumbs down...
> >
> > Oops, I meant to say '*is*' one that gets the thumbs down...
>
> No, to the contrary, having all the data fit easily within RAM is
> quite desirable.  One of my coworkers is investigating the
> entertaining possibility of hooking up SSDs to some of our systems to
> entirely eliminate disk I/O for WAL.  (The fun part then is to see
> what more can be done with another 3GB of space on the SSD that can
> eliminate a bunch more I/O  :-)...)

My experience in tuning Oracle and DB2 is that temp tablespaces are
what belong there. Things that are used when doing joins of large
tables.  Honestly, though, it'll depend on your exact application
model.  I've also had good luck putting indexes on SSD, because often
you do 3-4 hits of the index before a single seek on the data table.
So the reduction in access time is proportionally larger. I've not
done the numbers recently, but when 10K RPM drives were "the big
thing," I had excellent luck leaving my data tables on 7200rpm drives
that were high-capacity and moving the indexes to 10K drives.

Also a place where the trade offs between duplexing (mirroring with
read-balancing) and RAID-5 can become huge.

Chris
--
| Christopher Petrilli
| petrilli@gmail.com

Re: postgresql +AMD64 +big address spaces - does it work?

From

"Andy B"

Date:

07 July 2004, 16:00:20

Ok - just to end this thread, I think I understand what I was missing.

I'll stop this thread, and just comment on my first thread.

Thank you everyone who helped

Re: postgresql +AMD64 +big address spaces - does it work?

From

"Andy B"

Date:

07 July 2004, 16:12:53

> I get the feeling that, that regardless 64bit support or not, that the
> *concept* of a database which just happens to all easily fit within RAM
> isn't one that gets the thumbs down...

Oops, I meant to say '*is*' one that gets the thumbs down...

Re: postgresql +AMD64 +big address spaces - does it work?

From

"Andy B"

Date:

07 July 2004, 16:27:25

Hello again and thanks to everyone for the replies so far.

Tom, and all, I hear what you are all saying, and furthermore, in cases
where the amount of RAM is much smaller than the database size, I agree
totally. However, I'm *only* talking about a particular scenario which, till
now, has really only been true of small databases, and that is:
--------------------------------
RAM - (total size of database)  = (easily enough RAM for everything else to
run smoothly on the server).
--------------------------------

I get the feeling that, that regardless 64bit support or not, that the
*concept* of a database which just happens to all easily fit within RAM
isn't one that gets the thumbs down...

...and I think I must be missing something!

If you bear with me, could I just give you a better idea of where I'm coming
from, because I think I may have missed an important concept, or might be
able to sway you. (I don't mind which way this goes, other than my server
dreams will be in tatters if you're right and I'm confused, so be gentle
with me!)

Before I start, just to repeat that that though I have only been using
postgresql for 18 months and haven't run a huge database before, I do have a
large amount of experience of designing memory intensive multi-threaded real
time applications running in a unix virtual memory environment (specifically
SGI IRIX boxes), and am very comfortable with the postrgresql implementation
model. (and I did choose it over MySQL, so I am a fan.)

Ok!

---------------------------
Where I'm coming from!
---------------------------

I am developing a postgresql based national website, which hopefully will
have a million or two users if successful (though hopefully not all at once
:) )

The details aren't important, other than to say that the functionality I'm
planning is quite advanced - it's more of a web application than website as
such, and the database is going to get an absolute hammering. Most
importantly, a large percentage of the data within the database is going to
be 'active' (i.e. the target of queries) on a minute by minute basis.

Luckily for me, it isn't due to go live just yet :)

There is now 'proper' 64 bit support in Linux for both Intel and AMD and
we've already seen a few linux boxes with half a terabyte of RAM - shareable
as a single address space. (e.g. boxes like the Altix 3000 from SGI) Even
commodity server boxes (such as the HP DL585) now provide the environment
for multiple threads to share a single huge address space backed up by up to
64GB of RAM, accessible by 4 CPUs. I'm not more up to speed on the hardware
side, but this is only going one way: up!

So, with this new hardware and 64 bit OSs,  it is theoretically possible to
dramatically reduce the IO bottlenecks on a database server by simply
sticking in enough RAM so that the database ends up in RAM, leaving most of
the IO being 'O' for syncing updates.

We're talking about High Performance Database

Surely this is a very desirable setup for someone like me?
____

Here's how it could work:

1. The postgresql postmaster (is that right?) - the process responsible for
setting up the shared buffer cache  - does what it normally does, except
that it works with 64 bit addressing so your cache can be 40MB, or 200GB.
(CPU limited to 64GB currently in most 64bit processor systems but this will
change)

2. You setup the buffer cache size so that it could easily hold *the entire
database*. (Or the active part... the decision of how big is subject to all
the current guidelines. The only difference is that there is no 2GB limit).

e.g. on my 64GB system, my entire database is 16GB and I instruct postgresql
to setup a 20GB shared buffer cache, in order to allow for headroom for
updates.

3. At the postgresql level, database pages ('buffers?') get cached, and stay
in the buffer cache, because there is enough space for everything to reside
in the postgresql cache and more.

4. Additionally, at the virtual memory level, Linux isn't going to kick
pages of this huge shared buffer cache out to disk, because there's plenty
of RAM kicking around for everything else, and in fact, the target for such
a machine is for it to be running at close to zero swap activity. We're
talking about a high performance system after all, and swap activity is the
last thing you want in a high performance setup.

Note that Linux is still managing *all the memory*. I'm not interfering at
all with virtual memory. (Though IRIX's mpin() is very handy... I must find
out if Linux has a similar call)

5. Assuming your database *does* easily fit in the shared buffer cache,
queries will run like the wind, limited only by CPU availability and the
performance of the front end web servers + connections to deliver queries
and accept the results.

6. You get more bang for your buck, especially in systems with a heavy load
of concurrent postgresql backend activity, which would normally put enormous
strain on the IO system.

7. An extra benefit is that the database application developer can
contemplate query plans which, under the small cache, big database scenario
are just not sensible in a high load environment. (Can you tell that I've
already done this?!)
-------------------------------------------------

Anyway - current implementation aside, what theoretically is stopping you
all from jumping with joy at the prospect of running nationwide databases
from RAM?

Thanks for your input so far, and if you think I should move this thread
elsewhere, I'd be happy to.

I'd also be interested in helping with making the changes to allow the
shared buffer cache to be 64 bit friendly, though I've never helped on an
open source project, and my previous mistakes have only affected hundreds of
people, not millions ;)

Regards,
Andy

-------------------------------------------------

> 3. A large fixed-size shared-buffer arena is the worst of all possible
> worlds in terms of dynamic memory management.  The real-world situation
> is that RAM has to be shared among PG shared buffers, private memory of
> PG backend processes, and (usually) workspace of other non-Postgres
> processes that are running on the server machine.  The kernel is in a
> far better position than we are to evaluate these competing demands and
> make the right adjustments to changing situations.  The kernel can
> easily drop cached disk pages from its buffers to allocate more RAM to
> process workspace, or the reverse when process demands drop; but we
> can't change the size of the shared-buffer arena on the fly.
>
> Bottom line: let the kernel manage as much memory as possible.