Thread: Splitting queries across servers

Splitting queries across servers

From
"Max"
Date:
Hello,

Our postgresql database is getting too big to be handled by one server. We
need the database to be in RAM and cannot afford swapping. At the moment,
we're using only 3GB or RAM, however our business growth is going to drive
this number into the double digits zone, maybe triple digits.

What are our options ?

I am a little bit ignorant in this part of RDBM. Can we distribute our
tables across multiple servers (read server+postgres) and perform
distributed SQL queries ? If so, how does that work ? I am totally
resourceless!

Thank you in advance

Max.



Re: Splitting queries across servers

From
"Frank D. Engel, Jr."
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Wow.  Your needs really *are* demanding.  You might need to look into a
high-end server to meet those kinds of needs - Sun has some *very*
expensive hardware that can handle over 1/2TB of RAM, for example, and
Cray makes supercomputers (even more *very* expensive) which can handle
somewhere around 32 TB of RAM...

You're sure you can't possibly put up with even a small amount of
swapping?


On Jan 26, 2005, at 11:00 AM, Max wrote:

> Hello,
>
> Our postgresql database is getting too big to be handled by one
> server. We
> need the database to be in RAM and cannot afford swapping. At the
> moment,
> we're using only 3GB or RAM, however our business growth is going to
> drive
> this number into the double digits zone, maybe triple digits.
>
> What are our options ?
>
> I am a little bit ignorant in this part of RDBM. Can we distribute our
> tables across multiple servers (read server+postgres) and perform
> distributed SQL queries ? If so, how does that work ? I am totally
> resourceless!
>
> Thank you in advance
>
> Max.
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>
>
- -----------------------------------------------------------
Frank D. Engel, Jr.  <fde101@fjrhome.net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFB98Wx7aqtWrR9cZoRAtYNAJ0SHP+Y65TasCxvUxd9RjTXSJqyOgCdEi6X
VfX0V8TKg4JEPPy6eJyYPVM=
=zAYr
-----END PGP SIGNATURE-----



___________________________________________________________
$0 Web Hosting with up to 120MB web space, 1000 MB Transfer
10 Personalized POP and Web E-mail Accounts, and much more.
Signup at www.doteasy.com


Re: Splitting queries across servers

From
John Sidney-Woollett
Date:
We configured a slony1 cluster with a master and slave (both 7.4.6), and
used the slave to serve read-only queries thereby offloading some of the
work from the master database.

This worked well for us.

You could also take a look at pg_pool to distribute your load - but I
haven't actually used it, so can't give any advice...

Hope that helps.

John Sidney-Woollett

Max wrote:
> Hello,
>
> Our postgresql database is getting too big to be handled by one server. We
> need the database to be in RAM and cannot afford swapping. At the moment,
> we're using only 3GB or RAM, however our business growth is going to drive
> this number into the double digits zone, maybe triple digits.
>
> What are our options ?
>
> I am a little bit ignorant in this part of RDBM. Can we distribute our
> tables across multiple servers (read server+postgres) and perform
> distributed SQL queries ? If so, how does that work ? I am totally
> resourceless!
>
> Thank you in advance
>
> Max.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Splitting queries across servers

From
"Max"
Date:
>
> Wow.  Your needs really *are* demanding.  You might need to look into a
> high-end server to meet those kinds of needs - Sun has some *very*
> expensive hardware that can handle over 1/2TB of RAM, for example, and
> Cray makes supercomputers (even more *very* expensive) which can handle
> somewhere around 32 TB of RAM...

Well, that's too much money. We'd like to stay away from these options.

>
> You're sure you can't possibly put up with even a small amount of
> swapping?

Nop. Too many users. Xmas shopping brought us down for a few minutes, and
the rest of the time we were too slow.


>
>
> On Jan 26, 2005, at 11:00 AM, Max wrote:
>
> > Hello,
> >
> > Our postgresql database is getting too big to be handled by one
> > server. We
> > need the database to be in RAM and cannot afford swapping. At the
> > moment,
> > we're using only 3GB or RAM, however our business growth is going to
> > drive
> > this number into the double digits zone, maybe triple digits.
> >
> > What are our options ?
> >
> > I am a little bit ignorant in this part of RDBM. Can we distribute our
> > tables across multiple servers (read server+postgres) and perform
> > distributed SQL queries ? If so, how does that work ? I am totally
> > resourceless!
> >
> > Thank you in advance
> >
> > Max.
> >
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if
> > your
> >       joining column's datatypes do not match
> >
> >
> - -----------------------------------------------------------
> Frank D. Engel, Jr.  <fde101@fjrhome.net>
>
> $ ln -s /usr/share/kjvbible /usr/manual
> $ true | cat /usr/manual | grep "John 3:16"
> John 3:16 For God so loved the world, that he gave his only begotten
> Son, that whosoever believeth in him should not perish, but have
> everlasting life.
> $
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (Darwin)
>
> iD8DBQFB98Wx7aqtWrR9cZoRAtYNAJ0SHP+Y65TasCxvUxd9RjTXSJqyOgCdEi6X
> VfX0V8TKg4JEPPy6eJyYPVM=
> =zAYr
> -----END PGP SIGNATURE-----
>
>
>
> ___________________________________________________________
> $0 Web Hosting with up to 120MB web space, 1000 MB Transfer
> 10 Personalized POP and Web E-mail Accounts, and much more.
> Signup at www.doteasy.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>



Re: Splitting queries across servers

From
"Max"
Date:
>
> We configured a slony1 cluster with a master and slave (both 7.4.6), and
> used the slave to serve read-only queries thereby offloading some of the
> work from the master database.
>
> This worked well for us.
>
> You could also take a look at pg_pool to distribute your load - but I
> haven't actually used it, so can't give any advice...

thanks, I am going to look into this.

>
> Hope that helps.
>
> John Sidney-Woollett
>
> Max wrote:
> > Hello,
> >
> > Our postgresql database is getting too big to be handled by one
> server. We
> > need the database to be in RAM and cannot afford swapping. At
> the moment,
> > we're using only 3GB or RAM, however our business growth is
> going to drive
> > this number into the double digits zone, maybe triple digits.
> >
> > What are our options ?
> >
> > I am a little bit ignorant in this part of RDBM. Can we distribute our
> > tables across multiple servers (read server+postgres) and perform
> > distributed SQL queries ? If so, how does that work ? I am totally
> > resourceless!
> >
> > Thank you in advance
> >
> > Max.
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index
> scan if your
> >       joining column's datatypes do not match
>



Re: Splitting queries across servers

From
William Yu
Date:
With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
onto a single machine using 4GB DIMMs in every slot.

The other option is to explore static memory storage. It's probably too
expensive to put your entire DB onto such a device but moving just the
WAL there would give you a pretty decent jump in write performance. (I
ran some tests way back simulating this by using a RAMDISK to store the
WAL files and got ~60% increase in an update-heavy mix.)


Max wrote:
> Hello,
>
> Our postgresql database is getting too big to be handled by one server. We
> need the database to be in RAM and cannot afford swapping. At the moment,
> we're using only 3GB or RAM, however our business growth is going to drive
> this number into the double digits zone, maybe triple digits.
>
> What are our options ?
>
> I am a little bit ignorant in this part of RDBM. Can we distribute our
> tables across multiple servers (read server+postgres) and perform
> distributed SQL queries ? If so, how does that work ? I am totally
> resourceless!
>
> Thank you in advance
>
> Max.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

Re: Splitting queries across servers

From
"Dann Corbit"
Date:
Isn't putting the WAL in memory dangerous in case of a power failure?
I would think that RI would be compromised.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of William Yu
Sent: Thursday, January 27, 2005 12:41 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Splitting queries across servers

With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
onto a single machine using 4GB DIMMs in every slot.

The other option is to explore static memory storage. It's probably too
expensive to put your entire DB onto such a device but moving just the
WAL there would give you a pretty decent jump in write performance. (I
ran some tests way back simulating this by using a RAMDISK to store the
WAL files and got ~60% increase in an update-heavy mix.)

Re: Splitting queries across servers

From
William Yu
Date:
Very dangerous. I only did it as a test to guestimate what kind of
performance I might get if I got a static ram hard drive to put the WAL
onto.


Dann Corbit wrote:
> Isn't putting the WAL in memory dangerous in case of a power failure?
> I would think that RI would be compromised.
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of William Yu
> Sent: Thursday, January 27, 2005 12:41 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Splitting queries across servers
>
> With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
> onto a single machine using 4GB DIMMs in every slot.
>
> The other option is to explore static memory storage. It's probably too
> expensive to put your entire DB onto such a device but moving just the
> WAL there would give you a pretty decent jump in write performance. (I
> ran some tests way back simulating this by using a RAMDISK to store the
> WAL files and got ~60% increase in an update-heavy mix.)


Re: Splitting queries across servers

From
"Max"
Date:
>
>
> With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
> onto a single machine using 4GB DIMMs in every slot.

We were talking about a similar solution today, but not quite as good as
this. You really got me thinking now. 64Gb.. hm that would solve many
issues. what do you think the cost would be for such a beast ? Any idea ?

Max




Re: Splitting queries across servers

From
Mike Rylander
Date:
On Thu, 27 Jan 2005 20:17:29 -0800, Max <maxdl@adelphia.net> wrote:
> >
> >
> > With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
> > onto a single machine using 4GB DIMMs in every slot.
>
> We were talking about a similar solution today, but not quite as good as
> this. You really got me thinking now. 64Gb.. hm that would solve many
> issues. what do you think the cost would be for such a beast ? Any idea ?

Depending on the vendor and the storage subsystem, between $20k and
$50k, and if you start with less memory it will be much less.  I can
attest to the fact that 4 x Opterons work VERY nicely. ;)

>
> Max
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Splitting queries across servers

From
William Yu
Date:
Without memory, the prices are roughly:

4x846 = $5500
4x848 = $6500
4x850 = $8000

Memory costs would be:

16GB (1GB DIMMs) = $3000
32GB (2GB DIMMs) = $7500
64GB (4GB DIMMs) = $24000
128GB (8GB DIMMs) = $60000

The prices on the 4GB & 8GB DIMMs are bleeding edge of course. I see a
wide range of for them on Froogle. You will want your vendor of course
to pick memory is proper for the MB and pay whatever premium they'll
charge ontop of it. (Looking at the Tyan website, they only list 2GB
DIMMs as supported for their Quad Opteron MBs but whether that's because
those specs were done before >= 4GB DIMMs were available or an actual
hardware limitation, I dunno.)




Max wrote:
>>
>>With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
>>onto a single machine using 4GB DIMMs in every slot.
>
>
> We were talking about a similar solution today, but not quite as good as
> this. You really got me thinking now. 64Gb.. hm that would solve many
> issues. what do you think the cost would be for such a beast ? Any idea ?
>
> Max
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Splitting queries across servers

From
"Dann Corbit"
Date:
Suppose that you currently need 16 GB to cache everything now.
I would install (perhaps) 32 GB ram for the initial configuration.

The price of memory drops exponentially, and so waiting for the price to
drop will give a much lower expense for the cost of the RAM.

The reason to double the ram is the expense of upgrading in terms of
labor and downtime for the computer.  That can be very significant.  So
if we double the ram, that should give one or (hopefully) two years
safety margin.

If the database is expected to grow exponentially fast, then that is
another issue.   In such a case, if it can be cost justified, put on the
largest memory volume that is possible given your financial limitations.

In any case, plan to completely replace everything (CPU, memory, disk)
every 5 years or less.

All the value is really in the data.  Because hardware increases compute
power exponentially over time, that also means that the value of the
hardware you bought decreases exponentially over time.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of William Yu
Sent: Friday, January 28, 2005 11:36 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Splitting queries across servers

Without memory, the prices are roughly:

4x846 = $5500
4x848 = $6500
4x850 = $8000

Memory costs would be:

16GB (1GB DIMMs) = $3000
32GB (2GB DIMMs) = $7500
64GB (4GB DIMMs) = $24000
128GB (8GB DIMMs) = $60000

The prices on the 4GB & 8GB DIMMs are bleeding edge of course. I see a
wide range of for them on Froogle. You will want your vendor of course
to pick memory is proper for the MB and pay whatever premium they'll
charge ontop of it. (Looking at the Tyan website, they only list 2GB
DIMMs as supported for their Quad Opteron MBs but whether that's because

those specs were done before >= 4GB DIMMs were available or an actual
hardware limitation, I dunno.)




Max wrote:
>>
>>With a Quad Opteron (4 memory slots per CPU), you could put 64GB of
RAM
>>onto a single machine using 4GB DIMMs in every slot.
>
>
> We were talking about a similar solution today, but not quite as good
as
> this. You really got me thinking now. 64Gb.. hm that would solve many
> issues. what do you think the cost would be for such a beast ? Any
idea ?
>
> Max
>
>
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Splitting queries across servers

From
"Max"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mike Rylander
> Sent: Friday, January 28, 2005 4:02 AM
> To: Max; PgSql General
> Subject: Re: [GENERAL] Splitting queries across servers
>
>
> On Thu, 27 Jan 2005 20:17:29 -0800, Max <maxdl@adelphia.net> wrote:
> > >
> > >
> > > With a Quad Opteron (4 memory slots per CPU), you could put
> 64GB of RAM
> > > onto a single machine using 4GB DIMMs in every slot.
> >
> > We were talking about a similar solution today, but not quite as good as
> > this. You really got me thinking now. 64Gb.. hm that would solve many
> > issues. what do you think the cost would be for such a beast ?
> Any idea ?
>
> Depending on the vendor and the storage subsystem, between $20k and
> $50k, and if you start with less memory it will be much less.  I can
> attest to the fact that 4 x Opterons work VERY nicely. ;)
>

Thanks.

That would be quite an investment. Interesting. I wonder if we should put
windows or linux on it, since postgresql 8.0 works on windows.

What OS are you running on your quad opteron ?

Anyone's run some postgres performance tests between two machines same
config different OS ?

Max



Re: Splitting queries across servers

From
"Max"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of William Yu
> Sent: Friday, January 28, 2005 11:36 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Splitting queries across servers
>
>
> Without memory, the prices are roughly:
>
> 4x846 = $5500
> 4x848 = $6500
> 4x850 = $8000
>
> Memory costs would be:
>
> 16GB (1GB DIMMs) = $3000
> 32GB (2GB DIMMs) = $7500
> 64GB (4GB DIMMs) = $24000
> 128GB (8GB DIMMs) = $60000
>
> The prices on the 4GB & 8GB DIMMs are bleeding edge of course. I see a
> wide range of for them on Froogle. You will want your vendor of course
> to pick memory is proper for the MB and pay whatever premium they'll
> charge ontop of it. (Looking at the Tyan website, they only list 2GB
> DIMMs as supported for their Quad Opteron MBs but whether that's because
> those specs were done before >= 4GB DIMMs were available or an actual
> hardware limitation, I dunno.)
>

Thank you! That's very thorough. hm... we're talking 35K then. That could be
in my budget.

>
>
>
> Max wrote:
> >>
> >>With a Quad Opteron (4 memory slots per CPU), you could put 64GB of RAM
> >>onto a single machine using 4GB DIMMs in every slot.
> >
> >
> > We were talking about a similar solution today, but not quite as good as
> > this. You really got me thinking now. 64Gb.. hm that would solve many
> > issues. what do you think the cost would be for such a beast ?
> Any idea ?
> >
> > Max
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



Re: Splitting queries across servers

From
"Max"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Dann Corbit
> Sent: Friday, January 28, 2005 12:01 PM
> To: William Yu; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Splitting queries across servers
>
>
> Suppose that you currently need 16 GB to cache everything now.
> I would install (perhaps) 32 GB ram for the initial configuration.
>

Good point. Adding memory as I need it.

> The price of memory drops exponentially, and so waiting for the price to
> drop will give a much lower expense for the cost of the RAM.
>
> The reason to double the ram is the expense of upgrading in terms of
> labor and downtime for the computer.  That can be very significant.  So
> if we double the ram, that should give one or (hopefully) two years
> safety margin.

Downtime is a big deal, however I am planning on using replication with
pgpool.

> If the database is expected to grow exponentially fast, then that is
> another issue.   In such a case, if it can be cost justified, put on the
> largest memory volume that is possible given your financial limitations.

We can't really forecast the growing curve. My bet is that we have a short
term (6 months) need of 32 GB, so I'll just double that and it should give
us visibility for about a year. I hope!

I just realized I never asked that question: What is the maximum size of a
postgresql DB. Can it be anything ?

Max



Re: Splitting queries across servers

From
"Max"
Date:
>
> > -----Original Message-----
> > From: pgsql-general-owner@postgresql.org
> > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mike Rylander
> > Sent: Friday, January 28, 2005 4:02 AM
> > To: Max; PgSql General
> > Subject: Re: [GENERAL] Splitting queries across servers
> >
> >
> > On Thu, 27 Jan 2005 20:17:29 -0800, Max <maxdl@adelphia.net> wrote:
> > > >
> > > >
> > > > With a Quad Opteron (4 memory slots per CPU), you could put
> > 64GB of RAM
> > > > onto a single machine using 4GB DIMMs in every slot.
> > >
> > > We were talking about a similar solution today, but not quite
> as good as
> > > this. You really got me thinking now. 64Gb.. hm that would solve many
> > > issues. what do you think the cost would be for such a beast ?
> > Any idea ?
> >
> > Depending on the vendor and the storage subsystem, between $20k and
> > $50k, and if you start with less memory it will be much less.  I can
> > attest to the fact that 4 x Opterons work VERY nicely. ;)
> >
>
> Thanks.
>
> That would be quite an investment. Interesting. I wonder if we should put
> windows or linux on it, since postgresql 8.0 works on windows.
>
> What OS are you running on your quad opteron ?
>
> Anyone's run some postgres performance tests between two machines same
> config different OS ?
>
> Max
>

I want to mention that this last question is not invitation to flame about
windows against linux ;)
Personal preferences set aside, I am looking at performance consideration.

thanks!

Max



Re: Splitting queries across servers

From
Mike Rylander
Date:
On Sat, 29 Jan 2005 09:55:15 -0800, Max <maxdl@adelphia.net> wrote:
>
>
> > -----Original Message-----
> > From: pgsql-general-owner@postgresql.org
> > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mike Rylander
> > Sent: Friday, January 28, 2005 4:02 AM
> > To: Max; PgSql General
> > Subject: Re: [GENERAL] Splitting queries across servers
> >
> >
> > On Thu, 27 Jan 2005 20:17:29 -0800, Max <maxdl@adelphia.net> wrote:
> > > >
> > > >
> > > > With a Quad Opteron (4 memory slots per CPU), you could put
> > 64GB of RAM
> > > > onto a single machine using 4GB DIMMs in every slot.
> > >
> > > We were talking about a similar solution today, but not quite as good as
> > > this. You really got me thinking now. 64Gb.. hm that would solve many
> > > issues. what do you think the cost would be for such a beast ?
> > Any idea ?
> >
> > Depending on the vendor and the storage subsystem, between $20k and
> > $50k, and if you start with less memory it will be much less.  I can
> > attest to the fact that 4 x Opterons work VERY nicely. ;)
> >
>
> Thanks.
>
> That would be quite an investment. Interesting. I wonder if we should put
> windows or linux on it, since postgresql 8.0 works on windows.

To the best of my knowledge there is no 64-bit Windows for AMD64, but
I've not been forced to use Windows in 5 or 6 years...

>
> What OS are you running on your quad opteron ?

We are using the x86-64 build of Gentoo Linux and compiling our own
copy of PG 8.0.


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Splitting queries across servers

From
"Max"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mike Rylander
> Sent: Saturday, January 29, 2005 11:01 AM
> To: Max
> Cc: PgSql General
> Subject: Re: [GENERAL] Splitting queries across servers
>
>
> On Sat, 29 Jan 2005 09:55:15 -0800, Max <maxdl@adelphia.net> wrote:
> >
> >
> > > -----Original Message-----
> > > From: pgsql-general-owner@postgresql.org
> > > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Mike Rylander
> > > Sent: Friday, January 28, 2005 4:02 AM
> > > To: Max; PgSql General
> > > Subject: Re: [GENERAL] Splitting queries across servers
> > >
> > >
> > > On Thu, 27 Jan 2005 20:17:29 -0800, Max <maxdl@adelphia.net> wrote:
> > > > >
> > > > >
> > > > > With a Quad Opteron (4 memory slots per CPU), you could put
> > > 64GB of RAM
> > > > > onto a single machine using 4GB DIMMs in every slot.
> > > >
> > > > We were talking about a similar solution today, but not
> quite as good as
> > > > this. You really got me thinking now. 64Gb.. hm that would
> solve many
> > > > issues. what do you think the cost would be for such a beast ?
> > > Any idea ?
> > >
> > > Depending on the vendor and the storage subsystem, between $20k and
> > > $50k, and if you start with less memory it will be much less.  I can
> > > attest to the fact that 4 x Opterons work VERY nicely. ;)
> > >
> >
> > Thanks.
> >
> > That would be quite an investment. Interesting. I wonder if we
> should put
> > windows or linux on it, since postgresql 8.0 works on windows.
>
> To the best of my knowledge there is no 64-bit Windows for AMD64, but
> I've not been forced to use Windows in 5 or 6 years...
>

Actually winXP and win server 2003 do support AMD64, it's intel 64 that's
not supported yet.

> >
> > What OS are you running on your quad opteron ?
>
> We are using the x86-64 build of Gentoo Linux and compiling our own
> copy of PG 8.0.

Thanks for reminding me that 64bit translates to: recompile everything you
need!

I think this is exactly the choice of configuration we are going to make.
Someone just reminded me that windows and linux come down to the same
performance, but that the real overhead is on maintenance. It's true that
linux implies less downtime, and you can always upgrade. On top of that,
there's more granularity control on linux.

Thanks for the info.

Max

>
>
> --
> Mike Rylander
> mrylander@gmail.com
> GPLS -- PINES Development
> Database Developer
> http://open-ils.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: Splitting queries across servers

From
William Yu
Date:
Max wrote:
> That would be quite an investment. Interesting. I wonder if we should put
> windows or linux on it, since postgresql 8.0 works on windows.
>
> What OS are you running on your quad opteron ?

Well here's one thing that you can bank on. You can get 64-bit Linux now
while 64-bit Windows is still beta. (The word is that 64-bit Windows is
pretty stable except for the lack of drivers but I'd still feel queazy
about spending this much money to run a beta OS.)

And you absolutely want a 64-bit OS with that much memory. Even on our
development server with just 3GB of RAM, we see quite big jumps in
performance after moving from 32-bit Linux to 64-bit Linux. I'd hate to
think about 64GB of RAM being swapped in and out using PAE.

Re: Splitting queries across servers

From
PFC
Date:
> Thanks for reminding me that 64bit translates to: recompile everything
> you
> need!
>
> I think this is exactly the choice of configuration we are going to make.
> Someone just reminded me that windows and linux come down to the same
> performance, but that the real overhead is on maintenance. It's true that
> linux implies less downtime, and you can always upgrade. On top of that,
> there's more granularity control on linux.

    I won't say anything on linux vs windows performance (although I sure do
have an opinion), but simply keep in mind that postgres on window is young
and postgres on linux is vary mature ; therefore one is likely to have a
lot more performance refinements than the other.

    Also Linux is not that harder to administer (especially gentoo). After
all, all you want is a DB server, you'll install few software. Not like if
you needed 50 different apps.

    In your case recompiling everything is also a bonus because you'll be
sure that everything is 64-bit optimized down to the last driver.
    I wonder how long your killer machine will take to compile the whole OS.
And Postgres too... could you post that ?

    If you are concerned about security, you can compile everything with the
stack protector (Propolice SSP) enabled. It's extremely advisable. All the
other options like grsec et al are nice but require careful thinking,
propolice just needs to be activated. If you don't know what it is I'll
post docs.

Re: Splitting queries across servers

From
"Max"
Date:
>
>     I won't say anything on linux vs windows performance
> (although I sure do
> have an opinion), but simply keep in mind that postgres on window
> is young
> and postgres on linux is vary mature ; therefore one is likely to have a
> lot more performance refinements than the other.

This is a good point.

>
>     In your case recompiling everything is also a bonus because
> you'll be
> sure that everything is 64-bit optimized down to the last driver.
>     I wonder how long your killer machine will take to compile
> the whole OS.
> And Postgres too... could you post that ?

You probably want to ask Mike Rylander (who posted before) because he's
already done it.

>     If you are concerned about security, you can compile
> everything with the
> stack protector (Propolice SSP) enabled. It's extremely
> advisable. All the
> other options like grsec et al are nice but require careful thinking,
> propolice just needs to be activated. If you don't know what it is I'll
> post docs.
>

I have never heard of Propolice SSP. What is it ? Any relation to the honey
'Propolys'. just kidding.

Max




Re: Splitting queries across servers

From
"Max"
Date:
> And you absolutely want a 64-bit OS with that much memory. Even on our
> development server with just 3GB of RAM, we see quite big jumps in
> performance after moving from 32-bit Linux to 64-bit Linux. I'd hate to
> think about 64GB of RAM being swapped in and out using PAE.
>

What's PAE ?

Here's a question for the developers: what's the memory consumption
difference when you move a 3GB database from a 32 bit machine to a 64 ?
given that the whole thing must be in RAM.

Isn't the whole data taking up more RAM because pointers are now 64 bits
instead of 32 ?

Max



Re: Splitting queries across servers

From
PFC
Date:
> I have never heard of Propolice SSP. What is it ? Any relation to the
> honey
> 'Propolys'. just kidding.
>
> Max

    The name says little although I like it.

http://www.gentoo.org/proj/en/hardened/

    I was out of date -- Propolice has been renamed PaX.
    The hardened project has many parts, you should read the help on
grsecurity, but PaX is very interesting :

-------------------------------------------------------------------
 from http://www.gentoo.org/proj/en/hardened/docs/pax-howto.xml :

What is PaX?

PaX is a patch to the Linux kernel that provides hardening in two ways.

The first, ASLR (Address Space Layout Randomization) provides a means to
randomize the addressing scheme of all data loaded into memory. When an
application is built as a PIE (Position Independent Executable), PaX is
able to also randomize the addresses of the application base in addition.

The second protection provided by PaX is non-executable memory. This
prevents a common form of attack where executable code is inserted into
memory by an attacker. More information on PaX can be found throughout
this guide, but the homepage can be found at http://pax.grsecurity.net.

At run time, when a buffer is created, SSP adds a secret random value, the
canary, to the end of the buffer. When the function returns, SSP makes
sure that the canary is still intact. If an attacker were to perform a
buffer overflow, he would overwrite this value and trigger that stack
smashing handler.
-------------------------------------------------------------------
    For instance, imagine you have a version of Samba with the latest
unpatched hole. An attacker can enter. Now if you have PaX all he can do
is crash the process, and his intrusion attempt is detected and logged.

    It's not the final cure for everything, but it covers unpatched holes.

Re: Splitting queries across servers

From
Mike Rylander
Date:
On Sat, 29 Jan 2005 21:18:38 +0100, PFC <lists@boutiquenumerique.com> wrote:

[snip]

>         I wonder how long your killer machine will take to compile the whole OS.
> And Postgres too... could you post that ?

Well, I can't rebuild the whole OS (I'm using the box ... ) but:

postgresql-8.0.0beta4 $ time make -j 5
   ... lots of output ...
real    0m41.274s
user    1m36.315s
sys     0m15.451s

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: Splitting queries across servers

From
"Max"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of PFC
> Sent: Saturday, January 29, 2005 1:25 PM
> To: Max; PgSql General
> Subject: Re: [GENERAL] Splitting queries across servers
>
>
>
> > I have never heard of Propolice SSP. What is it ? Any relation to the
> > honey
> > 'Propolys'. just kidding.
> >
> > Max
>
>     The name says little although I like it.
>
> http://www.gentoo.org/proj/en/hardened/
>
>     I was out of date -- Propolice has been renamed PaX.
>     The hardened project has many parts, you should read the help on
> grsecurity, but PaX is very interesting :
>
> -------------------------------------------------------------------
>  from http://www.gentoo.org/proj/en/hardened/docs/pax-howto.xml :
>


This stuf is really cool. Thanks for the info.

Max


Re: Splitting queries across servers

From
PFC
Date:
> postgresql-8.0.0beta4 $ time make -j 5
>    ... lots of output ...
> real    0m41.274s
> user    1m36.315s
> sys     0m15.451s
>

    Yikes.

Re: Splitting queries across servers

From
William Yu
Date:
Max wrote:
>>And you absolutely want a 64-bit OS with that much memory. Even on our
>>development server with just 3GB of RAM, we see quite big jumps in
>>performance after moving from 32-bit Linux to 64-bit Linux. I'd hate to
>>think about 64GB of RAM being swapped in and out using PAE.
>>
>
>
> What's PAE ?

32-bit OS can only address up to 4GB of RAM. Of that 4GB, operating
systems usually split up the address range as 2GB for userspace and 2GB
for kernel. Depending on what OS/version you use, you might be able to
change this to 3/1 -- and I even recall a wacky mode under Linux that
can get you 4/4 (probably using PAE to swap both areas in and out).

So if you have more than 2/3/4GB of memory, the OS has to create a
"window" under the 4GB space and "swap" memory > 4GB in/out. It's not
really an actual swap of course -- that would absolutely cripple
performance to be unusable. It's more like an moving window that can
point to memory areas > 4GB. Still, there is a performance hit. If
process one needs memory at 6-7GB and process two needs it at 10-11GB,
there's going to be some contention for that window.

> Here's a question for the developers: what's the memory consumption
> difference when you move a 3GB database from a 32 bit machine to a 64 ?
> given that the whole thing must be in RAM.
>
> Isn't the whole data taking up more RAM because pointers are now 64 bits
> instead of 32 ?

Yes and no. Yes in that the memory used directly by Postgres takes twice
the amount of space. No in that the optimal setup for Postgres is to
have small settings for shared_buffers and leave the lion's share of
memory for the OS cache.

I have a bunch of servers -- both 32-bit and 64-bit. The 32-bit servers
@ 10K shared_buffers use 72MB of RAM. The 64-bit servers @ 10K
shared_buffers use 144MB of RAM. If these servers only had 256MB, it
would be a big deal. But since they're all in the gigabytes, the
difference between 3500MB of cache versus 3428 is rather insignificant.

Now if files on 64-bit OSes took up twice the space as files on 32-bit
OSes, the cache could only hold half the amount of data. But as far as I
can see, they don't. I don't think even the blocksizes are twice the
amount because most filesystems under Linux can already store files > 4GB.

Re: Splitting queries across servers

From
Tom Lane
Date:
"Max" <maxdl@adelphia.net> writes:
> Here's a question for the developers: what's the memory consumption
> difference when you move a 3GB database from a 32 bit machine to a 64 ?

Not a lot.

> Isn't the whole data taking up more RAM because pointers are now 64 bits
> instead of 32 ?

There are no pointers in on-disk data, so there's no difference at all
in file sizes, and thus none in shared-buffer requirements --- at least
not from the size of pointers.  (64-bit machines tend to have MAXALIGN
of 8 bytes instead of 4, which implies more alignment padding at the
ends of rows.  But that's usually a few-percent kind of cost, not a
factor of 2.)

The pointer size difference has a larger influence on Postgres' other
internal data structures, such as the shared lock table.  But by
comparison to the shared disk buffers, those generally don't amount
to anything.

            regards, tom lane

Re: Splitting queries across servers

From
"Guy Rouillier"
Date:
Max wrote:
> Actually winXP and win server 2003 do support AMD64, it's intel 64
> that's not supported yet.

To be accurate, Windows AMD64 support is still in beta.  There has been
a IA64 production release for quite some time, though Microsoft recently
announced that they were discontinuing it.

--
Guy Rouillier

Re: Splitting queries across servers

From
Chris
Date:
On Wed, 26 Jan 2005 08:00:36 -0800, Max <maxdl@adelphia.net> wrote:
> Hello,
>
> Our postgresql database is getting too big to be handled by one server. We
> need the database to be in RAM and cannot afford swapping. At the moment,
> we're using only 3GB or RAM, however our business growth is going to drive
> this number into the double digits zone, maybe triple digits.
>
> What are our options ?
>
Depending on what your queries are like, you might take a look at
memcached (www.danga.com).  It's a distributed memory cache that uses
simple key/value storage, and it's extremely fast and uses very little
cpu.

Chris