Thread: Hardware recommendations to scale to silly load

Hardware recommendations to scale to silly load

From

matt

Date:

26 August 2003, 22:39:30

I'm wondering if the good people out there could perhaps give me some
pointers on suitable hardware to solve an upcoming performance issue.
I've never really dealt with these kinds of loads before, so any
experience you guys have would be invaluable.  Apologies in advance for
the amount of info below...

My app is likely to come under some serious load in the next 6 months,
but the increase will be broadly predictable, so there is time to throw
hardware at the problem.

Currently I have a ~1GB DB, with the largest (and most commonly accessed
and updated) two tables having 150,000 and 50,000 rows.

A typical user interaction with the system involves about 15
single-table selects, 5 selects with joins or subqueries, 3 inserts, and
3 updates.  The current hardware probably (based on benchmarking and
profiling) tops out at about 300 inserts/updates *or* 2500 selects per
second.

There are multiple indexes on each table that updates & inserts happen
on.  These indexes are necessary to provide adequate select performance.

Current hardware/software:
Quad 700MHz PIII Xeon/1MB cache
3GB RAM
RAID 10 over 4 18GB/10,000rpm drives
128MB battery backed controller cache with write-back enabled
Redhat 7.3, kernel 2.4.20
Postgres 7.2.3 (stock redhat issue)

I need to increase the overall performance by a factor of 10, while at
the same time the DB size increases by a factor of 50.  e.g. 3000
inserts/updates or 25,000 selects per second, over a 25GB database with
most used tables of 5,000,000 and 1,000,000 rows.

Notably, the data is very time-sensitive, so the active dataset at any
hour is almost certainly going to be more on the order of 5GB than 25GB
(plus I'll want all the indexes in RAM of course).

Also, and importantly, the load comes but one hour per week, so buying a
Starfire isn't a real option, as it'd just sit idle the rest of the
time.  I'm particularly interested in keeping the cost down, as I'm a
shareholder in the company!

So what do I need?  Can anyone who has (or has ever had) that kind of
load in production offer any pointers, anecdotes, etc?  Any theoretical
musings also more than welcome.  Comments upon my sanity will be
referred to my doctor.

If the best price/performance option is a second hand 32-cpu Alpha
running VMS I'd be happy to go that way ;-)

Many thanks for reading this far.

Matt

Re: Hardware recommendations to scale to silly load

From

Bill Moran

Date:

27 August 2003, 11:20:02

matt wrote:
>>Are you sure?  Have you tested the overall application to see if possibly
>>you gain more on insert performance than you lose on select performanc?
>
> Unfortunately dropping any of the indexes results in much worse select
> performance that is not remotely clawed back by the improvement in
> insert performance.

Bummer.  It was just a thought: never assume dropping indexes will hurt
performance.  But, since you've obviously tested ...

> Actually there doesn't really seem to *be* that much improvement in
> insert performance when going from 3 indexes to 2.  I guess indexes must
> be fairly cheap for PG to maintain?

Don't know how "cheap" they are.

I have an app that does large batch updates. I found that if I dropped
the indexes, did the updates and recreated the indexes, it was faster
than doing the updates while the indexes were intact.

It doesn't sound like your app can use that approach, but I thought I'd
throw it out there.

>>It's possible that compiling Postgres manually with proper optimizations
>>could yield some improvements, as well as building a custom kernel in
>>Redhat.
>>
>>Also, you don't mention which filesystem you're using:
>>http://www.potentialtech.com/wmoran/postgresql.php
>
> Yeah, I can imagine getting 5% extra from a slim kernel and
> super-optimised PG.
>
> The FS is ext3, metadata journaling (the default), mounted noatime.

ext3 is more reliable than ext2, but it's 1.1x slower.  You can squeeze
a little performance by using Reiser or JFS, if you're not willing to
take the risk of ext2, either way, it's a pretty minor improvement.

Does noatime make much difference on a PostgreSQL database?  I haven't
tested that yet.

>>But if you're in the situation where you have more time than money,
>>you may find that an overall audit of your app is worthwhile.  Consider
>>taking parts that are in perl (for example) and recoding them into C
>>(that is, unless you've already identified that all the bottlenecks are
>>at the PostgreSQL server)
>
> I can pretty cheaply add more CPU horsepower for the app servers, as
> they scale horizontally, so I can chuck in a couple (or 3, or 4, or ...)
> more dual-cpu boxen with a gig of ram and tell the load balancer about
> them.  The problem with the DB is that that approach simply won't work -
> the box just has to get bigger!

Can you split it onto multiple boxes?  Some database layouts lend themselves
to this, others don't.  Obviously you can't do joins from one server to
another, so you may lose more in multiple queries than you gain by having
multiple servers.  It's worth looking into though.

I know my answers aren't quite the ones you were looking for, but my
experience is that many people try to solve poor application design
by simply throwing bigger hardware at the problem.  It appears as though
you've already done your homework, though.

Hope this has been _some_ help.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Hardware recommendations to scale to silly load

From

matt

Date:

27 August 2003, 22:11:58

> You probably, more than anything, should look at some kind of
> superfast, external storage array

Yeah, I think that's going to be a given.  Low end EMC FibreChannel
boxes can do around 20,000 IOs/sec, which is probably close to good
enough.

You mentioned using multiple RAID controllers as a boost - presumably
the trick here is to split the various elements (WAL, tables, indexes)
across different controllers using symlinks or suchlike?  Can I feasibly
split the DB tables across 5 or more controllers?

> > Also, and importantly, the load comes but one hour per week, so buying a
> > Starfire isn't a real option, as it'd just sit idle the rest of the
> > time.  I'm particularly interested in keeping the cost down, as I'm a
> > shareholder in the company!
>
> Interesting.  If you can't spread the load out, can you batch some parts
> of it?  Or is the whole thing interactive therefore needing to all be
> done in real time at once?

All interactive I'm afraid.  It's a micropayment system that's going to
be used here in the UK to do online voting for a popular TV programme.
The phone voting system has a hard limit of [redacted] million votes per
hour, and the producers would like to be able to tell people to vote
online if the phone lines are busy.  They can vote online anyway, but we
expect the average viewer to have to make 10 calls just to get through
during peak times, so the attraction is obvious.

> whether you like it or not, you're gonna need heavy iron if you need to do
> this all in one hour once a week.

Yeah, I need to rent a Starfire for a month later this year, anybody got
one lying around?  Near London?

> Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I
> saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going
> for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should
> fly.

Jeez, and I thought I was joking about the Starfire.  Even Slowaris
would be OK on one of them.

The financial issue is that there's just not that much money in the
micropayments game for bursty sales.  If I was doing these loads
*continuously* then I wouldn't be working, I'd be in the Maldives :-)

I'm also looking at renting equipment, or even trying out IBM/HP's
'on-demand' offerings.

Re: Hardware recommendations to scale to silly load

From

"scott.marlowe"

Date:

27 August 2003, 22:30:36

On 27 Aug 2003, matt wrote:

> I'm wondering if the good people out there could perhaps give me some
> pointers on suitable hardware to solve an upcoming performance issue.
> I've never really dealt with these kinds of loads before, so any
> experience you guys have would be invaluable.  Apologies in advance for
> the amount of info below...
>
> My app is likely to come under some serious load in the next 6 months,
> but the increase will be broadly predictable, so there is time to throw
> hardware at the problem.
>
> Currently I have a ~1GB DB, with the largest (and most commonly accessed
> and updated) two tables having 150,000 and 50,000 rows.
>
> A typical user interaction with the system involves about 15
> single-table selects, 5 selects with joins or subqueries, 3 inserts, and
> 3 updates.  The current hardware probably (based on benchmarking and
> profiling) tops out at about 300 inserts/updates *or* 2500 selects per
> second.
>
> There are multiple indexes on each table that updates & inserts happen
> on.  These indexes are necessary to provide adequate select performance.
>
> Current hardware/software:
> Quad 700MHz PIII Xeon/1MB cache
> 3GB RAM
> RAID 10 over 4 18GB/10,000rpm drives
> 128MB battery backed controller cache with write-back enabled
> Redhat 7.3, kernel 2.4.20
> Postgres 7.2.3 (stock redhat issue)
>
> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000
> inserts/updates or 25,000 selects per second, over a 25GB database with
> most used tables of 5,000,000 and 1,000,000 rows.

It will likely take a combination of optimizing your database structure /
methods and increasing your hardware / OS performance.

You probably, more than anything, should look at some kind of
superfast, external storage array that has dozens of drives, and a large
battery backed cache.  You may be able to approximate this yourself with
just a few dual channel Ultra 320 SCSI cards and a couple dozen hard
drives.  The more spindles you throw at a database, generally speaking,
the more parallel load it can handle.

You may find that once you get to 10 or 20 drives, RAID 5 or 5+0 or 0+5
will be outrunning 1+0/0+1 due to fewer writes.

You likely want to look at the fastest CPUs with the fastest memory you
can afford.  those 700MHz xeons are likely using PC133 memory, which is
painfully slow compared to the stuff pumping data out at 4 to 8 times the
rate of the older stuff.

Maybe an SGI Altix could do this?  Have you looked at them?  They're not
cheap, but they do look to be quite fast, and can scale to 64 CPUs if need
be.  They're interbox communication fabric is faster than most CPU's front
side busses.

> Notably, the data is very time-sensitive, so the active dataset at any
> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).
>
> Also, and importantly, the load comes but one hour per week, so buying a
> Starfire isn't a real option, as it'd just sit idle the rest of the
> time.  I'm particularly interested in keeping the cost down, as I'm a
> shareholder in the company!

Interesting.  If you can't spread the load out, can you batch some parts
of it?  Or is the whole thing interactive therefore needing to all be
done in real time at once?

> So what do I need?

whether you like it or not, you're gonna need heavy iron if you need to do
this all in one hour once a week.

> Can anyone who has (or has ever had) that kind of
> load in production offer any pointers, anecdotes, etc?  Any theoretical
> musings also more than welcome.  Comments upon my sanity will be
> referred to my doctor.
>
> If the best price/performance option is a second hand 32-cpu Alpha
> running VMS I'd be happy to go that way ;-)

Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I
saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going
for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should
fly.

Re: Hardware recommendations to scale to silly load

From

matt

Date:

27 August 2003, 22:40:41

> Don't know how "cheap" they are.
>
> I have an app that does large batch updates. I found that if I dropped
> the indexes, did the updates and recreated the indexes, it was faster
> than doing the updates while the indexes were intact.

Yeah, unfortunately it's not batch work, but real time financial work.
If I drop all the indexes my select performance goes through the floor,
as you'd expect.

> Does noatime make much difference on a PostgreSQL database?  I haven't
> tested that yet.

Yup, it does.  In fact it should probably be in the standard install
documentation (unless someone has a reason why it shouldn't).  Who
*cares* when PG last looked at the tables?  If 'nomtime' was available
that would probably be a good thing too.

> Can you split it onto multiple boxes?  Some database layouts lend themselves
> to this, others don't.  Obviously you can't do joins from one server to
> another, so you may lose more in multiple queries than you gain by having
> multiple servers.  It's worth looking into though.

I'm considering that.  There are some tables which I might be able to
split out.  There amy even be some things I can pull from the DB
altogether (session info in particular, so long as I can reliably send a
given user's requests to the same app server each time, bearing in mind
I can't see the cookies too easily because 50% of the requests are over
SSL)

> I know my answers aren't quite the ones you were looking for, but my
> experience is that many people try to solve poor application design
> by simply throwing bigger hardware at the problem.  It appears as though
> you've already done your homework, though.

Well, I *hope* that's the case!  The core issue is simply that we have
to deal with an insane load for 1 hour a week, and there's just no
avoiding it.

Maybe I can get Sun/HP/IBM to lend some gear (it's a pretty high-profile
site).

Improving simple textsearch?

From

Fabian Kreitner

Date:

27 August 2003, 22:44:53

Hi,

can anyone point me to information regarding this please?

Objective is to find entries that match one (or more) supplied strings in
two tables. The first has about 20.000 entries with 1 varchar field to
check, the other about 40.000 with 5 varchar fields to check. The currently
used sequential scan is getting too expensive.

Thanks,
   Fabian

Re: Hardware recommendations to scale to silly load

From

Christopher Browne

Date:

27 August 2003, 22:57:55

Martha Stewart called it a Good Thing whenmatt@ymogen.net (matt)wrote:
> I'm also looking at renting equipment, or even trying out IBM/HP's
> 'on-demand' offerings.

You're assuming that this is likely to lead to REAL savings, and that
seems unlikely.

During the recent power outage in the NorthEast, people looking for
generators and fuel were paying _premium_ prices, not discounted
prices.

If your hardware requirement leads to someone having to buy hardware
to support your peak load, then _someone_ has to pay the capital cost,
and that someone is unlikely to be IBM or HP.  "Peak demand" equipment
is likely to attract pretty "peaked" prices.

If you can find someone who needs the hardware during the day, but who
_never_ needs it during your needful hours, then there might be an
arrangement to be had, assuming the "someone else" trusts you to use
what's, at other times, their hardware, and assuming you trust them
with the financial information you're managing.
--
select 'cbbrowne' || '@' || 'ntlug.org';
http://www3.sympatico.ca/cbbrowne/linux.html
Rules of the   Evil Overlord #170.  "I  will  be  an equal-opportunity
despot and make sure that terror and oppression is distributed fairly,
not just against one particular  group that  will  form the core of  a
rebellion." <http://www.eviloverlord.com/>

Re: Hardware recommendations to scale to silly load

From

matt

Date:

27 August 2003, 23:03:13

> Are you sure?  Have you tested the overall application to see if possibly
> you gain more on insert performance than you lose on select performanc?

Unfortunately dropping any of the indexes results in much worse select
performance that is not remotely clawed back by the improvement in
insert performance.

Actually there doesn't really seem to *be* that much improvement in
insert performance when going from 3 indexes to 2.  I guess indexes must
be fairly cheap for PG to maintain?

> It's possible that compiling Postgres manually with proper optimizations
> could yield some improvements, as well as building a custom kernel in
> Redhat.
>
> Also, you don't mention which filesystem you're using:
> http://www.potentialtech.com/wmoran/postgresql.php

Yeah, I can imagine getting 5% extra from a slim kernel and
super-optimised PG.

The FS is ext3, metadata journaling (the default), mounted noatime.

> But if you're in the situation where you have more time than money,
> you may find that an overall audit of your app is worthwhile.  Consider
> taking parts that are in perl (for example) and recoding them into C
> (that is, unless you've already identified that all the bottlenecks are
> at the PostgreSQL server)

I can pretty cheaply add more CPU horsepower for the app servers, as
they scale horizontally, so I can chuck in a couple (or 3, or 4, or ...)
more dual-cpu boxen with a gig of ram and tell the load balancer about
them.  The problem with the DB is that that approach simply won't work -
the box just has to get bigger!

> I doubt if the suggestions I've made are going to get you 10x, but they
> may get you 2x, and then you only need the hardware to do 5x.

It all helps :-)  A few percent here, a few percent there, pretty soon
you're talking serious improvements...

Thanks

Matt

Re: Hardware recommendations to scale to silly load

From

Andrew Sullivan

Date:

27 August 2003, 23:18:21

On Wed, Aug 27, 2003 at 02:35:13AM +0100, matt wrote:

> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000
> inserts/updates or 25,000 selects per second, over a 25GB database with
> most used tables of 5,000,000 and 1,000,000 rows.

Your problem is mostly going to be disk related.  You can only get in
there as many tuples in a second as your disk rotates per second.  I
suspect what you need is really expensive disk hardware (sorry to
tell you that) set up as RAID 1+0 on fibre channel or something.
3000 write transactions per second is probably too much to ask for
any standard hardware.

But given that you are batching this once a week, and trying to avoid
big expenses, are you use this is the right approach?  Perhaps you
should consider a redesign using COPY and such?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Hardware recommendations to scale to silly load

From

Christopher Browne

Date:

27 August 2003, 23:24:28

After takin a swig o' Arrakan spice grog, scott.marlowe@ihs.com
("scott.marlowe") belched out... :-):
> whether you like it or not, you're gonna need heavy iron if you need
> to do this all in one hour once a week.

The other thing worth considering is trying to see if there is a way
of partitioning the workload across multiple hosts.

At the point that you start going past hardware that is
"over-the-counter commodity" stuff, the premiums start getting pretty
high.  Dual-CPU Intel boxes are pretty cheap compared to buncha-CPU
Sparc boxes.

If some sort of segmentation of the workload can be done, whether by
area code, postal code, or perhaps the last couple digits of the
caller's phone number, or even a "round robin," it's likely to be a
lot cheaper to get an array of 4 Dual-Xeon boxes with 8 disk drives
apiece than a Sun/HP/IBM box with 16 CPUs.
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/linuxxian.html
"Show me... show me... show me... COMPUTERS!"

Re: Hardware recommendations to scale to silly load

From

Bill Moran

Date:

27 August 2003, 23:30:07

Christopher Browne wrote:
> Martha Stewart called it a Good Thing whenmatt@ymogen.net (matt)wrote:
>
>>I'm also looking at renting equipment, or even trying out IBM/HP's
>>'on-demand' offerings.
>
> You're assuming that this is likely to lead to REAL savings, and that
> seems unlikely.
>
> During the recent power outage in the NorthEast, people looking for
> generators and fuel were paying _premium_ prices, not discounted
> prices.
>
> If your hardware requirement leads to someone having to buy hardware
> to support your peak load, then _someone_ has to pay the capital cost,
> and that someone is unlikely to be IBM or HP.  "Peak demand" equipment
> is likely to attract pretty "peaked" prices.
>
> If you can find someone who needs the hardware during the day, but who
> _never_ needs it during your needful hours, then there might be an
> arrangement to be had, assuming the "someone else" trusts you to use
> what's, at other times, their hardware, and assuming you trust them
> with the financial information you're managing.

I hadn't considered this, but that's not a bad idea.

With FreeBSD, you have jails, which allow multiple users to share
hardware without having to worry about user A looking at user B's
stuff.  Does such a paradigm exist on any heavy iron?  I have no
idea where you'd go to find this kind of "co-op" server leasing,
but it sure sounds like it could work.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

27 August 2003, 23:50:51

On Tue, 2003-08-26 at 20:35, matt wrote:
> I'm wondering if the good people out there could perhaps give me some
> pointers on suitable hardware to solve an upcoming performance issue.
> I've never really dealt with these kinds of loads before, so any
> experience you guys have would be invaluable.  Apologies in advance for
> the amount of info below...
>
> My app is likely to come under some serious load in the next 6 months,
> but the increase will be broadly predictable, so there is time to throw
> hardware at the problem.
>
> Currently I have a ~1GB DB, with the largest (and most commonly accessed
> and updated) two tables having 150,000 and 50,000 rows.
>
> A typical user interaction with the system involves about 15
> single-table selects, 5 selects with joins or subqueries, 3 inserts, and
> 3 updates.  The current hardware probably (based on benchmarking and
> profiling) tops out at about 300 inserts/updates *or* 2500 selects per
> second.
>
> There are multiple indexes on each table that updates & inserts happen
> on.  These indexes are necessary to provide adequate select performance.
>
> Current hardware/software:
> Quad 700MHz PIII Xeon/1MB cache
> 3GB RAM
> RAID 10 over 4 18GB/10,000rpm drives
> 128MB battery backed controller cache with write-back enabled

Much more cache needed.  Say 512MB per controller?

> Redhat 7.3, kernel 2.4.20
> Postgres 7.2.3 (stock redhat issue)

Upgrade to Pg 7.3.4!

> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000

Are you *sure* about that????  3K updates/inserts per second xlates
to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!

> inserts/updates or 25,000 selects per second, over a 25GB database with

Likewise: 90,000,000 selects per hour.

> most used tables of 5,000,000 and 1,000,000 rows.
>
> Notably, the data is very time-sensitive, so the active dataset at any

During the 1 hour surge, will SELECTs at 10 minutes after the
hour depend on INSERTs at 5 minutes after the hour?

If not, maybe you could pump the INSERT/UPDATE records into
flat files, to be processed after the 1-hour surge is complete.
That may reduce the h/w requirements.

> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).
>
> Also, and importantly, the load comes but one hour per week, so buying a

Only one hour out of 168?????  May I ask what kind of app it is?

> Starfire isn't a real option, as it'd just sit idle the rest of the
> time.  I'm particularly interested in keeping the cost down, as I'm a
> shareholder in the company!

What a fun exercises.  Ok, lets see:
Postgres 7.3.4
RH AS 2.1
12GB RAM
motherboard with 64 bit 66MHz PCI slots
4 - Xenon 3.0GHz (1MB cache) CPUs
8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
    having 512MB cache (for database)
2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
    having 512MB cache (for OS, swap, WAL files)
1 - library tape drive plugged into the OS' SCSI controller.  I
    prefer DLT, but that's my DEC bias.
1 - 1000 volt UPS.

If you know when the flood will be coming, you could perform
SELECT * FROM ... WHERE statements on an indexed field, to
pull the relevant data into Linux's buffers.

Yes, the 8 disks is capacity-overkill, but the 8 high-speed
spindles is what you're looking for.

> So what do I need?  Can anyone who has (or has ever had) that kind of
> load in production offer any pointers, anecdotes, etc?  Any theoretical
> musings also more than welcome.  Comments upon my sanity will be
> referred to my doctor.
>
> If the best price/performance option is a second hand 32-cpu Alpha
> running VMS I'd be happy to go that way ;-)

I'd love to work on a GS320!  You may even pick one up for a million
or 2.  The license costs for VMS & Rdb would eat you, though.

Rdb *does* have ways, though, using large buffers and hashed indexes,
with the table tuples stored on the same page as the hashed index
keys, to make such accesses *blazingly* fast.

> Many thanks for reading this far.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"A C program is like a fast dance on a newly waxed dance floor
by people carrying razors."
Waldi Ravens

Re: Hardware recommendations to scale to silly load

From

Bill Moran

Date:

27 August 2003, 23:53:07

matt wrote:
> I'm wondering if the good people out there could perhaps give me some
> pointers on suitable hardware to solve an upcoming performance issue.
> I've never really dealt with these kinds of loads before, so any
> experience you guys have would be invaluable.  Apologies in advance for
> the amount of info below...
>
> My app is likely to come under some serious load in the next 6 months,
> but the increase will be broadly predictable, so there is time to throw
> hardware at the problem.
>
> Currently I have a ~1GB DB, with the largest (and most commonly accessed
> and updated) two tables having 150,000 and 50,000 rows.
>
> A typical user interaction with the system involves about 15
> single-table selects, 5 selects with joins or subqueries, 3 inserts, and
> 3 updates.  The current hardware probably (based on benchmarking and
> profiling) tops out at about 300 inserts/updates *or* 2500 selects per
> second.
>
> There are multiple indexes on each table that updates & inserts happen
> on.  These indexes are necessary to provide adequate select performance.

Are you sure?  Have you tested the overall application to see if possibly
you gain more on insert performance than you lose on select performanc?

(Hey, you asked for musings ...)

> Current hardware/software:
> Quad 700MHz PIII Xeon/1MB cache
> 3GB RAM
> RAID 10 over 4 18GB/10,000rpm drives
> 128MB battery backed controller cache with write-back enabled
> Redhat 7.3, kernel 2.4.20
> Postgres 7.2.3 (stock redhat issue)

It's possible that compiling Postgres manually with proper optimizations
could yield some improvements, as well as building a custom kernel in
Redhat.

Also, you don't mention which filesystem you're using:
http://www.potentialtech.com/wmoran/postgresql.php

> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000
> inserts/updates or 25,000 selects per second, over a 25GB database with
> most used tables of 5,000,000 and 1,000,000 rows.
>
> Notably, the data is very time-sensitive, so the active dataset at any
> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).
>
> Also, and importantly, the load comes but one hour per week, so buying a
> Starfire isn't a real option, as it'd just sit idle the rest of the
> time.  I'm particularly interested in keeping the cost down, as I'm a
> shareholder in the company!

I can't say for sure without looking at your application overall, but
many applications I've seen could be optimized.  It's usually a few
seconds here and there that take hours to find and tweak.

But if you're in the situation where you have more time than money,
you may find that an overall audit of your app is worthwhile.  Consider
taking parts that are in perl (for example) and recoding them into C
(that is, unless you've already identified that all the bottlenecks are
at the PostgreSQL server)

I doubt if the suggestions I've made are going to get you 10x, but they
may get you 2x, and then you only need the hardware to do 5x.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

28 August 2003, 01:09:26

On Wed, 2003-08-27 at 21:26, Bill Moran wrote:
> Christopher Browne wrote:
> > Martha Stewart called it a Good Thing whenmatt@ymogen.net (matt)wrote:
[snip]
> With FreeBSD, you have jails, which allow multiple users to share
> hardware without having to worry about user A looking at user B's
> stuff.  Does such a paradigm exist on any heavy iron?  I have no

IBM invented the idea (or maybe stole it) back in the '70s.  The
VM hypervisor was designed as a conversion tool, to let customers
run both OS/MVS and DOS/VSE, to aid in converting from VSE to MVS.

Customers, the cheap, uncooperative beasts, liked VSE, but also liked
VM, since it let them have, for example, a dev, test, and production
"systems" all on the same piece of h/w, thus saving them oodles of
money in h/w costs and maintenance fees.

Yes, yes, the modern term for this is "server consolidation", and
VMware does the same thing, 30 years after dinosaur customers had
it on boxen that academics, analysts and "young whippersnappers"
said were supposed to be extinct 20 years ago.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Knowledge should be free for all."
Harcourt Fenton Mudd, Star Trek:TOS, "I, Mudd"

Re: Hardware recommendations to scale to silly load

From

Tomka Gergely

Date:

28 August 2003, 03:40:43

2003-08-27 ragyogó napján matt ezt üzente:

> Yeah, I can imagine getting 5% extra from a slim kernel and
> super-optimised PG.

Hm, about 20%, but only for the correctness - 20% not help you also :(

> The FS is ext3, metadata journaling (the default), mounted noatime.

Worst fs under linux :) Try xfs.

--
Tomka Gergely
"S most - vajon barbárok nélkül mi lesz velünk?
Ők mégiscsak megoldás voltak valahogy..."

Re: Hardware recommendations to scale to silly load

From

Tomka Gergely

Date:

28 August 2003, 03:52:25

2003-08-27 ragyogó napján Bill Moran ezt üzente:

> With FreeBSD, you have jails, which allow multiple users to share
> hardware without having to worry about user A looking at user B's
> stuff.  Does such a paradigm exist on any heavy iron?  I have no

Of course. All IBM hw can do this, because on all ibm hw runs linux, and
linux have more ways to do this :)

--
Tomka Gergely
"S most - vajon barbárok nélkül mi lesz velünk?
Ők mégiscsak megoldás voltak valahogy..."

Re: Hardware recommendations to scale to silly load

From

matt

Date:

28 August 2003, 05:17:57

> Are you *sure* about that????  3K updates/inserts per second xlates
> to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!

Yup, I know!

> During the 1 hour surge, will SELECTs at 10 minutes after the
> hour depend on INSERTs at 5 minutes after the hour?

Yes, they do.  It's a payments system, so things like account balances
and purchase histories have to be updated in real time.

> Only one hour out of 168?????  May I ask what kind of app it is?

Online voting for an unnamed TV show...

> > If the best price/performance option is a second hand 32-cpu Alpha
> > running VMS I'd be happy to go that way ;-)
>
> I'd love to work on a GS320!  You may even pick one up for a million
> or 2.  The license costs for VMS & Rdb would eat you, though.

You'd be amazed how little they do go for actually :-)

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

28 August 2003, 06:03:12

On Thu, 2003-08-28 at 03:17, matt wrote:
> > Are you *sure* about that????  3K updates/inserts per second xlates
> > to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!
>
> Yup, I know!
>
> > During the 1 hour surge, will SELECTs at 10 minutes after the
> > hour depend on INSERTs at 5 minutes after the hour?
>
> Yes, they do.  It's a payments system, so things like account balances
> and purchase histories have to be updated in real time.
>
> > Only one hour out of 168?????  May I ask what kind of app it is?
>
> Online voting for an unnamed TV show...
>
> > > If the best price/performance option is a second hand 32-cpu Alpha
> > > running VMS I'd be happy to go that way ;-)
> >
> > I'd love to work on a GS320!  You may even pick one up for a million
> > or 2.  The license costs for VMS & Rdb would eat you, though.
>
> You'd be amazed how little they do go for actually :-)

Then that's what I'd do.  VMS, Rdb, (your favorite 3GL language).
Presumably the SELECT statements will be direct lookup instead
of range retrieval?  If so, then I'd create a *large* amount of
GLOBAL BUFFERS, many MIXED AREAs, tables PLACED VIA HASHED INDEXES
so that the index nodes are on the same page as the corresponding
tuples.  Thus, 1 disk I/O gets both the relevant index key, plus
the tuple.  (Each I/O reads 3 pages into GBs [Global Buffers], so
that if a later statement needs a records nearby, it's already in
RAM.)

With fast storage controllers (dual-redundant, with 512MB each)
you could even use RAID5, and your app may not even know the diffie.
Of course, since the requirements are *so* extreme, better still
stick to RAID10.

I know that a certain pharmaceutical company had a similar situation,
where test results would flood in every morning.  A certain North-
eastern US wireless phone company needed to record every time every
phone call was started and stopped.

The technique I described is how both of these high-volume apps
solved The Need For Speed.

With VMS 7.3 and Rdb 7.1.04 and, oh, 16GB RAM, a carefully crafted
stored procedure run an hour or 2 before the show could pull the
necessary 5GB slice of the DB into GBs, and you'd reduce the I/O
load during the show itself.

Sorry it's not PostgreSQL, but I *know* that Rdb+VMS could handle
the task...

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"You ask us the same question every day, and we give you the
same answer every day. Someday, we hope that you will believe us..."
U.S. Secretary of Defense Donald Rumsfeld, to a reporter

Re: Hardware recommendations to scale to silly load

From

Chris Bowlby

Date:

28 August 2003, 11:09:42

On Tue, 2003-08-26 at 23:59, Ron Johnson wrote:

> What a fun exercises.  Ok, lets see:
> Postgres 7.3.4
> RH AS 2.1
> 12GB RAM
> motherboard with 64 bit 66MHz PCI slots
> 4 - Xenon 3.0GHz (1MB cache) CPUs
> 8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
>     having 512MB cache (for database)
> 2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
>     having 512MB cache (for OS, swap, WAL files)
> 1 - library tape drive plugged into the OS' SCSI controller.  I
>     prefer DLT, but that's my DEC bias.
> 1 - 1000 volt UPS.

 Be careful here, we've seen that with the P4 Xeon's that are
hyper-threaded and a system that has very high disk I/O causes the
system to be sluggish and slow. But after disabling the hyper-threading
itself, our system flew..

--
Chris Bowlby <excalibur@hub.org>
Hub.Org Networking Services

Re: Hardware recommendations to scale to silly load

From

"Shridhar Daithankar"

Date:

28 August 2003, 11:22:08

On 28 Aug 2003 at 11:05, Chris Bowlby wrote:

> On Tue, 2003-08-26 at 23:59, Ron Johnson wrote:
>
> > What a fun exercises.  Ok, lets see:
> > Postgres 7.3.4
> > RH AS 2.1
> > 12GB RAM
> > motherboard with 64 bit 66MHz PCI slots
> > 4 - Xenon 3.0GHz (1MB cache) CPUs
> > 8 - 36GB 15K RPM as RAID10 on a 64 bit 66MHz U320 controller
> >     having 512MB cache (for database)
> > 2 - 36GB 15K RPM as RAID1 on a 64 bit 66MHz U320 controller
> >     having 512MB cache (for OS, swap, WAL files)
> > 1 - library tape drive plugged into the OS' SCSI controller.  I
> >     prefer DLT, but that's my DEC bias.
> > 1 - 1000 volt UPS.
>
>  Be careful here, we've seen that with the P4 Xeon's that are
> hyper-threaded and a system that has very high disk I/O causes the
> system to be sluggish and slow. But after disabling the hyper-threading
> itself, our system flew..

Anybody has opteron working? Hows' the performance?

Bye
 Shridhar

--
A father doesn't destroy his children.        -- Lt. Carolyn Palamas, "Who Mourns for
Adonais?",           stardate 3468.1.

Re: Hardware recommendations to scale to silly load

From

Vivek Khera

Date:

28 August 2003, 11:40:30

sm> On 27 Aug 2003, matt wrote:

>> My app is likely to come under some serious load in the next 6 months,
>> but the increase will be broadly predictable, so there is time to throw
>> hardware at the problem.
>>
>> Currently I have a ~1GB DB, with the largest (and most commonly accessed
>> and updated) two tables having 150,000 and 50,000 rows.

Just how big do you expect your DB to grow?  For a 1GB disk-space
database, I'd probably just splurge for an SSD hooked up either via
SCSI or FibreChannel.  Heck, up to about 5Gb or so it is not that
expensive (about $25k) and adding another 5Gb should set you back
probably another $20k.  I use an SSD from Imperial Technology
( http://www.imperialtech.com/ ) for mail spools.  My database is way
to big for my budget to put in SSD.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Hardware recommendations to scale to silly load

From

Rod Taylor

Date:

28 August 2003, 11:55:01

> I need to increase the overall performance by a factor of 10, while at
> the same time the DB size increases by a factor of 50.  e.g. 3000
> inserts/updates or 25,000 selects per second, over a 25GB database with
> most used tables of 5,000,000 and 1,000,000 rows.

Ok.. I would be surprised if you needed much more actual CPU power. I
suspect they're mostly idle waiting on data -- especially with a Quad
Xeon (shared memory bus is it not?).

I'd be looking to get your hands on a large pSeries machine from IBM or
perhaps an 8-way Opteron (not that hard to come by today, should be easy
in the near future). The key is low latency ram tied to a chip rather
than a centralized bus -- a 3800 SunFire would do too ;).

Write performance won't matter very much. 3000 inserts/second isn't high
-- some additional battery backed write cache may be useful but not
overly important with enough ram to hold the complete dataset. I suspect
those are slow due to things like foreign keys -- which of course are
selects.

> Notably, the data is very time-sensitive, so the active dataset at any
> hour is almost certainly going to be more on the order of 5GB than 25GB
> (plus I'll want all the indexes in RAM of course).

Very good. Find yourself 8GB to 12GB ram and you should be fine. In this
case, additional ram will keep the system from hitting the disk for
writes as well.

You may want to play around with checkpoints. Prevention of a checkpoint
during this hour will help prevent peaks. Be warned though, WAL will
grow very large, and recovery time should a crash occur could be
painful.

You say the data is very time sensitive -- how time sensitive? Are the
selects all based on this weeks data? A copy of the database on a second
machine (say your Quad Xeon) for static per client data would be very
useful to reduce needless load. I assume the application servers have
already cached any static global data by this point.

Finally, upgrade to 7.4. Do use prepared statements. Do limit the number
of connections any given application server is allowed (especially for
short transactions). 3 PostgreSQL processes per CPU (where the box limit
is not Disk) seems to be about right -- your OS may vary.

Pre-calculate anything you can. Are the $ amounts for a transaction
generally the the same? Do you tend to have repeat clients? Great --
make your current clients transactions a day in advance. Now you have a
pair of selects and 1 update (mark it with the time the client actually
approved it). If the client doesn't approve of the pre-calculated
transaction, throw it away at some later time.

Attachment

signature.asc

Re: Hardware recommendations to scale to silly load

From

"scott.marlowe"

Date:

28 August 2003, 12:35:52

On 27 Aug 2003, matt wrote:

> > You probably, more than anything, should look at some kind of
> > superfast, external storage array
>
> Yeah, I think that's going to be a given.  Low end EMC FibreChannel
> boxes can do around 20,000 IOs/sec, which is probably close to good
> enough.
>
> You mentioned using multiple RAID controllers as a boost - presumably
> the trick here is to split the various elements (WAL, tables, indexes)
> across different controllers using symlinks or suchlike?  Can I feasibly
> split the DB tables across 5 or more controllers?

I'm not sure I'd split the tables by hand right up front.  Try getting as
many hard drives as you can afford hooked up at once, and then try
different ways of partitioning them.  I'm guessing that making two fairly
good sized 1+0 sets, one for data and one for WAL might be the best
answer.

> > Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I
> > saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going
> > for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should
> > fly.
>
> Jeez, and I thought I was joking about the Starfire.  Even Slowaris
> would be OK on one of them.
>
> The financial issue is that there's just not that much money in the
> micropayments game for bursty sales.  If I was doing these loads
> *continuously* then I wouldn't be working, I'd be in the Maldives :-)

$24,000 isn't that much for a server really, and if you can leverage this
one "sale" to get more, then it would likely pay for itself over time.

If you have problems keeping up with load, it will be harder to get more
customers, so you kinda wanna do this as well as possible the first time.

Re: Hardware recommendations to scale to silly load

From

Andrew Sullivan

Date:

28 August 2003, 13:20:28

On Wed, Aug 27, 2003 at 05:49:25PM +0100, matt wrote:
>
> I'm also looking at renting equipment, or even trying out IBM/HP's
> 'on-demand' offerings.

To handle that kind of load, you're not going to be able to do it
with cheap hardware.  Renting may be your answer.

a

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Hardware recommendations to scale to silly load

From

"Matt Clark"

Date:

28 August 2003, 13:31:17

> Just how big do you expect your DB to grow?  For a 1GB disk-space
> database, I'd probably just splurge for an SSD hooked up either via
> SCSI or FibreChannel.  Heck, up to about 5Gb or so it is not that
> expensive (about $25k) and adding another 5Gb should set you back
> probably another $20k.  I use an SSD from Imperial Technology
> ( http://www.imperialtech.com/ ) for mail spools.  My database is way
> to big for my budget to put in SSD.

I may well be able to split some tables that aren't used in joins into a separate DB, and could well use an SSD for
those.

In fact two of the inserts per user interaction could be split off, and they're not financially important tables, so
fsync=false
could be enabled for those, in which case an SSD might be overkill...

The whole thing will definitely *not* fit in an SSD for a sensible price, but the WAL might well!

Re: Hardware recommendations to scale to silly load

From

"Matt Clark"

Date:

28 August 2003, 13:42:42

> Ok.. I would be surprised if you needed much more actual CPU power. I
> suspect they're mostly idle waiting on data -- especially with a Quad
> Xeon (shared memory bus is it not?).

In reality the CPUs get pegged: about 65% PG and 35% system.  But I agree that memory throughput and latency is an
issue.

> Write performance won't matter very much. 3000 inserts/second isn't high
> -- some additional battery backed write cache may be useful but not
> overly important with enough ram to hold the complete dataset. I suspect
> those are slow due to things like foreign keys -- which of course are
> selects.

3000 inserts/sec isn't high when they're inside one transaction, but if each is inside its own transaction then that's
3000
commits/second.

> case, additional ram will keep the system from hitting the disk for
> writes as well.

How does that work?

> You may want to play around with checkpoints. Prevention of a checkpoint
> during this hour will help prevent peaks. Be warned though, WAL will
> grow very large, and recovery time should a crash occur could be
> painful.

Good point.  I'll have a think about that.

Re: Hardware recommendations to scale to silly load

From

Rod Taylor

Date:

28 August 2003, 15:29:24

On Thu, 2003-08-28 at 12:37, Matt Clark wrote:
> > Ok.. I would be surprised if you needed much more actual CPU power. I
> > suspect they're mostly idle waiting on data -- especially with a Quad
> > Xeon (shared memory bus is it not?).
>
> In reality the CPUs get pegged: about 65% PG and 35% system.  But I agree that memory throughput and latency is an
issue.

system in this case is dealing with disk activity or process switches?

Usually the 65% includes the CPU waiting on a request for data from main
memory. Since you will be moving a lot of data through the CPU, the L1 /
L2 cache doesn't help too much (even large cache), but low latency high
bandwidth memory will make a significant difference.  CPUs not having to
wait on other CPUs doing a memory fetch will make an even larger
difference (dedicated memory bus per CPU).

Good memory is the big ticket item. Sun CPUs are not better than Intel
CPUs, for simple DB interaction. It's the additional memory bandwidth
that makes them shine.  Incidentally, Suns are quite slow with PG for
calculation intensive work on a small dataset.

> > Write performance won't matter very much. 3000 inserts/second isn't high
> > -- some additional battery backed write cache may be useful but not
> > overly important with enough ram to hold the complete dataset. I suspect
> > those are slow due to things like foreign keys -- which of course are
> > selects.
>
> 3000 inserts/sec isn't high when they're inside one transaction, but if each is inside its own transaction then
that's3000 
> commits/second.

Still not anything to concern yourself with.  WAL on battery backed
write cache (with a good controller) will more than suffice -- boils
down to the same as if fsync was disabled. You might want to try putting
it onto it's own controller, but I don't think you will see much of a
change.  20k WAL operations / sec would be something to worry about.

> > case, additional ram will keep the system from hitting the disk for
> > writes as well.
>
> How does that work?

Simple.  Your OS will buffer writes in memory until they are required to
hit disk (fsync or similar).  Modify the appropriate sysctl to inform
the OS it can use more than 10% (10% is the FreeBSD default I believe)
of the memory for writes.  Buffering 4GB of work in memory (WAL logs
will ensure this is crash safe) will nearly eliminate I/O.

When the OS is no longer busy, it will filter the writes from ram back
to disk. Visibly, there is no change to the user aside from a speed
increase.

> > You may want to play around with checkpoints. Prevention of a checkpoint
> > during this hour will help prevent peaks. Be warned though, WAL will
> > grow very large, and recovery time should a crash occur could be
> > painful.
>
> Good point.  I'll have a think about that.

This is more important with a larger buffer. A checkpoint informs the OS
to dump the buffer to disk so it can guarantee it hit hardware (thus
allowing PG to remove / recycle WAL files).

I do think your best bet is to segregate the DB.  Read / write, by user
location, first 4 digits of the credit card, anything will make a much
better system.

Keep a master with all of the data that can take the full week to
process it.

Attachment

signature.asc

Re: Hardware recommendations to scale to silly load

From

William Yu

Date:

29 August 2003, 04:22:48

Shridhar Daithankar wrote:
>> Be careful here, we've seen that with the P4 Xeon's that are
>>hyper-threaded and a system that has very high disk I/O causes the
>>system to be sluggish and slow. But after disabling the hyper-threading
>>itself, our system flew..
>
> Anybody has opteron working? Hows' the performance?

Yes. I'm using an 2x 1.8GHz Opteron system w/ 8GB of RAM. Right now, I'm
still using 32-bit Linux -- I'm letting others be the 64-bit guinea
pigs. :) I probably will get a cheapie 1x Opteron machine first and test
the 64-bit kernel/libraries thoroughly before rolling it out to production.

As for performance, the scaling is magnificient -- even when just using
PAE instead of 64-bit addressing. At low transaction counts, it's only
~75% faster than the 2x Athlon 1800+ MP it replaced. But once the
transactions start coming in, the gap is as high as 5x. My w-a-g: since
each CPU has an integrated memory controller, you avoid memory bus
contention which is probably the major bottleneck as transaction load
increases. (I've seen Opteron several vs Xeon comparisons where
single-connection tests are par for both CPUs but heavy-load tests favor
the Opteron by a wide margin.) I suspect the 4X comparisons would tilt
even more towards AMD's favor.

We should see a boost when we move to 64-bit Linux and hopefully another
one when NUMA for Linux is production-stable.

Re: Hardware recommendations to scale to silly load

From

"Christopher Kings-Lynne"

Date:

29 August 2003, 04:27:13

> We should see a boost when we move to 64-bit Linux and hopefully another
> one when NUMA for Linux is production-stable.

Assuming SCO doesn't make them remove it :P

Chris

Re: Hardware recommendations to scale to silly load

From

"Shridhar Daithankar"

Date:

29 August 2003, 05:16:41

On 29 Aug 2003 at 0:05, William Yu wrote:

> Shridhar Daithankar wrote:
> >> Be careful here, we've seen that with the P4 Xeon's that are
> >>hyper-threaded and a system that has very high disk I/O causes the
> >>system to be sluggish and slow. But after disabling the hyper-threading
> >>itself, our system flew..
> >
> > Anybody has opteron working? Hows' the performance?
>
> Yes. I'm using an 2x 1.8GHz Opteron system w/ 8GB of RAM. Right now, I'm
> still using 32-bit Linux -- I'm letting others be the 64-bit guinea
> pigs. :) I probably will get a cheapie 1x Opteron machine first and test
> the 64-bit kernel/libraries thoroughly before rolling it out to production.

Just a guess here but does a precompiled postgresql for x86 and a x86-64
optimized one makes difference?

Opteron is one place on earth you can watch difference between 32/64 bit on
same machine. Can be handy at times..

>
> As for performance, the scaling is magnificient -- even when just using
> PAE instead of 64-bit addressing. At low transaction counts, it's only
> ~75% faster than the 2x Athlon 1800+ MP it replaced. But once the
> transactions start coming in, the gap is as high as 5x. My w-a-g: since
> each CPU has an integrated memory controller, you avoid memory bus
> contention which is probably the major bottleneck as transaction load
> increases. (I've seen Opteron several vs Xeon comparisons where
> single-connection tests are par for both CPUs but heavy-load tests favor
> the Opteron by a wide margin.) I suspect the 4X comparisons would tilt
> even more towards AMD's favor.

I am sure. But is 64 bit environment, Xeon is not the compitition. It's PA-RSC-
8700, ultraSparcs, Power series and if possible itanium.

I would still expect AMD to compete comfortably given high clock speed. But
chipset need to be competent as well..

I still remember the product I work on, a single CPU PA-RISC 8700 with single
SCSI disc, edged out a quad CPU Xeon with SCSI RAID controller running windows
in terms of scalability while running oracle.

I am not sure if it was windows v/s HP-UX issue but at the end HP machine was
lot better than windows machine. Windows machine shooted ahead for light load
and drooeed dead equally fast with rise in load..

> We should see a boost when we move to 64-bit Linux and hopefully another
> one when NUMA for Linux is production-stable.

Getting a 2.6 running now is the answer to make it stable fast..:-) Of course
if you have spare hardware..

Bye
 Shridhar

--
briefcase, n:    A trial where the jury gets together and forms a lynching party.

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

29 August 2003, 09:05:47

On Fri, 2003-08-29 at 03:18, Shridhar Daithankar wrote:
> On 29 Aug 2003 at 0:05, William Yu wrote:
>
> > Shridhar Daithankar wrote:
[snip]
> > As for performance, the scaling is magnificient -- even when just using
> > PAE instead of 64-bit addressing. At low transaction counts, it's only
> > ~75% faster than the 2x Athlon 1800+ MP it replaced. But once the
> > transactions start coming in, the gap is as high as 5x. My w-a-g: since
> > each CPU has an integrated memory controller, you avoid memory bus
> > contention which is probably the major bottleneck as transaction load
> > increases. (I've seen Opteron several vs Xeon comparisons where
> > single-connection tests are par for both CPUs but heavy-load tests favor
> > the Opteron by a wide margin.) I suspect the 4X comparisons would tilt
> > even more towards AMD's favor.
>
> I am sure. But is 64 bit environment, Xeon is not the compitition. It's PA-RSC-
> 8700, ultraSparcs, Power series and if possible itanium.

IMO, Opti will compete in *both* markets.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Adventure is a sign of incompetence"
Stephanson, great polar explorer

Force table to be permanently in cache?

From

Fabian Kreitner

Date:

29 August 2003, 09:50:25

Hi everyone,

I have a sql request which on first invocation completes in ~12sec but then
drops to ~3sec on the following runs. The 3 seconds would be acceptable but
how can I make sure that the data is cached and all times? Is it simply
enough to set shared_buffers high enough to hold the entire database (and
have enough ram installed of course)? The OS is linux in this case.


Nested Loop  (cost=0.00..11.44 rows=1 width=362) (actual
time=247.83..12643.96 rows=14700 loops=1)
   ->  Index Scan using suchec_testa on suchec  (cost=0.00..6.02 rows=1
width=23) (actual time=69.91..902.68 rows=42223 loops=1)
   ->  Index Scan using idx_dokument on dokument d  (cost=0.00..5.41 rows=1
width=339) (actual time=0.26..0.26 rows=0 loops=42223)
Total runtime: 12662.64 msec


Nested Loop  (cost=0.00..11.44 rows=1 width=362) (actual time=1.18..2829.79
rows=14700 loops=1)
   ->  Index Scan using suchec_testa on suchec  (cost=0.00..6.02 rows=1
width=23) (actual time=0.51..661.75 rows=42223 loops=1)
   ->  Index Scan using idx_dokument on dokument d  (cost=0.00..5.41 rows=1
width=339) (actual time=0.04..0.04 rows=0 loops=42223)
Total runtime: 2846.63 msec

Re: Hardware recommendations to scale to silly load

From

Andrew Sullivan

Date:

29 August 2003, 11:05:27

On Fri, Aug 29, 2003 at 12:05:03AM -0700, William Yu wrote:
> We should see a boost when we move to 64-bit Linux and hopefully another
> one when NUMA for Linux is production-stable.

According to the people who've worked with SGIs, NUMA actually seems
to make things worse.  It has something to do with how the shared
memory is handled.  You'll want to dig through the -general or
-hackers archives from somewhere between 9 and 14 months ago, IIRC.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Force table to be permanently in cache?

From

Andrew Sullivan

Date:

29 August 2003, 11:35:00

On Fri, Aug 29, 2003 at 02:52:10PM +0200, Fabian Kreitner wrote:
> Hi everyone,
>
> I have a sql request which on first invocation completes in ~12sec but then
> drops to ~3sec on the following runs. The 3 seconds would be acceptable but
> how can I make sure that the data is cached and all times? Is it simply
> enough to set shared_buffers high enough to hold the entire database (and
> have enough ram installed of course)? The OS is linux in this case.

If the table gets hit often enough, then it'll be in your filesystem
cache anyway.  See the many discussions of sizing shared_buffers in
the archives of this list for thoughts on how big that should be.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Hardware recommendations to scale to silly load

From

William Yu

Date:

29 August 2003, 13:52:57

Shridhar Daithankar wrote:
> Just a guess here but does a precompiled postgresql for x86 and a x86-64
> optimized one makes difference?
 >
 > Opteron is one place on earth you can watch difference between 32/64
 > bit on same machine. Can be handy at times..

I don't know yet. I tried building a 64-bit kernel and my eyes glazed
over trying to figure out how to create the cross-platform GCC compiler
that's first needed to build the kernel. Then I read all the libraries &
drivers also needed to be 64-bit compiled and at that point gave up the
ghost. I'll wait until a 64-bit Redhat distro is available before I test
the 64-bit capabilities.

The preview SuSE 64-bit Linux used in most of the Opteron rollout tests
has MySql precompiled as 64-bit and under that DB, 64-bit added an extra
  ~25% performance (compared to a 32-bit SuSE install). My guess is half
of the performance comes from eliminating the PAE swapping.

> I am sure. But is 64 bit environment, Xeon is not the compitition. It's PA-RSC-
> 8700, ultraSparcs, Power series and if possible itanium.

Well, just because the Opteron is 64-bit doesn't mean it's direct
competition for the high-end RISC chips. Yes, if you're looking at the
discrete CPU itself, it appears they could compete -- the SpecINT scores
places the Opteron near the top of the list. But big companies also need
the infrastructure, management tools and top-end scalability. If you
just have to have the million dollar machines (128x Itanium2 servers or
whatever), AMD is nowhere close to competing unless Beowulf clusters fit
your needs.

In terms of infrastructure, scalability, mindshare and pricing, Xeon is
most certainly Opteron's main competition. We're talking <$10K servers
versus $50K+ servers (assuming you actually want performance instead of
having a single pokey UltraSparc CPU in a box). And yes, just because
Opteron is a better performing server platform than Xeon doesn't mean a
corporate fuddy-duddy still won't buy Xeon due to the $1B spent by Intel
on marketting.

>>We should see a boost when we move to 64-bit Linux and hopefully another
>>one when NUMA for Linux is production-stable.
>
> Getting a 2.6 running now is the answer to make it stable fast..:-) Of course
> if you have spare hardware..

My office is a pigsty of spare hardware lying around. :) We're like pigs
rolling around in the mud.

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

29 August 2003, 14:32:07

On Fri, 2003-08-29 at 11:33, William Yu wrote:
> Shridhar Daithankar wrote:
[snip]
> > I am sure. But is 64 bit environment, Xeon is not the compitition. It's PA-RSC-
> > 8700, ultraSparcs, Power series and if possible itanium.
>
> Well, just because the Opteron is 64-bit doesn't mean it's direct
> competition for the high-end RISC chips. Yes, if you're looking at the
> discrete CPU itself, it appears they could compete -- the SpecINT scores
> places the Opteron near the top of the list. But big companies also need
> the infrastructure, management tools and top-end scalability. If you
> just have to have the million dollar machines (128x Itanium2 servers or
> whatever), AMD is nowhere close to competing unless Beowulf clusters fit
> your needs.

With the proper motherboards and chipsets, it can definitely compete.

What's so special about Itanic-2 that it can be engineered to be
put in 128x boxes and run VMS and high-end Unix , but Opti can't?
Nothing.  If a company with enough engineering talent wants to do
it, it can happen.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"For me and windows it became a matter of easy to start with,
and becoming increasingly difficult to be productive as time
went on, and if something went wrong very difficult to fix,
compared to linux's large over head setting up and learning the
system with ease of use and the increase in productivity
becoming larger the longer I use the system."
Rohan Nicholls , The Netherlands

Re: Hardware recommendations to scale to silly load

From

Bruce Momjian

Date:

29 August 2003, 23:45:03

matt wrote:
> > Are you *sure* about that????  3K updates/inserts per second xlates
> > to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!
>
> Yup, I know!

Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
drive I can do 4k inserts/second if I turn fsync off.  If you have a
battery-backed controller, you should be able to do the same.  (You will
not need to turn fsync off --- fsync will just be fast because of the
disk drive RAM).

Am I missing something?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

30 August 2003, 02:21:55

On Fri, 2003-08-29 at 21:44, Bruce Momjian wrote:
> matt wrote:
> > > Are you *sure* about that????  3K updates/inserts per second xlates
> > > to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!
> >
> > Yup, I know!
>
> Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
> drive I can do 4k inserts/second if I turn fsync off.  If you have a
> battery-backed controller, you should be able to do the same.  (You will
> not need to turn fsync off --- fsync will just be fast because of the
> disk drive RAM).
>
> Am I missing something?

Is that
    FOR I BETWEEN 1 AND 4000
        BEGIN
           INSERT
        COMMIT
or
    BEGIN
        INSERT
        <snip 3998 inserts>
        INSERT
    COMMIT;
or
    COPY

I get the impression that Matt will need to do 25,000 of these per
hour:
    SELECT <blah>
    IF <some circumstance that happens about 1/8th of the time>
        BEGIN
            INSERT
               or
            UPDATE
        COMMIT;

He says his current h/w peaks at 1/10th that rate.

My question is: is that current peak rate ("300 inserts/updates
*or* 2500 selects") based upon 1 connection, or many connections?
With 4 CPUs, and a 4 disk RAID10, I wouldn't be surprised if 4 con-
current connections gives the optimum speed.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

Great Inventors of our time:
Al Gore -> Internet
Sun Microsystems -> Clusters

Selecting random rows efficiently

From

Richard Jones

Date:

30 August 2003, 09:15:40

Hi,
i have a table of around 3 million rows from which i regularly (twice a second
at the moment) need to select a random row from

currently i'm doing "order by rand() limit 1" - but i suspect this is
responsible for the large load on my db server - i guess that PG is doing far
too much work just to pick one row.

one way i can think of is to read in all the primary keys from my table, and
select one of the keys at random then directly fetch that row.

are there any other ways to do this? i need to keep the load down :)

Thanks,
Richard

Re: Selecting random rows efficiently

From

Jeff

Date:

30 August 2003, 10:09:07

On Sat, 30 Aug 2003, Richard Jones wrote:

> Hi,
> i have a table of around 3 million rows from which i regularly (twice a second
> at the moment) need to select a random row from
>
> currently i'm doing "order by rand() limit 1" - but i suspect this is
> responsible for the large load on my db server - i guess that PG is doing far
> too much work just to pick one row.
>

If you have an int id (aka serial) column then it is simple - just pick a
random number between 1 and currval('id_seq')...

or offset rand() limit 1 perhaps?

since you want random ther eis no need to bother with an order and that'll
save a sort.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Re: Selecting random rows efficiently

From

Richard Jones

Date:

30 August 2003, 10:32:05

On Saturday 30 August 2003 1:08 pm, you wrote:
> On Sat, 30 Aug 2003, Richard Jones wrote:
> > Hi,
> > i have a table of around 3 million rows from which i regularly (twice a
> > second at the moment) need to select a random row from
> >
> > currently i'm doing "order by rand() limit 1" - but i suspect this is
> > responsible for the large load on my db server - i guess that PG is doing
> > far too much work just to pick one row.
>
> If you have an int id (aka serial) column then it is simple - just pick a
> random number between 1 and currval('id_seq')...
>
> or offset rand() limit 1 perhaps?
>
> since you want random ther eis no need to bother with an order and that'll
> save a sort.


Yes, the pkey is a SERIAL but the problem is that the sequence is rather
sparse

for example, it goes something like 1 -> 5000  then 100000->100000 and then
2000000->upwards

this is due to chunks being deleted etc..

if i pick a random number for the key it will not be a random enough
distribution, because the sequence is sparse.. sometimes it will pick a key
that doesnt exist.

i'm currently reading all the keys into an array and selecting randoms from
there - but this is no good long-term as i need to refresh the array of keys
to take into account newly added rows to the table (daily)

i was hoping there was some trickery with sequences that would allow me to
easily pick a random valid sequence number..?

Thanks,
Rich.








>
> --
> Jeff Trout <jeff@jefftrout.com>
> http://www.jefftrout.com/
> http://www.stuarthamm.net/
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Re: Hardware recommendations to scale to silly load

From

Rod Taylor

Date:

30 August 2003, 10:58:30

> My question is: is that current peak rate ("300 inserts/updates
> *or* 2500 selects") based upon 1 connection, or many connections?
> With 4 CPUs, and a 4 disk RAID10, I wouldn't be surprised if 4 con-
> current connections gives the optimum speed.

Optimum number of active workers is probably between 10 and 16.  4 doing
math, 4 doing a dma transfer of data, and 4 to be available the instant
one of the other 8 completes.

On FreeBSD it seems to work that way when there is a mix of activity
with the database.

Attachment

signature.asc

Re: Selecting random rows efficiently

From

Rod Taylor

Date:

30 August 2003, 11:02:10

> i was hoping there was some trickery with sequences that would allow me to
> easily pick a random valid sequence number..?

I would suggest renumbering the data.

ALTER SEQUENCE ... RESTART WITH 1;
UPDATE table SET pkey = DEFAULT;

Of course, PostgreSQL may have trouble with that update due to
evaluation of the unique constraint immediately -- so drop the primary
key first, and add it back after.

Attachment

signature.asc

Re: Selecting random rows efficiently

From

Ron Johnson

Date:

30 August 2003, 11:04:11

On Sat, 2003-08-30 at 08:09, Richard Jones wrote:
> Hi,
> i have a table of around 3 million rows from which i regularly (twice a second
> at the moment) need to select a random row from
>
> currently i'm doing "order by rand() limit 1" - but i suspect this is
> responsible for the large load on my db server - i guess that PG is doing far
> too much work just to pick one row.

What datatype is the selected by key?

Also, where is rand() defined?  Is that a UDF?

Could it be that there is a type mismatch?

> one way i can think of is to read in all the primary keys from my table, and
> select one of the keys at random then directly fetch that row.
>
> are there any other ways to do this? i need to keep the load down :)
>
> Thanks,
> Richard

Are you really in Micronesia?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"The greatest dangers to liberty lurk in insidious encroachment
by men of zeal, well-meaning, but without understanding."
Justice Louis Brandeis, dissenting, Olmstead v US (1928)

Re: Selecting random rows efficiently

From

Tom Lane

Date:

30 August 2003, 11:47:56

Richard Jones <rj@last.fm> writes:
>>> i have a table of around 3 million rows from which i regularly (twice a
>>> second at the moment) need to select a random row from

> i was hoping there was some trickery with sequences that would allow me to
> easily pick a random valid sequence number..?

There is no magic bullet here, but if you expect that requirement to
persist then it is worth your trouble to expend effort on a real
solution.  A real solution in my mind would look like

1. Add a column "random_id float8 default random()".  The idea here
   is that you assign a random ID to each row as it is created.

2. Add an index on the above column.

3. Your query now looks like

    SELECT * FROM table WHERE random_id >= random()
    ORDER BY random_id LIMIT 1;

This gives you a plan on the order of

 Limit  (cost=0.00..0.17 rows=1 width=8)
   ->  Index Scan using fooi on foo  (cost=0.00..57.00 rows=334 width=8)
         Filter: (random_id >= random())

which is fast and gives a genuinely random result row.  At least up
until you have enough rows that there start being duplicate random_ids,
which AFAIK would be 2 billion rows with a decent random()
implementation.  If you're concerned about that, you could periodically
re-randomize with
    UPDATE table SET random_id = random();
so that any rows that were "hidden" because they had a duplicate
random_id have another shot at being choosable.  But with only a few mil
rows I don't think you need to worry.

            regards, tom lane

Re: Selecting random rows efficiently

From

Tom Lane

Date:

30 August 2003, 12:14:18

I said:
> 3. Your query now looks like
>     SELECT * FROM table WHERE random_id >= random()
>     ORDER BY random_id LIMIT 1;

Correction: the above won't give quite the right query because random()
is marked as a volatile function.  You can hide the random() call inside
a user-defined function that you (misleadingly) mark stable, or you can
just stick it into a sub-select:

regression=# explain select * from foo WHERE random_id >= (select random())
regression-# ORDER BY random_id LIMIT 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=0.01..0.15 rows=1 width=8)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Index Scan using fooi on foo  (cost=0.00..45.50 rows=334 width=8)
         Index Cond: (random_id >= $0)
(5 rows)

This technique is probably safer against future planner changes,
however:

regression=# create function oneshot_random() returns float8 as
regression-# 'select random()' language sql stable;
CREATE FUNCTION
regression=# explain select * from foo WHERE random_id >= oneshot_random()
regression-# ORDER BY random_id LIMIT 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=0.00..0.14 rows=1 width=8)
   ->  Index Scan using fooi on foo  (cost=0.00..46.33 rows=334 width=8)
         Index Cond: (random_id >= oneshot_random())
(3 rows)

The point here is that an indexscan boundary condition has to use stable
or immutable functions.  By marking oneshot_random() stable, you
essentially say that it's okay to evaluate it only once per query,
rather than once at each row.

            regards, tom lane

Re: Hardware recommendations to scale to silly load

From

"Matt Clark"

Date:

30 August 2003, 12:37:14

>     SELECT <blah>
>     IF <some circumstance that happens about 1/8th of the time>
>         BEGIN
>             INSERT
>                or
>             UPDATE
>         COMMIT;
>
> He says his current h/w peaks at 1/10th that rate.
>
> My question is: is that current peak rate ("300 inserts/updates
> *or* 2500 selects") based upon 1 connection, or many connections?
> With 4 CPUs, and a 4 disk RAID10, I wouldn't be surprised if 4 con-
> current connections gives the optimum speed.

Well it's more like each user interaction looks like:

    SELECT
    SELECT
    SELECT
    SELECT
    SELECT
    SELECT
    INSERT
    SELECT
    SELECT
    SELECT
    SELECT
    INSERT
    SELECT
    SELECT
    SELECT
    UPDATE
    SELECT
    SELECT
    UPDATE
    SELECT

And concurrency is very high, because it's a web app, and each httpd has one connection to PG, and there can be
hundredsof active 
httpd processes.  Some kind of connection pooling scheme might be in order when there are that many active clients.
Anyviews?

Re: Hardware recommendations to scale to silly load

From

"Matt Clark"

Date:

30 August 2003, 12:37:23

> Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
> drive I can do 4k inserts/second if I turn fsync off.  If you have a
> battery-backed controller, you should be able to do the same.  (You will
> not need to turn fsync off --- fsync will just be fast because of the
> disk drive RAM).
>
> Am I missing something?

I think Ron asked this, but I will too, is that 4k inserts in one transaction or 4k transactions each with one insert?

fsync is very much faster (as are all random writes) with the write-back cache, but I'd hazard a guess that it's still
notnearly as 
fast as turning fsync off altogether.  I'll do a test perhaps...

Re: Selecting random rows efficiently

From

Ron Johnson

Date:

30 August 2003, 13:19:02

On Sat, 2003-08-30 at 09:01, Rod Taylor wrote:
> > i was hoping there was some trickery with sequences that would allow me to
> > easily pick a random valid sequence number..?
>
> I would suggest renumbering the data.
>
> ALTER SEQUENCE ... RESTART WITH 1;
> UPDATE table SET pkey = DEFAULT;
>
> Of course, PostgreSQL may have trouble with that update due to
> evaluation of the unique constraint immediately -- so drop the primary
> key first, and add it back after.

And if there are child tables, they'd all have to be updated, too.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Whatever may be the moral ambiguities of the so-called
demoratic nations and however serious may be their failure to
conform perfectly to their democratic ideals, it is sheer moral
perversity to equate the inconsistencies of a democratic
civilization with the brutalities which modern tyrannical states
practice."
Reinhold Nieburhr, ca. 1940

Re: Selecting random rows efficiently

From

"Russell Garrett"

Date:

30 August 2003, 14:43:54

Considering that we'd have to index the random field too, it'd be neater in
the long term to re-number the primary key. Although, being a primary key,
that's foreign-keyed from absolutely everywhere, so that'd probably take an
amusingly long time.

...and no we're not from Micronesia, we're from ever so slightly less exotic
London. Though Micronesia might be nice...

Russ (also from last.fm but without the fancy address)

pgsql-performance-owner@postgresql.org wrote:
> On Sat, 2003-08-30 at 09:01, Rod Taylor wrote:
>>> i was hoping there was some trickery with sequences that would
>>> allow me to easily pick a random valid sequence number..?
>>
>> I would suggest renumbering the data.
>>
>> ALTER SEQUENCE ... RESTART WITH 1;
>> UPDATE table SET pkey = DEFAULT;
>>
>> Of course, PostgreSQL may have trouble with that update due to
>> evaluation of the unique constraint immediately -- so drop the
>> primary key first, and add it back after.
>
> And if there are child tables, they'd all have to be updated, too.

Re: Hardware recommendations to scale to silly load

From

Vivek Khera

Date:

02 September 2003, 13:01:59

>>>>> "AS" == Andrew Sullivan <andrew@libertyrms.info> writes:

AS> On Fri, Aug 29, 2003 at 12:05:03AM -0700, William Yu wrote:
>> We should see a boost when we move to 64-bit Linux and hopefully another
>> one when NUMA for Linux is production-stable.

AS> According to the people who've worked with SGIs, NUMA actually seems
AS> to make things worse.  It has something to do with how the shared
AS> memory is handled.  You'll want to dig through the -general or
AS> -hackers archives from somewhere between 9 and 14 months ago, IIRC.

I knew my PhD research would one day be good for *something* ...

The basic premise of NUMA is that you can isolate which data belongs
to which processor and put that on memory pages that are local/closer
to it.  In practice, this is harder than it sounds as it requires very
detailed knowledge of the application's data access patterns, and how
memory is allocated by the OS and standard libraries.  Often you end
up with pages that have data that should be local to two different
processors, and that data keeps being migrated (if your NUMA OS
supports page migration) between the two processors or one of them
just gets slow access.

I can't imagine it benefiting postgres given its globally shared
buffers.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Hardware recommendations to scale to silly load

From

Vivek Khera

Date:

02 September 2003, 13:09:00

>>>>> "MC" == Matt Clark <matt@ymogen.net> writes:

MC> And concurrency is very high, because it's a web app, and each
MC> httpd has one connection to PG, and there can be hundreds of
MC> active httpd processes.  Some kind of connection pooling scheme
MC> might be in order when there are that many active clients.  Any

One thing you really should do (don't know if you already do it...) is
have your web split into a front-end proxy and a back-end application
server.  There are lots of docs on how to do this for mod_perl, but it
can apply to just about any backend technology that is pooling the
connections.

With a setup like this, my front-end web server typically has about
100 to 150 connections, and the backend doing the dynamic work (and
accessing the database) has peaked at 60 or so.  Usually the backend
numbers at about 25.

The front-end small processes get to deal with your dialup customers
trickling down the data since it buffers your backend for you.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Hardware recommendations to scale to silly load

From

Greg Stark

Date:

02 September 2003, 14:33:00

Vivek Khera <khera@kcilink.com> writes:

> The front-end small processes get to deal with your dialup customers
> trickling down the data since it buffers your backend for you.

Huh. Well, I used to think this. But I think I was wrong. I used to have
apache proxy servers running in front of the mod_perl apache servers. The
proxy servers handled image and static html requests, and proxied any dynamic
content to the mod_perl servers.

In fact most web pages are only a few kilobytes, and you can easily configure
the kernel buffers on the sockets to be 32kb or more. So the proxies would
only come into play when there was a really large dynamic document, something
that should probably never happen on a high volume web site anyways.

I think the main source of the benefit people see from this setup is the
static content. For that you get a bigger kick out of separating the static
content onto entirely separate servers, preferably something slim like thttpd
and just exposing the mod_perl/php/whatever servers directly.

The one thing I worry about exposing the dynamic servers directly is
susceptibility to dos or ddos attacks. Since all someone has to do to tie up
your precious heavyweight apache slot is make a connection, one machine could
easily tie up your whole web site. That would be a bit harder if you had
hundreds of slots available. Of course even so it's not hard.

--
greg

Re: Hardware recommendations to scale to silly load

From

Ron Johnson

Date:

02 September 2003, 14:45:30

On Tue, 2003-09-02 at 11:01, Vivek Khera wrote:
> >>>>> "AS" == Andrew Sullivan <andrew@libertyrms.info> writes:
>
> AS> On Fri, Aug 29, 2003 at 12:05:03AM -0700, William Yu wrote:
> >> We should see a boost when we move to 64-bit Linux and hopefully another
> >> one when NUMA for Linux is production-stable.
>
> AS> According to the people who've worked with SGIs, NUMA actually seems
> AS> to make things worse.  It has something to do with how the shared
> AS> memory is handled.  You'll want to dig through the -general or
> AS> -hackers archives from somewhere between 9 and 14 months ago, IIRC.
>
> I knew my PhD research would one day be good for *something* ...
>
> The basic premise of NUMA is that you can isolate which data belongs
> to which processor and put that on memory pages that are local/closer
> to it.  In practice, this is harder than it sounds as it requires very
> detailed knowledge of the application's data access patterns, and how
> memory is allocated by the OS and standard libraries.  Often you end
> up with pages that have data that should be local to two different
> processors, and that data keeps being migrated (if your NUMA OS
> supports page migration) between the two processors or one of them
> just gets slow access.
>
> I can't imagine it benefiting postgres given its globally shared
> buffers.

Opteron is supposed to have screaming fast inter-CPU memory xfer
(HyperTransport does inter-CPU as well as well as CPU-RAM transport).

That's supposed to help with scaling, and PostgreSQL really may take
advantage of that, with, say 16-32 processors?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Knowledge should be free for all."
Harcourt Fenton Mudd, Star Trek:TOS, "I, Mudd"

Re: Selecting random rows efficiently

From

"scott.marlowe"

Date:

03 September 2003, 21:13:44

Can you just create an extra serial column and make sure that one is
always in order and no holes in it?  (i.e. a nightly process, etc...)???

If so, then something like this truly flies:

select * from accounts where aid = (select cast(floor(random()*100000)+1 as int));

My times on it on a 100,000 row table are < 1 millisecond.

Note that you have to have a hole free sequence AND know how many rows
there are, but if you can meet those needs, this is screamingly fast.

On Sat, 30 Aug 2003, Russell Garrett wrote:

> Considering that we'd have to index the random field too, it'd be neater in
> the long term to re-number the primary key. Although, being a primary key,
> that's foreign-keyed from absolutely everywhere, so that'd probably take an
> amusingly long time.
>
> ...and no we're not from Micronesia, we're from ever so slightly less exotic
> London. Though Micronesia might be nice...
>
> Russ (also from last.fm but without the fancy address)
>
> pgsql-performance-owner@postgresql.org wrote:
> > On Sat, 2003-08-30 at 09:01, Rod Taylor wrote:
> >>> i was hoping there was some trickery with sequences that would
> >>> allow me to easily pick a random valid sequence number..?
> >>
> >> I would suggest renumbering the data.
> >>
> >> ALTER SEQUENCE ... RESTART WITH 1;
> >> UPDATE table SET pkey = DEFAULT;
> >>
> >> Of course, PostgreSQL may have trouble with that update due to
> >> evaluation of the unique constraint immediately -- so drop the
> >> primary key first, and add it back after.
> >
> > And if there are child tables, they'd all have to be updated, too.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Re: Hardware recommendations to scale to silly load

From

Bruce Momjian

Date:

10 September 2003, 00:26:01

Matt Clark wrote:
> > Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
> > drive I can do 4k inserts/second if I turn fsync off.  If you have a
> > battery-backed controller, you should be able to do the same.  (You will
> > not need to turn fsync off --- fsync will just be fast because of the
> > disk drive RAM).
> >
> > Am I missing something?
>
> I think Ron asked this, but I will too, is that 4k inserts in
> one transaction or 4k transactions each with one insert?
>
> fsync is very much faster (as are all random writes) with the
> write-back cache, but I'd hazard a guess that it's still not
> nearly as fast as turning fsync off altogether.  I'll do a test
> perhaps...

Sorry to be replying late.  Here is what I found.

fsync on
        Inserts all in one transaction         3700 inserts/second
        Inserts in separate transactions        870 inserts/second

fsync off
        Inserts all in one transaction         3700 inserts/second
        Inserts all in one transaction         2500 inserts/second

ECPG test program attached.

--

  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
/*
 *    Thread test program
 *    by Philip Yarra
 */


#include <stdlib.h>

void        ins1(void);

EXEC SQL BEGIN DECLARE SECTION;
char       *dbname;
int        iterations = 10;
EXEC SQL END DECLARE SECTION;

int
main(int argc, char *argv[])
{

    if (argc < 2 || argc > 3)
    {
        fprintf(stderr, "Usage: %s dbname [iterations]\n", argv[0]);
        return 1;
    }
    dbname = argv[1];

    if (argc == 3)
        iterations = atoi(argv[2]);
    if (iterations % 2 != 0)
    {
        fprintf(stderr, "iterations must be an even number\n");
        return 1;
    }

    EXEC SQL CONNECT TO:dbname AS test0;

    /* DROP might fail */
    EXEC SQL AT test0 DROP TABLE test_thread;
    EXEC SQL AT test0 COMMIT WORK;
    EXEC SQL AT test0 CREATE TABLE test_thread(message TEXT);
    EXEC SQL AT test0 COMMIT WORK;
    EXEC SQL DISCONNECT test0;

    ins1();

    return 0;
}

void
ins1(void)
{
    int            i;
    EXEC SQL WHENEVER sqlerror sqlprint;
    EXEC SQL CONNECT TO:dbname AS test1;
    EXEC SQL AT test1 SET AUTOCOMMIT TO ON;

    for (i = 0; i < iterations; i++)
        EXEC SQL AT test1 INSERT INTO test_thread VALUES('thread1');
//    EXEC SQL AT test1 COMMIT WORK;

    EXEC SQL DISCONNECT test1;

    printf("thread 1 : done!\n");
}

Re: Hardware recommendations to scale to silly load

From

"Gregory S. Williamson"

Date:

10 September 2003, 04:41:22

Nitpicking --

Perhaps the 4th data line is meant to be:
        Inserts in separate transactions         2500 inserts/second
                   ^^^^^^^^^^^^^^^^^^^^^^^
??


Greg Williamson

-----Original Message-----
From:    Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent:    Tue 9/9/2003 8:25 PM
To:    Matt Clark
Cc:    Ron Johnson; PgSQL Performance ML
Subject:    Re: [PERFORM] Hardware recommendations to scale to silly load

Matt Clark wrote:
> > Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
> > drive I can do 4k inserts/second if I turn fsync off.  If you have a
> > battery-backed controller, you should be able to do the same.  (You will
> > not need to turn fsync off --- fsync will just be fast because of the
> > disk drive RAM).
> >
> > Am I missing something?
>
> I think Ron asked this, but I will too, is that 4k inserts in
> one transaction or 4k transactions each with one insert?
>
> fsync is very much faster (as are all random writes) with the
> write-back cache, but I'd hazard a guess that it's still not
> nearly as fast as turning fsync off altogether.  I'll do a test
> perhaps...

Sorry to be replying late.  Here is what I found.

fsync on
        Inserts all in one transaction         3700 inserts/second
        Inserts in separate transactions        870 inserts/second

fsync off
        Inserts all in one transaction         3700 inserts/second
        Inserts all in one transaction         2500 inserts/second

ECPG test program attached.

--

  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Hardware recommendations to scale to silly load

From

Bruce Momjian

Date:

10 September 2003, 13:08:08

Gregory S. Williamson wrote:
> Nitpicking --
>
> Perhaps the 4th data line is meant to be:
>         Inserts in separate transactions         2500 inserts/second
>                    ^^^^^^^^^^^^^^^^^^^^^^^


Oh, yes, sorry.  It is:

> Sorry to be replying late.  Here is what I found.
>
> fsync on
>         Inserts all in one transaction         3700 inserts/second
>         Inserts in separate transactions        870 inserts/second
>
> fsync off
>         Inserts all in one transaction         3700 inserts/second
>         Inserts in separate transactions       2500 inserts/second

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073