Thread: Has anyone run on the new G5 yet

Has anyone run on the new G5 yet

From
Sean Shanny
Date:
To all,

We are building a data warehouse composed of essentially click stream
data.  The DB is growing fairly quickly as to be expected, currently at
90GB for one months data.  The idea is to keep 6 months detailed data on
line and then start aggregating older data to summary tables.  We have 2
fact tables currently, one with about 68 million rows and the other with
about 210 million rows.  Numerous dimension tables ranging from a dozen
rows to millions.

We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in
hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec
PERC3/Di,  configuration.  I believe they are 10k drives.  Files system
is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem
turned on.  This box is used only for the warehouse.  All the ETL work
is done on this machine as well.  DB version is postgreSQL 7.4.

We are running into issues with IO saturation obviously.  Since this
thing is only going to get bigger we are looking for some advice on how
to accommodate DB's of this size.

First question is do we gain anything by moving the RH Enterprise
version of Linux in terms of performance, mainly in the IO realm as we
are not CPU bound at all?  Second and more radical, has anyone run
postgreSQL on the new Apple G5 with an XRaid system?  This seems like a
great value combination.  Fast CPU, wide bus, Fibre Channel IO, 2.5TB
all for ~17k.

I keep see references to terabyte postgreSQL installations, I was
wondering if anyone on this list is in charge of one of those and can
offer some advice on how to position ourselves hardware wise.

Thanks.

--sean


Re: Has anyone run on the new G5 yet

From
Sean Shanny
Date:
I should also add that we have already done a ton of tuning based on the
archives of this list so we are not starting from scratch here.

Thanks.

--sean

Sean Shanny wrote:

> To all,
>
> We are building a data warehouse composed of essentially click stream
> data.  The DB is growing fairly quickly as to be expected, currently
> at 90GB for one months data.  The idea is to keep 6 months detailed
> data on line and then start aggregating older data to summary tables.
> We have 2 fact tables currently, one with about 68 million rows and
> the other with about 210 million rows.  Numerous dimension tables
> ranging from a dozen rows to millions.
>
> We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in
> hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0,
> Adaptec PERC3/Di,  configuration.  I believe they are 10k drives.
> Files system is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP
> with bigmem turned on.  This box is used only for the warehouse.  All
> the ETL work is done on this machine as well.  DB version is
> postgreSQL 7.4.
>
> We are running into issues with IO saturation obviously.  Since this
> thing is only going to get bigger we are looking for some advice on
> how to accommodate DB's of this size.
>
> First question is do we gain anything by moving the RH Enterprise
> version of Linux in terms of performance, mainly in the IO realm as we
> are not CPU bound at all?  Second and more radical, has anyone run
> postgreSQL on the new Apple G5 with an XRaid system?  This seems like
> a great value combination.  Fast CPU, wide bus, Fibre Channel IO,
> 2.5TB all for ~17k.
>
> I keep see references to terabyte postgreSQL installations, I was
> wondering if anyone on this list is in charge of one of those and can
> offer some advice on how to position ourselves hardware wise.
>
> Thanks.
>
> --sean
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>


Re: Has anyone run on the new G5 yet

From
Eric Soroos
Date:
Sean

>  Second and more radical, has anyone run postgreSQL on the new Apple
> G5 with an XRaid system?  This seems like a great value combination.
> Fast CPU, wide bus, Fibre Channel IO, 2.5TB all for ~17k.
>
> I keep see references to terabyte postgreSQL installations, I was
> wondering if anyone on this list is in charge of one of those and can
> offer some advice on how to position ourselves hardware wise.

 From my (admittedly low end) OSX experience, you just don't have the
filesystem options on OSX that you have on linux, from the noatime
mount, filesystem types, and the raid options. I also feel that the
software stack is a bit more mature and tested on the linux side of
things.

I doubt that the g5 hardware is that much faster than what you have
right now. The raid hardware might be a good deal for you even on a
linux platform. There are reports of it 'just working' with x86 linux
hardware.

eric


Re: Has anyone run on the new G5 yet

From
"Fred Moyer"
Date:
> We are running into issues with IO saturation obviously.  Since this
> thing is only going to get bigger we are looking for some advice on
> how to accommodate DB's of this size.
<snip>
> Second and more radical, has anyone run
> postgreSQL on the new Apple G5 with an XRaid system?  This seems like
> a great value combination.  Fast CPU, wide bus, Fibre Channel IO,
> 2.5TB all for ~17k.
<snip>
If you are going for I/O performance you are best off with one of the
Xserve competitors listed at http://www.apple.com/xserve/raid/.  The
Xserve is based on IDE drives which have a lower seek time (say 8.9 ms)
compared to scsi (3.6 ms for seagate cheetah).  For small random
read/write operations (like databases) this will give you a noticable
improvement in performance over ide drives.  Also make sure to get as
many drives as possible, more spindles equals better I/O performance.

> I keep see references to terabyte postgreSQL installations, I was
> wondering if anyone on this list is in charge of one of those and can
> offer some advice on how to position ourselves hardware wise.

I've gone to about half terabyte size and all I can say is you should
plan for at least one quarter to one half a rack of drivespace (assuming
14 drives per 4u that's 42 to 84 drives).  Do yourself a favor and get
more rather than less, you will really appreciate it.  I averaged about
2 mb/s average per drive via the raid controller stats on 14 drive array
during I/O bound seek and update operations in 2 raid 10 arrays (half
xlogs and half data).  That comes out to around 2 hours for a terabyte
with 70 drives assuming a constant scaling.  You may be able to get more
or less depending on your setup and query workload.

> Thanks.
>
> --sean
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html


Re: Has anyone run on the new G5 yet

From
Gaetano Mendola
Date:
Sean Shanny wrote:

> We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in
> hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec
> PERC3/Di,  configuration.  I believe they are 10k drives.  Files system
> is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem
> turned on.  This box is used only for the warehouse.  All the ETL work
> is done on this machine as well.  DB version is postgreSQL 7.4.

Are you experiencing improvment using the hyper-threading ?


Regards
Gaetano Mendola


Re: Has anyone run on the new G5 yet

From
Sean Shanny
Date:
Gaetano,

I don't believe we have ever run the system without it turned on.
Another switch to fiddle with. :-)

--sean

Gaetano Mendola wrote:

> Sean Shanny wrote:
>
>> We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in
>> hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0,
>> Adaptec PERC3/Di,  configuration.  I believe they are 10k drives.
>> Files system is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP
>> with bigmem turned on.  This box is used only for the warehouse.  All
>> the ETL work is done on this machine as well.  DB version is
>> postgreSQL 7.4.
>
>
> Are you experiencing improvment using the hyper-threading ?
>
>
> Regards
> Gaetano Mendola
>
>


Re: Has anyone run on the new G5 yet

From
Paul Tuckfield
Date:
(hope I'm posting this correctly)

You wrote:

>First question is do we gain anything by moving the RH Enterprise
>version of Linux in terms of performance, mainly in the IO realm as we
>are not CPU bound at all? Second and more radical, has anyone run
>postgreSQL on the new Apple G5 with an XRaid system? This seems like a
>great value combination. Fast CPU, wide bus, Fibre Channel IO, 2.5TB
>all for ~17k.

Wow, funny coincidence:  I've  got a pair of dual xeons w. 8G + 14disk
fcal arrays, and an xserve with an XRaid that I've been screwing around
with.  If you have specific tests you'd like to see, let me know.

--- so, for the truly IO bound, here's my recent messin' around summary:

In the not-so-structured tests I've done, I've been disappointed with
Redhat AS 2.1.  IO thruput.  I've had difficulty driving a lot of IO
thru my dual fcal channels:  I can only get one going at 60M/sec, and
when I drive IO to the second, I still see only about 60M/sec combined.
and when I does get that high it uses about 30% CPU on a dual xeon
hyperthreaded box, all in sys (by vmstat).  something very wrong there,
and the only thing I can conclude is that I'm serializing in the driver
somehow (qla2200 driver), thus parallel channels do the same as one, and
interrupt madness drives the cpu up just to do this contentious IO.

This contrasts with the Redhat 9 I just installed on a similar box, that
got 170M/sec on 2 fcal channels, and the expected 5-6% cpu.

The above testing was dd straight from /dev/rawX devices, so no buffer
cache confusion there.

Also had problems getting the Redhat AS to bind to my newer qla2300
adapters at all, whereas they bound fine under RH9.

Redhat makes the claim of finer grained locks/semaphores in the qla and
AIC drivers in RH AS, but my tests seem to show that the 2 fcal ports
were serializing against eachother in the kernel under RH AS, and not so
under RH9.  Maybe I'm useing the wrong driver under AS. eh.

so sort story long, it seems like you're better of with RH9.  But again,
before you lay out serious coin for xserve or others, if you have
specific tests you want to see, I'll take a little time to contrast w.
exserve.  One of the xeons also has an aic7x scsi controler w 4 drives
so It might match your rig better.

I also did some token testing on the xserve I have which I believe may
only have one processor (how do you tell on osX?) and the xraid has 5
spindles in it.  I did a cursory build of postgres on it and also a io
test (to the filesystem) and saw about 90M/sec.  Dunno if it has dual
paths (if you guys know how to tell, let me know)


Biggest problem I've had in the past w. linux in general is that it
seems to make poor VM choices under heavy filesystem IO.  I don't really
get exactly where it's going wrong , but I've had numerous experiences
on older systems where bursty IO would seem to cause paging on the box
(pageout of pieces of the oracle SGA shared memory) which is a
performance disaseter.  It seems to happen even when the shared memory
was sized reasonably below the size of physical ram, presumably because
linux is too aggressive in allocating filesystem cache (?) anyway, it
seems to make decisions based on desire for zippy workstation
performance and gets burned on thruput on database servers.  I'm
guessing this may be an issue for you , when doing heavy IO.  Thing is,
it'll show like you're IO bound kindof because you're thrashing.


Re: Has anyone run on the new G5 yet

From
William Yu
Date:
Sean Shanny wrote:
>
> First question is do we gain anything by moving the RH Enterprise
> version of Linux in terms of performance, mainly in the IO realm as we
> are not CPU bound at all?  Second and more radical, has anyone run
> postgreSQL on the new Apple G5 with an XRaid system?  This seems like a
> great value combination.  Fast CPU, wide bus, Fibre Channel IO, 2.5TB
> all for ~17k.

Seems like a great value but until Apple produces a G5 that supports
ECC, I'd pass on them.


Re: Has anyone run on the new G5 yet

From
Bruce Momjian
Date:
Paul Tuckfield wrote:
> Biggest problem I've had in the past w. linux in general is that it
> seems to make poor VM choices under heavy filesystem IO.  I don't really
> get exactly where it's going wrong , but I've had numerous experiences
> on older systems where bursty IO would seem to cause paging on the box
> (pageout of pieces of the oracle SGA shared memory) which is a
> performance disaseter.  It seems to happen even when the shared memory
> was sized reasonably below the size of physical ram, presumably because
> linux is too aggressive in allocating filesystem cache (?) anyway, it
> seems to make decisions based on desire for zippy workstation
> performance and gets burned on thruput on database servers.  I'm
> guessing this may be an issue for you , when doing heavy IO.  Thing is,
> it'll show like you're IO bound kindof because you're thrashing.

This is not surprising.  There has always been an issue with dynamic
buffer cache systems contending with memory used by processes.  It takes
a long time to get the balance right, and still there might be cases
where it gets things wrong.  Isn't there a Linux option to lock shared
memory in to RAM?   If so, we should document this in our manuals, but
right now, there is no mention of it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073