Thread: Requirements for a database server

Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
Hi,

I have been using postgres for the last year now.  The database has grown
from a mere few MBs to over 100GB data and expected to top up 300GB by the
end of the year.  Lately performance of queries, inserts, updates has
continued to grow worse as the dataset has grown larger, even though most
queries have indexes on them, while vacuuming the database has become a
nightmare.  For this reason we were thinking of upgrading the system
hardware.

Is there anyone out there who has databases of this sort ?  Can he point me
out some basic requirements for these sort of databases (processors, ram,
storage, etc) ?  Also would it be worthwile to migrate to another database
system which is more robust with this kind of amount of data (such as
oracle) ?

Thanks
Robert


Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
Currently we are using a PowerEdge 2400 server, 1 Gb RAM, dual-processor.
We have 5 x 36GB SCSI disks at RAID 5 for the database and one 36GB disk
for the WAL files, logging , os, etc.

Postgres configuration is set to
sort_mem = 65536
fsync = false
shared_buffers = 1000

The database consist of 6 main tables (plus some other additional ones,
which are not that large, but used in joins with main table).
Largest table consists of almost 15GB of data (spread over 15 files).  It
has four main indexes comprising another 13GB of data.

As operating system we're using Redhat 7.1 with the SMP kernel.

Regards
Robert




                    "Andy Samuel"
                    <andysamuel@geoc        To:     <Robert.Farrugia@go.com.mt>
                    ities.com>              cc:
                                            Subject:     Re: [ADMIN] Requirements for a database server
                    18/07/2001 10:06






Robert

The size of your db makes me very interested.  May I know your currrent
computer configuration ( Processor, memory, etc ) ?  And also the
configuration of your PostgreSQL.
Please feel free you post them at the ADMIN forum since it will be very
useful to others.

Best regards
Andy


----- Original Message -----
From: <Robert.Farrugia@go.com.mt>
To: <pgsql-admin@postgresql.org>
Sent: Wednesday, July 18, 2001 12:52 PM
Subject: [ADMIN] Requirements for a database server


> Hi,
>
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by
the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.
>
> Is there anyone out there who has databases of this sort ?  Can he point
me
> out some basic requirements for these sort of databases (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to another
database
> system which is more robust with this kind of amount of data (such as
> oracle) ?
>
> Thanks
> Robert
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl







Re: Requirements for a database server

From
"Ross J. Reedstrom"
Date:
Robert -
You mention having a PostgreSQL database in production for around a year.
May I ask what version it is? I'm guessing you're still at 6.5.X,
in which case, I'd suggest an upgrade to 7.1.2, before doing anything
more drastic.  Not only are there bug fixes, many improving stability
of the system, there are also performance enhancements, particularly in
scalability. Without knowing more details of your schema and applications
(typical queries, table sizes, etc.) we can't be of much more help.

BTW, we've had  many reports of multi-gigabyte systems performing well,
and recently a report of a multi-TERA-byte system!

There have been lots of vacuum changes, as well: Bruce and Tom have made
VACUUM much friendlier to a live system since 6.5.x.

Ross

On Wed, Jul 18, 2001 at 07:52:37AM +0200, Robert.Farrugia@go.com.mt wrote:
> Hi,
>
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.
>
> Is there anyone out there who has databases of this sort ?  Can he point me
> out some basic requirements for these sort of databases (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to another database
> system which is more robust with this kind of amount of data (such as
> oracle) ?
>
> Thanks
> Robert

Re: Requirements for a database server

From
Tom Lane
Date:
Robert.Farrugia@go.com.mt writes:
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.

Have you tried dropping and rebuilding the indexes?

Currently, PG doesn't reclaim dead space in indexes very effectively,
so the indexes on a frequently-updated table tend to grow without bound.
(I may or may not be able to fix this for 7.2 --- it's next on my
to-look-at list, but no promises.)  In the meantime, an occasional
rebuild may help restore performance.

BTW, the vacuuming issue is pretty well fixed for 7.2 ...

            regards, tom lane

Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
I'm using the latest version 7.1.2 (athough I started with 7 and then
migrated to 7.1).

Problem is that we have few main tables which continue to grow as against
having a large number of tables.  As I see it, the more data you have in a
table, the longer the query will take proportionally (even though you have
indexes).  A good solution I found out, is to use a divide and conquer
approach.  Normally I the queries depend on fixed periods of time (days or
months).  What I do is run a query for an hour, then the next hour and so
on for the period of time and collate the results together.  In this case I
normally use JDBC.  This is because queries for small periods of time are a
lot faster than for whole weeks or months ( I think it has to do with the
amount of data to go through).

Another thing I noticed is that if you do a query which uses only two
fields of index, this is not normally used (sequential search is then
used).  I do not know if this is a bug or it is how it works.  Could also
be that I do not do vacuums alot since it takes a lot of time.  As regards
to better vacuums from the ones of 6.5.x I concur.  With version 7.0,
vacuum used to crash a lot, but now it is more stable (keeping fingers
crossed, vacuum has never crashed with 7.1.2).  I don't know if there is a
way to vacuum analyze a table without doing the actual vacuum, which will
not lock the tables.

On another note, what system hardware do you use for storage of terabytes
of data ?  We were considering external harddisk racks (fibre channels) ?
What RAID is most useful in these cases, 0+1 or 5 ?  Does the OS make a
difference, for e.g. Linux as against Solaris in these cases ?

Thanks alot
Robert




                    "Ross J.
                    Reedstrom"           To:     Robert.Farrugia@go.com.mt
                    <reedstrm@ric        cc:     pgsql-admin@postgresql.org
                    e.edu>               Subject:     Re: [ADMIN] Requirements for a database server

                    18/07/2001
                    17:10






Robert -
You mention having a PostgreSQL database in production for around a year.
May I ask what version it is? I'm guessing you're still at 6.5.X,
in which case, I'd suggest an upgrade to 7.1.2, before doing anything
more drastic.  Not only are there bug fixes, many improving stability
of the system, there are also performance enhancements, particularly in
scalability. Without knowing more details of your schema and applications
(typical queries, table sizes, etc.) we can't be of much more help.

BTW, we've had  many reports of multi-gigabyte systems performing well,
and recently a report of a multi-TERA-byte system!

There have been lots of vacuum changes, as well: Bruce and Tom have made
VACUUM much friendlier to a live system since 6.5.x.

Ross

On Wed, Jul 18, 2001 at 07:52:37AM +0200, Robert.Farrugia@go.com.mt wrote:
> Hi,
>
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by
the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.
>
> Is there anyone out there who has databases of this sort ?  Can he point
me
> out some basic requirements for these sort of databases (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to another
database
> system which is more robust with this kind of amount of data (such as
> oracle) ?
>
> Thanks
> Robert





Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
Since I have a number of heavy inserts per day, dropping the indexes and
recreating them could be an option.   Only problem I see is that to
recreate them they take alot of time, at which queries will be tremendously
slow.

regarsd
Robert




                    Tom Lane
                    <tgl@sss.pgh.        To:     Robert.Farrugia@go.com.mt
                    pa.us>               cc:     pgsql-admin@postgresql.org
                                         Subject:     Re: [ADMIN] Requirements for a database server
                    18/07/2001
                    17:48






Robert.Farrugia@go.com.mt writes:
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by
the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.

Have you tried dropping and rebuilding the indexes?

Currently, PG doesn't reclaim dead space in indexes very effectively,
so the indexes on a frequently-updated table tend to grow without bound.
(I may or may not be able to fix this for 7.2 --- it's next on my
to-look-at list, but no promises.)  In the meantime, an occasional
rebuild may help restore performance.

BTW, the vacuuming issue is pretty well fixed for 7.2 ...

                               regards, tom lane





Re: Requirements for a database server

From
Ragnar Kjørstad
Date:
On Thu, Jul 19, 2001 at 07:38:25AM +0200, Robert.Farrugia@go.com.mt wrote:
> Since I have a number of heavy inserts per day, dropping the indexes and
> recreating them could be an option.   Only problem I see is that to
> recreate them they take alot of time, at which queries will be tremendously
> slow.

What about first creating new indexes (with a different name), and then
delete the old one? will that work?


--
Ragnar Kjorstad
Big Storage

Re: Requirements for a database server

From
Tom Lane
Date:
=?iso-8859-1?Q?Ragnar_Kj=F8rstad?= <postgres@ragnark.vestdata.no> writes:
> On Thu, Jul 19, 2001 at 07:38:25AM +0200, Robert.Farrugia@go.com.mt wrote:
>> Since I have a number of heavy inserts per day, dropping the indexes and
>> recreating them could be an option.   Only problem I see is that to
>> recreate them they take alot of time, at which queries will be tremendously
>> slow.

> What about first creating new indexes (with a different name), and then
> delete the old one? will that work?

Should work fine as far as read-only queries go.  CREATE INDEX locks out
writes, however, so those will be blocked in any case...

            regards, tom lane

Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
I will try and do this.  Will let you know how it went.

Regards
Robert




                    Tom Lane
                    <tgl@sss.pgh.        To:     Ragnar Kjørstad <postgres@ragnark.vestdata.no>
                    pa.us>               cc:     Robert.Farrugia@go.com.mt, pgsql-admin@postgresql.org
                                         Subject:     Re: [ADMIN] Requirements for a database server
                    19/07/2001
                    20:08






=?iso-8859-1?Q?Ragnar_Kj=F8rstad?= <postgres@ragnark.vestdata.no> writes:
> On Thu, Jul 19, 2001 at 07:38:25AM +0200, Robert.Farrugia@go.com.mt
wrote:
>> Since I have a number of heavy inserts per day, dropping the indexes and
>> recreating them could be an option.   Only problem I see is that to
>> recreate them they take alot of time, at which queries will be
tremendously
>> slow.

> What about first creating new indexes (with a different name), and then
> delete the old one? will that work?

Should work fine as far as read-only queries go.  CREATE INDEX locks out
writes, however, so those will be blocked in any case...

                               regards, tom lane





Re: Requirements for a database server

From
Ron Chmara
Date:
On Tuesday, July 17, 2001, at 10:52  PM,
Robert.Farrugia@go.com.mt wrote:
> Hi,
> I have been using postgres for the last year now.  The database
> has grown
> from a mere few MBs to over 100GB data and expected to top up
> 300GB by the
> end of the year.

Interesting. The most I've ever seen a PG data set was 87GB, but
it was only
2ndNF......

I'm trying to ask a question politely, which is:
"Can you improve efficiency be removing repetitive data, such as
text  or varchar fields?"

As I read your post, I thought about a project I'm working on,
where we took 180GB
and dropped it to 100Mb, simply because their db design stored a
lot of data that was
repetitive. To put it another way, the english language only has
around 20,000 words,
so even normal text fields can be optimized at some point with
numeric keys. :-)

In their case, they were storing every text "GET" string
recieved by their web server,
and running LIKE searches on it (no, this is not a joke.). When
their queries went from
 > 2 hours to <1 second, they were very confused. :-)

>   Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even
> though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.

That's one option. RAID controllers are cheap, these days... my
standard db
box has at *least* 128Mb of RAID cache on it, which helps...
depending on table
design, of course. Dell makes some nice, multi-raid, systems. If
your table design
exceeds the amount of RAID cache, you may want to redesign your
tables, or
look into a mainframe or two.

> Is there anyone out there who has databases of this sort ?  Can
> he point me
> out some basic requirements for these sort of databases
> (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to
> another database
> system which is more robust with this kind of amount of data (such as
> oracle) ?

We test-migrated the above-mentioned project to a testing
platform with Oracle,
and then realized the $45K oracle license could be better spent
on a ground up
redesign of the db, and saved $20K by the end of the project.

Of course, your case may be different, but in many cases, once a
full data
set is gathered, there can be numerous optimizations found
within the db design
that *weren't* apparent before.

Using more efficient libraries (such as compiled C instead of
JDBC), or optimizing
queries (avoiding the evils of table scans) can improve
performance 1000%.
Fields which used to be "unique enough" may have 10, 50, 100
duplicate entires
in them, all text/char/varchar, and can be ruduced to
higher-speed int() keys.
Queries that are built on complex, multi-data fields, can be
split to more rows, or
tables (example: If you often query on the first letter in a
name, make that an col
in the table, and query that). Compound data such as
"email_address" can easily
be broken down even further, into username and domain tables, and a join
between the tables (so all aol.com email addresses come back,
lightning fast).
Even basic things, like stripping off a leading alpha character
from a product
code, can boost performance and reduce storage and throughput issues.

Of course, if you've already done all of that, you're down to
raw performance.

Some general notes:
In most cases, SMP with postgres helps on concurrent queries,
but doesn't do
a lot for single queries.
180/SCSI is the only way to go. IDE just doen't work in the
best-performance cases.
Rip the OS down to the bare essentials. Don't even think of
using a stock OS kernel,
if you don't have to.
If you have a front end for inserts, and another front end for
queries, it may help to
reduce the front-end overhead. Often, there's a bottleneck which
is assumed
to be the db engine, when it's really the access method.

HTH,
-Ronabop

--2D426F70|759328624|00101101010000100110111101110000
ron@opus1.com, 520-326-6109, http://www.opus1.com/ron/
The opinions expressed in this email are not necessarily those
of myself,
my employers, or any of the other little voices in my head.

Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
Hi,

As regards to repetitive data, this is a bit tricky.  The dataset contains
some varchars with repetitive data, but due to the amount of different
values, it defeats the purpose of putting them into a seperate table.
Unfortunately most of the fields in the table are numbers which cannot be
optimized further more.  What I will do over the next few weeks, is try to
redesign better the tables structures without dropping any data.  Thanks
for the hints.

Unfortunately for us we have bought the Poweredge 2400, which suffers from
RAID problems as regards to speed.

Thanks
Robert




                    Ron Chmara
                    <ron@Opus1.CO        To:     Robert.Farrugia@go.com.mt
                    M>                   cc:     pgsql-admin@postgresql.org
                                         Subject:     Re: [ADMIN] Requirements for a database server
                    22/07/2001
                    09:06






On Tuesday, July 17, 2001, at 10:52  PM,
Robert.Farrugia@go.com.mt wrote:
> Hi,
> I have been using postgres for the last year now.  The database
> has grown
> from a mere few MBs to over 100GB data and expected to top up
> 300GB by the
> end of the year.

Interesting. The most I've ever seen a PG data set was 87GB, but
it was only
2ndNF......

I'm trying to ask a question politely, which is:
"Can you improve efficiency be removing repetitive data, such as
text  or varchar fields?"

As I read your post, I thought about a project I'm working on,
where we took 180GB
and dropped it to 100Mb, simply because their db design stored a
lot of data that was
repetitive. To put it another way, the english language only has
around 20,000 words,
so even normal text fields can be optimized at some point with
numeric keys. :-)

In their case, they were storing every text "GET" string
recieved by their web server,
and running LIKE searches on it (no, this is not a joke.). When
their queries went from
 > 2 hours to <1 second, they were very confused. :-)

>   Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even
> though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.  For this reason we were thinking of upgrading the system
> hardware.

That's one option. RAID controllers are cheap, these days... my
standard db
box has at *least* 128Mb of RAID cache on it, which helps...
depending on table
design, of course. Dell makes some nice, multi-raid, systems. If
your table design
exceeds the amount of RAID cache, you may want to redesign your
tables, or
look into a mainframe or two.

> Is there anyone out there who has databases of this sort ?  Can
> he point me
> out some basic requirements for these sort of databases
> (processors, ram,
> storage, etc) ?  Also would it be worthwile to migrate to
> another database
> system which is more robust with this kind of amount of data (such as
> oracle) ?

We test-migrated the above-mentioned project to a testing
platform with Oracle,
and then realized the $45K oracle license could be better spent
on a ground up
redesign of the db, and saved $20K by the end of the project.

Of course, your case may be different, but in many cases, once a
full data
set is gathered, there can be numerous optimizations found
within the db design
that *weren't* apparent before.

Using more efficient libraries (such as compiled C instead of
JDBC), or optimizing
queries (avoiding the evils of table scans) can improve
performance 1000%.
Fields which used to be "unique enough" may have 10, 50, 100
duplicate entires
in them, all text/char/varchar, and can be ruduced to
higher-speed int() keys.
Queries that are built on complex, multi-data fields, can be
split to more rows, or
tables (example: If you often query on the first letter in a
name, make that an col
in the table, and query that). Compound data such as
"email_address" can easily
be broken down even further, into username and domain tables, and a join
between the tables (so all aol.com email addresses come back,
lightning fast).
Even basic things, like stripping off a leading alpha character
from a product
code, can boost performance and reduce storage and throughput issues.

Of course, if you've already done all of that, you're down to
raw performance.

Some general notes:
In most cases, SMP with postgres helps on concurrent queries,
but doesn't do
a lot for single queries.
180/SCSI is the only way to go. IDE just doen't work in the
best-performance cases.
Rip the OS down to the bare essentials. Don't even think of
using a stock OS kernel,
if you don't have to.
If you have a front end for inserts, and another front end for
queries, it may help to
reduce the front-end overhead. Often, there's a bottleneck which
is assumed
to be the db engine, when it's really the access method.

HTH,
-Ronabop

--2D426F70|759328624|00101101010000100110111101110000
ron@opus1.com, 520-326-6109, http://www.opus1.com/ron/
The opinions expressed in this email are not necessarily those
of myself,
my employers, or any of the other little voices in my head.





Re: Requirements for a database server

From
Robert.Farrugia@go.com.mt
Date:
Tom,

I have just rebuilded some of the indexes, with the result than space has
been recouped (about 15GB).  Thanks for the tip.

I'm trying to vacuum the tables, but the pg_xlog is gettting very big
(filling up the harddrive).  As an estimate, how much space do you need to
vacuum a 30GB table (data + indexes) ?

Thanks
Robert




                    Tom Lane
                    <tgl@sss.pgh.        To:     Robert.Farrugia@go.com.mt
                    pa.us>               cc:     pgsql-admin@postgresql.org
                                         Subject:     Re: [ADMIN] Requirements for a database server
                    18/07/2001
                    17:48






Robert.Farrugia@go.com.mt writes:
> I have been using postgres for the last year now.  The database has grown
> from a mere few MBs to over 100GB data and expected to top up 300GB by
the
> end of the year.  Lately performance of queries, inserts, updates has
> continued to grow worse as the dataset has grown larger, even though most
> queries have indexes on them, while vacuuming the database has become a
> nightmare.

Have you tried dropping and rebuilding the indexes?

Currently, PG doesn't reclaim dead space in indexes very effectively,
so the indexes on a frequently-updated table tend to grow without bound.
(I may or may not be able to fix this for 7.2 --- it's next on my
to-look-at list, but no promises.)  In the meantime, an occasional
rebuild may help restore performance.

BTW, the vacuuming issue is pretty well fixed for 7.2 ...

                               regards, tom lane





Re: Requirements for a database server

From
Tom Lane
Date:
Robert.Farrugia@go.com.mt writes:
> I'm trying to vacuum the tables, but the pg_xlog is gettting very big
> (filling up the harddrive).  As an estimate, how much space do you need to
> vacuum a 30GB table (data + indexes) ?

Not a lot, if you apply the previously posted patch to truncate the xlog
a little more enthusiastically.  I'm too tired to look up the URL yet
again, but it's in the patches archives from mid-June or so.

            regards, tom lane