Thread: Is It Too Big ? - Customer Data Warehouse Growth Projection

Is It Too Big ? - Customer Data Warehouse Growth Projection

From
Mark kirkwood
Date:
I have a customer using Postgresql for a data warehouse. Their current
setup is :

5G database with a 50M data load each week.

They are running Pg 7.1.3 on Redhat 7.2
the platform is a dual P1000+768M Ram+5x18G 15000 rpm scsi with raid 5

Their growth projections are :
1 month     30G database with a 300M weekly load
3 months    60G database with a 600M weekly load
4 months   300G database with a 3000M weekly load

I would appreciate any suggestions on how to plan for this growth.

- Clearly hardware will need to be looked at (can we cope continuing to
use Intel based platform(s)? )
- Also software, are we okay using Postgresql for a 300G database (is
7.2 aimed at this sized undertaking?), How about the os ( can we keep
using Linux or Freebsd ?).


I am informed that the expected number of users is low, so that major
challenges are big queries and the weekly loads. Most of the space used
by the database will be in 1 very big and 1 big table.

Thanks in advance

Mark



Re: Is It Too Big ? - Customer Data Warehouse Growth Projection

From
Tom Lane
Date:
Mark kirkwood <markir@slingshot.co.nz> writes:
> I would appreciate any suggestions on how to plan for this growth.

> - Clearly hardware will need to be looked at (can we cope continuing to
> use Intel based platform(s)? )
> - Also software, are we okay using Postgresql for a 300G database (is
> 7.2 aimed at this sized undertaking?), How about the os ( can we keep
> using Linux or Freebsd ?).

7.2 is aimed more at 24x7 operation; as far as size of database goes,
I wouldn't think it would be much better than 7.1.  Probably the major
issue for you is that PG doesn't have any provision for spreading
a database across multiple filesystems --- so you will have to put the
whole DB on one honkin' big RAID array, or use an LVM layer to spread
the filesystem across multiple drives in software.

I seem to recall having heard that Linux has some ~100GB restriction on
the size of a single filesystem, which could be a problem.  That might
be obsolete information, but be sure to check max filesystem size for
whichever kernel you select.

> I am informed that the expected number of users is low, so that major
> challenges are big queries and the weekly loads. Most of the space used
> by the database will be in 1 very big and 1 big table.

PG's hard limit on the size of a single table is 2 billion pages, which
is 16TB with the default page size and 64TB if you compile with BLCKSZ
set to 32K.  The nonstandard choice might be a good option for this DB.

The prospect of having to dump this database for backup seems a tad
daunting.  What are you using for a backup device?

            regards, tom lane

Re: Is It Too Big ? - Customer Data Warehouse Growth

From
Steve Wampler
Date:
Tom Lane wrote:
>

> I seem to recall having heard that Linux has some ~100GB restriction on
> the size of a single filesystem, which could be a problem.  That might
> be obsolete information, but be sure to check max filesystem size for
> whichever kernel you select.
>

I don't think Linux does, though some filesystems might:

->uname -a
Linux tapycer 2.4.5 #2 SMP Mon Jun 18 16:09:41 MST 2001 i686 unknown
->df
Filesystem           1k-blocks      Used Available Use% Mounted on
...
/dev/pool/pool0      366529628   1318912 365210716   0% /mnt/san1
/dev/pool/pool1       96442588   7028276  89414312   7% /mnt/san2
/dev/pool/pool2      145972564  21280320 124692244  15% /mnt/san3

These are all GFS filesystems (www.sistina.com) located on a SAN.

--
Steve Wampler -- swampler@noao.edu
O sibile, si ergo.  Fortibus es enaro.
Nobile, demis trux.  Demis phulla causan dux.

Re: Is It Too Big ? - Customer Data Warehouse Growth

From
Neil Conway
Date:
On Thu, 2002-01-17 at 09:33, Tom Lane wrote:
> I seem to recall having heard that Linux has some ~100GB restriction on
> the size of a single filesystem, which could be a problem.  That might
> be obsolete information, but be sure to check max filesystem size for
> whichever kernel you select.

No, ext2 (Linux's default filesystem) can use partitions up to 4TB in
size. For a system this large, something like XFS might be a good choice
(it's been ported to Linux by SGI) -- IIRC, it supports filesystems
larger than a petabyte.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: Is It Too Big ? - Customer Data Warehouse Growth

From
Mark kirkwood
Date:
On Fri, 2002-01-18 at 03:33, Tom Lane wrote:

> The prospect of having to dump this database for backup seems a tad
> daunting.  What are you using for a backup device?
>
The backup scenario is currently:

pg_dump to local disk, copy the archive to another machine put on a dlt
(alt-? have to check) tape.

I dont think the dumping to disk + copying is going to be sustainable,
but the customer likes the idea of the backup being immediately
available.

regards

Mark




Re: Is It Too Big ? - Customer Data Warehouse Growth

From
Mark kirkwood
Date:
On Fri, 2002-01-18 at 03:57, Jim Buttafuoco wrote:
> Mark/All,
>
> I currently have 4 databases at ~800G running RedHat Linux 6.2 and Postgres
> 7.0.X (Yes this is an old version but the customer is happy).  The application
> is a data warehouse.  We moved from a 3 node (16 Alphla CPU's, 12G memory)
> Oracle parallel server to the 4 Intel (dual 733Mhz each) systems.  Our
> customer is very happy with both the cost of the Linux/Postgres system and the
> performance.  Each system has an exact copy of the database (using a custom
> rserv to keep the systems in sync).
>
> Jim
>
>
Jim,

Thanks for the information - very encouraging, I hoped we could keep
using Intel + Postgresql.

I have a couple more questions :-)

What do you use to back these db's up ?
What sort of io system and raid level are you using ?


Thanks in advance

Mark




Re: Is It Too Big ? - Customer Data Warehouse Growth

From
"Command Prompt, Inc."
Date:
Hello,

Use pg_dump over the wire. Set up a machine that has a 100MB card in it
and pull the dump straight to another machine for processing.

J

On 19 Jan 2002, Mark kirkwood wrote:

> On Fri, 2002-01-18 at 03:33, Tom Lane wrote:
>
> > The prospect of having to dump this database for backup seems a tad
> > daunting.  What are you using for a backup device?
> >
> The backup scenario is currently:
>
> pg_dump to local disk, copy the archive to another machine put on a dlt
> (alt-? have to check) tape.
>
> I dont think the dumping to disk + copying is going to be sustainable,
> but the customer likes the idea of the backup being immediately
> available.
>
> regards
>
> Mark
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
--
by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/


Re: Is It Too Big ? - Customer Data Warehouse Growth

From
mordicus
Date:
Le Samedi 19 Janvier 2002 03:44, Mark kirkwood a écrit :
> On Fri, 2002-01-18 at 03:33, Tom Lane wrote:
> > The prospect of having to dump this database for backup seems a tad
> > daunting.  What are you using for a backup device?
>
> The backup scenario is currently:
>
> pg_dump to local disk, copy the archive to another machine put on a dlt
> (alt-? have to check) tape.
>

Dlt tape, very good choice :)

install pg_dump on the backup machine so you don't have to copy from machine
A to machine B