Postgres for a "data warehouse", 5-10 TB - Mailing list pgsql-performance

From Igor Chudov
Subject Postgres for a "data warehouse", 5-10 TB
Date
Msg-id CAMhtkAbK7saRO_Tayb1nS3YLWN=Wns=Ci47qs-3YbX0T2+hkpA@mail.gmail.com
Whole thread Raw
Responses Re: Postgres for a "data warehouse", 5-10 TB
Re: Postgres for a "data warehouse", 5-10 TB
Re: Postgres for a "data warehouse", 5-10 TB
List pgsql-performance
I have been a MySQL user for years, including owning a few
multi-gigabyte databases for my websites, and using it to host
algebra.com (about 12 GB database).

I have had my ups and downs with MySQL. The ups were ease of use and
decent performance for small databases such as algebra.com. The downs
were things like twenty hour REPAIR TABLE operations on a 35 GB
table, etc.

Right now I have a personal (one user) project to create a 5-10
Terabyte data warehouse. The largest table will consume the most space
and will take, perhaps, 200,000,000 rows.

I want to use it to obtain valuable business intelligence and to make
money.

I expect it to grow, never shrink, and to be accessed via batch
queries. I do not care for batch queries to be super fast, for example
an hour per query would be just fine.

However, while an hour is fine, two weeks per query is NOT fine.

I have a server with about 18 TB of storage and 48 GB of RAM, and 12
CPU cores.

My initial plan was to use MySQL, InnoDB, and deal with problems as
they arise. Perhaps, say, I would implement my own joining
procedures.

After reading some disparaging stuff about InnoDB performance on large
datasets, however, I am getting cold feet. I have a general feeling
that, perhaps, I will not be able to succeed with MySQL, or, perhaps,
with either MySQL and Postgres.

I do not know much about Postgres, but I am very eager to learn and
see if I can use it for my purposes more effectively than MySQL.

I cannot shell out $47,000 per CPU for Oracle for this project.

To be more specific, the batch queries that I would do, I hope,
would either use small JOINS of a small dataset to a large dataset, or
just SELECTS from one big table.

So... Can Postgres support a 5-10 TB database with the use pattern
stated above?

Thanks!

i

pgsql-performance by date:

Previous
From: Hany ABOU-GHOURY
Date:
Subject: Databases optimization
Next
From: Scott Marlowe
Date:
Subject: Re: Postgres for a "data warehouse", 5-10 TB