Hi,
On 01/22/2018 09:46 PM, Alban Hertroys wrote:
> Hi all,
>
> At work we are in the process of setting up a data-warehouse using PG
> 10. I'm looking for a suitable server, but I hardly know anything
> about server-grade hardware.
>
> Hence, we're looking for some advice, preferably with some
> opportunity to discuss our situation and possibly things we did not
> take into account etc. A face to talk to would be appreciated. Who
> provides that in or near the eastern border of the Netherlands?
>
Coincidentally, there's a big conference (FOSDEM) in that area next
week, with a smaller PostgreSQL conference (FOSDEM PgDay) the day
before. Might be a good opportunity to talk to PostgreSQL people.
If you're looking for an actual consulting, there are multiple companies
that might help you with this (including ours).
> More details:
>
> We're planning to deploy on bare-metal hardware, with a fallback
> server with similar or lesser specs for emergencies and upgrades and
> perhaps some (read-only) load balancing of different kinds of loads.
>
> The server will be accessed for reporting and ETL (or ELT) mostly.
> Both reporting servers (test/devel and production) are configured for
> at most 40 agents, so that's max 40 connections each to the warehouse
> for now. So far, we haven't reached that number in real loads, but
> reports are requested ~40,000 times a month (we measure HTTP requests
> minus static content).
>
> We will also be doing ETL of (very) remote (SAP) tables to the
> warehouse server; in what we got so far in our limited test
> environment we have tables of over 30GB, most of which is from the
> last 4 to 5 years.
>
That's nice, but it does not really tell us how much work that means for
the database :-( Those queries might be touching tiny subset of the
data, or it might touch the whole data set. That will have significant
impact on the hardware requirements.
> The biggy though is that we also plan to store factory process
> measurements on this server (temperatures, pressures, etc. at 5s
> intervals).
So, time series data. I wonder if timescale [1] would be appropriate
here (never used it, but seems to be designed for exactly this use
case). And built on PostgreSQL.
[1] http://www.timescale.com/
> Part of one factory has already been writing that data to
> a different server, but that's already 4.3 billion records (140GB)
> for about a year of measuring and that's not even half of the
> factory. We will be required to retain 10-15 years of data from
> several factories (on the short term, at least 2). The expectancy is
> that this will grow to ~15TB for our factory alone.
>
Storing this amounts of data is not that difficult - the DL360 machines
can handle 40TB+ for example. The question is how intensive the data
access and processing will be, so that you can pick the right storage
configuration, size the amount of RAM etc.
FWIW it makes no sense to size this for 10-15 years from the get go,
because (a) you won't get it right anyway (things change over time), and
(b) you're unlikely to keep the same hardware for 10+ years.
Get smaller but more powerful hardware, plan to replace it in a couple
of years with never machines.
> We also want to keep our options for growth of this data warehouse
> open. There are some lab databases, for example, that currently exist
> as two separate brand database servers (with different major versions
> of the lab software, so there are design differences as well), that
> aren't exactly small either.
>
> I have been drooling over those shiny new AMD Epyc processors, which
> look certainly adequate with a truckload of memory and a good RAID-10
> array and some SSD(s) for the WAL, but it's really hard to figure out
> how many cores and memory we need. Sure, 2 7601's at 64 cores and 4TB
> of memory (if that's even for sale) would probably do the trick, but
> even I think that might be going a little overboard ;)
>
Well, you can either find out what your actual needs are (by looking at
the current system and extrapolating it in some way) and sizing the
hardware accordingly. Or you can squeeze as much money from the
management as possible, and buying the shiniest stuff out possible.
> Oh yeah, apparently we're married to HP or something… At least, IT
> management told me to look at their offerings.
>
I'd say Proliant machines are pretty solid.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services