Re: Rearchitecting for storage - Mailing list pgsql-general

From Andy Colson
Subject Re: Rearchitecting for storage
Date
Msg-id ed3320ff-5a6f-c718-9bea-9db50502e3fa@squeakycode.net
Whole thread Raw
In response to Rearchitecting for storage  (Matthew Pounsett <matt@conundrum.com>)
Responses Re: Rearchitecting for storage  (Matthew Pounsett <matt@conundrum.com>)
List pgsql-general
On 7/18/19 8:44 AM, Matthew Pounsett wrote:
> 
> I've recently inherited a database that is dangerously close to outgrowing the available storage on its existing
hardware. I'm looking for (pointers to) advice on scaling the storage in a financially constrained not-for-profit.
 
> 
> The current size of the DB's data directory is just shy of 23TB.  When I received the machine it's on, it was
configuredwith 18x3TB drives in RAID10 (9x 2-drive mirrors striped together) for about 28TB of available storage.  As a
shortterm measure I've reconfigured them into RAID50 (3x 6-drive RAID5 arrays).  This is obviously a poor choice for
performance,but it'll get us through until we figure out what to do about upgrading/replacing the hardware.  The host
isconstrained to 24x3TB drives, so we can't get much of an upgrade by just adding/replacing disks.
 
> 
> One of my anticipated requirements for any replacement we design is that I should be able to do upgrades of Postgres
forup to five years without needing major upgrades to the hardware.  My understanding of the standard upgrade process
isthat this requires that the data directory be smaller than the free storage (so that there is room to hold two copies
ofthe data directory simultaneously).  I haven't got detailed growth statistics yet, but given that the DB has grown to
23TBin 5 years, I should assume that it could double in the next five years, requiring 100TB of available storage to be
ableto do updates.
 
> 
> This seems to be right on the cusp of what is possible to fit in a single chassis with a RAID10 configuration (at
least,with commodify hardware), which means we're looking at pretty high cost:performance ratio.  I'd like to see if we
canfind designs that get that ratio down a bit, or a lot, but I'm a general sysadmin, and the detailed effects on those
choicesare outside of my limited DBA experience.
 
> 
> Are there good documents out there on sizing hardware for this sort of mid-range storage requirement, that is neither
bigdata, nor "small data" able to fit on a single host?   I'm hoping for an overview of the tradeoffs between single
head,dual-head setups with a JBOD array, or whatever else is advisable to consider these days.  Corrections of any poor
assumptionsexposed above are also quite welcome. :)
 
> 
> Thanks in advance for any assistance!
> 

Now might be a good time to consider splitting the database onto multiple computers.  Might be simpler with a mid-range
database,then your plan for the future is "add more computers".
 

-Andy



pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: Rearchitecting for storage
Next
From: Where is Where
Date:
Subject: maximum distance vs fixed distance for tsquery_phrase