Rearchitecting for storage - Mailing list pgsql-general

From Matthew Pounsett
Subject Rearchitecting for storage
Date
Msg-id CAAiTEH-442wghJmt=Yw2bnUvxNW-nwTMoFLqgLrTZ1CmZxiZJA@mail.gmail.com
Whole thread Raw
Responses Re: Rearchitecting for storage  (Kenneth Marshall <ktm@rice.edu>)
Re: Rearchitecting for storage  (Andy Colson <andy@squeakycode.net>)
Re: Rearchitecting for storage  (Matthew Pounsett <matt@conundrum.com>)
List pgsql-general

I've recently inherited a database that is dangerously close to outgrowing the available storage on its existing hardware.  I'm looking for (pointers to) advice on scaling the storage in a financially constrained not-for-profit.

The current size of the DB's data directory is just shy of 23TB.  When I received the machine it's on, it was configured with 18x3TB drives in RAID10 (9x 2-drive mirrors striped together) for about 28TB of available storage.  As a short term measure I've reconfigured them into RAID50 (3x 6-drive RAID5 arrays).  This is obviously a poor choice for performance, but it'll get us through until we figure out what to do about upgrading/replacing the hardware.  The host is constrained to 24x3TB drives, so we can't get much of an upgrade by just adding/replacing disks.

One of my anticipated requirements for any replacement we design is that I should be able to do upgrades of Postgres for up to five years without needing major upgrades to the hardware.  My understanding of the standard upgrade process is that this requires that the data directory be smaller than the free storage (so that there is room to hold two copies of the data directory simultaneously).  I haven't got detailed growth statistics yet, but given that the DB has grown to 23TB in 5 years, I should assume that it could double in the next five years, requiring 100TB of available storage to be able to do updates.  

This seems to be right on the cusp of what is possible to fit in a single chassis with a RAID10 configuration (at least, with commodify hardware), which means we're looking at pretty high cost:performance ratio.  I'd like to see if we can find designs that get that ratio down a bit, or a lot, but I'm a general sysadmin, and the detailed effects on those choices are outside of my limited DBA experience.

Are there good documents out there on sizing hardware for this sort of mid-range storage requirement, that is neither big data, nor "small data" able to fit on a single host?   I'm hoping for an overview of the tradeoffs between single head, dual-head setups with a JBOD array, or whatever else is advisable to consider these days.  Corrections of any poor assumptions exposed above are also quite welcome. :)

Thanks in advance for any assistance!




pgsql-general by date:

Previous
From: Luca Ferrari
Date:
Subject: Re: Postgers 9.3 - ubuntu 16.04 - Are clogs entires automatically deleted?
Next
From: Dirk Riehle
Date:
Subject: PostgreSQL as a Service