Thread: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server

Hey,

I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.

I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.

Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.

What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast to
accessand old data would eventually only be stored in compressed format in the HDDs? 

I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how
toadd the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from
here.

Thanks in advance,
Onni Hakala




On Wed, 16 Oct 2024 17:06:24 +0300
Onni Hakala <onni@fyff.ee> wrote:

> Hey,

Hi,

> I have a large dataset of > 100TB which would be very expensive to
> store solely into SSD drives.
>
> I have access to a server which has 2x 3.84TB NVME SSD disks and
> large array of HDD drives 8 x 22TB.
>
> Most of the data that I have in my dataset is very rarely accessed
> and is stored only for archival purposes.

I can't answer to the rest but with the configuration you describe,
you might meet some slow down because HDD speed is not what you think
it is when you reach a barrier in disks filling.


A test on a 2TB/7200RPM returns that, at the beginning of the disk it is
quite fast :
========================================================================
hdparm --offset 1g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1 GB): 504 MB in  3.00 seconds =
 167.97 MB/sec


@ 50%, there's even a small gain (NB : this is a PITA, hdparm displays
GB where it uses in fact GiB:( :
=======================================================================
hdparm --offset 931g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 931 GB): 520 MB in  3.01 seconds =
 172.73 MB/sec


@ 70% of the disk, the loss is only 11% :
=========================================
hdparm --offset 1304g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1304 GB): 440 MB in  3.01 seconds =
 146.38 MB/sec


@ 80% of the disk the read speed has already lost 21% :
=======================================================
hdparm --offset 1490g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1490 GB): 390 MB in  3.01 seconds =
129.68 MB/sec


but @ 90% of the disk, the loss climbs @ 30.5% :
================================================
hdparm --offset 1676g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1676 GB): 344 MB in  3.01 seconds =
 114.45 MB/sec


So, for a good response time, it is best to never fill your disks
further than 75% and it is also always a good practice to avoid
some inconvenience, especially when using a COW FS, such as ZFS.

Jean-Yves

--



Omni,

Firstly, it looks like to have a solution, ( a fixed number and type of disks) looking for a problem. Its better to
considerwhat is right mix of disks for your application the server.  

To understand the best physical layout, you need to know the logical  access patterns to your data. When you access
yourdata are you doing random queries, or index scans or table scans or  computing aggregates? What queries are most
importantand what response time are you targeting?  
Once you have an idea of your logical access patterns you map that to you physical layout.

Postgres Tablespaces are useful as they allow you locate tables  and indexes of different physical devices. For example
itwill be better to put small frequently used tables of SSD's. Or just put frequently used indexes on the SSD.   

There are many factors to consider when planning the physical storage layout. There could me a need for AI apps to do
justthat! 

-Tim


> On 10/16/2024 7:06 AM PDT Onni Hakala <onni@fyff.ee> wrote:
>
>
> Hey,
>
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
>
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
>
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
>
> What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast
toaccess and old data would eventually only be stored in compressed format in the HDDs? 
>
> I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how
toadd the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from
here.
>
> Thanks in advance,
> Onni Hakala



On Wed, 2024-10-16 at 17:06 +0300, Onni Hakala wrote:
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
>
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
>
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
>
> What would be the de-facto way to use both SSD and HDD together in a way use where commonly used
> data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?
>
> I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs
> but I’m unsure how to add the SSDs into this equation and I thought that this is probably a common
> scenario and wanted to ask opinions from here.

The traditionalway to handle that would be to create an extra tablespace on a file system built
on the slow disks.  You'd move tables that are no longer hot to that slow tablespace.
If you cannot move the whole table, use partitioning and move old partitions to the slow
tablespace.

Yours,
Laurenz Albe