Thread: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
From
Onni Hakala
Date:
Hey, I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives. I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB. Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes. What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast to accessand old data would eventually only be stored in compressed format in the HDDs? I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how toadd the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from here. Thanks in advance, Onni Hakala
Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
From
Bzzzz
Date:
On Wed, 16 Oct 2024 17:06:24 +0300 Onni Hakala <onni@fyff.ee> wrote: > Hey, Hi, > I have a large dataset of > 100TB which would be very expensive to > store solely into SSD drives. > > I have access to a server which has 2x 3.84TB NVME SSD disks and > large array of HDD drives 8 x 22TB. > > Most of the data that I have in my dataset is very rarely accessed > and is stored only for archival purposes. I can't answer to the rest but with the configuration you describe, you might meet some slow down because HDD speed is not what you think it is when you reach a barrier in disks filling. A test on a 2TB/7200RPM returns that, at the beginning of the disk it is quite fast : ======================================================================== hdparm --offset 1g --direct -t /dev/sdb1 /dev/sdb1: geometry = 243201/255/63, sectors = 3907027087, start = 2048 Timing O_DIRECT disk reads (offset 1 GB): 504 MB in 3.00 seconds = 167.97 MB/sec @ 50%, there's even a small gain (NB : this is a PITA, hdparm displays GB where it uses in fact GiB:( : ======================================================================= hdparm --offset 931g --direct -t /dev/sdb1 /dev/sdb1: geometry = 243201/255/63, sectors = 3907027087, start = 2048 Timing O_DIRECT disk reads (offset 931 GB): 520 MB in 3.01 seconds = 172.73 MB/sec @ 70% of the disk, the loss is only 11% : ========================================= hdparm --offset 1304g --direct -t /dev/sdb1 /dev/sdb1: geometry = 243201/255/63, sectors = 3907027087, start = 2048 Timing O_DIRECT disk reads (offset 1304 GB): 440 MB in 3.01 seconds = 146.38 MB/sec @ 80% of the disk the read speed has already lost 21% : ======================================================= hdparm --offset 1490g --direct -t /dev/sdb1 /dev/sdb1: geometry = 243201/255/63, sectors = 3907027087, start = 2048 Timing O_DIRECT disk reads (offset 1490 GB): 390 MB in 3.01 seconds = 129.68 MB/sec but @ 90% of the disk, the loss climbs @ 30.5% : ================================================ hdparm --offset 1676g --direct -t /dev/sdb1 /dev/sdb1: geometry = 243201/255/63, sectors = 3907027087, start = 2048 Timing O_DIRECT disk reads (offset 1676 GB): 344 MB in 3.01 seconds = 114.45 MB/sec So, for a good response time, it is best to never fill your disks further than 75% and it is also always a good practice to avoid some inconvenience, especially when using a COW FS, such as ZFS. Jean-Yves --
Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
From
TIM CHILD
Date:
Omni, Firstly, it looks like to have a solution, ( a fixed number and type of disks) looking for a problem. Its better to considerwhat is right mix of disks for your application the server. To understand the best physical layout, you need to know the logical access patterns to your data. When you access yourdata are you doing random queries, or index scans or table scans or computing aggregates? What queries are most importantand what response time are you targeting? Once you have an idea of your logical access patterns you map that to you physical layout. Postgres Tablespaces are useful as they allow you locate tables and indexes of different physical devices. For example itwill be better to put small frequently used tables of SSD's. Or just put frequently used indexes on the SSD. There are many factors to consider when planning the physical storage layout. There could me a need for AI apps to do justthat! -Tim > On 10/16/2024 7:06 AM PDT Onni Hakala <onni@fyff.ee> wrote: > > > Hey, > > I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives. > > I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB. > > Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes. > > What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast toaccess and old data would eventually only be stored in compressed format in the HDDs? > > I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how toadd the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from here. > > Thanks in advance, > Onni Hakala
Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
From
Laurenz Albe
Date:
On Wed, 2024-10-16 at 17:06 +0300, Onni Hakala wrote: > I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives. > > I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB. > > Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes. > > What would be the de-facto way to use both SSD and HDD together in a way use where commonly used > data would be fast to access and old data would eventually only be stored in compressed format in the HDDs? > > I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs > but I’m unsure how to add the SSDs into this equation and I thought that this is probably a common > scenario and wanted to ask opinions from here. The traditionalway to handle that would be to create an extra tablespace on a file system built on the slow disks. You'd move tables that are no longer hot to that slow tablespace. If you cannot move the whole table, use partitioning and move old partitions to the slow tablespace. Yours, Laurenz Albe