Re: Hardware for writing/updating 12,000,000 rows per hour - Mailing list pgsql-general

From Neil
Subject Re: Hardware for writing/updating 12,000,000 rows per hour
Date
Msg-id 52DC320F-D44A-4CB1-9E36-0F7FC15680DA@fairwindsoft.com
Whole thread Raw
In response to Re: Hardware for writing/updating 12,000,000 rows per hour  (Arya F <arya6000@gmail.com>)
Responses RE: Hardware for writing/updating 12,000,000 rows per hour  (farjad.farid <farjad.farid@checknetworks.com>)
List pgsql-general

On Jul 27, 2019, at 11:55 AM, Arya F <arya6000@gmail.com> wrote:

On Sat, Jul 27, 2019 at 11:49 AM farjad.farid <farjad.farid@checknetworks.com> wrote:
With this kind of design requirements it is worth considering hardware "failure & recovery". Even SSDs can and do fail.

It is not just a matter of just speed. RAID disks of some kind, depending on the budget is worth the effort. 

-----Original Message-----
From: Alvaro Herrera <alvherre@2ndquadrant.com>
Sent: 2019 July 26 22:39
To: Arya F <arya6000@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Ron <ronljohnsonjr@gmail.com>; pgsql-general@lists.postgresql.org
Subject: Re: Hardware for writing/updating 12,000,000 rows per hour

On 2019-Jul-26, Arya F wrote:

> I think I can modify my application to do a batch update. Right now
> the server has an HDD and it really can't handle a lot of updates and
> inserts per second. Would changing to a regular SSD be able to easily
> do 3000 updates per second?

That's a pretty hard question in isolation -- you need to consider how many indexes are there to update, whether the updated columns are indexed or not, what the datatypes are, how much locality of access you'll have ... I'm probably missing some other important factors.  (Of course, you'll have to tune various PG server settings to find your sweet spot.)

I suggest that should be measuring instead of trying to guess.  A reasonably cheap way is to rent a machine somewhere with the type of hardware you think you'll need, and run your workload there for long enough, making sure to carefully observe important metrics such as table size, accumulated bloat, checkpoint regime, overall I/O activity, and so on.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi Farjad

I was thinking of having physical or logical replication. Or is having RAID a must if I don’t want to lose data? 

Arya,

Answering the question of what hardware would work better requires a lot of thought. I would suggest that you hire someone that can do the calculations to get you in the ball park and provide other pros and cons.  Then test with rented or cloud equipment like Álvaro suggested.  The following are some basic back of the hand calculations to show you the thought process, the numbers presented here are probably way off and need to be based on your hardware and research.

The first thing you need to understand is how much data are you going to be moving around.  You mentioned ’12,000,000 writes/updates per hour’ and the size of the tuples getting updated at about 2k.  That is about 3,333 per second (as someone else already noted). That translates to about 3333 * 2000 = 6.6 MB/sec of basic data movement if they are all inserts, not including index updates, replication, vacuum, etc.  If they are all updates then you can double that.  So lets say they are 1/2 updates, that means basic data movement is 2 x for updates and 1x for the inserts so that changes it to 9.9 MB/sec.  Lets say the index keys are total 200 Bytes we have 3,330 * 200 * 1.5 (half update, half inserts) = 0.99 MB/Sec.  If you have an existing system all of these things can be measured.

This brings the total to 10.9 MB/sec sustained operation minimum, not including WAL, OS, Vacuum processing, etc. and provided the data is being processed evenly over time, which it never is.  This will start to tax a standard HDD since it has to handle the OS, WAL log (about the same I/O as the database), and database, which probably puts the sustained usage certainly above 22MB/sec, considering a single drive handling a little over 40MB/sec (and most drives do not perform at their rating). Considering that data storage is not processed evenly over time, then you also need to consider peaks in the processing and multiply this data rate by a factor (which without knowing your data access pattern is impossible to predict).  So we already suspect based on the fact that your HDD is not handling it that there might be more going on than you have provided and that a single hard drive is nowhere near adequate.

Now lets think about architecture.

Best practice would say to have the OS on one drive, the WAL log on another drive, and the database on another drive.  The reason is that you can probably get 40+MB/Sec on each HDD drive. Of course server grade hardware with 15,000 RPM HDDs would be higher performance.  If you only have one drive then the OS, the WAL log, and the database are competing for the one resource.  With 3 drives, depending on the I/O architecture (not a laptop) you can probably get 3 times the I/O throughput, or 40+MB/sec on each HDD in our contrived example.

The other major help is memory. If you can fit the whole database in memory or at least the part that is being used regularly, then slow I/O from an HDD is not so important for update operations.  But again that depends on the data access patterns, if you are updating the same tuple over and over then it can stay in memory most of the time and can reduce database HDD I/O.  You still have WAL I/O.  Lets say this is about 39GB/hour (10.9MB/sec * 3600).  So with 39 GB of data to hold most of the database in memory, would probably require 48GB or 64GB of main memory.  Of course there are many reasons why this would work with less or need more. Additional memory might allow you to continue to use HDDs depending on access patterns.  Sometimes more memory is cheaper than more or faster disks.

Now about SDDs.  SDDs should easily give you 6 to 10 times the throughput, provided the computer is designed for SDDs.  This means that you might, based on the back of hand calcs here, get by with one SDD.  However from a reliability standpoint I don’t think anyone would recommend that.

The other issue you need to consider is how many CPU cores you should have.  This won’t effect I/O throughput, but it will determine how many simultaneous connections can be processing at one time.  It might be that CPU starvation is limiting the data rate of your current hardware.  So if you have more CPUs the sustained or peak data rates might go up, depending on how you calculated the ’12,000,000 writes/updates per hour’.

None of these ideas here consider reliability which would determine whether the 3 drives are bare, mirrored, raid, or jbods.  Also note that any form of raid can reduce the throughput and have other reliability problems if not correctly engineered.  Data loss is a completely different set of issues and can be handled many ways.

Raid may provide reliability if your storage fails, but may not provide reliability to the customer if the computer itself fails, the power supply fails, or the network fails.  If customer reliability is critical, then I would expect you to have duplicate hardware with failover.  In this case, for example, it is not completely crazy to run the OS drive as a single bare drive, no raid.  If it fails, the system just switches to the failover hardware while you fix it.  If you cannot afford this type of redundancy, then you might have a single system with redundant power suppliers, mirrored drives, etc.  It all depends on your risk issues, the amount of money you have to spend, and the technical knowledge you have to manage the system.  This last part is very important when a failure occurs.  You don’t want to have to learn while you are in a failure or recovery situation. The more complex your system is to support high availability the more knowledge you need.

If you are unsure of these issues, which are pretty basic system admin and hardware design issues, then you should get someone with experience to help you.

Neil






pgsql-general by date:

Previous
From: Andres Freund
Date:
Subject: Re: Why does backend send buffer size hardcoded at 8KB?
Next
From: Luca Ferrari
Date:
Subject: Re: Too slow to create new schema and their tables, functions, triggers.