Thread: Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

From
Gunther Schadow
Date:
Hi all, I have a general question on scaling PostgreSQL for unlimited 
throughput, based on some experience.

TL;DR: My question is: given that the work-load on any secondary/standby 
database server is almost the same as that of the master database 
server, is there any point to bother with PgPool-II to route query 
activity over the hot standby's, is it instead not better to just 
increase the power of the master database system? Is there any trick 
that can really get to massive scalability with the database?

Background: Let's say we're building a massive NSA citizens surveillance 
system where we process every call and every email of every person on 
earth to find dissidents and link this to their financial transactions 
travel logs like airline bookings and Uber rides, to find some bogus 
charges that FBI agents could use to put any dissident in prison as soon 
as possible. And so we need a system that infinitely scales.

OK, I'm kidding (but keep thinking of adverse effects of our IT work) 
but the point is a massive data processing system. We know that 
parallelizing all work flows is the key to keep up. Processing the 
speech and emails and messages and other transactions is mostly doable 
by throwing hardware at it to parallelize the workflows.

There's always going to be one common bottleneck: that database.

In my experience of parallelizing workflow processes, I can hammer my 
PostgreSQL database and all I can do to keep up with that is broadening 
the IO pipeline and looking at the balance of CPU and IO to make sure 
that it's balanced at near 100% and as I add more CPU bandwidth I add 
more IO bandwidth and so on to keep those gigabytes flowing and the CPUs 
churning. But it's much harder with the database than with the message 
transformations (natural language understanding, data extraction, image 
processing, etc.)

I have set up a hot standby database which I thought would just keep 
track with the master, and which I could use to run queries while the 
insert, update, and delete operations would all go against the master 
db. What I discovered is that the stress on the hot standby systems is 
significant just to keep up! The replaying of these logs takes 
significant resources, so much that if I use a less powerful hardware 
for the secondary, it tends to fall behind and ultimately bails out 
because it cannot process the log stream.

So, if my secondary is so busy already with just keeping up to date with 
the master db, and I cannot use a significantly smaller hardware, how 
can I put a lot of extra query load on these secondary systems? My 
argument is GENERAL not "show me your schema", etc. I am talking about 
principles. I read it somewhere that you need to dimension these 
secondaries / standby servers about the same capacity as the master 
server. And that means that the standby servers are about as busy as the 
master server. And that means that as you scale this up, the scaling is 
actually quite inefficient. I have to copy all that data while the 
receiving end of all that data is as busy receiving this data as the 
master server is with processing the actual transactions.

Doesn't that mean that it's better to just scale up the master system as 
much as possible while the standby servers are only a means of fault 
tolerance but never actually improved performance? In other words there 
is no real benefit of running read/query-only workloads on the 
secondaries and routing updates to the primary, because the background 
workload is replicated with every standby server and is not 
significantly less than the workload on the master server.

And in other words, isn't there a way to replicate that is more 
efficient? Or are there hard limits? Again, I'm talking principles.

For example, if I just make exact disk copies of the data tables on the 
SCSI bus level (like RAID-1) for block write transactions while I 
distribute the block read transactions over the RAID-1 spindles, again, 
most of my disks are still occupied with the write transactions because 
they all must write everything while I can distribute only the read 
activity. I suppose I can use some tricks to avoid seek time by 
scheduling reads to those disks that are currently writing to the same 
cylinder (I know that's moot with SSDs but there is some locality issues 
even for DDR RAM access, so the principle still holds). I suppose I 
could tweak the mirrors, to track the master write with a slight delay 
so as to allow some potential to re-organize blocks so as to write 
contiguous blocks or blocks that go to the same track. But this type of 
write scheduling is what OSs do out of a cache.

So, my question is: isn't there any trick that can really get to massive 
scalability with the database? Should I even bother with PgPool-II to 
route query activity over hot standbys?  I can buy two boxes of n CPUs 
and disk volume to run as master and slave, or I can spend the same 
money to buy a single system with twice the CPU cores and a twice as 
wide IO path and disks. Why would I do anything other but to just 
increase that master db server?

regards,
-Gunther





Re: Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

From
Marco Colli
Date:
 Hello,

I asked the same question to myself in the past years.

I think that the question boils down to: 
How can I achieve unlimited database scalability? 
Is it possible to have linear scalability (i.e. throughput increases proportionally to the number of nodes)?

The answer is "sharding". It can be a custom solution or a database that supports it automatically. In this way you can actually split data across multiple nodes and the client contacts only the relevant servers for that data (based on a shard key). See also https://kubernetes-rails.com/#conclusion about database. Take a look at how Cassandra, MongoDB, CouchDB and Redis Cluster work for example: however there are huge limitations / drawbacks that come along with their unlimited-scalability strategies.

For hot standbys, those are only useful if you have a relatively small number of writes compared to reads. With that slave nodes you only scale the *read* throughput.

Hope it helps,
Marco Colli







Re: Conundrum with scaling out of bottleneck with hot standby, PgPool-II, etc.

From
Bruce Momjian
Date:
On Wed, Dec 23, 2020 at 07:34:01PM +0100, Marco Colli wrote:
>  Hello,
> 
> I asked the same question to myself in the past years.
> 
> I think that the question boils down to: 
> How can I achieve unlimited database scalability? 
> Is it possible to have linear scalability (i.e. throughput increases
> proportionally to the number of nodes)?
> 
> The answer is "sharding". It can be a custom solution or a database that
> supports it automatically. In this way you can actually split data across
> multiple nodes and the client contacts only the relevant servers for that data
> (based on a shard key). See also https://kubernetes-rails.com/#conclusion about
> database. Take a look at how Cassandra, MongoDB, CouchDB and Redis Cluster work
> for example: however there are huge limitations / drawbacks that come along
> with their unlimited-scalability strategies.
> 
> For hot standbys, those are only useful if you have a relatively small number
> of writes compared to reads. With that slave nodes you only scale the *read*
> throughput.

Agreed.  There are really two parameters:

1.  percentage of reads vs writes
2.  number of standbys

If you have a high value for #1, it makes sense to use pgpool, but
having only one standby doesn't buy you much;  add three, and you will
see an impact.  Second, if writes are high, only scaling up the primary
or adding sharding will help you.  It is kind of an odd calculus, but it
makes sense.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee