Re: [SPAM] Re: Architectural question - Mailing list pgsql-performance

From Moreno Andreo
Subject Re: [SPAM] Re: Architectural question
Date
Msg-id 56CB1DE7.8090204@evolu-s.it
Whole thread Raw
In response to Re: Architectural question  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: [SPAM] Re: Architectural question
List pgsql-performance
Il 18/02/2016 21:33, Jim Nasby ha scritto:

Just before we go on, I have to say that I'm working on PostgreSQL for
about 10 years now, but while in the past "leave everything as it is"
worked, in the last 15 months I began to research and study how to
improve my server performance, so I'm quite a bit of novice in being a
DBA (but a novice that when needed reads a lot of documentation :-) )
So, if some questions may sound "strange", "noobish" to you, that's the
reason.

> On 2/11/16 12:06 PM, Moreno Andreo wrote:
>> Now, the actual question, is:
>> Having a VM that can be upgraded with a click on a panel and a reboot,
>> and that the server fault is only related to a OS failure, should I keep
>> a single-server solution (but I fear that I/O throughput will become
>> even more inadequate) or is it convenient to migrate in a 2-server
>> system? And, in case of 2-server configuration, what would you
>> recommend?
>
> Much of that depends on your disaster recovery strategy.
I'm planning to have a cron job that backups data (only data) overnight
(I was thinking something like pg_dumpall) and takes a snapshot of the
whole server over the weekend (If I'm not wrong, VMWare allows live
snapshots), so if something bad happens, I'll recover the snapshot from
last save and restore all databases from latest backup.

>
>> Scenario 1:
>> Given 350 databases, I split them in 2, 175 on server 1 and 175 on
>> server 2, having pgBouncer to resolve the connections and each server
>> has its own workload
>>
>> Scenario 2:
>> Server 1 -> Master, Server 2 -> Slave (Replicated with Slony or...?),
>> Server 1 for writes, Server 2 for reads
>
> Personally I'd do kind of a hybrid at this point.
>
> First, I'd split the masters across both servers, with a way to easily
> fail over if one of the servers dies.
>
> Next, I'd get streaming replication setup so that the half with
> masters on A have replicas on B and vice-versa. That way you can
> easily recover from one server or the other failing.
>
> Depending on your needs, could could use synchronous replication as
> part of that setup. You can even do that at a per-transaction level,
> so maybe you use sync rep most of the time, and just turn it off when
> inserting or updating BLOBS.
This sounds good, and when everything is OK we have I/O operation split
across the two servers; a small delay in synchronizing blobs should not
be a big deal, even if something bad happens (because of XLOG), right?

>
>> Last thing: should blobs (or the whole database directory itself) go in
>> a different partition, to optimize performance, or in VM environment
>> this is not a concern anymore?
>
> First: IMO concerns about blobs in the database are almost always
> overblown.
In many places I've been they say, at last, "BLOBs are slow". So I
considered this as another point to analyze while designing server
architecture. If you say "don't mind", then I won't.

> 30GB of blobs on modern hardware really isn't a big deal, and there's
> a *lot* to be said for not having to write the extra code to manage
> all that by hand.
What do you mean? Extra code?

>
> When it comes to your disk layout, the first things I'd look at would be:
>
> - Move the temporary statistics directory to a RAM disk
> - Move pg_xlog to it's own partition
So I need another vDisk, not that big, for pg_xlog?

> Those don't always help, but frequently they do. And when they do, it
> usually makes a big difference.
>
> Beyond that, there might be some advantage to putting blobs on their
> own tablespace. Hard to say without trying it.
I'm thinking about it, because while the most of the blobs are < 1MB,
there are some that reach 20, 50 and even 100 megabytes, and I'm quite
concerned in overall performance of the whole system (even if it's on
modern hardware, 100 megs to extract are not that fast...) when these
have to be sent to whom is requesting them...

So, my ideas are clearer now, but the first step is to decide if there's
need for only one server (my budget will be happier, because they seem
very good, but quite expensive, at GCP...) or it's best with two, using
pgBouncer, and where to put pgBouncer... :-)

Thanks
Moreno




pgsql-performance by date:

Previous
From: tuanhoanganh
Date:
Subject: Re: Why Postgres use a little memory on Windows.
Next
From: dstibrany
Date:
Subject: Filesystem and Disk Partitioning for New Server Setup