Thread: Architecture setup for multiple dwh databases: multi master, master slave, ...

Architecture setup for multiple dwh databases: multi master, master slave, ...

From

Bert

Date:

02 July 2015, 14:22:25

Dear all,

We are preparing for a complete overhaul of our PostgreSQL cluster.

We are running multiple dwh database on the system. During the day our curent setup is perfect, the read only queries are nicely spread over the 2 servers.

Our main issue with our current setup is during the night: A lot of data movement / transformation and vacuum systems are running on the datbases. We see that the storage systems on the master are much more a constraint than on the slave. (io wait is much higher on the master server). Also are backups are taking quite long, although we can still live with this. At the moment our biggest database is around 550GB data, and in total around 900Gb data. We have a buffer cache hit ratio on both servers of around 95%.

At the moment we are running on the following setup:

2 servers with the following setup:

* system: rhel6.6

* pg version: 9.2.13

* hardware:
            cpu: 2x Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz;
            ram: 128GB ram
            disk: 12x 7500RPM disk (raid 10);

network: 4Gb/s (bonded)

* disk: 500G xlog partition; 1.5TB data partition. both ext4

* streaming replication (master slave)

* haproxy to balance the ro-queries over the 2 servers

In our new setup we will have also new hardware:

cpu: 2x Intel(R) Xeon(R) CPU E5-2643 v3 6C @ 3.4GHz 20MB Cache 2133MHz 135W

ram: 128GB ram (8x 16GB TruDDR4 Memory (2Rx4, 1.2V) PC4-17000 CL15 2133MHz LP RDIMM)

disks: 11x 1.2TB 10K disks + 1x 400GB cache (is a raid controller feature)

raid controller: ServeRAID M5100 Series with 1GB flash cache; ssd caching enabler
network: 10GBs

Since we have a high buffer hit ratio we didn't add extra ram. But the rest of the setup is build for better performance. Due to the price difference between 10K disks and professional SSD disks we chose not to install SSD disks, except one for caching.

But we are still not sure on how to install those servers; except that we will run PostgreSQL 9.4.

Extra features we want to have is 'online upgrades'. Which is possible with the logicial replication feature.

But we are not sure the that logical replication is the best way to do replication for us, since we have commits of data of sometimes more than 1GB. My understanding of the various replication options in PGSQL is that streaming replication will be more performant for those cases?

Since are running multiple 'ETL's' at the same time; we were thinking about installing 2 PGSQL instances on the 2 servers, and run one instance as master, and one as slave on the both servers. But I'm not sure if that is a super idea, because the 2 instances might want to fight for the available memory?

Anyone who can shed a light on this?

This is the architecture we would like to keep for the next 5 years.

wkr,

Bert

Bert Desmet
0477/305361