Re: PostGreSQL Replication and question on maintenance - Mailing list pgsql-general

From Soto Cuevas Manuel Alejandro
Subject Re: PostGreSQL Replication and question on maintenance
Date
Msg-id 354C83E1-421C-4C49-9505-01E821471C79@entel.cl
Whole thread Raw
In response to Re: PostGreSQL Replication and question on maintenance  (github kran <githubkran@gmail.com>)
Responses Re: PostGreSQL Replication and question on maintenance
Re: PostGreSQL Replication and question on maintenance
List pgsql-general
Taking advantage of the instance, what software do you recommend to do reverse engineering of postgresql?

Thank you 

Manuel

El 16-11-2019, a la(s) 10:36, github kran <githubkran@gmail.com> escribió:



ANEXO ENTEL Este es un correo externo: Verifique remitente, No haga clic en vínculos. Si recibes un correo malicioso avísanos a sioc@entel.cl . 

Any reply on this please ?.

On Fri, Nov 15, 2019 at 9:10 AM github kran <githubkran@gmail.com> wrote:


On Thu, Nov 14, 2019 at 11:42 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
  these numbers looks crazy high - how much memory has your server - more than 1TB? 

The cluster got 244 GB of RAM and storage capacity it has is 64 TB. 
 

pá 15. 11. 2019 v 6:26 odesílatel github kran <githubkran@gmail.com> napsal:

Hello postGreSQL Community ,

 

Hope everyone is doing great !!.


Background

We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our day to day activities to write and read data. We have 2 clusters running PostgreSQL engine , one cluster

keeps data up to 60 days and another cluster retains data beyond 1 year. The data is partitioned close to a week( ~evry 5 days a partition) and we have around 5 partitions per month per each table and we have 2 tables primarily so that will be 10 tables a week. So in the cluster-1 we have around  20 partitions and in cluster-2 we have around 160 partitions ( data from 2018). We also want to keep the data for up to 2 years in the cluster-2 to serve the data needs of the customer and so far we reached upto 1 year of maintaining this data.

 

Current activity

We have a custom weekly migration DB script job that moves data from 1 cluster to another cluster what it does is the below things.

1) COPY command to copy the data from cluster-1 and split that data into binary files

2) Writing the binary data into the cluster-2 table

3) Creating indexes after the data is copied.

 

Problem what we have right now.

When the migration activity runs(weekly) from past 2 times , we saw the cluster read replica instance has restarted as it fallen behind the master(writer instance). Everything

after that worked seamlessly but we want to avoid the replica getting restarted. To avoid from restart we started doing smaller binary files and copy those files to the cluster-2

instead of writing 1 big file of 450 million records. We were successful in the recent migration as the reader instance didn’t restart after we split 1 big file into multiple files to copy the data over but did restart after the indexes are created on the new table as it could be write intensive.

 

DB parameters set on migration job

work_mem set to 8 GB  and maintenace_work_mem=32 GB.




Indexes per table = 3

total indexes for 2 tables = 5

 

DB size

Cluster-2 = 8.6 TB

Cluster-1 = 3.6 TB

Peak Table relational rows = 400 - 480 million rows

Average table relational rows = 300 - 350 million rows.

Per table size = 90 -95 GB , per table index size is about 45 GB

 

Questions

1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down the writes to the cluster , with that the reader instance can sync the data slowly ?.

2) Based on the above use case what are your recommendations to keep the data longer up to 2 years ?

3) What other recommendations you recommend ?.

 

 

Appreciate your replies.


THanks
githubkran



Este correo es dirigido solo a las personas que aparecen como destinatarios y
puede contener información confidencial o privilegiada. Si usted recibió este correo
por error, por favor notificar inmediatamente al emisor y elimine el correo original.
Cualquier otro uso de este correo esta prohibido.

*********************************************************************************************

This message is for the designated recipient only and may contain privileged or
confidential information. If you have received it in error, please notify the sender
immediately and delete the original. Any other use of the email is prohibited.

pgsql-general by date:

Previous
From: Andrei Zhidenkov
Date:
Subject: Re: Partitioning large table (140GB)
Next
From: Pavel Stehule
Date:
Subject: Re: PostGreSQL Replication and question on maintenance