Home > mailing lists

[GENERAL] Streaming replication bandwith per table - Mailing list pgsql-general

From	Maeldron T.
Subject	[GENERAL] Streaming replication bandwith per table
Date	June 20, 2017 03:50:33
Msg-id	CAKatfSkgjn2-ZWxHNQJ6-6QC0aewxrMU5qYspfX8NThaV-oRGw@mail.gmail.com Whole thread
Responses	Re: [GENERAL] Streaming replication bandwith per table
List	pgsql-general

Tree view

Hello,

tl;dr

Streaming replication generates too much traffic to set it up between different regions for financial reasons. The streaming replication would cost more than every other hosting expense altogether (including every the traffic, even though it’s web and huge amount of emails).

Is there a way to see in the log how much bandwidth is used per table?

Details:

My only idea is to improve the design.

I believe the source of the issue is tables involved in many to many relations with frequent updates.

The structure varies. Sometimes it’s:

table_a_id_1 | table_a_id_2

other times:

table_b_id_1 | table_b_id_2 | float value

and:

integer value | table_c_id

It’s simple, but the performance is a key factor.

Depending on the relation (whether it had an additional value or not), the expected usage, and my mood, I implemented them using either classical pairs or arrays with gin indices:

table_a_id_1 | [table_a_id_2, table_a_id_7, table_a_id_9,...]

and:

integer value | [table_a_id_1, table_a_id_5, table_a_id_6, ...]

There are millions of records in the tables with arrays. The "normal" pairs have tens of million and above. One table could have about 40 billion values in theory (it doesn’t but it’s growing).

I can guess which tables are problematic and what to change, but:

* It’s difficult to simulate real-life usage

* The usage patterns are different from table to table

* If I’m wrong, I waste time and resources (and downtime) to make it even worse

I know the updates on the arrays cost much more (it’s also a performance issue) but the table takes magnitudes less space this way. I even considered jsonb when there are also float values for each pair.

What to change in the design depends on the real-life use. How can I measure the bandwidth usage per table in streaming replication? I don’t see a way to emulate it with realistic results.

PS: except the updates on the arrays, the performance itself is satisfying for each table. It’s only the bandwidth usage that would hurt

pgsql-general by date:

From: Peter Geoghegan
Date: 20 June 2017, 03:20:03
Subject: Re: [GENERAL] effective_io_concurrency increasing

From: Peter Eisentraut
Date: 20 June 2017, 04:06:05
Subject: Re: [GENERAL] Streaming replication bandwith per table

[GENERAL] Streaming replication bandwith per table - Mailing list pgsql-general

Previous

Next