Report reorder buffer size - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Report reorder buffer size |
Date | |
Msg-id | CAExHW5sX4a4GEMo99wcPALHPG9XM+-3ovAyPJ=ZnZAA=ux_EVw@mail.gmail.com Whole thread Raw |
Responses |
Re: Report reorder buffer size
|
List | pgsql-hackers |
Hi All, The question of how to tune logical_decoding_work_mem has come up a few times in the context of customer cases. If it is too low then the changes are spilled to disk, slowing down replication and consuming disk space. In case of streamed transactions, it won't consume space on primary but will consume resources (transactional resources/disk space) on replica. If decoding memory is too high, it may cause memory pressure especially when there are many wal senders and large amounts of changes queued to reorder buffers frequently. In order to tune logical_decoding_work_mem optimally we need a measure of the size of changes belonging to the transactions queued in the reorder buffer, whether spilled to the disk or in memory or sent downstream as part of an unfinished streamed transaction. Currently there is no way to determine that. pg_stat_replication_slots reports cumulative statistics about spilled or streamed transactions. It helps to know whether the logical decoding work memory is sufficient or not to hold all the changes. But it does not provide a way to estimate an optimal value for it since it doesn't capture the state of the reorder buffer at a given point in time. With the attached WIP patch, we can report, at a given point in time, total size of reorder buffer i.e. sum of the size of all the changes belonging to the transactions active in the reorder buffer whether those changes are in memory, on disk or sent downstream as part of an unfinished streamed transaction. By sampling this value at regular intervals, one can observe the reorder buffer trendline as shown in the plots attached. In the plots, the Y-axis is the total size of reorder buffer and the X axis is time of day. rbsize_trendlines_pgbench.png shows trendline when just regular pgbench is run with 3K TPS and rbsize_trendlines_large_transaction_spikes.png shows the trendline when there are long and large transactions in addition to pgbench. Using these trendlines, a user may decide to keep logical_decoding_work_mem to minimum and thus let all the large transactions spill to disk or be streamed. Or they may decide to set it as 100MB, 200MB or 500MB reducing or eliminating the disk spill or streaming. It makes it easy to quantify the trade-offs in tuning logical_decoding_work_mem. Tracking total size of reorder buffer also helps to quantify load on a wal sender. It is an additional metric that can be useful to debug WAL buildup, slow replication etc. For example, the plot clearly and directly shows when the wal sender started processing the large transaction, when it finished etc. The patch is not complete. There are many things to be worked on 1. Whether the metrics should be reported in pg_stat_replication or pg_stat_replication_slots. Given that it's a point-in-time metric, it fits pg_stat_replication better. But that view does not report logical decoding activity happening in backends other than the wal senders. But that's true with the other metrics in that view as well. pg_stat_replication_slots, however, covers all users of logical decoding. For now I have used pg_stat_replication, but it can be changed based on the discussion here. 2. Is the size of the reorder buffer enough or we want to also track the size of changes on disk and the size of changes sent downstream as part of unfinished streamed transactions separately? Also the number of transactions being tracked by the reorder buffer? If we are going to report so much statistics about the contents of the reorder buffer, is it better to have a separate view pg_stat_reorder_buffer for the same? 3. The patch doesn't cover all the places where the reorder buffer's total size should be maintained. I need to find all such places and add required maths. But before I work on those, I would like to know others' opinions, usefulness and acceptance. FWIW, attached is a tap test to test the changes -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: