Re: Add memory_limit_hits to pg_stat_replication_slots - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Add memory_limit_hits to pg_stat_replication_slots |
Date | |
Msg-id | aONnKBgNIUJdINSP@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Add memory_limit_hits to pg_stat_replication_slots (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Add memory_limit_hits to pg_stat_replication_slots
|
List | pgsql-hackers |
Hi, On Mon, Oct 06, 2025 at 10:50:52AM +0530, Ashutosh Bapat wrote: > On Fri, Oct 3, 2025 at 11:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Oct 3, 2025 at 9:26 AM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > On Fri, Oct 3, 2025 at 6:45 PM Bertrand Drouvot > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Oct 03, 2025 at 05:19:42PM +0530, Ashutosh Bapat wrote: > > > > > + bool memory_limit_reached = (rb->size >= logical_decoding_work_mem * > > > > > (Size) 1024); > > > > > + > > > > > + if (memory_limit_reached) > > > > > + rb->memExceededCount += 1; > > > > > > > > Thanks for looking at it! > > > > > > > > > If the memory limit is hit but no transaction was serialized, the > > > > > stats won't be updated since UpdateDecodingStats() won't be called. We > > > > > need to call UpdateDecodingStats() in ReorderBufferCheckMemoryLimit() > > > > > if no transaction was streamed or spilled. > > > > > > > > I did some testing and the stats are reported because UpdateDecodingStats() is > > > > also called in DecodeCommit(), DecodeAbort() and DecodePrepare() (in addition > > > > to ReorderBufferSerializeTXN() and ReorderBufferStreamTXN()). That's also why > > > > ,for example, total_txns is reported even if no transaction was streamed or > > > > spilled. > > > > > > In a very pathological case, where all transactions happen to be > > > aborted while decoding and yet memory limit is hit many times, nothing > > > will be reported till first committed transaction after it is decoded. > > > Which may never happen. I didn't find a call stack where by > > > UpdateDecodingStats could be reached from > > > ReorderBufferCheckAndTruncateAbortedTXN(). > > > > The more we report the status frequently, the less chances we lose the > > statistics in case of logical decoding being interrupted but the more > > overheads we have to update the statistics. I personally prefer not to > > call UpdateDecodingStats() frequently since pgstat_report_replslot() > > always flush the statistics. If the transaction is serialized or > > streamed, we can update the memExceededCount together with other > > statistics such as streamBytes and spillBytes. But if we can free > > enough memory only by truncating already-aborted transactions, we need > > to rely on the next committed/aborted/prepared transaction to update > > the statistics. Indeed, there is cases when committed/aborted/prepared would not be called right after ReorderBufferCheckAndTruncateAbortedTXN(). > So how about calling UpdateDecodingStats() only in > > case where we only truncate aborted transactions and the memory usage > > gets lower than the limit? > > Yes, that's what my intention was. I also think it makes sense. > > I've attached the patch that implements this idea with a small > > refactoring. Thanks! > It also has the change to the regression test results > > we've discussed. -SELECT slot_name, spill_txns, spill_count, mem_exceeded_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase'; +SELECT slot_name, spill_txns, spill_count, mem_exceeded_count > 0 as mem_exceeded_count FROM pg_stat_replication_slots WHEREslot_name = 'regression_slot_stats4_twophase'; slot_name | spill_txns | spill_count | mem_exceeded_count ---------------------------------+------------+-------------+-------------------- - regression_slot_stats4_twophase | 0 | 0 | 1 + regression_slot_stats4_twophase | 0 | 0 | t Could we also imagine that there are other activities enough to reach the memory limit and transactions are not aborted, meaning spill_txns and/or spill_count are > 0? In that case we may want to get rid of this test instead (as checking spill_txns >=0 and spill_count >=0 would not really reflect the intend of this test). > The change looks good to me. > > Given Andres's comment, in a nearby thread, about being cautious about > adding useless statistics, I think this one needs a bit more > discussion. In the proposal email Bertant wrote > > Please find attached a patch to $SUBJECT, to report the number of times the > > logical_decoding_work_mem has been reached. > > > > With such a counter one could get a ratio like total_txns/memory_limit_hits. > > > > That could help to see if reaching logical_decoding_work_mem is rare or > > frequent enough. If frequent, then maybe there is a need to adjust > > logical_decoding_work_mem. > > > > I agree with the goal that we need a metric to decide whether > exceeding logical decoding work mem is frequent or not. > > > Based on my simple example above, one could say that it might be possible to get > > the same with: > > > > (spill_count - spill_txns) + (stream_count - stream_txns) > > > > but that doesn't appear to be the case with a more complicated example (277 vs 247): > > > > slot_name | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits | (spc-spct)+(strc-strt) > > --------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------ > > logical_slot | 405 | 552 | 19 | 5 | 105 | 277 | 247 > > (1 row) > > Is there another case where this difference is going to lead to a wrong conclusion? Yeah, for example when the ratio aborted/committed is high, we could get things like: slot_name | spill_txns | spill_count | stream_txns | stream_count | total_txns | mem_exceeded_count | (spc-spct)+(strc-strt) --------------+------------+-------------+-------------+--------------+------------+--------------------+------------------------ logical_slot | 1 | 2 | 0 | 0 | 192 | 244 | 1 Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: