Thread: RFC: Detailed reorder buffer stats dumps
Hi all
I'm thinking of piggy-backing on the approach used in the "Get memory contexts of an arbitrary backend process" patch in order to provide access to detailed reorder buffer content statistics from walsenders on request.
Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or on dynamic probes (perf, systemtap) to see what it's doing. I intend a patch soon to add a couple of fields to struct WalSnd to report some very coarse reorder buffer stats - at least oldest buffered xid, number of buffered txns, total bytes of buffered txns in memory, total bytes of buffered txns spilled to disk.
But sometimes what I really want is details on the txns that're in the reorder buffer, and that's not feasible to export via always-enabled reporting like struct WalSnd. So I'm thinking that the same approach used for the memory context stats patch might work well for asking the walsender for a detailed dump of reorder buffer contents. Something like a per-buffered-txn table of txn topxid, start-lsn, most recent change lsn, number of changes, number of subxids, number of invalidations, number of catalog changes, buffer size in memory, buffer size spilled to disk.
Anyone drastically opposed to the idea?
(I know I have to finish up with the LWLock tracepoint patchset first, this is a RFC at this stage).
--
On Wed, May 5, 2021 at 4:03 PM Craig Ringer <craig@2ndquadrant.com> wrote: > > Hi all > > I'm thinking of piggy-backing on the approach used in the "Get memory contexts of an arbitrary backend process" patch inorder to provide access to detailed reorder buffer content statistics from walsenders on request. > > Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or on dynamic probes (perf, systemtap) to seewhat it's doing. I intend a patch soon to add a couple of fields to struct WalSnd to report some very coarse reorder bufferstats - at least oldest buffered xid, number of buffered txns, total bytes of buffered txns in memory, total bytesof buffered txns spilled to disk. > > But sometimes what I really want is details on the txns that're in the reorder buffer, and that's not feasible to exportvia always-enabled reporting like struct WalSnd. So I'm thinking that the same approach used for the memory contextstats patch might work well for asking the walsender for a detailed dump of reorder buffer contents. Something likea per-buffered-txn table of txn topxid, start-lsn, most recent change lsn, number of changes, number of subxids, numberof invalidations, number of catalog changes, buffer size in memory, buffer size spilled to disk. > +1. Will be really useful to troubleshoot what's going on in the ReorderBuffer. If we put that in WalSnd it will not be useful for the connections which are using normal backends to get logical changes through built-in functions. -- Best Wishes, Ashutosh Bapat
Hi, On 2021-05-05 18:33:27 +0800, Craig Ringer wrote: > I'm thinking of piggy-backing on the approach used in the "Get memory > contexts of an arbitrary backend process" patch in order to provide access > to detailed reorder buffer content statistics from walsenders on request. > > Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or > on dynamic probes (perf, systemtap) to see what it's doing. I intend a > patch soon to add a couple of fields to struct WalSnd to report some very > coarse reorder buffer stats - at least oldest buffered xid, number of > buffered txns, total bytes of buffered txns in memory, total bytes of > buffered txns spilled to disk. > > But sometimes what I really want is details on the txns that're in the > reorder buffer, and that's not feasible to export via always-enabled > reporting like struct WalSnd. So I'm thinking that the same approach used > for the memory context stats patch might work well for asking the walsender > for a detailed dump of reorder buffer contents. Something like a > per-buffered-txn table of txn topxid, start-lsn, most recent change lsn, > number of changes, number of subxids, number of invalidations, number of > catalog changes, buffer size in memory, buffer size spilled to disk. > > Anyone drastically opposed to the idea? I am doubtful. The likelihood of ending with effectively unused code seems very substantial here. Greetings, Andres Freund
On Thu, 6 May 2021 at 02:28, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-05-05 18:33:27 +0800, Craig Ringer wrote:
> I'm thinking of piggy-backing on the approach used in the "Get memory
> contexts of an arbitrary backend process" patch in order to provide access
> to detailed reorder buffer content statistics from walsenders on request.
>
> Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or
> on dynamic probes (perf, systemtap) to see what it's doing. I intend a
> patch soon to add a couple of fields to struct WalSnd to report some very
> coarse reorder buffer stats - at least oldest buffered xid, number of
> buffered txns, total bytes of buffered txns in memory, total bytes of
> buffered txns spilled to disk.
>
> But sometimes what I really want is details on the txns that're in the
> reorder buffer, and that's not feasible to export via always-enabled
> reporting like struct WalSnd. So I'm thinking that the same approach used
> for the memory context stats patch might work well for asking the walsender
> for a detailed dump of reorder buffer contents. Something like a
> per-buffered-txn table of txn topxid, start-lsn, most recent change lsn,
> number of changes, number of subxids, number of invalidations, number of
> catalog changes, buffer size in memory, buffer size spilled to disk.
>
> Anyone drastically opposed to the idea?
I am doubtful. The likelihood of ending with effectively unused code
seems very substantial here.
I can't rule that out, but the motivation for this proposal isn't development convenience. It's production support and operations. The reorder buffer is a black box right now, and when you're trying to answer the questions "what is the walsender doing," "is meaningful progress being made," and "what is slowing down replication" it's ... not easy.
I currently rely on some fairly hairy gdb scripts, which I'm not keen on running on production systems if I can avoid it.
I'm far from set on the approach of asking the walsender to dump a reorder buffer state summary to a file. But I don't think the current state of affairs is much fun for production use. Especially since we prevent the pg_stat_replication sent_lsn from going backwards, so reorder buffering can cause replication to appear to completely cease to progress for long periods unless you identify the socket and monitor traffic on it, or you intrude on the process with gdb.
At the least it'd be helpful to have pg_stat_replication (or a new related auxiliary view like pg_stat_logical_decoding) show
- walsender total bytes sent this session
- number of txns processed this txn
- number txns filtered out by output plugin this session
- oldest xid in reorder buffer
- reorder buffer number of txns
- reorder buffer total size (in-memory and total inc spilled)
- reorderbuffercommit current xid, last change lsn, total buffered size of current tx, total bytes of buffer processed so far within the current txn, and commit lsn if known, only when currently streaming a txn from reorderbuffercommit
That way it'd be possible to tell if a logical walsender is currently processing a commit and get a much better sense of its progress within the commit.
Perhaps output plugins could do some of this and expose their own custom views. But then each plugin would have to add its own. Plus they don't get a particularly good view into the reorder buffer state; they'd have a hard time maintaining good running stats.
Some basic monitoring exposed for logical decoding and reorder buffering would help a lot. Does that sound more palatable?
If so, I'd probably still want to be able to hook a few places in logical decoding to allow an extension to instrument it when greater insight into the inner workings is required.
On Thu, May 6, 2021 at 9:54 AM Craig Ringer <craig@2ndquadrant.com> wrote: > > On Thu, 6 May 2021 at 02:28, Andres Freund <andres@anarazel.de> wrote: >> >> Hi, >> >> On 2021-05-05 18:33:27 +0800, Craig Ringer wrote: >> > I'm thinking of piggy-backing on the approach used in the "Get memory >> > contexts of an arbitrary backend process" patch in order to provide access >> > to detailed reorder buffer content statistics from walsenders on request. >> > >> > Right now the reorder buffer is mostly a black-box. I mostly rely on gdb or >> > on dynamic probes (perf, systemtap) to see what it's doing. I intend a >> > patch soon to add a couple of fields to struct WalSnd to report some very >> > coarse reorder buffer stats - at least oldest buffered xid, number of >> > buffered txns, total bytes of buffered txns in memory, total bytes of >> > buffered txns spilled to disk. >> > >> > But sometimes what I really want is details on the txns that're in the >> > reorder buffer, and that's not feasible to export via always-enabled >> > reporting like struct WalSnd. So I'm thinking that the same approach used >> > for the memory context stats patch might work well for asking the walsender >> > for a detailed dump of reorder buffer contents. Something like a >> > per-buffered-txn table of txn topxid, start-lsn, most recent change lsn, >> > number of changes, number of subxids, number of invalidations, number of >> > catalog changes, buffer size in memory, buffer size spilled to disk. >> > >> > Anyone drastically opposed to the idea? >> >> I am doubtful. The likelihood of ending with effectively unused code >> seems very substantial here. > > > I can't rule that out, but the motivation for this proposal isn't development convenience. It's production support andoperations. The reorder buffer is a black box right now, and when you're trying to answer the questions "what is the walsenderdoing," "is meaningful progress being made," and "what is slowing down replication" it's ... not easy. > > I currently rely on some fairly hairy gdb scripts, which I'm not keen on running on production systems if I can avoid it. > > I'm far from set on the approach of asking the walsender to dump a reorder buffer state summary to a file. But I don'tthink the current state of affairs is much fun for production use. Especially since we prevent the pg_stat_replicationsent_lsn from going backwards, so reorder buffering can cause replication to appear to completely ceaseto progress for long periods unless you identify the socket and monitor traffic on it, or you intrude on the processwith gdb. > > At the least it'd be helpful to have pg_stat_replication (or a new related auxiliary view like pg_stat_logical_decoding)show > > - walsender total bytes sent this session > - number of txns processed this txn > You might be able to derive some of the above sorts of stats from the newly added pg_stat_replication_slots [1]. > - number txns filtered out by output plugin this session > - oldest xid in reorder buffer > - reorder buffer number of txns > - reorder buffer total size (in-memory and total inc spilled) > - reorderbuffercommit current xid, last change lsn, total buffered size of current tx, total bytes of buffer processedso far within the current txn, and commit lsn if known, only when currently streaming a txn from reorderbuffercommit > > That way it'd be possible to tell if a logical walsender is currently processing a commit and get a much better sense ofits progress within the commit. > > Perhaps output plugins could do some of this and expose their own custom views. But then each plugin would have to addits own. Plus they don't get a particularly good view into the reorder buffer state; they'd have a hard time maintaininggood running stats. > > Some basic monitoring exposed for logical decoding and reorder buffering would help a lot. Does that sound more palatable? > Can't we think of enhancing existing views or introduce a new view to provide such information? [1] - https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-SLOTS-VIEW -- With Regards, Amit Kapila.
On Thu, 6 May 2021 at 19:40, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, May 6, 2021 at 9:54 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
> At the least it'd be helpful to have pg_stat_replication (or a new related auxiliary view like pg_stat_logical_decoding) show
>
> - walsender total bytes sent this session
> - number of txns processed this txn
>
You might be able to derive some of the above sorts of stats from the
newly added pg_stat_replication_slots [1].
That's a huge improvement that I managed to totally miss the discussion of and work on. Thanks. It'll be a significant help.
'
> - number txns filtered out by output plugin this session
> - oldest xid in reorder buffer
> - reorder buffer number of txns
> - reorder buffer total size (in-memory and total inc spilled)
> - reorderbuffercommit current xid, last change lsn, total buffered size of current tx, total bytes of buffer processed so far within the current txn, and commit lsn if known, only when currently streaming a txn from reorderbuffercommit
These are less statistical in nature, and more about the current activity of the walsender and logical decoding state. I'm not sure if it'd make much sense to tack them on to pg_stat_replication_slots, especially as that'd also mean they were quite delayed.
But it probably isn't worth the effort of exposing this info in a new view.
With the higher level info now available in pg_stat_replication_slots, I think I might look into exposing these finer details via trace markers for use with perf / systemtap / etc instead.
A digression, but: It's a real shame that such tools don't give us a way to read specific tagged regions of memory with the same ease they let us probe function calls though. You generally need gdb to read the value of a global, or a moderately funky systemtap script. There's no convenient equivalent to SDT markers (TRACE_FOO) to tag variables. Wouldn't it be nice if we could
perf watch postgres:walsender_reorderbuffer_oldest_xid
or something like that?