Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date | |
Msg-id | CAAKRu_avP+b+KEbCe1K0mdKhbJ5gVx8YzV37p+ukCGkqFY1HJA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
List | pgsql-hackers |
On Tue, Aug 3, 2021 at 2:13 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-08-02 18:25:56 -0400, Melanie Plageman wrote: > > Thanks for the feedback! > > > > I agree it makes sense to count strategy writes separately. > > > > I thought about this some more, and I don't know if it makes sense to > > only count "avoidable" strategy writes. > > > > This would mean that a backend writing out a buffer from the strategy > > ring when no clean shared buffers (as well as no clean strategy buffers) > > are available would not count that write as a strategy write (even > > though it is writing out a buffer from its strategy ring). But, it > > obviously doesn't make sense to count it as a regular buffer being > > written out. So, I plan to change this code. > > What do you mean with "no clean shared buffers ... are available"? > I think I was talking about the scenario in which a backend using a strategy does not find a clean buffer in the strategy ring and goes to look in the freelist for a clean shared buffer and doesn't find one. I was probably talking in circles up there. I think the current patch counts the right writes in the right way, though. > > > > The most substantial missing piece of the patch right now is persisting > > the data across reboots. > > > > The two places in the code I can see to persist the buffer action stats > > data are: > > 1) using the stats collector code (like in > > pgstat_read/write_statsfiles() > > 2) using a before_shmem_exit() hook which writes the data structure to a > > file and then read from it when making the shared memory array initially > > I think it's pretty clear that we should go for 1. Having two mechanisms for > persisting stats data is a bad idea. New version uses the stats collector. > > > > Also, I'm unsure how writing the buffer action stats out in > > pgstat_write_statsfiles() will work, since I think that backends can > > update their buffer action stats after we would have already persisted > > the data from the BufferActionStatsArray -- causing us to lose those > > updates. > > I was thinking it'd work differently. Whenever a connection ends, it reports > its data up to pgstats.c (otherwise we'd loose those stats). By the time > shutdown happens, they all need to have already have reported their stats - so > we don't need to do anything to get the data to pgstats.c during shutdown > time. > When you say "whenever a connection ends", what part of the code are you referring to specifically? Also, when you say "shutdown", do you mean a backend shutting down or all backends shutting down (including postmaster) -- like pg_ctl stop? > > > And, I don't think I can use pgstat_read_statsfiles() since the > > BufferActionStatsArray should have the data from the file as soon as the > > view containing the buffer action stats can be queried. Thus, it seems > > like I would need to read the file while initializing the array in > > CreateBufferActionStatsCounters(). > > Why would backends need to read that data back? > To get totals across restarts, but, doesn't matter now that I am using stats collector. > > > diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql > > index 55f6e3711d..96cac0a74e 100644 > > --- a/src/backend/catalog/system_views.sql > > +++ b/src/backend/catalog/system_views.sql > > @@ -1067,9 +1067,6 @@ CREATE VIEW pg_stat_bgwriter AS > > pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, > > pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, > > pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, > > - pg_stat_get_buf_written_backend() AS buffers_backend, > > - pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, > > - pg_stat_get_buf_alloc() AS buffers_alloc, > > pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; > > Material for a separate patch, not this. But if we're going to break > monitoring queries anyway, I think we should consider also renaming > maxwritten_clean (and perhaps a few others), because nobody understands what > that is supposed to mean. > > Do you mean I shouldn't remove anything from the pg_stat_bgwriter view? > > > @@ -1089,10 +1077,6 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) > > > > LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); > > > > - /* Count all backend writes regardless of if they fit in the queue */ > > - if (!AmBackgroundWriterProcess()) > > - CheckpointerShmem->num_backend_writes++; > > - > > /* > > * If the checkpointer isn't running or the request queue is full, the > > * backend will have to perform its own fsync request. But before forcing > > @@ -1106,8 +1090,10 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType type) > > * Count the subset of writes where backends have to do their own > > * fsync > > */ > > + /* TODO: should we count fsyncs for all types of procs? */ > > if (!AmBackgroundWriterProcess()) > > - CheckpointerShmem->num_backend_fsync++; > > + pgstat_increment_buffer_action(BA_Fsync); > > + > > Yes, I think that'd make sense. Now that we can disambiguate the different > types of syncs between procs, I don't see a point of having a process-type > filter here. We just loose data... > > Done > > > /* don't set checksum for all-zero page */ > > @@ -1229,11 +1234,60 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, > > if (XLogNeedsFlush(lsn) && > > StrategyRejectBuffer(strategy, buf)) > > { > > + /* > > + * Unset the strat write flag, as we will not be writing > > + * this particular buffer from our ring out and may end > > + * up having to find a buffer from main shared buffers, > > + * which, if it is dirty, we may have to write out, which > > + * could have been prevented by checkpointing and background > > + * writing > > + */ > > + StrategyUnChooseBufferFromRing(strategy); > > + > > /* Drop lock/pin and loop around for another buffer */ > > LWLockRelease(BufferDescriptorGetContentLock(buf)); > > UnpinBuffer(buf, true); > > continue; > > } > > Could we combine this with StrategyRejectBuffer()? It seems a bit wasteful to > have two function calls into freelist.c when the second happens exactly when > the first returns true? > > > > + > > + /* > > + * TODO: there is certainly a better way to write this > > + * logic > > + */ > > + > > + /* > > + * The dirty buffer that will be written out was selected > > + * from the ring and we did not bother checking the > > + * freelist or doing a clock sweep to look for a clean > > + * buffer to use, thus, this write will be counted as a > > + * strategy write -- one that may be unnecessary without a > > + * strategy > > + */ > > + if (StrategyIsBufferFromRing(strategy)) > > + { > > + pgstat_increment_buffer_action(BA_Write_Strat); > > + } > > + > > + /* > > + * If the dirty buffer was one we grabbed from the > > + * freelist or through a clock sweep, it could have been > > + * written out by bgwriter or checkpointer, thus, we will > > + * count it as a regular write > > + */ > > + else > > + pgstat_increment_buffer_action(BA_Write); > > It seems this would be better solved by having an "bool *from_ring" or > GetBufferSource* parameter to StrategyGetBuffer(). > I've addressed both of these in the new version. > > > @@ -2895,6 +2948,20 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln) > > /* > > * bufToWrite is either the shared buffer or a copy, as appropriate. > > */ > > + > > + /* > > + * TODO: consider that if we did not need to distinguish between a buffer > > + * flushed that was grabbed from the ring buffer and written out as part > > + * of a strategy which was not from main Shared Buffers (and thus > > + * preventable by bgwriter or checkpointer), then we could move all calls > > + * to pgstat_increment_buffer_action() here except for the one for > > + * extends, which would remain in ReadBuffer_common() before smgrextend() > > + * (unless we decide to start counting other extends). That includes the > > + * call to count buffers written by bgwriter and checkpointer which go > > + * through FlushBuffer() but not BufferAlloc(). That would make it > > + * simpler. Perhaps instead we can find somewhere else to indicate that > > + * the buffer is from the ring of buffers to reuse. > > + */ > > smgrwrite(reln, > > buf->tag.forkNum, > > buf->tag.blockNum, > > Can we just add a parameter to FlushBuffer indicating what the source of the > write is? > I just noticed this comment now, so I'll address that in the next version. I rebased today and noticed merge conflicts, so, it looks like v5 will be on its way soon anyway. > > > @@ -247,7 +257,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state) > > * the rate of buffer consumption. Note that buffers recycled by a > > * strategy object are intentionally not counted here. > > */ > > - pg_atomic_fetch_add_u32(&StrategyControl->numBufferAllocs, 1); > > + pgstat_increment_buffer_action(BA_Alloc); > > > > /* > > * First check, without acquiring the lock, whether there's buffers in the > > > @@ -411,11 +421,6 @@ StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc) > > */ > > *complete_passes += nextVictimBuffer / NBuffers; > > } > > - > > - if (num_buf_alloc) > > - { > > - *num_buf_alloc = pg_atomic_exchange_u32(&StrategyControl->numBufferAllocs, 0); > > - } > > SpinLockRelease(&StrategyControl->buffer_strategy_lock); > > return result; > > } > > Hm. Isn't bgwriter using the *num_buf_alloc value to pace its activity? I > suspect this patch shouldn't get rid of numBufferAllocs at the same time as > overhauling the stats stuff. Perhaps we don't need both - but it's not obvious > that that's the case / how we can make that work. > > I initially meant to add a function to the patch like pg_stat_get_buffer_actions() but which took a BufferActionType and BackendType as parameters and returned a single value which is the number of buffer action types of that type for that type of backend. let's say I defined it like this: uint64 pg_stat_get_backend_buffer_actions_stats(BackendType backend_type, BufferActionType ba_type) Then, I intended to use that in StrategySyncStart() to set num_buf_alloc by subtracting the value of StrategyControl->numBufferAllocs from the value returned by pg_stat_get_backend_buffer_actions_stats(B_BG_WRITER, BA_Alloc), val, then adding that value, val, to StrategyControl->numBufferAllocs. I think that would have the same behavior as current, though I'm not sure if the performance would end up being better or worse. It wouldn't be atomically incrementing StrategyControl->numBufferAllocs, but it would do a few additional atomic operations in StrategySyncStart() than before. Also, we would do all the work done by pg_stat_get_buffer_actions() in StrategySyncStart(). But that is called comparatively infrequently, right? > > > > +void > > +pgstat_increment_buffer_action(BufferActionType ba_type) > > +{ > > + volatile PgBackendStatus *beentry = MyBEEntry; > > + > > + if (!beentry || !pgstat_track_activities) > > + return; > > + > > + if (ba_type == BA_Alloc) > > + pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.allocs, 1); > > + else if (ba_type == BA_Extend) > > + pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.extends, 1); > > + else if (ba_type == BA_Fsync) > > + pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.fsyncs, 1); > > + else if (ba_type == BA_Write) > > + pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.writes, 1); > > + else if (ba_type == BA_Write_Strat) > > + pg_atomic_add_fetch_u64(&beentry->buffer_action_stats.writes_strat, 1); > > +} > > I don't think we want to use atomic increments here - they're *slow*. And > there only ever can be a single writer to a backend's stats. So just doing > something like > pg_atomic_write_u64(&var, pg_atomic_read_u64(&var) + 1) > should do the trick. > Done > > > +/* > > + * Called for a single backend at the time of death to persist its I/O stats > > + */ > > +void > > +pgstat_record_dead_backend_buffer_actions(void) > > +{ > > + volatile PgBackendBufferActionStats *ba_stats; > > + volatile PgBackendStatus *beentry = MyBEEntry; > > + > > + if (beentry->st_procpid != 0) > > + return; > > + > > + // TODO: is this correct? could there be a data race? do I need a lock? > > + ba_stats = &BufferActionStatsArray[beentry->st_backendType]; > > + pg_atomic_add_fetch_u64(&ba_stats->allocs, pg_atomic_read_u64(&beentry->buffer_action_stats.allocs)); > > + pg_atomic_add_fetch_u64(&ba_stats->extends, pg_atomic_read_u64(&beentry->buffer_action_stats.extends)); > > + pg_atomic_add_fetch_u64(&ba_stats->fsyncs, pg_atomic_read_u64(&beentry->buffer_action_stats.fsyncs)); > > + pg_atomic_add_fetch_u64(&ba_stats->writes, pg_atomic_read_u64(&beentry->buffer_action_stats.writes)); > > + pg_atomic_add_fetch_u64(&ba_stats->writes_strat, pg_atomic_read_u64(&beentry->buffer_action_stats.writes_strat)); > > +} > > I don't see a race, FWIW. > > This is where I propose that we instead report the values up to the stats > collector, instead of having a separate array that we need to persist > Changed > > > +/* > > + * Fill the provided values array with the accumulated counts of buffer actions > > + * taken by all backends of type backend_type (input parameter), both alive and > > + * dead. This is currently only used by pg_stat_get_buffer_actions() to create > > + * the rows in the pg_stat_buffer_actions system view. > > + */ > > +void > > +pgstat_recount_all_buffer_actions(BackendType backend_type, Datum *values) > > +{ > > + int i; > > + volatile PgBackendStatus *beentry; > > + > > + /* > > + * Add stats from all exited backends > > + */ > > + values[BA_Alloc] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].allocs); > > + values[BA_Extend] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].extends); > > + values[BA_Fsync] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].fsyncs); > > + values[BA_Write] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].writes); > > + values[BA_Write_Strat] = pg_atomic_read_u64(&BufferActionStatsArray[backend_type].writes_strat); > > + > > + /* > > + * Loop through all live backends and count their buffer actions > > + */ > > + // TODO: see note in pg_stat_get_buffer_actions() about inefficiency of this method > > + > > + beentry = BackendStatusArray; > > + for (i = 1; i <= MaxBackends; i++) > > + { > > + /* Don't count dead backends. They should already be counted */ > > + if (beentry->st_procpid == 0) > > + continue; > > + if (beentry->st_backendType != backend_type) > > + continue; > > + > > + values[BA_Alloc] += pg_atomic_read_u64(&beentry->buffer_action_stats.allocs); > > + values[BA_Extend] += pg_atomic_read_u64(&beentry->buffer_action_stats.extends); > > + values[BA_Fsync] += pg_atomic_read_u64(&beentry->buffer_action_stats.fsyncs); > > + values[BA_Write] += pg_atomic_read_u64(&beentry->buffer_action_stats.writes); > > + values[BA_Write_Strat] += pg_atomic_read_u64(&beentry->buffer_action_stats.writes_strat); > > + > > + beentry++; > > + } > > +} > > It seems to make a bit more sense to have this sum up the stats for all > backend types at once. Changed. > > > + /* > > + * Currently, the only supported backend types for stats are the following. > > + * If this were to change, pg_proc.dat would need to be changed as well > > + * to reflect the new expected number of rows. > > + */ > > + Datum values[BUFFER_ACTION_NUM_TYPES]; > > + bool nulls[BUFFER_ACTION_NUM_TYPES]; > > Ah ;) > I just went ahead and made a row for each backend type. - Melanie
Attachment
pgsql-hackers by date: