Thread: Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote: > 1- It is time consuming to evict all shared buffers one by one using > the pg_buffercache_evict() function. This is really useful for benchmarking. When testing IO heavy workloads it turns out to be a lot lower noise to measure with shared buffers already "touched" before taking actual measurements. The first time a buffer is used the kernel needs to actually initialize the memory for the buffer, which can take a substantial amount of time. Which obviously can hide many performance differences. And it adds a lot of noise. > 2- It would be good to have a function to mark buffers as dirty to > test different scenarios. This is important to be able to measure checkpointer performance. Otherwise one has to load data from scratch. That's bad for three reasons: 1) After re-loading the data from scratch the data can be laid-out differently on-disk, which can have substantial performance impacts. By re-dirtying data one instead can measure the performance effects of a change with the same on-disk layaout. 2) It takes a lot of time to reload data from scratch. 3) The first write to data has substantially different performance characteristics than later writes, but often isn't the most relevant factor for real-world performance test. > So, this patchset extends pg_buffercache with 3 functions: > > 1- pg_buffercache_evict_all(): This is very similar to the already > existing pg_buffercache_evict() function. The difference is > pg_buffercache_evict_all() does not take an argument. Instead it just > loops over the shared buffers and tries to evict all of them. It > returns the number of buffers evicted and flushed. I do think that'd be rather useful. Perhaps it's also worth having a version that just evicts a specific relation? > 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an > argument and tries to mark this buffer as dirty. Returns true on > success. This returns false if the buffer is already dirty. Do you > think this makes sense or do you prefer it to return true if the > buffer is already dirty? I don't really have feelings about that ;) > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > Date: Fri, 20 Dec 2024 14:06:47 +0300 > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing > > This new function provides a mechanism to evict all shared buffers at > once. It is designed to address the inefficiency of repeatedly calling > pg_buffercache_evict() for each individual buffer, which can be > time-consuming when dealing with large shared buffer pools > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > */ > bool > -EvictUnpinnedBuffer(Buffer buf) > +EvictUnpinnedBuffer(Buffer buf, bool *flushed) > { > BufferDesc *desc; > uint32 buf_state; > bool result; > > + *flushed = false; > + > /* Make sure we can pin the buffer. */ > ResourceOwnerEnlarge(CurrentResourceOwner); > ReservePrivateRefCountEntry(); > @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf) > LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_SHARED); > FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > LWLockRelease(BufferDescriptorGetContentLock(desc)); > + *flushed = true; > } > > /* This will return false if it becomes dirty or someone else pins > it. */ I don't think *flushed is necessarily accurate this way, as it might detect that it doesn't need to flush the data (via StartBufferIO() returning false). > + */ > +Datum > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > +{ > + Datum result; > + TupleDesc tupledesc; > + HeapTuple tuple; > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > + > + int32 buffers_evicted = 0; > + int32 buffers_flushed = 0; > + bool flushed; > + > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > + > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to use pg_buffercache_evict_all function"))); > + > + for (int buf = 1; buf < NBuffers; buf++) > + { > + if (EvictUnpinnedBuffer(buf, &flushed)) > + buffers_evicted++; > + if (flushed) > + buffers_flushed++; I'd probably add a pre-check for the buffer not being in use. We don't need an external function call, with an unconditional LockBufHdr() inside, for that case. > +/* > + * MarkUnpinnedBufferDirty > + * > + * This function is intended for testing/development use only! > + * > + * To succeed, the buffer must not be pinned on entry, so if the caller had a > + * particular block in mind, it might already have been replaced by some other > + * block by the time this function runs. It's also unpinned on return, so the > + * buffer might be occupied and flushed by the time control is returned. This > + * inherent raciness without other interlocking makes the function unsuitable > + * for non-testing usage. > + * > + * Returns true if the buffer was not dirty and it has now been marked as > + * dirty. Returns false if it wasn't valid, if it couldn't be marked as dirty > + * due to a pin, or if the buffer was already dirty. > + */ Hm. One potentially problematic thing with this is that it can pin and dirty a buffer even while its relation is locked exclusively. Which i think some code doesn't expect. Not sure if there's a way around that :( Greetings, Andres Freund