Thread: Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote: > 1- It is time consuming to evict all shared buffers one by one using > the pg_buffercache_evict() function. This is really useful for benchmarking. When testing IO heavy workloads it turns out to be a lot lower noise to measure with shared buffers already "touched" before taking actual measurements. The first time a buffer is used the kernel needs to actually initialize the memory for the buffer, which can take a substantial amount of time. Which obviously can hide many performance differences. And it adds a lot of noise. > 2- It would be good to have a function to mark buffers as dirty to > test different scenarios. This is important to be able to measure checkpointer performance. Otherwise one has to load data from scratch. That's bad for three reasons: 1) After re-loading the data from scratch the data can be laid-out differently on-disk, which can have substantial performance impacts. By re-dirtying data one instead can measure the performance effects of a change with the same on-disk layaout. 2) It takes a lot of time to reload data from scratch. 3) The first write to data has substantially different performance characteristics than later writes, but often isn't the most relevant factor for real-world performance test. > So, this patchset extends pg_buffercache with 3 functions: > > 1- pg_buffercache_evict_all(): This is very similar to the already > existing pg_buffercache_evict() function. The difference is > pg_buffercache_evict_all() does not take an argument. Instead it just > loops over the shared buffers and tries to evict all of them. It > returns the number of buffers evicted and flushed. I do think that'd be rather useful. Perhaps it's also worth having a version that just evicts a specific relation? > 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an > argument and tries to mark this buffer as dirty. Returns true on > success. This returns false if the buffer is already dirty. Do you > think this makes sense or do you prefer it to return true if the > buffer is already dirty? I don't really have feelings about that ;) > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > Date: Fri, 20 Dec 2024 14:06:47 +0300 > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing > > This new function provides a mechanism to evict all shared buffers at > once. It is designed to address the inefficiency of repeatedly calling > pg_buffercache_evict() for each individual buffer, which can be > time-consuming when dealing with large shared buffer pools > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > */ > bool > -EvictUnpinnedBuffer(Buffer buf) > +EvictUnpinnedBuffer(Buffer buf, bool *flushed) > { > BufferDesc *desc; > uint32 buf_state; > bool result; > > + *flushed = false; > + > /* Make sure we can pin the buffer. */ > ResourceOwnerEnlarge(CurrentResourceOwner); > ReservePrivateRefCountEntry(); > @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf) > LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_SHARED); > FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > LWLockRelease(BufferDescriptorGetContentLock(desc)); > + *flushed = true; > } > > /* This will return false if it becomes dirty or someone else pins > it. */ I don't think *flushed is necessarily accurate this way, as it might detect that it doesn't need to flush the data (via StartBufferIO() returning false). > + */ > +Datum > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > +{ > + Datum result; > + TupleDesc tupledesc; > + HeapTuple tuple; > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > + > + int32 buffers_evicted = 0; > + int32 buffers_flushed = 0; > + bool flushed; > + > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > + > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to use pg_buffercache_evict_all function"))); > + > + for (int buf = 1; buf < NBuffers; buf++) > + { > + if (EvictUnpinnedBuffer(buf, &flushed)) > + buffers_evicted++; > + if (flushed) > + buffers_flushed++; I'd probably add a pre-check for the buffer not being in use. We don't need an external function call, with an unconditional LockBufHdr() inside, for that case. > +/* > + * MarkUnpinnedBufferDirty > + * > + * This function is intended for testing/development use only! > + * > + * To succeed, the buffer must not be pinned on entry, so if the caller had a > + * particular block in mind, it might already have been replaced by some other > + * block by the time this function runs. It's also unpinned on return, so the > + * buffer might be occupied and flushed by the time control is returned. This > + * inherent raciness without other interlocking makes the function unsuitable > + * for non-testing usage. > + * > + * Returns true if the buffer was not dirty and it has now been marked as > + * dirty. Returns false if it wasn't valid, if it couldn't be marked as dirty > + * due to a pin, or if the buffer was already dirty. > + */ Hm. One potentially problematic thing with this is that it can pin and dirty a buffer even while its relation is locked exclusively. Which i think some code doesn't expect. Not sure if there's a way around that :( Greetings, Andres Freund
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Mon, 17 Feb 2025 at 19:59, Andres Freund <andres@anarazel.de> wrote: > > Hi, Thanks for the review! And sorry for the late reply. > On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote: > > So, this patchset extends pg_buffercache with 3 functions: > > > > 1- pg_buffercache_evict_all(): This is very similar to the already > > existing pg_buffercache_evict() function. The difference is > > pg_buffercache_evict_all() does not take an argument. Instead it just > > loops over the shared buffers and tries to evict all of them. It > > returns the number of buffers evicted and flushed. > > I do think that'd be rather useful. Perhaps it's also worth having a version > that just evicts a specific relation? That makes sense. I will work on this. > > 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an > > argument and tries to mark this buffer as dirty. Returns true on > > success. This returns false if the buffer is already dirty. Do you > > think this makes sense or do you prefer it to return true if the > > buffer is already dirty? > > I don't really have feelings about that ;) I feel the same. I just wanted to have a symmetry between dirty and evict functions. If you think this would be useless, I can remove it. > > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Date: Fri, 20 Dec 2024 14:06:47 +0300 > > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing > > > > This new function provides a mechanism to evict all shared buffers at > > once. It is designed to address the inefficiency of repeatedly calling > > pg_buffercache_evict() for each individual buffer, which can be > > time-consuming when dealing with large shared buffer pools > > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > > > */ > > bool > > -EvictUnpinnedBuffer(Buffer buf) > > +EvictUnpinnedBuffer(Buffer buf, bool *flushed) > > { > > BufferDesc *desc; > > uint32 buf_state; > > bool result; > > > > + *flushed = false; > > + > > /* Make sure we can pin the buffer. */ > > ResourceOwnerEnlarge(CurrentResourceOwner); > > ReservePrivateRefCountEntry(); > > @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf) > > LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_SHARED); > > FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > > LWLockRelease(BufferDescriptorGetContentLock(desc)); > > + *flushed = true; > > } > > > > /* This will return false if it becomes dirty or someone else pins > > it. */ > > I don't think *flushed is necessarily accurate this way, as it might detect > that it doesn't need to flush the data (via StartBufferIO() returning false). You are right. It seems there is no way to get that information without editing the FlushBuffer(), right? And I think editing FlushBuffer() is not worth it. So, I can either remove it or explain in the docs that #buffers_flushed may not be completely accurate. > > + */ > > +Datum > > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > > +{ > > + Datum result; > > + TupleDesc tupledesc; > > + HeapTuple tuple; > > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > > + > > + int32 buffers_evicted = 0; > > + int32 buffers_flushed = 0; > > + bool flushed; > > + > > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > > + elog(ERROR, "return type must be a row type"); > > + > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to use pg_buffercache_evict_all function"))); > > + > > + for (int buf = 1; buf < NBuffers; buf++) > > + { > > + if (EvictUnpinnedBuffer(buf, &flushed)) > > + buffers_evicted++; > > + if (flushed) > > + buffers_flushed++; > > I'd probably add a pre-check for the buffer not being in use. We don't need an > external function call, with an unconditional LockBufHdr() inside, for that > case. I did not understand why we would need this. Does not EvictUnpinnedBuffer() already check that the buffer is not in use? > > +/* > > + * MarkUnpinnedBufferDirty > > + * > > + * This function is intended for testing/development use only! > > + * > > + * To succeed, the buffer must not be pinned on entry, so if the caller had a > > + * particular block in mind, it might already have been replaced by some other > > + * block by the time this function runs. It's also unpinned on return, so the > > + * buffer might be occupied and flushed by the time control is returned. This > > + * inherent raciness without other interlocking makes the function unsuitable > > + * for non-testing usage. > > + * > > + * Returns true if the buffer was not dirty and it has now been marked as > > + * dirty. Returns false if it wasn't valid, if it couldn't be marked as dirty > > + * due to a pin, or if the buffer was already dirty. > > + */ > > Hm. One potentially problematic thing with this is that it can pin and dirty a > buffer even while its relation is locked exclusively. Which i think some code > doesn't expect. Not sure if there's a way around that :( I see, I did not think about that. Since this function can be used only by superusers, that problem might not be a big issue. What do you think? -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, Here is the v2. Changes prior to v1 are: - pg_buffercache_evict_relation() function is introduced. Which takes a relation as an argument and evicts all shared buffers in the relation. - It is mentioned in the docs that the #buffers_flushed in the pg_buffercache_evict_relation() and pg_buffercache_evict_all() functions do not necessarily mean these buffers are flushed by these functions. Remaining questions from the v1: On Wed, 12 Mar 2025 at 19:15, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > On Mon, 17 Feb 2025 at 19:59, Andres Freund <andres@anarazel.de> wrote: > > On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote: > > > + */ > > > +Datum > > > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > > > +{ > > > + Datum result; > > > + TupleDesc tupledesc; > > > + HeapTuple tuple; > > > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > > > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > > > + > > > + int32 buffers_evicted = 0; > > > + int32 buffers_flushed = 0; > > > + bool flushed; > > > + > > > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > > > + elog(ERROR, "return type must be a row type"); > > > + > > > + if (!superuser()) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > > + errmsg("must be superuser to use pg_buffercache_evict_all function"))); > > > + > > > + for (int buf = 1; buf < NBuffers; buf++) > > > + { > > > + if (EvictUnpinnedBuffer(buf, &flushed)) > > > + buffers_evicted++; > > > + if (flushed) > > > + buffers_flushed++; > > > > I'd probably add a pre-check for the buffer not being in use. We don't need an > > external function call, with an unconditional LockBufHdr() inside, for that > > case. > > I did not understand why we would need this. Does not > EvictUnpinnedBuffer() already check that the buffer is not in use? > > > +/* > > > + * MarkUnpinnedBufferDirty > > > + * > > > + * This function is intended for testing/development use only! > > > + * > > > + * To succeed, the buffer must not be pinned on entry, so if the caller had a > > > + * particular block in mind, it might already have been replaced by some other > > > + * block by the time this function runs. It's also unpinned on return, so the > > > + * buffer might be occupied and flushed by the time control is returned. This > > > + * inherent raciness without other interlocking makes the function unsuitable > > > + * for non-testing usage. > > > + * > > > + * Returns true if the buffer was not dirty and it has now been marked as > > > + * dirty. Returns false if it wasn't valid, if it couldn't be marked as dirty > > > + * due to a pin, or if the buffer was already dirty. > > > + */ > > > > Hm. One potentially problematic thing with this is that it can pin and dirty a > > buffer even while its relation is locked exclusively. Which i think some code > > doesn't expect. Not sure if there's a way around that :( > > I see, I did not think about that. Since this function can be used > only by superusers, that problem might not be a big issue. What do you > think? -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Aidar Imamov
Date:
Hi! This is my first time trying out as a patch reviewer, and I don't claim to be an expert. I am very interested in the topic of buffer cache and would like to learn more about it. I have reviewed the patches after they were edited and I would like to share my thoughts on some points. pg_buffercache_evict_all(): > for (int buf = 1; buf < NBuffers; buf++) Mb it would be more correct to use <= NBuffers? I also did not fully understand what A. Freund was referring to. However, we cannot avoid blocking the buffer header, as we need to ensure that the buffer is not being pinned by anyone else. Perhaps it would be beneficial to call LockBufHdr() outside and check if BUT_STATE_GET_REFCOUNT == 0 before calling EvictUnpinnedBuffer(). This would help to prevent unnecessary calls to EvictUnpinnedBuffer() itself, ResourceOwnerEnlarge() and ReservePrivateRefCountEntry(). pg_buffercache_mark_dirty_all(): > for (int buf = 1; buf < NBuffers; buf++) Mb it would be more correct to use <= NBuffers? pg_buffercache_evict_relation(): > errmsg("must be superuser to use pg_buffercache_evict function") '_relation' postfix got lost here > /* Open relation. */ > rel = relation_open(relOid, AccessShareLock); If I understand correctly, this function is meant to replicate the behavior of VACUUM FULL, but only for dirty relation pages. In this case, wouldn't it be necessary to obtain an exclusive access lock? > for (int buf = 1; buf < NBuffers; buf++) Mb it would be more correct to use <= NBuffers? > /* > * No need to call UnlockBufHdr() if BufTagMatchesRelFileLocator() > * returns true, EvictUnpinnedBuffer() will take care of it. > */ > buf_state = LockBufHdr(bufHdr); > if (BufTagMatchesRelFileLocator(&bufHdr->tag, &rel->rd_locator)) > { > if (EvictUnpinnedBuffer(buf, buf_state, &flushed)) > buffers_evicted++; > if (flushed) > buffers_flushed++; > } If you have previously acquired the exclusive access lock to the relation, you may want to consider replacing the order of the BufTagMatchesRelFileLocator() and LockBufHdr() calls for improved efficiency (as mentioned in the comment on the BufTagMatchesRelFileLocator() call in the DropRelationBuffers() function in bufmgr.c). And it maybe useful also to add BUT_STATE_GET_REFCOUNT == 0 precheck before calling EvictUnpinnedBuffer()? regards, Aidar Imamov
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Joseph Koshakow
Date:
Hi I am working with Aidar to give a review and I am also a beginner
reviewer.
> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> */
> bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
I think you should update the comment above this function to include
details on the new `flushed` argument.
> diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml
> index 802a5112d77..83950ca5cce 100644
> --- a/doc/src/sgml/pgbuffercache.sgml
> +++ b/doc/src/sgml/pgbuffercache.sgml
> @@ -31,8 +31,10 @@
> This module provides the <function>pg_buffercache_pages()</function>
> function (wrapped in the <structname>pg_buffercache</structname> view),
> the <function>pg_buffercache_summary()</function> function, the
> - <function>pg_buffercache_usage_counts()</function> function and
> - the <function>pg_buffercache_evict()</function> function.
> + <function>pg_buffercache_usage_counts()</function> function, the
> + <function>pg_buffercache_evict()</function>, the
> + <function>pg_buffercache_evict_relation()</function>, and the
> + <function>pg_buffercache_evict_all()</function> function.
All the other functions have indexterm tags above this, should we add
indexterms for the new functions?
> + <sect2 id="pgbuffercache-pg-buffercache-evict-relation">
> + <title>The <structname>pg_buffercache_evict_relation</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_relation()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is that
> + <function>pg_buffercache_evict_relation()</function> takes a relation
> + identifier instead of buffer identifier. Then, it tries to evict all
> + buffers in that relation. The function is intended for developer testing only.
> + </para>
> + </sect2>
> +
> + <sect2 id="pgbuffercache-pg-buffercache-evict-all">
> + <title>The <structname>pg_buffercache_evict_all</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_all()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is,
> + the <function>pg_buffercache_evict_all()</function> does not take argument;
> + instead it tries to evict all buffers in the buffer pool. The function is
> + intended for developer testing only.
> + </para>
> + </sect2>
The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we probably
want to mention here.
Also I think it's more gramatically correct to say "the <fn-name>
function" or just "<fn-name>". A couple of times you say "<fn-name>
function" or "the <fn-name>".
Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".
> +CREATE FUNCTION pg_buffercache_evict_relation(
> + IN regclass,
> + IN fork text default 'main',
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
> +
> +CREATE FUNCTION pg_buffercache_evict_all(
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?
> + /* Lock the header and check if it's valid. */
> + buf_state = LockBufHdr(desc);
> + if ((buf_state & BM_VALID) == 0)
> + {
> + UnlockBufHdr(desc, buf_state);
> + return false;
> + }
EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?
/* Lock the header if it is not already locked. */
if (!buf_state)
buf_state = LockBufHdr(desc);
>> I don't think *flushed is necessarily accurate this way, as it might detect
>> that it doesn't need to flush the data (via StartBufferIO() returning false).
>
> You are right. It seems there is no way to get that information
> without editing the FlushBuffer(), right? And I think editing
> FlushBuffer() is not worth it. So, I can either remove it or explain
> in the docs that #buffers_flushed may not be completely accurate.
There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.
reviewer.
> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> */
> bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
I think you should update the comment above this function to include
details on the new `flushed` argument.
> diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml
> index 802a5112d77..83950ca5cce 100644
> --- a/doc/src/sgml/pgbuffercache.sgml
> +++ b/doc/src/sgml/pgbuffercache.sgml
> @@ -31,8 +31,10 @@
> This module provides the <function>pg_buffercache_pages()</function>
> function (wrapped in the <structname>pg_buffercache</structname> view),
> the <function>pg_buffercache_summary()</function> function, the
> - <function>pg_buffercache_usage_counts()</function> function and
> - the <function>pg_buffercache_evict()</function> function.
> + <function>pg_buffercache_usage_counts()</function> function, the
> + <function>pg_buffercache_evict()</function>, the
> + <function>pg_buffercache_evict_relation()</function>, and the
> + <function>pg_buffercache_evict_all()</function> function.
All the other functions have indexterm tags above this, should we add
indexterms for the new functions?
> + <sect2 id="pgbuffercache-pg-buffercache-evict-relation">
> + <title>The <structname>pg_buffercache_evict_relation</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_relation()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is that
> + <function>pg_buffercache_evict_relation()</function> takes a relation
> + identifier instead of buffer identifier. Then, it tries to evict all
> + buffers in that relation. The function is intended for developer testing only.
> + </para>
> + </sect2>
> +
> + <sect2 id="pgbuffercache-pg-buffercache-evict-all">
> + <title>The <structname>pg_buffercache_evict_all</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_all()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is,
> + the <function>pg_buffercache_evict_all()</function> does not take argument;
> + instead it tries to evict all buffers in the buffer pool. The function is
> + intended for developer testing only.
> + </para>
> + </sect2>
The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we probably
want to mention here.
Also I think it's more gramatically correct to say "the <fn-name>
function" or just "<fn-name>". A couple of times you say "<fn-name>
function" or "the <fn-name>".
Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".
> +CREATE FUNCTION pg_buffercache_evict_relation(
> + IN regclass,
> + IN fork text default 'main',
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
> +
> +CREATE FUNCTION pg_buffercache_evict_all(
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?
> + /* Lock the header and check if it's valid. */
> + buf_state = LockBufHdr(desc);
> + if ((buf_state & BM_VALID) == 0)
> + {
> + UnlockBufHdr(desc, buf_state);
> + return false;
> + }
EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?
/* Lock the header if it is not already locked. */
if (!buf_state)
buf_state = LockBufHdr(desc);
>> I don't think *flushed is necessarily accurate this way, as it might detect
>> that it doesn't need to flush the data (via StartBufferIO() returning false).
>
> You are right. It seems there is no way to get that information
> without editing the FlushBuffer(), right? And I think editing
> FlushBuffer() is not worth it. So, I can either remove it or explain
> in the docs that #buffers_flushed may not be completely accurate.
There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.
Thanks,
Joe Koshakow
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Wed, 19 Mar 2025 at 01:02, Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > Hi! > > This is my first time trying out as a patch reviewer, and I don't claim > to be > an expert. I am very interested in the topic of buffer cache and would > like to > learn more about it. Thank you so much for the review! > pg_buffercache_evict_all(): > > for (int buf = 1; buf < NBuffers; buf++) > Mb it would be more correct to use <= NBuffers? Yes, done. > I also did not fully understand what A. Freund was referring to. > However, we > cannot avoid blocking the buffer header, as we need to ensure that the > buffer > is not being pinned by anyone else. Perhaps it would be beneficial to > call > LockBufHdr() outside and check if BUT_STATE_GET_REFCOUNT == 0 before > calling > EvictUnpinnedBuffer(). This would help to prevent unnecessary calls to > EvictUnpinnedBuffer() itself, ResourceOwnerEnlarge() and > ReservePrivateRefCountEntry(). I had an off-list talk with Andres and learned that ResourceOwnerEnlarge() and ReservePrivateRefCountEntry() need to be called before acquiring the buffer header lock. I introduced the EvictUnpinnedBuffersFromSharedRelation() function for that [1]. > pg_buffercache_mark_dirty_all(): > > for (int buf = 1; buf < NBuffers; buf++) > Mb it would be more correct to use <= NBuffers? Done. > pg_buffercache_evict_relation(): > > errmsg("must be superuser to use pg_buffercache_evict function") > '_relation' postfix got lost here Done. > > /* Open relation. */ > > rel = relation_open(relOid, AccessShareLock); > If I understand correctly, this function is meant to replicate the > behavior of > VACUUM FULL, but only for dirty relation pages. In this case, wouldn't > it be > necessary to obtain an exclusive access lock? I think VACUUM FULL does more things but I agree with you, obtaining an exclusive access lock is the correct way. > > for (int buf = 1; buf < NBuffers; buf++) > Mb it would be more correct to use <= NBuffers? Done. > > > /* > > * No need to call UnlockBufHdr() if BufTagMatchesRelFileLocator() > > * returns true, EvictUnpinnedBuffer() will take care of it. > > */ > > buf_state = LockBufHdr(bufHdr); > > if (BufTagMatchesRelFileLocator(&bufHdr->tag, &rel->rd_locator)) > > { > > if (EvictUnpinnedBuffer(buf, buf_state, &flushed)) > > buffers_evicted++; > > if (flushed) > > buffers_flushed++; > > } > > If you have previously acquired the exclusive access lock to the > relation, > you may want to consider replacing the order of the > BufTagMatchesRelFileLocator() > and LockBufHdr() calls for improved efficiency (as mentioned in the > comment on > the BufTagMatchesRelFileLocator() call in the DropRelationBuffers() > function in > bufmgr.c). I think you are right, we are basically copying FlushRelationBuffers() to the contrib/pg_buffer_cache. Done. > And it maybe useful also to add BUT_STATE_GET_REFCOUNT == 0 precheck > before calling > EvictUnpinnedBuffer()? I think we do not need to do this. I see EvictUnpinnedBuffer() as a central place that checks all conditions. If we add this pre-check then we should not do the same check in the EvictUnpinnedBuffer(), creating this logic would add unnecessary complexity to EvictUnpinnedBuffer(). I addressed these reviews in v3. I will share the patches in my next reply. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Fri, 21 Mar 2025 at 01:25, Joseph Koshakow <koshy44@gmail.com> wrote: > > Hi I am working with Aidar to give a review and I am also a beginner > reviewer. Thank you so much for the review! > > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> > > Date: Fri, 20 Dec 2024 14:06:47 +0300 > > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing > > */ > > bool > > -EvictUnpinnedBuffer(Buffer buf) > > +EvictUnpinnedBuffer(Buffer buf, bool *flushed) > > I think you should update the comment above this function to include > details on the new `flushed` argument. Yes, done. > > diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml > > index 802a5112d77..83950ca5cce 100644 > > --- a/doc/src/sgml/pgbuffercache.sgml > > +++ b/doc/src/sgml/pgbuffercache.sgml > > @@ -31,8 +31,10 @@ > > This module provides the <function>pg_buffercache_pages()</function> > > function (wrapped in the <structname>pg_buffercache</structname> view), > > the <function>pg_buffercache_summary()</function> function, the > > - <function>pg_buffercache_usage_counts()</function> function and > > - the <function>pg_buffercache_evict()</function> function. > > + <function>pg_buffercache_usage_counts()</function> function, the > > + <function>pg_buffercache_evict()</function>, the > > + <function>pg_buffercache_evict_relation()</function>, and the > > + <function>pg_buffercache_evict_all()</function> function. > > All the other functions have indexterm tags above this, should we add > indexterms for the new functions? I think so, done. > > + <sect2 id="pgbuffercache-pg-buffercache-evict-relation"> > > + <title>The <structname>pg_buffercache_evict_relation</structname> Function</title> > > + <para> > > + The <function>pg_buffercache_evict_relation()</function> function is very similar > > + to <function>pg_buffercache_evict()</function> function. The difference is that > > + <function>pg_buffercache_evict_relation()</function> takes a relation > > + identifier instead of buffer identifier. Then, it tries to evict all > > + buffers in that relation. The function is intended for developer testing only. > > + </para> > > + </sect2> > > + > > + <sect2 id="pgbuffercache-pg-buffercache-evict-all"> > > + <title>The <structname>pg_buffercache_evict_all</structname> Function</title> > > + <para> > > + The <function>pg_buffercache_evict_all()</function> function is very similar > > + to <function>pg_buffercache_evict()</function> function. The difference is, > > + the <function>pg_buffercache_evict_all()</function> does not take argument; > > + instead it tries to evict all buffers in the buffer pool. The function is > > + intended for developer testing only. > > + </para> > > + </sect2> > > The other difference is that these new functions have an additional > output argument to indicate if the buffer was flushed which we probably > want to mention here. You are right, done. > Also I think it's more gramatically correct to say "the <fn-name> > function" or just "<fn-name>". A couple of times you say "<fn-name> > function" or "the <fn-name>". I choose "the <fn-name> function", done. > Also I think the third to last sentence should end with "... does not > take **an** argument" or "... does not take argument**s**". I agree, done. > > +CREATE FUNCTION pg_buffercache_evict_relation( > > + IN regclass, > > + IN fork text default 'main', > > + OUT buffers_evicted int4, > > + OUT buffers_flushed int4) > > +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation' > > +LANGUAGE C PARALLEL SAFE VOLATILE; > > + > > +CREATE FUNCTION pg_buffercache_evict_all( > > + OUT buffers_evicted int4, > > + OUT buffers_flushed int4) > > +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all' > > +LANGUAGE C PARALLEL SAFE VOLATILE; > > Does it make sense to also update pg_buffercache_evict to also return > a bool if the buffer was flushed? Or is that too much of a breaking > change? I think it makes sense but I did that change in another patch (0002) as this may need more discussion. > > + /* Lock the header and check if it's valid. */ > > + buf_state = LockBufHdr(desc); > > + if ((buf_state & BM_VALID) == 0) > > + { > > + UnlockBufHdr(desc, buf_state); > > + return false; > > + } > > EvictUnpinnedBuffer first checks if the buffer is locked before > attempting to lock it. Do we need a similar check here? I do not think so, for now we do not call MarkUnpinnedBufferDirty() while the buffer header is locked. > /* Lock the header if it is not already locked. */ > if (!buf_state) > buf_state = LockBufHdr(desc); > > >> I don't think *flushed is necessarily accurate this way, as it might detect > >> that it doesn't need to flush the data (via StartBufferIO() returning false). > > > > You are right. It seems there is no way to get that information > > without editing the FlushBuffer(), right? And I think editing > > FlushBuffer() is not worth it. So, I can either remove it or explain > > in the docs that #buffers_flushed may not be completely accurate. > > There's already a lot of caveats on EvictUnpinnedBuffer that it might > have unexpected results when run concurrently, so I think it's fine to > add one more. I agree. v3 is attached, I addressed both you and Aidar's reviews in the v3. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, It seems that Aidar's email ended up as another thread [1]. I am copy-pasting mail and answer here to keep the discussion in this thread. On Sun, 23 Mar 2025 at 22:16, Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > I agree with most of what Joseph said. However, I would like to add some > comments. > > At the moment, the "flushed" flag essentially indicates whether the > buffer > was dirty at the time of eviction and it does not guarantee that it has > been > written to disk. Therefore, it would be better to rename the > buffers_flushed > column in the output of pg_buffer_cache_evict_all() and > pg_buffercache_evict_relation() functions to dirty_buffers mb? This > would > allow us to avoid the confusion that arises from the fact that not all > dirty > buffers could have actually been written to disk. In addition, this > would > remove the "flushed" parameter from the EvictUnpinnedBuffer() function. > Because if we explicitly call LockBufHdr() outside of > EvictUnpinnedBuffer(), > we can already know in advance whether the buffer is dirty or not. > > The same applies to the suggestion to retrieve "flushed" count from the > pg_buffercache_evict() call. We cannot say this for certain, but we can > determine whether the buffer was dirty. I think flushed means 'passing the buffer contents to the kernel' in the Postgres context (as it is explained in the FlushBuffer()). We know that flush has happened, we just do not know if the buffer is flushed by us or someone else. [1] https://postgr.es/m/flat/76a550315baef9d7424b70144f1c6a2d%40postgrespro.ru -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Aidar Imamov
Date:
Hi! I've been looking at the patches v3 and there are a few things I want to talk about. EvictUnpinnedBuffer(): I think we should change the paragraph with "flushed" of the comment to something more like this: "If the buffer was dirty at the time of the call it will be flushed by calling FlushBuffer() and if 'flushed' is not NULL '*flushed' will be set to true." I also think it'd be a good idea to add null-checks for "flushed" before dereferencing it: > *flushed = false; > *flushed = true; If the (!buf_state) clause is entered then we assume that the header is not locked. Maybe edit the comment: "Lock the header if it is not already locked." -> "Lock the header"? > if (!buf_state) > { > /* Make sure we can pin the buffer. */ > ResourceOwnerEnlarge(CurrentResourceOwner); > ReservePrivateRefCountEntry(); > > /* Lock the header if it is not already locked. */ > buf_state = LockBufHdr(desc); > } EvictUnpinnedBuffersFromSharedRelation(): Maybe it will be more accurate to name the function as EvictRelUnpinnedBuffers()? I think the comment will seem more correct in the following manner: "Try to evict all the shared buffers containing provided relation's pages. This function is intended for testing/development use only! Before calling this function, it is important to acquire AccessExclusiveLock on the specified relation to avoid replacing the current block of this relation with another one during execution. If not null, buffers_evicted and buffers_flushed are set to the total number of buffers evicted and flushed respectively." I also think it'd be a good idea to add null-checks for "buffers_evicted" and "buffers_flushed" before dereferencing them: > *buffers_evicted = *buffers_flushed = 0; I think we don't need to check this clause again if AccessExclusiveLock was acquired before function call. Don't we? > if (BufTagMatchesRelFileLocator(&bufHdr->tag, &rel->rd_locator)) > { > if (EvictUnpinnedBuffer(buf, buf_state, &flushed)) > (*buffers_evicted)++; > if (flushed) > (*buffers_flushed)++; > } MarkUnpinnedBufferDirty(): I think the comment will seem more correct in the following manner: "Try to mark provided shared buffer as dirty. This function is intended for testing/development use only! Same as EvictUnpinnedBuffer() but with MarkBufferDirty() call inside. Returns true if the buffer was already dirty or it has successfully been marked as dirty." And also I think the function should return true at the case when the buffer was already dirty. What do you think? pg_buffercache_evict_relation(): "pg_buffercache_evict_relation function is intended to" - 'be' is missed here. pg_buffercache_mark_dirty(): Maybe edit the comment to: "Try to mark a shared buffer as dirty."? Maybe edit the elog text to: "bad shared buffer ID" - just to clarify the case when provided buffer number is negative (local buffer number). pg_buffercache_mark_dirty_all(): Maybe also edit the comment to: "Try to mark all the shared buffers as dirty."? bufmgr.h: I think it might be a good idea to follow the Postgres formatting style and move the function's arguments to the next line if they exceed 80 characters. regards, Aidar Imamov
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Fri, 28 Mar 2025 at 01:53, Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > Hi! > > I've been looking at the patches v3 and there are a few things I want to > talk > about. Thank you for looking into this! > EvictUnpinnedBuffer(): > I think we should change the paragraph with "flushed" of the comment to > something more like this: "If the buffer was dirty at the time of the > call it > will be flushed by calling FlushBuffer() and if 'flushed' is not NULL > '*flushed' > will be set to true." This is correct if the buffer header lock is acquired before calling the EvictUnpinnedBuffer(). Otherwise, the buffer might be flushed after calling the EvictUnpinnedBuffer() and before acquiring the buffer header lock. I slightly edited this comment and also added a comment for the buf_state variable: * buf_state is used to understand if the buffer header lock is acquired * before calling this function. If it has non-zero value, it is assumed that * buffer header lock is acquired before calling this function. This is * helpful for evicting buffers in the relation as the buffer header lock * needs to be taken before calling this function in this case. * * *flushed is set to true if the buffer was dirty and has been flushed. * However, this does not necessarily mean that we flushed the buffer, it * could have been flushed by someone else. > I also think it'd be a good idea to add null-checks for "flushed" before > dereferencing it: > > *flushed = false; > > *flushed = true; Assert check is added. > If the (!buf_state) clause is entered then we assume that the header is > not locked. > Maybe edit the comment: "Lock the header if it is not already locked." > -> "Lock the header"? > > if (!buf_state) > > { > > /* Make sure we can pin the buffer. */ > > ResourceOwnerEnlarge(CurrentResourceOwner); > > ReservePrivateRefCountEntry(); > > > > /* Lock the header if it is not already locked. */ > > buf_state = LockBufHdr(desc); > > } I think this is better, done. > EvictUnpinnedBuffersFromSharedRelation(): > Maybe it will be more accurate to name the function as > EvictRelUnpinnedBuffers()? I liked that, done. > I think the comment will seem more correct in the following manner: > "Try to evict all the shared buffers containing provided relation's > pages. > > This function is intended for testing/development use only! > > Before calling this function, it is important to acquire > AccessExclusiveLock on > the specified relation to avoid replacing the current block of this > relation with > another one during execution. > > If not null, buffers_evicted and buffers_flushed are set to the total > number of > buffers evicted and flushed respectively." I added all comments except the not null part, I also preserved the explanation of why we need this function. > I also think it'd be a good idea to add null-checks for > "buffers_evicted" and > "buffers_flushed" before dereferencing them: > > *buffers_evicted = *buffers_flushed = 0; Assert check is added. > I think we don't need to check this clause again if AccessExclusiveLock > was acquired > before function call. Don't we? > > if (BufTagMatchesRelFileLocator(&bufHdr->tag, &rel->rd_locator)) > > { > > if (EvictUnpinnedBuffer(buf, buf_state, &flushed)) > > (*buffers_evicted)++; > > if (flushed) > > (*buffers_flushed)++; > > } I think we need it. Copy-pasting a comment from the DropRelationBuffers(): * We can make this a tad faster by prechecking the buffer tag before * we attempt to lock the buffer; this saves a lot of lock * acquisitions in typical cases. It should be safe because the * caller must have AccessExclusiveLock on the relation, or some other * reason to be certain that no one is loading new pages of the rel * into the buffer pool. (Otherwise we might well miss such pages * entirely.) Therefore, while the tag might be changing while we * look at it, it can't be changing *to* a value we care about, only * *away* from such a value. So false negatives are impossible, and * false positives are safe because we'll recheck after getting the * buffer lock. > MarkUnpinnedBufferDirty(): > I think the comment will seem more correct in the following manner: > "Try to mark provided shared buffer as dirty. > > This function is intended for testing/development use only! > > Same as EvictUnpinnedBuffer() but with MarkBufferDirty() call inside. > > Returns true if the buffer was already dirty or it has successfully been > marked as > dirty." Done. > And also I think the function should return true at the case when the > buffer was > already dirty. What do you think? I asked the same question in the first email and I still could not decide but I will be applying this as it was not decided before. > pg_buffercache_evict_relation(): > "pg_buffercache_evict_relation function is intended to" - 'be' is missed > here. Done. > pg_buffercache_mark_dirty(): > Maybe edit the comment to: "Try to mark a shared buffer as dirty."? Done. > Maybe edit the elog text to: "bad shared buffer ID" - just to clarify > the case when > provided buffer number is negative (local buffer number). Although I think your suggestion makes sense, I did not change this since it is already implemented like that in the pg_buffercache_evict(). > pg_buffercache_mark_dirty_all(): > Maybe also edit the comment to: "Try to mark all the shared buffers as > dirty."? Done. > bufmgr.h: > I think it might be a good idea to follow the Postgres formatting style > and move the > function's arguments to the next line if they exceed 80 characters. I thought that pgindent already does this, done. v4 is attached. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Aidar Imamov
Date:
Hi! I've reviewed the latest version of the patches and found a few things worth discussing. This is probably my final feedback on the patches at this point. Maybe Joseph has something to add. After this discussion, I think it would be helpful if one of the more experienced hackers could take a look at the overall picture (perhaps we should set the status "Ready for Committer"? To be honest, I'm not sure what step that status should be set at). pg_buffercache--1.5--1.6.sql: It seems that there is no need for the OR REPLACE clause after the pg_buffercache_evict() function has been dropped. Maybe we should remove the RETURNS clause from function declarations that have OUT parameters? Doc: "When there are OUT or INOUT parameters, the RETURNS clause can be omitted" pg_buffercache_evict_relation(): The function is marked as STRICT so I think we do not need for redundant check: > if (PG_ARGISNULL(0)) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > errmsg("relation cannot be null"))); Doc: "returns null whenever any of its arguments are null", "the function is not executed when there are null arguments;". EvictUnpinnedBuffer(): Maybe edit this comment a little (just not to repeat the sentences): > buf_state is used to check if the buffer header lock has been acquired > before > calling this function. If buf_state has a non-zero value, it means that > the buffer > header has already been locked. This information is useful for evicting > specific > relation's buffers, as the buffers headers need to be locked before > this function > can be called with such a intention. EvictUnpinnedBuffer() & EvictRelUnpinnedBuffers(): Why did you decide to use the assert to check for NULLs? I understand that currently, the use of these functions is limited to a specific set of calls through the pg_buffercache interface. However, this may not always be the case. Wouldn't it be better to allow users to choose whether or not they want to receive additional information? For example, you could simply ignore any arguments that are passed as NULL. Additionally, I believe it would be beneficial to include some regression tests to check at least the following cases: return type, being a superuser, bad buffer id, local buffers case. Also, there's a little thing about declaring functions as PARALLEL SAFE. To be honest, I don't really have any solid arguments against it. I just have some doubts. For example, how will it work if the plan is split up and we try to work on an object in one part, while the other part of the plan evicts the pages of that object or marks them as dirty... I can't really say for sure about that. And in that context, I'd suggest referring to that awesome statement in the documentation: "If in doubt, functions should be labeled as UNSAFE, which is the default." regards, Aidar Imamov
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Mon, 31 Mar 2025 at 16:37, Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > Hi! > > I've reviewed the latest version of the patches and found a few things > worth > discussing. This is probably my final feedback on the patches at this > point. > Maybe Joseph has something to add. Thank you so much for the reviews, these were very helpful! > After this discussion, I think it would be helpful if one of the more > experienced > hackers could take a look at the overall picture (perhaps we should set > the > status "Ready for Committer"? To be honest, I'm not sure what step that > status > should be set at). I agree. I will mark this as 'ready for committer' after writing the tests. > pg_buffercache--1.5--1.6.sql: > It seems that there is no need for the OR REPLACE clause after the > pg_buffercache_evict() function has been dropped. > > Maybe we should remove the RETURNS clause from function declarations > that have > OUT parameters? > Doc: "When there are OUT or INOUT parameters, the RETURNS clause can be > omitted" You are right, both are done. > pg_buffercache_evict_relation(): > The function is marked as STRICT so I think we do not need for redundant > check: > > if (PG_ARGISNULL(0)) > > ereport(ERROR, > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > errmsg("relation cannot be null"))); > Doc: "returns null whenever any of its arguments are null", "the > function is not > executed when there are null arguments;". Done. > EvictUnpinnedBuffer(): > Maybe edit this comment a little (just not to repeat the sentences): > > buf_state is used to check if the buffer header lock has been acquired > > before > > calling this function. If buf_state has a non-zero value, it means that > > the buffer > > header has already been locked. This information is useful for evicting > > specific > > relation's buffers, as the buffers headers need to be locked before > > this function > > can be called with such a intention. This is better, done. > EvictUnpinnedBuffer() & EvictRelUnpinnedBuffers(): > Why did you decide to use the assert to check for NULLs? > I understand that currently, the use of these functions is limited to a > specific set > of calls through the pg_buffercache interface. However, this may not > always be the > case. Wouldn't it be better to allow users to choose whether or not they > want to > receive additional information? For example, you could simply ignore any > arguments > that are passed as NULL. I do not have a strong opinion on this. I agree that these functions can be used somewhere else later but it is easier to ignore these on the return instead of handling NULLs in the functions. If we want to consider the NULL possibility in the EvictUnpinnedBuffer() & EvictRelUnpinnedBuffers(), then we need to write additional if clauses. I think this increases the complexity unnecessarily. > Additionally, I believe it would be beneficial to include some > regression tests to > check at least the following cases: return type, being a superuser, bad > buffer id, > local buffers case. You are right, I will try to write the tests but I am sharing the new version without tests to speed things up. > Also, there's a little thing about declaring functions as PARALLEL SAFE. > To be honest, > I don't really have any solid arguments against it. I just have some > doubts. For > example, how will it work if the plan is split up and we try to work on > an object in > one part, while the other part of the plan evicts the pages of that > object or marks > them as dirty... I can't really say for sure about that. And in that > context, I'd > suggest referring to that awesome statement in the documentation: "If in > doubt, > functions should be labeled as UNSAFE, which is the default." You may be right. I thought they are parallel safe as we acquire the buffer header lock for each buffer but now, I have the same doubts as you. I want to hear other people's opinions on that before labeling them as UNSAFE. v5 is attached. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2025-03-31 19:49:25 +0300, Nazir Bilal Yavuz wrote: > > After this discussion, I think it would be helpful if one of the more > > experienced > > hackers could take a look at the overall picture (perhaps we should set > > the > > status "Ready for Committer"? To be honest, I'm not sure what step that > > status > > should be set at). > > I agree. I will mark this as 'ready for committer' after writing the tests. Are you still working on tests? > > Also, there's a little thing about declaring functions as PARALLEL SAFE. > > To be honest, > > I don't really have any solid arguments against it. I just have some > > doubts. For > > example, how will it work if the plan is split up and we try to work on > > an object in > > one part, while the other part of the plan evicts the pages of that > > object or marks > > them as dirty... I can't really say for sure about that. And in that > > context, I'd > > suggest referring to that awesome statement in the documentation: "If in > > doubt, > > functions should be labeled as UNSAFE, which is the default." > > You may be right. I thought they are parallel safe as we acquire the > buffer header lock for each buffer but now, I have the same doubts as > you. I want to hear other people's opinions on that before labeling > them as UNSAFE. I don't see a problem with them being parallel safe. > From a59be4b50d7691ba952d0abd32198e972d4a6ea0 Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > Date: Mon, 24 Mar 2025 13:52:04 +0300 > Subject: [PATCH v5 1/3] Add pg_buffercache_evict_[relation | all]() functions > for testing > > pg_buffercache_evict_relation(): Evicts all shared buffers in a > relation at once. > pg_buffercache_evict_all(): Evicts all shared buffers at once. > > Both functions provide mechanism to evict multiple shared buffers at > once. They are designed to address the inefficiency of repeatedly calling > pg_buffercache_evict() for each individual buffer, which can be > time-consuming when dealing with large shared buffer pools. > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > > These functions are intended for developer testing and debugging > purposes and are available to superusers only. > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to use pg_buffercache_evict_relation function"))); I think it'd be nicer if we didn't ask translators to translate 3 different versions of this message, for different functions. Why not pass in the functio name as a parameter? > + if (RelationUsesLocalBuffers(rel)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("relation uses local buffers," > + "pg_buffercache_evict_relation function is intended to" > + "be used for shared buffers only"))); We try not to break messages across multiple lines, as that makes searching for them harder. > + EvictRelUnpinnedBuffers(rel, &buffers_evicted, &buffers_flushed); > + > + /* Close relation, release lock. */ This comment imo isn't useful, it just restates the code verbatim. > + relation_close(rel, AccessExclusiveLock); Hm. Why are we dropping the lock here early? It's probably ok, but it's not clear to me why we should do so. > + /* Build and return the tuple. */ Similar to above, the comment is just a restatement of a short piece of code. > + <para> > + The <function>pg_buffercache_evict_relation()</function> function allows all > + shared buffers in the relation to be evicted from the buffer pool given a > + relation identifier. Use of this function is restricted to superusers only. > + </para> I'd say "all unpinned shared buffers". I wonder if these functions ought to also return the number of buffers that could not be evicted? Should this, or the longer comment for the function, mention that buffers for all relation forks are evicted? > + <para> > + The <function>pg_buffercache_evict_all()</function> function allows all > + shared buffers to be evicted in the buffer pool. Use of this function is > + restricted to superusers only. > + </para> Basically the same comments as above, except for the fork portion. > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c > index f9681d09e1e..5d82e3fa297 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -6512,26 +6512,47 @@ ResOwnerPrintBufferPin(Datum res) > * even by the same block. This inherent raciness without other interlocking > * makes the function unsuitable for non-testing usage. > * > + * buf_state is used to check if the buffer header lock has been acquired > + * before calling this function. If buf_state has a non-zero value, it means > + * that the buffer header has already been locked. This information is useful > + * for evicting a specific relation's buffers, as the buffers' headers need to > + * be locked before this function can be called with such an intention. I don't like this aspect of the API changes one bit. IMO something callable from outside storage/buffer should have no business knowing about buf_state. And detecting whether already hold a lock by checking whether the buf_state is 0 feels really wrong to me. > +/* > + * Try to evict all the shared buffers containing provided relation's pages. > + * > + * This function is intended for testing/development use only! > + * > + * We need this helper function because of the following reason. > + * ReservePrivateRefCountEntry() needs to be called before acquiring the > + * buffer header lock but ReservePrivateRefCountEntry() is static and it would > + * be better to have it as static. Hence, it can't be called from outside of > + * this file. This helper function is created to bypass that problem. I think there are other reasons - we should minimize the amount of code outside of storage/buffer that needs to know about BuferDescs etc, it's a layering violation to access that outside. > + * Before calling this function, it is important to acquire > + * AccessExclusiveLock on the specified relation to avoid replacing the > + * current block of this relation with another one during execution. What do you mean with "current block of this relation"? > + * buffers_evicted and buffers_flushed are set the total number of buffers > + * evicted and flushed respectively. > + */ > +void > +EvictRelUnpinnedBuffers(Relation rel, int32 *buffers_evicted, int32 *buffers_flushed) > +{ > + bool flushed; > + > + Assert(buffers_evicted && buffers_flushed); > + *buffers_evicted = *buffers_flushed = 0; I'd personally not bother with the Assert, the next line would just crash anyway... > diff --git a/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > index 2e255f3fc10..d54bb1fd6f8 100644 > --- a/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > +++ b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > @@ -1,5 +1,13 @@ > \echo Use "ALTER EXTENSION pg_buffercache UPDATE TO '1.6'" to load this file. \quit > > +DROP FUNCTION pg_buffercache_evict(integer); > +CREATE FUNCTION pg_buffercache_evict( > + IN int, > + OUT evicted boolean, > + OUT flushed boolean) > +AS 'MODULE_PATHNAME', 'pg_buffercache_evict' > +LANGUAGE C PARALLEL SAFE VOLATILE STRICT; I assume the function has to be dropped because the return type changes? Greetings, Andres Freund
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, Thanks for the review! On Wed, 2 Apr 2025 at 20:06, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2025-03-31 19:49:25 +0300, Nazir Bilal Yavuz wrote: > > > After this discussion, I think it would be helpful if one of the more > > > experienced > > > hackers could take a look at the overall picture (perhaps we should set > > > the > > > status "Ready for Committer"? To be honest, I'm not sure what step that > > > status > > > should be set at). > > > > I agree. I will mark this as 'ready for committer' after writing the tests. > > Are you still working on tests? Sorry, I forgot. They are attached as 0004. > > > Also, there's a little thing about declaring functions as PARALLEL SAFE. > > > To be honest, > > > I don't really have any solid arguments against it. I just have some > > > doubts. For > > > example, how will it work if the plan is split up and we try to work on > > > an object in > > > one part, while the other part of the plan evicts the pages of that > > > object or marks > > > them as dirty... I can't really say for sure about that. And in that > > > context, I'd > > > suggest referring to that awesome statement in the documentation: "If in > > > doubt, > > > functions should be labeled as UNSAFE, which is the default." > > > > You may be right. I thought they are parallel safe as we acquire the > > buffer header lock for each buffer but now, I have the same doubts as > > you. I want to hear other people's opinions on that before labeling > > them as UNSAFE. > > I don't see a problem with them being parallel safe. > > > > From a59be4b50d7691ba952d0abd32198e972d4a6ea0 Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Date: Mon, 24 Mar 2025 13:52:04 +0300 > > Subject: [PATCH v5 1/3] Add pg_buffercache_evict_[relation | all]() functions > > for testing > > > > pg_buffercache_evict_relation(): Evicts all shared buffers in a > > relation at once. > > pg_buffercache_evict_all(): Evicts all shared buffers at once. > > > > Both functions provide mechanism to evict multiple shared buffers at > > once. They are designed to address the inefficiency of repeatedly calling > > pg_buffercache_evict() for each individual buffer, which can be > > time-consuming when dealing with large shared buffer pools. > > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > > > > These functions are intended for developer testing and debugging > > purposes and are available to superusers only. > > > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to use pg_buffercache_evict_relation function"))); > > I think it'd be nicer if we didn't ask translators to translate 3 different > versions of this message, for different functions. Why not pass in the functio > name as a parameter? Ah, I didn't think of this aspect, done. > > + if (RelationUsesLocalBuffers(rel)) > > + ereport(ERROR, > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > + errmsg("relation uses local buffers," > > + "pg_buffercache_evict_relation function is intended to" > > + "be used for shared buffers only"))); > > We try not to break messages across multiple lines, as that makes searching > for them harder. Done. > > + EvictRelUnpinnedBuffers(rel, &buffers_evicted, &buffers_flushed); > > + > > + /* Close relation, release lock. */ > > This comment imo isn't useful, it just restates the code verbatim. Removed. > > > + relation_close(rel, AccessExclusiveLock); > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > clear to me why we should do so. We are dropping the lock after we processed the relation. I didn't understand what could be the problem here. Why do you think it is early? > > + /* Build and return the tuple. */ > > Similar to above, the comment is just a restatement of a short piece of code. Done. It exists in other places in the pg_buffercache_pages.c file, I only removed the ones that I added. > > + <para> > > + The <function>pg_buffercache_evict_relation()</function> function allows all > > + shared buffers in the relation to be evicted from the buffer pool given a > > + relation identifier. Use of this function is restricted to superusers only. > > + </para> > > I'd say "all unpinned shared buffers". Done. > I wonder if these functions ought to also return the number of buffers that > could not be evicted? I didn't add this as I thought this information could be gathered if wanted. > Should this, or the longer comment for the function, mention that buffers for > all relation forks are evicted? I added this to a longer comment for the function. > > + <para> > > + The <function>pg_buffercache_evict_all()</function> function allows all > > + shared buffers to be evicted in the buffer pool. Use of this function is > > + restricted to superusers only. > > + </para> > > Basically the same comments as above, except for the fork portion. Done. > > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c > > index f9681d09e1e..5d82e3fa297 100644 > > --- a/src/backend/storage/buffer/bufmgr.c > > +++ b/src/backend/storage/buffer/bufmgr.c > > @@ -6512,26 +6512,47 @@ ResOwnerPrintBufferPin(Datum res) > > * even by the same block. This inherent raciness without other interlocking > > * makes the function unsuitable for non-testing usage. > > * > > + * buf_state is used to check if the buffer header lock has been acquired > > + * before calling this function. If buf_state has a non-zero value, it means > > + * that the buffer header has already been locked. This information is useful > > + * for evicting a specific relation's buffers, as the buffers' headers need to > > + * be locked before this function can be called with such an intention. > > I don't like this aspect of the API changes one bit. IMO something callable > from outside storage/buffer should have no business knowing about buf_state. > > And detecting whether already hold a lock by checking whether the buf_state is > 0 feels really wrong to me. I changed this. I basically copied EvictUnpinnedBuffer() to the inside of EvictRelUnpinnedBuffers(), we don't need any hacky methods now. > > +/* > > + * Try to evict all the shared buffers containing provided relation's pages. > > + * > > + * This function is intended for testing/development use only! > > + * > > + * We need this helper function because of the following reason. > > + * ReservePrivateRefCountEntry() needs to be called before acquiring the > > + * buffer header lock but ReservePrivateRefCountEntry() is static and it would > > + * be better to have it as static. Hence, it can't be called from outside of > > + * this file. This helper function is created to bypass that problem. > > I think there are other reasons - we should minimize the amount of code > outside of storage/buffer that needs to know about BuferDescs etc, it's a > layering violation to access that outside. I added that as a comment. > > + * Before calling this function, it is important to acquire > > + * AccessExclusiveLock on the specified relation to avoid replacing the > > + * current block of this relation with another one during execution. > > What do you mean with "current block of this relation"? It is used timewise, like a snapshot of the blocks when the function is called. I updated this with: 'The caller must hold at least AccessShareLock on the relation to prevent the relation from being dropped.' [1] > > + * buffers_evicted and buffers_flushed are set the total number of buffers > > + * evicted and flushed respectively. > > + */ > > +void > > +EvictRelUnpinnedBuffers(Relation rel, int32 *buffers_evicted, int32 *buffers_flushed) > > +{ > > + bool flushed; > > + > > + Assert(buffers_evicted && buffers_flushed); > > + *buffers_evicted = *buffers_flushed = 0; > > I'd personally not bother with the Assert, the next line would just crash > anyway... AIO tests already use EvictUnpinnedBuffer() without flushed information. I made '*buffers_evicted, *buffers_flushed in the EvictRelUnpinnedBuffers() and *flushed in the EvictUnpinnedBuffer()' optional as Aidar suggested at upthread. > > diff --git a/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > > index 2e255f3fc10..d54bb1fd6f8 100644 > > --- a/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > > +++ b/contrib/pg_buffercache/pg_buffercache--1.5--1.6.sql > > @@ -1,5 +1,13 @@ > > \echo Use "ALTER EXTENSION pg_buffercache UPDATE TO '1.6'" to load this file. \quit > > > > +DROP FUNCTION pg_buffercache_evict(integer); > > +CREATE FUNCTION pg_buffercache_evict( > > + IN int, > > + OUT evicted boolean, > > + OUT flushed boolean) > > +AS 'MODULE_PATHNAME', 'pg_buffercache_evict' > > +LANGUAGE C PARALLEL SAFE VOLATILE STRICT; > > I assume the function has to be dropped because the return type changes? Correct. v6 is attached. Additional changes prior to v5: * Tests are added. * Andres said off-list that AccessExclusiveLock is too much, all the lock needs to do is to prevent the relation from being dropped. So, pg_buffercache_evict_relation() opens a relation with AccessShareLock now. Function comment in EvictRelUnpinnedBuffers() is updated regarding that. [1] * EvictRelUnpinnedBuffers() does not call EvictUnpinnedBuffer() now. I basically copied EvictUnpinnedBuffer() to the inside of EvictRelUnpinnedBuffers() with the needed updates. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2025-04-04 18:36:57 +0300, Nazir Bilal Yavuz wrote: > v6 is attached. Additional changes prior to v5: > > * Tests are added. I don't think the mark-dirty stuff is quite ready to commit. Given that, could you change the split of the patches so that the tests are for the evict patches alone? I don't really see a need to add the tests separately from the code introducing the relevant functionality. > * EvictRelUnpinnedBuffers() does not call EvictUnpinnedBuffer() now. I > basically copied EvictUnpinnedBuffer() to the inside of > EvictRelUnpinnedBuffers() with the needed updates. I think I'd instead introduce a EvictBufferInternal() that is used both by EvictUnpinnedBuffer() and EvictRelUnpinnedBuffers(), that is expected to be called with the buffer header lock held. > > > + relation_close(rel, AccessExclusiveLock); > > > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > > clear to me why we should do so. > > We are dropping the lock after we processed the relation. I didn't > understand what could be the problem here. Why do you think it is > early? Most commonly we close relations without releasing the lock, instead relying on the lock being released at the end of the transaction. > > I wonder if these functions ought to also return the number of buffers that > > could not be evicted? > > I didn't add this as I thought this information could be gathered if wanted. How? A separate query will also return buffers that were since newly read in. I think it'be better to just return all that information, given that we are already dropping the function. Dropping objects in extension upgrades can be painful for users, due to other objects potentially depending on the dropped objects. So we shouldn't do that unnecessarily often. > From 88624ea0f768ebf1a015cc0c06d2c72c822577a8 Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > Date: Fri, 4 Apr 2025 13:22:00 +0300 > Subject: [PATCH v6 1/4] Return buffer flushed information in > pg_buffercache_evict function > > pg_buffercache_evict() function shows buffer flushed information now. > This feature will be used in the following commits. > > Author: Nazir Bilal Yavuz <byavuz81@gmail.com> > Suggested-by: Joseph Koshakow <koshy44@gmail.com> > Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> > Reviewed-by: Andres Freund <andres@anarazel.de> > Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com > --- > src/include/storage/bufmgr.h | 2 +- > src/backend/storage/buffer/bufmgr.c | 11 +++++++++- > src/test/modules/test_aio/test_aio.c | 4 ++-- > doc/src/sgml/pgbuffercache.sgml | 15 ++++++++----- > contrib/pg_buffercache/Makefile | 3 ++- > contrib/pg_buffercache/meson.build | 1 + > .../pg_buffercache--1.5--1.6.sql | 9 ++++++++ I think I'd squash this with the following changes, as it seems unnecessary to increase the version by multiple steps in immediately successive commits. > +/* > + * Try to evict all the shared buffers containing provided relation's pages. > + * > + * This function is intended for testing/development use only! > + * > + * We need this helper function because of the following reason. > + * ReservePrivateRefCountEntry() needs to be called before acquiring the > + * buffer header lock but ReservePrivateRefCountEntry() is static and it would > + * be better to have it as static. Hence, it can't be called from outside of > + * this file. This helper function is created to bypass that problem. > > + * Also, we should minimize the amount of code outside of storage/buffer that > + * needs to know about BuferDescs etc., it is a layering violation to access > + * that outside. > + * FWIW, I think this explanation is kind of true, but the real reason is that from a layering perspective this simply is bufmgr.c's responsibility, not pg_buffercache's. I'd probably just drop these two paragraphs. > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to use %s()", > + "pg_buffercache_evict_relation"))); FWIW, I'd upate the existing superuser check in the same file to use the same string. And perhaps I'd just put the check into a small helper function that is shared by pg_buffercache_evict() / pg_buffercache_evict() / pg_buffercache_evict_all(). > +/* > + * Try to evict all shared buffers. > + */ > +Datum > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > +{ > + Datum result; > + TupleDesc tupledesc; > + HeapTuple tuple; > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > + > + int32 buffers_evicted = 0; > + int32 buffers_flushed = 0; > + bool flushed; > + > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > + > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to use %s()", > + "pg_buffercache_evict_all"))); > + > + for (int buf = 1; buf <= NBuffers; buf++) > + { > + if (EvictUnpinnedBuffer(buf, &flushed)) > + buffers_evicted++; > + if (flushed) > + buffers_flushed++; > + } FWIW, I'd put this loop in bufmgr.c, in a similar helper function as for the other cases. > --- a/contrib/pg_buffercache/sql/pg_buffercache.sql > +++ b/contrib/pg_buffercache/sql/pg_buffercache.sql > @@ -26,3 +26,54 @@ SET ROLE pg_monitor; > SELECT count(*) > 0 FROM pg_buffercache; > SELECT buffers_used + buffers_unused > 0 FROM pg_buffercache_summary(); > SELECT count(*) > 0 FROM pg_buffercache_usage_counts(); > +RESET role; > + > +------ > +---- Test pg_buffercache_evict* and pg_buffercache_mark_dirty* functions > +------ > + > +CREATE ROLE regress_buffercache_normal; > +SET ROLE regress_buffercache_normal; > + > +-- These should fail because these are STRICT functions > +SELECT * FROM pg_buffercache_evict(); > +SELECT * FROM pg_buffercache_evict_relation(); > +SELECT * FROM pg_buffercache_mark_dirty(); > +-- These should fail because they need to be called as SUPERUSER > +SELECT * FROM pg_buffercache_evict(1); > +SELECT * FROM pg_buffercache_evict_relation(1); > +SELECT * FROM pg_buffercache_evict_all(); > +SELECT * FROM pg_buffercache_mark_dirty(1); > +SELECT * FROM pg_buffercache_mark_dirty_all(); > + > +RESET ROLE; > +CREATE ROLE regress_buffercache_superuser SUPERUSER; > +SET ROLE regress_buffercache_superuser; Not sure what we gain by creating this role? Greetings, Andres Freund
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Mon, 7 Apr 2025 at 16:38, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2025-04-04 18:36:57 +0300, Nazir Bilal Yavuz wrote: > > v6 is attached. Additional changes prior to v5: > > > > * Tests are added. > > I don't think the mark-dirty stuff is quite ready to commit. Given that, could > you change the split of the patches so that the tests are for the evict > patches alone? I don't really see a need to add the tests separately from the > code introducing the relevant functionality. Done. > > * EvictRelUnpinnedBuffers() does not call EvictUnpinnedBuffer() now. I > > basically copied EvictUnpinnedBuffer() to the inside of > > EvictRelUnpinnedBuffers() with the needed updates. > > I think I'd instead introduce a EvictBufferInternal() that is used both by > EvictUnpinnedBuffer() and EvictRelUnpinnedBuffers(), that is expected to be > called with the buffer header lock held. Makes sense, done that way. I named it as 'EvictUnpinnedBufferInternal'. > > > > + relation_close(rel, AccessExclusiveLock); > > > > > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > > > clear to me why we should do so. > > > > We are dropping the lock after we processed the relation. I didn't > > understand what could be the problem here. Why do you think it is > > early? > > Most commonly we close relations without releasing the lock, instead relying > on the lock being released at the end of the transaction. I see. I was looking at pg_prewarm as an example and copied it from there. > > > I wonder if these functions ought to also return the number of buffers that > > > could not be evicted? > > > > I didn't add this as I thought this information could be gathered if wanted. > > How? A separate query will also return buffers that were since newly read in. Yes, correct. I missed that. > I think it'be better to just return all that information, given that we are > already dropping the function. Dropping objects in extension upgrades can be > painful for users, due to other objects potentially depending on the dropped > objects. So we shouldn't do that unnecessarily often. Done that way. pg_buffercache_evict_relation() and pg_buffercache_evict_all() return the total number of buffers processed. > > From 88624ea0f768ebf1a015cc0c06d2c72c822577a8 Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Date: Fri, 4 Apr 2025 13:22:00 +0300 > > Subject: [PATCH v6 1/4] Return buffer flushed information in > > pg_buffercache_evict function > > > > pg_buffercache_evict() function shows buffer flushed information now. > > This feature will be used in the following commits. > > > > Author: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Suggested-by: Joseph Koshakow <koshy44@gmail.com> > > Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> > > Reviewed-by: Andres Freund <andres@anarazel.de> > > Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com > > --- > > src/include/storage/bufmgr.h | 2 +- > > src/backend/storage/buffer/bufmgr.c | 11 +++++++++- > > src/test/modules/test_aio/test_aio.c | 4 ++-- > > doc/src/sgml/pgbuffercache.sgml | 15 ++++++++----- > > contrib/pg_buffercache/Makefile | 3 ++- > > contrib/pg_buffercache/meson.build | 1 + > > .../pg_buffercache--1.5--1.6.sql | 9 ++++++++ > > I think I'd squash this with the following changes, as it seems unnecessary to > increase the version by multiple steps in immediately successive commits. Done. > > +/* > > + * Try to evict all the shared buffers containing provided relation's pages. > > + * > > + * This function is intended for testing/development use only! > > + * > > + * We need this helper function because of the following reason. > > + * ReservePrivateRefCountEntry() needs to be called before acquiring the > > + * buffer header lock but ReservePrivateRefCountEntry() is static and it would > > + * be better to have it as static. Hence, it can't be called from outside of > > + * this file. This helper function is created to bypass that problem. > > > > + * Also, we should minimize the amount of code outside of storage/buffer that > > + * needs to know about BuferDescs etc., it is a layering violation to access > > + * that outside. > > + * > > FWIW, I think this explanation is kind of true, but the real reason is that > from a layering perspective this simply is bufmgr.c's responsibility, not > pg_buffercache's. I'd probably just drop these two paragraphs. Done. > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to use %s()", > > + "pg_buffercache_evict_relation"))); > > FWIW, I'd upate the existing superuser check in the same file to use the same > string. And perhaps I'd just put the check into a small helper function that > is shared by pg_buffercache_evict() / pg_buffercache_evict() / > pg_buffercache_evict_all(). Done. I added the pg_buffercache_superuser_check() function. > > +/* > > + * Try to evict all shared buffers. > > + */ > > +Datum > > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > > +{ > > + Datum result; > > + TupleDesc tupledesc; > > + HeapTuple tuple; > > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > > + bool nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > > + > > + int32 buffers_evicted = 0; > > + int32 buffers_flushed = 0; > > + bool flushed; > > + > > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE) > > + elog(ERROR, "return type must be a row type"); > > + > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to use %s()", > > + "pg_buffercache_evict_all"))); > > + > > + for (int buf = 1; buf <= NBuffers; buf++) > > + { > > + if (EvictUnpinnedBuffer(buf, &flushed)) > > + buffers_evicted++; > > + if (flushed) > > + buffers_flushed++; > > + } > > FWIW, I'd put this loop in bufmgr.c, in a similar helper function as for the > other cases. Done. I moved this loop to the EvictAllUnpinnedBuffers() function in bufmgr.c. > > --- a/contrib/pg_buffercache/sql/pg_buffercache.sql > > +++ b/contrib/pg_buffercache/sql/pg_buffercache.sql > > @@ -26,3 +26,54 @@ SET ROLE pg_monitor; > > SELECT count(*) > 0 FROM pg_buffercache; > > SELECT buffers_used + buffers_unused > 0 FROM pg_buffercache_summary(); > > SELECT count(*) > 0 FROM pg_buffercache_usage_counts(); > > +RESET role; > > + > > +------ > > +---- Test pg_buffercache_evict* and pg_buffercache_mark_dirty* functions > > +------ > > + > > +CREATE ROLE regress_buffercache_normal; > > +SET ROLE regress_buffercache_normal; > > + > > +-- These should fail because these are STRICT functions > > +SELECT * FROM pg_buffercache_evict(); > > +SELECT * FROM pg_buffercache_evict_relation(); > > +SELECT * FROM pg_buffercache_mark_dirty(); > > > +-- These should fail because they need to be called as SUPERUSER > > +SELECT * FROM pg_buffercache_evict(1); > > +SELECT * FROM pg_buffercache_evict_relation(1); > > +SELECT * FROM pg_buffercache_evict_all(); > > +SELECT * FROM pg_buffercache_mark_dirty(1); > > +SELECT * FROM pg_buffercache_mark_dirty_all(); > > + > > +RESET ROLE; > > +CREATE ROLE regress_buffercache_superuser SUPERUSER; > > +SET ROLE regress_buffercache_superuser; > > Not sure what we gain by creating this role? I missed that the role used for running the tests is already a superuser. Done. v7 is attached. I only attached pg_buffercache_evict* patch to run it on cfbot. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2025-04-07 19:37:50 +0300, Nazir Bilal Yavuz wrote: > > > > > + relation_close(rel, AccessExclusiveLock); > > > > > > > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > > > > clear to me why we should do so. > > > > > > We are dropping the lock after we processed the relation. I didn't > > > understand what could be the problem here. Why do you think it is > > > early? > > > > Most commonly we close relations without releasing the lock, instead relying > > on the lock being released at the end of the transaction. > > I see. I was looking at pg_prewarm as an example and copied it from there. I don't think we're particularly consistent about it. And I think there's some differing views about what precisely the right behaviour is... I've tried to polish the patch. Changes I made: - The number of processed buffers for EvictAllUnpinnedBuffers() was alwasy NBuffers, that didn't seem right. But that's obsoleted by the next point: - I think it'd be more useful to return the number of skipped buffers, i.e. buffers that could not be evicted, than the number of processed buffers. I'm not_evicted or such would also work. - EvictAllUnpinnedBuffers() did not check whether the buffer was valid before locking the buffer, which made it a fair bit more expensive than EvictRelUnpinnedBuffers, which kinda has such a check via the buffer tag check. That sped up EvictAllUnpinnedBuffers() 3x when using a cluster with mostly unused shared buffers. - The optional pointer arguments made the code look a bit ugly. I made them mandatory now. - I made EvictUnpinnedBufferInternal()'s argument the BufferDesc, rather than the Buffer. - The tests for STRICTness worked, they just errored out because there isn't a function of the relevant names without arguments. I called them with NULL. - The count(*) tests would have succeeded even if the call had "failed" due to STRICTness. I used <colname> IS NOT NULL instead. - rebased over the other pg_buffercache changes Other points: - I don't love the buffers_ prefix for the column names / C function arguments. Seems long. It seems particularly weird because pg_buffercache_evict() doesn't have a buffer_ prefix. I left it as-is, but I think something perhaps ought to change before commit. Otoh, pg_buffercache_summary() and pg_buffercache_usage_counts() already inconsistent in a similar way with each other. - Arguably these functions ought to check BM_TAG_VALID, not BM_VALID. But that only rather rarely happens when there are no pins. Since this is a pre-existing pattern, I left it alone. - The documentation format of the functions isn't quite what we usually do (a table documenting the columns returned by a function with multiple columns), but otoh, these are really developer oriented functions, so spending 30 lines of a <table> on each of these functions feels a bit silly. I'd be ok with it as-is. - The docs for pg_buffercache_evict() don't quite sound right to me, there's some oddity in the phrasing. Nothing too bad, but perhaps worht a small bit of additional polish. Greetings, Andres Freund
Attachment
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Tue, 8 Apr 2025 at 02:29, Andres Freund <andres@anarazel.de> wrote: > On 2025-04-07 19:37:50 +0300, Nazir Bilal Yavuz wrote: > > > > > > + relation_close(rel, AccessExclusiveLock); > > > > > > > > > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > > > > > clear to me why we should do so. > > > > > > > > We are dropping the lock after we processed the relation. I didn't > > > > understand what could be the problem here. Why do you think it is > > > > early? > > > > > > Most commonly we close relations without releasing the lock, instead relying > > > on the lock being released at the end of the transaction. > > > > I see. I was looking at pg_prewarm as an example and copied it from there. > > I don't think we're particularly consistent about it. And I think there's some > differing views about what precisely the right behaviour is... > > > I've tried to polish the patch. Changes I made: > > - The number of processed buffers for EvictAllUnpinnedBuffers() was alwasy > NBuffers, that didn't seem right. But that's obsoleted by the next point: > > - I think it'd be more useful to return the number of skipped buffers, > i.e. buffers that could not be evicted, than the number of processed > buffers. I agree. The number of buffers that we tried to evict but couldn't evict will give more information rather than NBuffers. > > I'm not_evicted or such would also work. Did you mean that 'not_evicted' can be used instead of 'skipped_buffers' as a column name? If that is the case, both are okay to me. > - EvictAllUnpinnedBuffers() did not check whether the buffer was valid before > locking the buffer, which made it a fair bit more expensive than > EvictRelUnpinnedBuffers, which kinda has such a check via the buffer tag > check. > > That sped up EvictAllUnpinnedBuffers() 3x when using a cluster with mostly > unused shared buffers. Very nice speedup, thank you. > - The optional pointer arguments made the code look a bit ugly. I made them > mandatory now. I agree, I think they look better now. > - I made EvictUnpinnedBufferInternal()'s argument the BufferDesc, rather than > the Buffer. > > - The tests for STRICTness worked, they just errored out because there isn't a > function of the relevant names without arguments. I called them with NULL. > > - The count(*) tests would have succeeded even if the call had "failed" due to > STRICTness. I used <colname> IS NOT NULL instead. These look more correct now. > > - rebased over the other pg_buffercache changes > > > Other points: > > - I don't love the buffers_ prefix for the column names / C function > arguments. Seems long. It seems particularly weird because > pg_buffercache_evict() doesn't have a buffer_ prefix. > > I left it as-is, but I think something perhaps ought to change before > commit. Both are okay to me, I still couldn't decide which I like more. > Otoh, pg_buffercache_summary() and pg_buffercache_usage_counts() already > inconsistent in a similar way with each other. > > > - Arguably these functions ought to check BM_TAG_VALID, not BM_VALID. But that > only rather rarely happens when there are no pins. Since this is a > pre-existing pattern, I left it alone. > > > - The documentation format of the functions isn't quite what we usually do (a > table documenting the columns returned by a function with multiple columns), > but otoh, these are really developer oriented functions, so spending 30 > lines of a <table> on each of these functions feels a bit silly. > > I'd be ok with it as-is. > > - The docs for pg_buffercache_evict() don't quite sound right to me, there's > some oddity in the phrasing. Nothing too bad, but perhaps worht a small bit > of additional polish. I tried to update it without changing the old doc, perhaps that was the wrong move. Thank you so much for the review and the updates! v8 applies cleanly and passes the CI. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2025-04-08 03:15:57 +0300, Nazir Bilal Yavuz wrote: > On Tue, 8 Apr 2025 at 02:29, Andres Freund <andres@anarazel.de> wrote: > > On 2025-04-07 19:37:50 +0300, Nazir Bilal Yavuz wrote: > > > > > > > + relation_close(rel, AccessExclusiveLock); > > > > > > > > > > > > Hm. Why are we dropping the lock here early? It's probably ok, but it's not > > > > > > clear to me why we should do so. > > > > > > > > > > We are dropping the lock after we processed the relation. I didn't > > > > > understand what could be the problem here. Why do you think it is > > > > > early? > > > > > > > > Most commonly we close relations without releasing the lock, instead relying > > > > on the lock being released at the end of the transaction. > > > > > > I see. I was looking at pg_prewarm as an example and copied it from there. > > > > I don't think we're particularly consistent about it. And I think there's some > > differing views about what precisely the right behaviour is... > > > > > > I've tried to polish the patch. Changes I made: > > > > - The number of processed buffers for EvictAllUnpinnedBuffers() was alwasy > > NBuffers, that didn't seem right. But that's obsoleted by the next point: > > > > - I think it'd be more useful to return the number of skipped buffers, > > i.e. buffers that could not be evicted, than the number of processed > > buffers. > > I agree. The number of buffers that we tried to evict but couldn't > evict will give more information rather than NBuffers. Cool. > > > > I'm not_evicted or such would also work. > > Did you mean that 'not_evicted' can be used instead of > 'skipped_buffers' as a column name? If that is the case, both are okay > to me. Correct, sorry for that garbled sentence. > > - I don't love the buffers_ prefix for the column names / C function > > arguments. Seems long. It seems particularly weird because > > pg_buffercache_evict() doesn't have a buffer_ prefix. > > > > I left it as-is, but I think something perhaps ought to change before > > commit. > > Both are okay to me, I still couldn't decide which I like more. I went back and forth at least three times. In the end I added the buffer_ (singular) prefix for pg_buffercache_evict(). > > - The documentation format of the functions isn't quite what we usually do (a > > table documenting the columns returned by a function with multiple columns), > > but otoh, these are really developer oriented functions, so spending 30 > > lines of a <table> on each of these functions feels a bit silly. > > > > I'd be ok with it as-is. > > > > - The docs for pg_buffercache_evict() don't quite sound right to me, there's > > some oddity in the phrasing. Nothing too bad, but perhaps worht a small bit > > of additional polish. > > I tried to update it without changing the old doc, perhaps that was > the wrong move. I think it's ok for now. It might be worth doing a larger redesign of the pgbuffercache docs at some point... Pushed. Thanks for your patches and thanks for all the reviewers getting this ready! Greetings, Andres Freund
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Bertrand Drouvot
Date:
Hi, On Tue, Apr 08, 2025 at 02:40:52AM -0400, Andres Freund wrote: > I think it's ok for now. It might be worth doing a larger redesign of the > pgbuffercache docs at some point... > > > Pushed. > > > Thanks for your patches and thanks for all the reviewers getting this ready! Thanks for the patch! That sounds like a great addition. I was doing some tests and did not see any issues. Also while doing the tests I thouhgt that it could be useful to evict only from a subset of NUMA nodes (now that NUMA awareness is in). We'd need to figure out what to do for buffers that are spread across NUMA nodes though. Does that sound like an idea worth to spend time on? (If so, I'd be happy to work on it). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Robert Haas
Date:
On Tue, Mar 18, 2025 at 6:03 PM Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > for (int buf = 1; buf < NBuffers; buf++) > Mb it would be more correct to use <= NBuffers? I agree that (int buf = 1; buf < NBuffers; buf++) isn't right because that iterates one fewer times than the number of buffers. What was ultimately committed was: + for (int buf = 1; buf <= NBuffers; buf++) + { + BufferDesc *desc = GetBufferDescriptor(buf - 1); Curiously, there is no other instance of <= NBuffers in the code. Elsewhere we instead do: for (i = 0; i < NBuffers; i++) { BufferDesc *bufHdr = GetBufferDescriptor(i); Or in BufferSync: for (buf_id = 0; buf_id < NBuffers; buf_id++) { BufferDesc *bufHdr = GetBufferDescriptor(buf_id); Nonetheless what was committed seems pretty defensible, because we have lots of other places that do GetBufferDescriptor(buffer - 1) and similar. Alternating between 0-based indexing and 1-based indexing like this seems rather error-prone somehow. :-( -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, On Thu, 10 Apr 2025 at 16:50, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Tue, Apr 08, 2025 at 02:40:52AM -0400, Andres Freund wrote: > > I think it's ok for now. It might be worth doing a larger redesign of the > > pgbuffercache docs at some point... > > > > > > Pushed. > > > > > > Thanks for your patches and thanks for all the reviewers getting this ready! > > Thanks for the patch! That sounds like a great addition. I was doing some > tests and did not see any issues. Thank you for looking into this! > Also while doing the tests I thouhgt that it > could be useful to evict only from a subset of NUMA nodes (now that NUMA > awareness is in). We'd need to figure out what to do for buffers that are spread > across NUMA nodes though. > > Does that sound like an idea worth to spend time on? (If so, I'd be happy to work > on it). I think it looks useful, but I’m not too familiar with NUMA, so I’m not sure I can say much with confidence. Maybe someone else can chime in with a more solid take? -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Nazir Bilal Yavuz
Date:
Hi, Thank you for looking into this! On Thu, 10 Apr 2025 at 23:06, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 6:03 PM Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > > for (int buf = 1; buf < NBuffers; buf++) > > Mb it would be more correct to use <= NBuffers? > > I agree that (int buf = 1; buf < NBuffers; buf++) isn't right because > that iterates one fewer times than the number of buffers. What was > ultimately committed was: > > + for (int buf = 1; buf <= NBuffers; buf++) > + { > + BufferDesc *desc = GetBufferDescriptor(buf - 1); > > Curiously, there is no other instance of <= NBuffers in the code. > Elsewhere we instead do: > > for (i = 0; i < NBuffers; i++) > { > BufferDesc *bufHdr = GetBufferDescriptor(i); > > Or in BufferSync: > > for (buf_id = 0; buf_id < NBuffers; buf_id++) > { > BufferDesc *bufHdr = GetBufferDescriptor(buf_id); > > Nonetheless what was committed seems pretty defensible, because we > have lots of other places that do GetBufferDescriptor(buffer - 1) and > similar. Alternating between 0-based indexing and 1-based indexing > like this seems rather error-prone somehow. :-( I understand your point. I did it like that because bufferids start from 1 and go to NBuffers inclusive in the pg_buffercache view, so it seems more natural to me to implement it like that. I am okay to replace these loops with [1] to make it standart everywhere: [1] for (int buf = 0; buf < NBuffers; buf++) { BufferDesc *desc = GetBufferDescriptor(buf); -- Regards, Nazir Bilal Yavuz Microsoft
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Robert Haas
Date:
On Fri, Apr 11, 2025 at 4:02 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > I understand your point. I did it like that because bufferids start > from 1 and go to NBuffers inclusive in the pg_buffercache view, so it > seems more natural to me to implement it like that. I am okay to > replace these loops with [1] to make it standart everywhere: > > [1] > for (int buf = 0; buf < NBuffers; buf++) > { > BufferDesc *desc = GetBufferDescriptor(buf); I'm more making an observation than asking for a change. If you and others think it should be changed, that is fine, but I'm uncertain myself what we ought to be doing here. I just wanted to raise the issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
From
Andres Freund
Date:
Hi, On 2025-04-10 13:50:36 +0000, Bertrand Drouvot wrote: > Thanks for the patch! That sounds like a great addition. I was doing some > tests and did not see any issues. Also while doing the tests I thouhgt that it > could be useful to evict only from a subset of NUMA nodes (now that NUMA > awareness is in). We'd need to figure out what to do for buffers that are spread > across NUMA nodes though. > > Does that sound like an idea worth to spend time on? (If so, I'd be happy to work > on it). I'm not sure that's common enough to warrant its own function. You can do that with pg_buffercache_evict(), it'll be slower than pg_buffercache_evict_all(), but given that determining the numa node already is somewhat expensive, I'm not sure it's going to make that big a difference. Greetings, Andres Freund