Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions - Mailing list pgsql-hackers
From | Joseph Koshakow |
---|---|
Subject | Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions |
Date | |
Msg-id | CAAvxfHeHBL6XxypexiSu0geH3hkHiFX=ApUaeaU+hgQEfjAwxQ@mail.gmail.com Whole thread Raw |
In response to | Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions (Aidar Imamov <a.imamov@postgrespro.ru>) |
Responses |
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
|
List | pgsql-hackers |
Hi I am working with Aidar to give a review and I am also a beginner
reviewer.
> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> */
> bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
I think you should update the comment above this function to include
details on the new `flushed` argument.
> diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml
> index 802a5112d77..83950ca5cce 100644
> --- a/doc/src/sgml/pgbuffercache.sgml
> +++ b/doc/src/sgml/pgbuffercache.sgml
> @@ -31,8 +31,10 @@
> This module provides the <function>pg_buffercache_pages()</function>
> function (wrapped in the <structname>pg_buffercache</structname> view),
> the <function>pg_buffercache_summary()</function> function, the
> - <function>pg_buffercache_usage_counts()</function> function and
> - the <function>pg_buffercache_evict()</function> function.
> + <function>pg_buffercache_usage_counts()</function> function, the
> + <function>pg_buffercache_evict()</function>, the
> + <function>pg_buffercache_evict_relation()</function>, and the
> + <function>pg_buffercache_evict_all()</function> function.
All the other functions have indexterm tags above this, should we add
indexterms for the new functions?
> + <sect2 id="pgbuffercache-pg-buffercache-evict-relation">
> + <title>The <structname>pg_buffercache_evict_relation</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_relation()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is that
> + <function>pg_buffercache_evict_relation()</function> takes a relation
> + identifier instead of buffer identifier. Then, it tries to evict all
> + buffers in that relation. The function is intended for developer testing only.
> + </para>
> + </sect2>
> +
> + <sect2 id="pgbuffercache-pg-buffercache-evict-all">
> + <title>The <structname>pg_buffercache_evict_all</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_all()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is,
> + the <function>pg_buffercache_evict_all()</function> does not take argument;
> + instead it tries to evict all buffers in the buffer pool. The function is
> + intended for developer testing only.
> + </para>
> + </sect2>
The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we probably
want to mention here.
Also I think it's more gramatically correct to say "the <fn-name>
function" or just "<fn-name>". A couple of times you say "<fn-name>
function" or "the <fn-name>".
Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".
> +CREATE FUNCTION pg_buffercache_evict_relation(
> + IN regclass,
> + IN fork text default 'main',
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
> +
> +CREATE FUNCTION pg_buffercache_evict_all(
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?
> + /* Lock the header and check if it's valid. */
> + buf_state = LockBufHdr(desc);
> + if ((buf_state & BM_VALID) == 0)
> + {
> + UnlockBufHdr(desc, buf_state);
> + return false;
> + }
EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?
/* Lock the header if it is not already locked. */
if (!buf_state)
buf_state = LockBufHdr(desc);
>> I don't think *flushed is necessarily accurate this way, as it might detect
>> that it doesn't need to flush the data (via StartBufferIO() returning false).
>
> You are right. It seems there is no way to get that information
> without editing the FlushBuffer(), right? And I think editing
> FlushBuffer() is not worth it. So, I can either remove it or explain
> in the docs that #buffers_flushed may not be completely accurate.
There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.
reviewer.
> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> */
> bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
I think you should update the comment above this function to include
details on the new `flushed` argument.
> diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml
> index 802a5112d77..83950ca5cce 100644
> --- a/doc/src/sgml/pgbuffercache.sgml
> +++ b/doc/src/sgml/pgbuffercache.sgml
> @@ -31,8 +31,10 @@
> This module provides the <function>pg_buffercache_pages()</function>
> function (wrapped in the <structname>pg_buffercache</structname> view),
> the <function>pg_buffercache_summary()</function> function, the
> - <function>pg_buffercache_usage_counts()</function> function and
> - the <function>pg_buffercache_evict()</function> function.
> + <function>pg_buffercache_usage_counts()</function> function, the
> + <function>pg_buffercache_evict()</function>, the
> + <function>pg_buffercache_evict_relation()</function>, and the
> + <function>pg_buffercache_evict_all()</function> function.
All the other functions have indexterm tags above this, should we add
indexterms for the new functions?
> + <sect2 id="pgbuffercache-pg-buffercache-evict-relation">
> + <title>The <structname>pg_buffercache_evict_relation</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_relation()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is that
> + <function>pg_buffercache_evict_relation()</function> takes a relation
> + identifier instead of buffer identifier. Then, it tries to evict all
> + buffers in that relation. The function is intended for developer testing only.
> + </para>
> + </sect2>
> +
> + <sect2 id="pgbuffercache-pg-buffercache-evict-all">
> + <title>The <structname>pg_buffercache_evict_all</structname> Function</title>
> + <para>
> + The <function>pg_buffercache_evict_all()</function> function is very similar
> + to <function>pg_buffercache_evict()</function> function. The difference is,
> + the <function>pg_buffercache_evict_all()</function> does not take argument;
> + instead it tries to evict all buffers in the buffer pool. The function is
> + intended for developer testing only.
> + </para>
> + </sect2>
The other difference is that these new functions have an additional
output argument to indicate if the buffer was flushed which we probably
want to mention here.
Also I think it's more gramatically correct to say "the <fn-name>
function" or just "<fn-name>". A couple of times you say "<fn-name>
function" or "the <fn-name>".
Also I think the third to last sentence should end with "... does not
take **an** argument" or "... does not take argument**s**".
> +CREATE FUNCTION pg_buffercache_evict_relation(
> + IN regclass,
> + IN fork text default 'main',
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_relation'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
> +
> +CREATE FUNCTION pg_buffercache_evict_all(
> + OUT buffers_evicted int4,
> + OUT buffers_flushed int4)
> +AS 'MODULE_PATHNAME', 'pg_buffercache_evict_all'
> +LANGUAGE C PARALLEL SAFE VOLATILE;
Does it make sense to also update pg_buffercache_evict to also return
a bool if the buffer was flushed? Or is that too much of a breaking
change?
> + /* Lock the header and check if it's valid. */
> + buf_state = LockBufHdr(desc);
> + if ((buf_state & BM_VALID) == 0)
> + {
> + UnlockBufHdr(desc, buf_state);
> + return false;
> + }
EvictUnpinnedBuffer first checks if the buffer is locked before
attempting to lock it. Do we need a similar check here?
/* Lock the header if it is not already locked. */
if (!buf_state)
buf_state = LockBufHdr(desc);
>> I don't think *flushed is necessarily accurate this way, as it might detect
>> that it doesn't need to flush the data (via StartBufferIO() returning false).
>
> You are right. It seems there is no way to get that information
> without editing the FlushBuffer(), right? And I think editing
> FlushBuffer() is not worth it. So, I can either remove it or explain
> in the docs that #buffers_flushed may not be completely accurate.
There's already a lot of caveats on EvictUnpinnedBuffer that it might
have unexpected results when run concurrently, so I think it's fine to
add one more.
Thanks,
Joe Koshakow
pgsql-hackers by date: