Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Date
Msg-id bbxepmv3w2iapbxehkypzsut545valywzf62qlizsrzkbrdnfy@dzw5ju3itwgs
Whole thread Raw
List pgsql-hackers
Hi,

On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote:
> 1- It is time consuming to evict all shared buffers one by one using
> the pg_buffercache_evict() function.

This is really useful for benchmarking. When testing IO heavy workloads it
turns out to be a lot lower noise to measure with shared buffers already
"touched" before taking actual measurements. The first time a buffer is used
the kernel needs to actually initialize the memory for the buffer, which can
take a substantial amount of time. Which obviously can hide many performance
differences. And it adds a lot of noise.


> 2- It would be good to have a function to mark buffers as dirty to
> test different scenarios.

This is important to be able to measure checkpointer performance. Otherwise
one has to load data from scratch. That's bad for three reasons:

1) After re-loading the data from scratch the data can be laid-out differently
   on-disk, which can have substantial performance impacts. By re-dirtying
   data one instead can measure the performance effects of a change with the
   same on-disk layaout.
2) It takes a lot of time to reload data from scratch.
3) The first write to data has substantially different performance
   characteristics than later writes, but often isn't the most relevant factor
   for real-world performance test.



> So, this patchset extends pg_buffercache with 3 functions:
> 
> 1- pg_buffercache_evict_all(): This is very similar to the already
> existing pg_buffercache_evict() function. The difference is
> pg_buffercache_evict_all() does not take an argument. Instead it just
> loops over the shared buffers and tries to evict all of them. It
> returns the number of buffers evicted and flushed.

I do think that'd be rather useful.  Perhaps it's also worth having a version
that just evicts a specific relation?


> 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an
> argument and tries to mark this buffer as dirty. Returns true on
> success. This returns false if the buffer is already dirty. Do you
> think this makes sense or do you prefer it to return true if the
> buffer is already dirty?

I don't really have feelings about that ;)


> From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001
> From: Nazir Bilal Yavuz <byavuz81@gmail.com>
> Date: Fri, 20 Dec 2024 14:06:47 +0300
> Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing
> 
> This new function provides a mechanism to evict all shared buffers at
> once. It is designed to address the inefficiency of repeatedly calling
> pg_buffercache_evict() for each individual buffer, which can be
> time-consuming when dealing with large shared buffer pools
> (e.g., ~790ms vs. ~270ms for 16GB of shared buffers).

>   */
>  bool
> -EvictUnpinnedBuffer(Buffer buf)
> +EvictUnpinnedBuffer(Buffer buf, bool *flushed)
>  {
>      BufferDesc *desc;
>      uint32        buf_state;
>      bool        result;
>  
> +    *flushed = false;
> +
>      /* Make sure we can pin the buffer. */
>      ResourceOwnerEnlarge(CurrentResourceOwner);
>      ReservePrivateRefCountEntry();
> @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf)
>          LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_SHARED);
>          FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL);
>          LWLockRelease(BufferDescriptorGetContentLock(desc));
> +        *flushed = true;
>      }
>  
>      /* This will return false if it becomes dirty or someone else pins
>      it. */

I don't think *flushed is necessarily accurate this way, as it might detect
that it doesn't need to flush the data (via StartBufferIO() returning false).


> + */
> +Datum
> +pg_buffercache_evict_all(PG_FUNCTION_ARGS)
> +{
> +    Datum        result;
> +    TupleDesc    tupledesc;
> +    HeapTuple    tuple;
> +    Datum        values[NUM_BUFFERCACHE_EVICT_ALL_ELEM];
> +    bool        nulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0};
> +
> +    int32        buffers_evicted = 0;
> +    int32        buffers_flushed = 0;
> +    bool        flushed;
> +
> +    if (get_call_result_type(fcinfo, NULL, &tupledesc) != TYPEFUNC_COMPOSITE)
> +        elog(ERROR, "return type must be a row type");
> +
> +    if (!superuser())
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                 errmsg("must be superuser to use pg_buffercache_evict_all function")));
> +
> +    for (int buf = 1; buf < NBuffers; buf++)
> +    {
> +        if (EvictUnpinnedBuffer(buf, &flushed))
> +            buffers_evicted++;
> +        if (flushed)
> +            buffers_flushed++;

I'd probably add a pre-check for the buffer not being in use. We don't need an
external function call, with an unconditional LockBufHdr() inside, for that
case.



> +/*
> + * MarkUnpinnedBufferDirty
> + *
> + * This function is intended for testing/development use only!
> + *
> + * To succeed, the buffer must not be pinned on entry, so if the caller had a
> + * particular block in mind, it might already have been replaced by some other
> + * block by the time this function runs.  It's also unpinned on return, so the
> + * buffer might be occupied and flushed by the time control is returned.  This
> + * inherent raciness without other interlocking makes the function unsuitable
> + * for non-testing usage.
> + *
> + * Returns true if the buffer was not dirty and it has now been marked as
> + * dirty.  Returns false if it wasn't valid, if it couldn't be marked as dirty
> + * due to a pin, or if the buffer was already dirty.
> + */

Hm. One potentially problematic thing with this is that it can pin and dirty a
buffer even while its relation is locked exclusively. Which i think some code
doesn't expect.  Not sure if there's a way around that :(


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Dmitry Dolgov
Date:
Subject: Re: pg_stat_statements and "IN" conditions