On Sat, 1 Feb 2025 at 06:01, Zhang Mingli <
zmlpostgres@gmail.com> wrote:
>
>
>
> Zhang Mingli
>
www.hashdata.xyz > On Jan 30, 2025 at 15:49 +0800, Matthias van de Meent <
boekewurm+postgres@gmail.com>, wrote:
>
> Hi,
>
> Thanks for your insights.
> While the buffer tag consumes a relatively small amount of space in the overall shared buffer architecture, including the BufferDescriptors array and page buffers, but every little helps, I think.
>
> Regarding the code, I've read through some. Here are my initial thoughts:
>
>
> ```
> int
> BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id)
> [...]
> ```
>
> In the BufTableInsert function, it appears that the key changes from BufferTag to an integer surrogateid.
> Given that multiple buckets exist based on the hash code, we need to iterate through the bucket lists to find a slot by comparing the keys, and if surrogateid is set to -1, will the comparison function always return false?
Please note the use of our own compare function, which grabs the stored value of 'key' and uses that buffer ID to find the buffertag of the indicated buffer (or, in case of surrogateId, the buffertag that was supplied by the caller of BufTableInsert()).
> Additionally, I'm curious about the purpose of MyBackendBufLookup, which is set and reset around the hash_search_with_hash_value call. Is there a concurrency consideration here, even though we have a lock before the buffer insertion?
In multi-threaded postgresql this might need additional considerations, but in non-reentrent non-threaded PostgreSQL we shouldn't have any callers to the BufTableInsert/Delete/Search() functions from inside those same functions.
> And, a potential drawback? : databases built on PostgreSQL might manipulate the buffer table directly (e.g., reading it for specific purposes).
> In this case, the buffer tags stored in the table would reveal the infos without needing to reference the buffer descriptor array.
> While I understand that Postgres doesn’t have promise about that, just a consideration.
The buffer lookup table reference is stored in a variable that's private to buftable.c. Any user of the table that's not using the BufTable*-functions would need to be doing some very weird trickery w.r.t. finding out where the table base pointer is stored and how to access it (though attaching through shared memory is technically possible, I wouldn't suggest that as a valid and supported method).
Additionally, anyone who wants to scan that table would have to lock all partitions for the duration of the scan to correctly scan all entries, which would be catastrophic for the performance of a cluster that does any amount of buffer swapping.
It is probably much cheaper to scan the bufferdesc array, as that has per-entry locks, and the definitions for that array are shared in the relevant headers, and are thus usable without resorting to magic values.
Kind regards,
Matthias van de Meent