On 4/5/25 15:23, Tomas Vondra wrote:
> On 4/5/25 11:37, Bertrand Drouvot wrote:
>> Hi,
>>
>> On Fri, Apr 04, 2025 at 09:25:57PM +0200, Tomas Vondra wrote:
>>> OK,
>>>
>>> here's v25 after going through the patches once more, fixing the issues
>>> mentioned by Bertrand, etc.
>>
>> Thanks!
>>
>>> I think 0001 and 0002 are fine,
>>
>> Agree, I just have some cosmetic nits comments: please find them in
>> nit-bertrand-0002.txt attached.
>>
>>> I have a
>>> couple minor questions about 0003.
>>>
>>> - I was wondering if maybe we should have some "global ID" of memory
>>> page, so that with large memory pages it's indicated the buffers are on
>>> the same memory page. Right now each buffer starts page_num from 0, but
>>> it should not be very hard to have a global counter. Opinions?
>>
>> I think that's a good idea. We could then add a new column (say os_page_id) that
>> would help identify which buffers are sharing the same "physical" page.
>>
>
> I was thinking we'd change the definition of the existing page_num
> column, i.e. it wouldn't be 0..N sequence for each buffer, but a global
> page ID. But I don't know if this would be useful in practice.
>
See the attached v25 with a draft of this in patch 0003.
While working on this, I realized it's probably wrong to use TYPEALIGN()
to calculate the OS page pointer. The code did this:
os_page_ptrs[idx]
= (char *) TYPEALIGN(os_page_size,
buffptr + (os_page_size * j));
but TYPEALIGN() rounds "up". Let's assume we have 1KB buffers and 4KB
memory pages, and that the first buffer is aligned to 4kB (i.e. it
starts right at the beginning of a memory page). Then we expect to get
page_num sequence:
0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ...
with 4 buffers per memory page. But we get this:
0, 1, 1, 1, 1, 2, 2, 2, 2, ...
So I've changed this to TYPEALIGN_DOWN(), which fixes the result.
The pg_shmem_allocations_numa had a variant of this issue when
calculating the number of pages, I think. I believe the shmem segment
may start half-way through a page, and the allocated_size may not be a
multiple of os_page_size (otherwise why use TYPEALIGN). In which case we
might skip the first page, I think.
The one question I have about this is whether we know the pointer
returned by TYPEALIGN_DOWN() is valid. It's before ent->location (or
before the first shared buffer) ...
regards
--
Tomas Vondra