On February 6, 2019, 08:25AM +0000, Kyotaro HORIGUCHI wrote:
>At Wed, 6 Feb 2019 06:29:15 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote:
>> Although I haven't looked deeply at Thomas's patch yet, there's currently no place to store the size per relation in
sharedmemory. You have to wait for the global metacache that Ideriha-san is addressing. Then, you can store the
relationsize in the RelationData structure in relcache.
>Just one counter in the patch *seems* to give significant gain comparing to the complexity, given that lseek is so
complexor it brings latency, especially on workloads where file is scarcely changed. Though I didn't run it on a test
bench.
> > > (2) Is the MdSharedData temporary or permanent in shared memory?
> > Permanent in shared memory.
> I'm not sure the duration of the 'permanent' there, but it disappears when server stops. Anyway it doesn't need to be
permanentbeyond a server restart.
Thank you for the insights.
I did a simple test in the previous email using simple syscall tracing,
the patch significantly reduced the number of lseek syscall.
(but that simple test might not be enough to describe the performance benefit)
Regarding Tsunakawa-san's comment,
in Thomas' patch, he made a place in shared memory that stores the
relsize_change_counter, so I am thinking of utilizing the same,
but for caching the relsize itself.
Perhaps I should explain further the intention for the design.
First step, to cache the file size in the shared memory. Similar to the
intention or purpose of the patch written by Thomas, to reduce the
number of lseek(SEEK_END) by caching the relation size without using
additional locks. The difference is by caching a rel size on the shared
memory itself. I wonder if there are problems that you can see with
this approach.
Eventually, the next step is to have a structure in shared memory
that caches file addresses along with their sizes (Andres' idea of
putting an open relations table in the shared memory). With a
structure that group buffers into file relation units, we can get
file size directly from shared memory, so the assumption is it would
be faster whenever we truncate/extend our relations because we can
track the offset of the changes in size and use range for managing
the truncation, etc..
The next step is a complex direction that needs serious discussion,
but I am wondering if we can proceed with the first step for now if
the idea and direction are valid.
Regards,
Kirk Jamison