On 09/11/2020 11:34, Kyotaro Horiguchi wrote:
> At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
>> Do you need the "ntaccess == 2" test? You could always increment the
>> counter, and in the code that uses ntaccess to decide what to evict,
>> treat all values >= 2 the same.
>>
>> Need to handle integer overflow somehow. Or maybe not: integer
>> overflow is so infrequent that even if a hot syscache entry gets
>> evicted prematurely because its ntaccess count wrapped around to 0, it
>> will happen so rarely that it won't make any difference in practice.
>
> That relaxing simplifies the code significantly, but a significant
> degradation by about 5% still exists.
>
> (SearchCatCacheInternal())
> + ct->naccess++;
> !+ ct->lastaccess = catcacheclock;
>
> If I removed the second line above, the degradation disappears
> (-0.7%).
0.7% degradation is probably acceptable.
> However, I don't find the corresponding numbers in the output
> of perf. The sum of the numbers for the removed instructions is (0.02
> + 0.28 = 0.3%). I don't think the degradation as the whole doesn't
> always reflect to the instruction level profiling, but I'm stuck here,
> anyway.
Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple)
is exactly 64 bytes currently, so any fields that you add after 'tuple'
will go on a different cache line. Maybe it would help if you just move
the new fields before 'tuple'.
Making CatCTup smaller might help. Some ideas/observations:
- The 'ct_magic' field is only used for assertion checks. Could remove it.
- 4 Datums (32 bytes) are allocated for the keys, even though most
catcaches have fewer key columns.
- In the current syscaches, keys[2] and keys[3] are only used to store
32-bit oids or some other smaller fields. Allocating a full 64-bit Datum
for them wastes memory.
- You could move the dead flag at the end of the struct or remove it
altogether, with the change I mentioned earlier to not keep dead items
in the buckets
- You could steal a few bit for dead/negative flags from some other
field. Use special values for tuple.t_len for them or something.
With some of these tricks, you could shrink CatCTup so that the new
lastaccess and naccess fields would fit in the same cacheline.
That said, I think this is good enough performance-wise as it is. So if
we want to improve performance in general, that can be a separate patch.
- Heikki