Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Protect syscache from bloating with negative cache entries
Date
Msg-id de62aa48-553e-5e99-60c6-1505cdc5b2e3@iki.fi
Whole thread Raw
In response to Re: Protect syscache from bloating with negative cache entries  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Protect syscache from bloating with negative cache entries  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: Protect syscache from bloating with negative cache entries  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On 06/11/2020 10:24, Kyotaro Horiguchi wrote:
> Thank you for the comment!
> 
> First off, I thought that I managed to eliminate the degradation
> observed on the previous versions, but significant degradation (1.1%
> slower) is still seen in on case.

One thing to keep in mind with micro-benchmarks like this is that even 
completely unrelated code changes can change the layout of the code in 
memory, which in turn can affect CPU caching affects in surprising ways. 
If you're lucky, you can see 1-5% differences just by adding a function 
that's never called, for example, if it happens to move other code in 
memory so that a some hot codepath or struct gets split across CPU cache 
lines. It can be infuriating when benchmarking.

> At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas <hlinnaka@iki.fi> wrote in
> (A) original test patch
> 
> I naively thought that the code path is too short to bury the
> degradation of additional a few instructions.  Actually I measured
> performance again with the same patch set on the current master and
> had the more or less the same result.
> 
> master 8195.58ms, patched 8817.40 ms: +10.75%
> 
> However, I noticed that the additional call was a recursive call and a
> jmp inserted for the recursive call seems taking significant
> time. After avoiding the recursive call, the difference reduced to
> +0.96% (master 8268.71ms : patched 8348.30ms)
> 
> Just two instructions below are inserted in this case, which looks
> reasonable.
> 
>    8720ff <+31>:    cmpl   $0xffffffff,0x4ba942(%rip)        # 0xd2ca48 <catalog_cache_prune_min_age>
>    872106 <+38>:    jl     0x872240 <SearchCatCache1+352> (call to a function)

That's interesting. I think a 1% degradation would be acceptable.

I think we'd like to enable this feature by default though, so the 
performance when it's enabled is also very important.

> (C) inserting bare counter-update code without a branch
> 
>> Do we actually need a branch there? If I understand correctly, the
>> point is to bump up a usage counter on the catcache entry. You could
>> increment the counter unconditionally, even if the feature is not
>> used, and avoid the branch that way.
> 
> That change causes 4.9% degradation, which is worse than having a
> branch.
> 
> master 8364.54ms, patched 8666.86ms (+4.9%)
> 
> The additional instructions follow.
> 
> + 8721ab <+203>:    mov    0x30(%rbx),%eax  # %eax = ct->naccess
> + 8721ae <+206>:    mov    $0x2,%edx
> + 8721b3 <+211>:    add    $0x1,%eax        # %eax++
> + 8721b6 <+214>:    cmove  %edx,%eax        # if %eax == 0 then %eax = 2
> <original code>
> + 8721bf <+223>:    mov    %eax,0x30(%rbx)  # ct->naccess = %eax
> + 8721c2 <+226>:    mov    0x4cfe9f(%rip),%rax        # 0xd42068 <catcacheclock>
> + 8721c9 <+233>:    mov    %rax,0x38(%rbx)  # ct->lastaccess = %rax

Do you need the "ntaccess == 2" test? You could always increment the 
counter, and in the code that uses ntaccess to decide what to evict, 
treat all values >= 2 the same.

Need to handle integer overflow somehow. Or maybe not: integer overflow 
is so infrequent that even if a hot syscache entry gets evicted 
prematurely because its ntaccess count wrapped around to 0, it will 
happen so rarely that it won't make any difference in practice.

- Heikki



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Protect syscache from bloating with negative cache entries
Next
From: Andy Fan
Date:
Subject: Advance xmin aggressively on Read Commit isolation level