Hi,
On 2020-04-19 15:07:22 -0700, Jeff Davis wrote:
> I brought up an issue where GCC in combination with FORTIFY_SOURCE[2]
> causes a perf regression for logical tapes after introducing
> LogicalTapeSetExtend()[3]. Unfortunately, FORTIFY_SOURCE is used by
> default on ubuntu. I have not observed the problem with clang.
>
> There is no reason why the change should trigger the regression, but it
> does. The slowdown is due to GCC switching to an inlined version of
> memcpy() for LogicalTapeWrite() at logtape.c:768. The change[3] seems
> to have little if anything to do with that.
FWIW, with gcc 10 and glibc 2.30 I don't see such a switch. Taking a
profile shows me:
│ nthistime = TapeBlockPayloadSize - lt->pos;
│ if (nthistime > size)
3.01 │1 b0: cmp %rdx,%r12
1.09 │ cmovbe %r12,%rdx
│ memcpy():
│
│ __fortify_function void *
│ __NTH (memcpy (void *__restrict __dest, const void *__restrict __src,
│ size_t __len))
│ {
│ return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
2.44 │ mov %r13,%rsi
│ LogicalTapeWrite():
│ nthistime = size;
│ Assert(nthistime > 0);
│
│ memcpy(lt->buffer + lt->pos, ptr, nthistime);
2.49 │ add 0x28(%rbx),%rdi
0.28 │ mov %rdx,%r15
│ memcpy():
4.65 │ → callq memcpy@plt
│ LogicalTapeWrite():
I.e. normal memcpy is getting called.
That's with -D_FORTIFY_SOURCE=2
With which compiler / libc versions did you encounter this?
Greetings,
Andres Freund