Home > mailing lists

Re: SP-GiST micro-optimizations - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: SP-GiST micro-optimizations
Date	August 28, 2012 21:27:26
Msg-id	503D0D86.6080105@enterprisedb.com Whole thread Raw
In response to	Re: SP-GiST micro-optimizations (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: SP-GiST micro-optimizations
List	pgsql-hackers

Tree view

On 28.08.2012 20:30, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> Drilling into the profile, I came up with three little optimizations:
>
>> 1. Within spgdoinsert, a significant portion of the CPU time is spent on
>> line 2033 in spgdoinsert.c:
>
>> memset(&out, 0, sizeof(out));
>
>> That zeroes out a small struct allocated in the stack. Replacing that
>> with MemSet() makes it faster, reducing the time spent on zeroing that
>> struct from 10% to 1.5% of the time spent in spgdoinsert(). That's not
>> very much in the big scheme of things, but it's a trivial change so
>> seems worth it.
>
> Fascinating.  I'd been of the opinion that modern compilers would inline
> memset() for themselves and MemSet was probably not better than what the
> compiler could do these days.  What platform are you testing on?

x64, gcc 4.7.1, running Debian.

The assembly generated for the MemSet is:
.loc 1 2033 0 discriminator 3movq    $0, -432(%rbp)
.LVL166:movq    $0, -424(%rbp)
.LVL167:movq    $0, -416(%rbp)
.LVL168:movq    $0, -408(%rbp)
.LVL169:movq    $0, -400(%rbp)
.LVL170:movq    $0, -392(%rbp)

while the corresponding memset code is:
.loc 1 2040 0 discriminator 6xorl    %eax, %eax.loc 1 2042 0 discriminator 6cmpb    $0, -669(%rbp).loc 1 2040 0
discriminator6movq    -584(%rbp), %rdimovl    $6, %ecxrep stosq
 

In fact, with -mstringop=unrolled_loop, I can coerce gcc to produce code 
similar to the MemSet version:
movq    %rax, -440(%rbp).loc 1 2040 0 discriminator 6xorl    %eax, %eax
.L254:movl    %eax, %edxaddl    $32, %eaxcmpl    $32, %eaxmovq    $0, -432(%rbp,%rdx)movq    $0, -424(%rbp,%rdx)movq
$0,-416(%rbp,%rdx)movq    $0, -408(%rbp,%rdx)jb    .L254leaq    -432(%rbp), %r9addq    %r9, %rax.loc 1 2042 0
discriminator6cmpb    $0, -665(%rbp).loc 1 2040 0 discriminator 6movq    $0, (%rax)movq    $0, 8(%rax)
 

I'm not sure why gcc doesn't choose that by default. Perhaps it's CPU 
specific which variant is faster - I was quite surprised that MemSet was 
such a clear win on my laptop. Or maybe it's a speed-space tradeoff, and 
gcc chooses the more compact version, although using -O3 instead of -O2 
made no difference.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Stephen Frost
Date: 28 August 2012, 21:12:42
Subject: Re: "default deny" for roles

From: Robert Haas
Date: 28 August 2012, 21:36:40
Subject: Re: MySQL search query is not executing in Postgres DB

Re: SP-GiST micro-optimizations - Mailing list pgsql-hackers

Previous

Next