Re: Some platform-specific MemSet research - Mailing list pgsql-hackers

From Rocco Altier
Subject Re: Some platform-specific MemSet research
Date
Msg-id 6E0907A94904D94B99D7F387E08C4F57C62740@FALCON.INSIGHT
Whole thread Raw
In response to Some platform-specific MemSet research  (Seneca Cunningham <scunning@ca.afilias.info>)
Responses Re: Some platform-specific MemSet research
List pgsql-hackers
I wanted to chime in that I also see this speedup from using XLC 6.0
(IBM's cc), even in 32bit mode.  I have tested on AIX 5.2 and 5.1.

I think this would be good to include in the regular release.

Not sure how many people are running older versions of AIX that would
want a new version of postgres.
-rocco



> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian
> Sent: Wednesday, February 01, 2006 12:11 PM
> To: Seneca Cunningham
> Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Some platform-specific MemSet research
>
>
>
> My guess is that there is some really fast assembler for
> memory copy on
> AIX, and only libc memset() has it.  If you want, we can make
> MEMSET_LOOP_LIMIT in c.h a configure value, and allow template/aix to
> set it to zero, causing memset() to be always used.
>
> Are you prepared to make this optimization decision for all AIX users
> using gcc, or only for certain versions?
>
> --------------------------------------------------------------
> -------------
>
> Seneca Cunningham wrote:
> > Martijn van Oosterhout wrote:
> > > On Tue, Jan 24, 2006 at 05:24:28PM -0500, Seneca Cunningham wrote:
> > >
> > >>After reading the post on -patches proposing that MemSet
> be changed to
> > >>use long instead of int32 on the grounds that a pair of
> x86-64 linux
> > >>boxes took less time to execute the long code 64*10^6
> times[1], I took a
> > >>look at how the testcode performed on AIX with gcc.
> While the switch to
> > >>long did result in a minor performance improvement, dropping the
> > >>MemSetLoop in favour of the native memset resulted in the
> tests taking
> > >>~25% the time as the MemSetLoop-like int loop. The 32-bit
> linux system I
> > >>ran the expanded tests on showed that for the buffer size
> range that
> > >>postgres can use the looping MemSet instead of memset
> (size <= 1024
> > >>bytes), MemSet generally had better performance.
> > >
> > >
> > > Could you please check the asm output to see what's going
> on. We've had
> > > tests like these produce odd results in the past because
> the compiler
> > > optimised away stuff that didn't have any effect. Since
> every memset
> > > after the first is a no-op, you want to make sure it's
> still actually
> > > doing the work...
> >
> > Well, on both linux and AIX, all 30 of the 64000000 iterations loops
> > from the source exist (10 int, 10 long, 10 memset).  According to my
> > understanding of the assembler, memset itself is only
> called for values
> > >= 64 bytes on both platforms and the memset is called in
> each iteration.
> >
> > The assembler for the 64 byte loops, with prepended line
> number, first
> > loop MemSetLoop int-variant, second loop memset, third loop
> MemSetLoop
> > long-variant:
> >
> > 64-bit AIX:
> >
> >     419     addi 3,1,112
> >     420     li 4,0
> >     421     bl .gettimeofday
> >     422     nop
> >     423     lis 10,0x3d0
> >     424     cmpld 6,26,16
> >     425     li 11,0
> >     426     ori 10,10,36864
> >     427 L..41:
> >     428     bge 6,L..42
> >     429     mr 9,26
> >     430     li 0,0
> >     431 L..44:
> >     432     stw 0,0(9)
> >     433     addi 9,9,4
> >     434     cmpld 7,16,9
> >     435     bgt 7,L..44
> >     436 L..42:
> >     437     addi 0,11,1
> >     438     extsw 11,0
> >     439     cmpw 7,11,10
> >     440     bne+ 7,L..41
> >     441     li 4,0
> >     442     mr 3,22
> >     443     lis 25,0x3d0
> >     444     li 28,0
> >     445     bl .gettimeofday
> >     446     nop
> >     447     li 4,64
> >     448     addi 5,1,112
> >     449     ld 3,LC..9(2)
> >     450     mr 6,22
> >     451     ori 25,25,36864
> >     452     bl .print_time
> >     453     addi 3,1,112
> >     454     li 4,0
> >     455     bl .gettimeofday
> >     456     nop
> >     457 L..46:
> >     458     mr 3,26
> >     459     li 4,0
> >     460     li 5,64
> >     461     bl .memset
> >     462     nop
> >     463     addi 0,28,1
> >     464     extsw 28,0
> >     465     cmpw 7,28,25
> >     466     bne+ 7,L..46
> >     467     li 4,0
> >     468     mr 3,22
> >     469     bl .gettimeofday
> >     470     nop
> >     471     li 4,64
> >     472     addi 5,1,112
> >     473     ld 3,LC..11(2)
> >     474     mr 6,22
> >     475     bl .print_time
> >     476     addi 3,1,112
> >     477     li 4,0
> >     478     bl .gettimeofday
> >     479     nop
> >     480     lis 10,0x3d0
> >     481     cmpld 6,26,16
> >     482     li 11,0
> >     483     ori 10,10,36864
> >     484 L..48:
> >     485     bge 6,L..49
> >     486     mr 9,26
> >     487     li 0,0
> >     488 L..51:
> >     489     std 0,0(9)
> >     490     addi 9,9,8
> >     491     cmpld 7,9,16
> >     492     blt 7,L..51
> >     493 L..49:
> >     494     addi 0,11,1
> >     495     extsw 11,0
> >     496     cmpw 7,11,10
> >     497     bne+ 7,L..48
> >     498     li 4,0
> >     499     mr 3,22
> >     500     bl .gettimeofday
> >     501     nop
> >     502     li 4,64
> >     503     addi 5,1,112
> >     504     ld 3,LC..13(2)
> >     505     mr 6,22
> >     506     bl .print_time
> >
> >
> > 32-bit Linux:
> >
> >     387     popl    %ecx
> >     388     popl    %edi
> >     389     pushl   $0
> >     390     leal    -20(%ebp), %edx
> >     391     pushl   %edx
> >     392     call    gettimeofday
> >     393     xorl    %edx, %edx
> >     394     addl    $16, %esp
> >     395 .L41:
> >     396     movl    -4160(%ebp), %eax
> >     397     cmpl    %eax, -4144(%ebp)
> >     398     jae .L42
> >     399     movl    -4144(%ebp), %eax
> >     400 .L44:
> >     401     movl    $0, (%eax)
> >     402     addl    $4, %eax
> >     403     cmpl    %eax, -4160(%ebp)
> >     404     ja  .L44
> >     405 .L42:
> >     406     incl    %edx
> >     407     cmpl    $64000000, %edx
> >     408     jne .L41
> >     409     subl    $8, %esp
> >     410     pushl   $0
> >     411     leal    -28(%ebp), %edx
> >     412     pushl   %edx
> >     413     call    gettimeofday
> >     414     leal    -28(%ebp), %eax
> >     415     movl    %eax, (%esp)
> >     416     leal    -20(%ebp), %ecx
> >     417     movl    $64, %edx
> >     418     movl    $.LC5, %eax
> >     419     call    print_time
> >     420     popl    %eax
> >     421     popl    %edx
> >     422     pushl   $0
> >     423     leal    -20(%ebp), %edx
> >     424     pushl   %edx
> >     425     call    gettimeofday
> >     426     xorl    %edi, %edi
> >     427     addl    $16, %esp
> >     428 .L46:
> >     429     pushl   %eax
> >     430     pushl   $64
> >     431     pushl   $0
> >     432     movl    -4144(%ebp), %ecx
> >     433     pushl   %ecx
> >     434     call    memset
> >     435     incl    %edi
> >     436     addl    $16, %esp
> >     437     cmpl    $64000000, %edi
> >     438     jne .L46
> >     439     subl    $8, %esp
> >     440     pushl   $0
> >     441     leal    -28(%ebp), %eax
> >     442     pushl   %eax
> >     443     call    gettimeofday
> >     444     leal    -28(%ebp), %edx
> >     445     movl    %edx, (%esp)
> >     446     leal    -20(%ebp), %ecx
> >     447     movl    $64, %edx
> >     448     movl    $.LC6, %eax
> >     449     call    print_time
> >     450     popl    %eax
> >     451     popl    %edx
> >     452     pushl   $0
> >     453     leal    -20(%ebp), %eax
> >     454     pushl   %eax
> >     455     call    gettimeofday
> >     456     xorl    %edx, %edx
> >     457     addl    $16, %esp
> >     458 .L48:
> >     459     movl    -4160(%ebp), %eax
> >     460     cmpl    %eax, -4144(%ebp)
> >     461     jae .L49
> >     462     movl    -4144(%ebp), %eax
> >     463 .L51:
> >     464     movl    $0, (%eax)
> >     465     addl    $4, %eax
> >     466     cmpl    -4160(%ebp), %eax
> >     467     jb  .L51
> >     468 .L49:
> >     469     incl    %edx
> >     470     cmpl    $64000000, %edx
> >     471     jne .L48
> >     472     subl    $8, %esp
> >     473     pushl   $0
> >     474     leal    -28(%ebp), %edx
> >     475     pushl   %edx
> >     476     call    gettimeofday
> >     477     leal    -28(%ebp), %eax
> >     478     movl    %eax, (%esp)
> >     479     leal    -20(%ebp), %ecx
> >     480     movl    $64, %edx
> >     481     movl    $.LC7, %eax
> >     482     call    print_time
> >
> > --
> > Seneca Cunningham
> > scunning@ca.afilias.info
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


pgsql-hackers by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Multiple logical databases
Next
From: Bruce Momjian
Date:
Subject: Re: TODO-Item: B-tree fillfactor control