Home > mailing lists

Re: use SSE2 for is_valid_ascii - Mailing list pgsql-hackers

From	Nathan Bossart
Subject	Re: use SSE2 for is_valid_ascii
Date	August 11, 2022 05:35:30
Msg-id	20220811053530.GB1610687@nathanxps13 Whole thread Raw
In response to	Re: use SSE2 for is_valid_ascii (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: use SSE2 for is_valid_ascii
List	pgsql-hackers

Tree view

On Thu, Aug 11, 2022 at 11:10:34AM +0700, John Naylor wrote:
>> I wonder if reusing a zero vector (instead of creating a new one every
>> time) has any noticeable effect on performance.
> 
> Creating a zeroed register is just FOO PXOR FOO, which should get
> hoisted out of the (unrolled in this case) loop, and which a recent
> CPU will just map to a hard-coded zero in the register file, in which
> case the execution latency is 0 cycles. :-)

Ah, indeed.  At -O2, my compiler seems to zero out two registers before the
loop with either approach:

    pxor    %xmm0, %xmm0    ; accumulator
    pxor    %xmm2, %xmm2    ; always zeros

And within the loop, I see the following:

    movdqu  (%rdi), %xmm1
    movdqu  (%rdi), %xmm3
    addq    $16, %rdi
    pcmpeqb %xmm2, %xmm1    ; check for zeros
    por %xmm3, %xmm0        ; OR data into accumulator
    por %xmm1, %xmm0        ; OR zero check results into accumulator
    cmpq    %rdi, %rsi

So the call to _mm_setzero_si128() within the loop is fine.  Apologies for
the noise.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Dilip Kumar
Date: 11 August 2022, 05:31:51
Subject: Re: SUBTRANS: Minimizing calls to SubTransSetParent()

From: Sergey Dudoladov
Date: 11 August 2022, 05:42:04
Subject: Re: Stats collector's idx_blks_hit value is highly misleading in practice

Re: use SSE2 for is_valid_ascii - Mailing list pgsql-hackers

Previous

Next