Re: define pg_structiszero(addr, s, r) - Mailing list pgsql-hackers

From David Rowley
Subject Re: define pg_structiszero(addr, s, r)
Date
Msg-id CAApHDvopp0AUMLQE4TpH1e=qF3dhxfVjU8eJE7_tsWMjdrzi-A@mail.gmail.com
Whole thread Raw
In response to Re: define pg_structiszero(addr, s, r)  (Ranier Vilela <ranier.vf@gmail.com>)
Responses Re: define pg_structiszero(addr, s, r)
Re: define pg_structiszero(addr, s, r)
List pgsql-hackers
On Tue, 5 Nov 2024 at 06:39, Ranier Vilela <ranier.vf@gmail.com> wrote:
> I think we can add a small optimization to this last patch [1].

I think if you want to make it faster, you could partially unroll the
inner-most loop, like:

// size_t * 4
for (; p < aligned_end - (sizeof(size_t) * 3); p += sizeof(size_t) * 4)
{
    if (((size_t *) p)[0] != 0 | ((size_t *) p)[1] != 0 | ((size_t *)
p)[2] != 0 | ((size_t *) p)[3] != 0)
        return false;
}

$ gcc allzeros.c -O2 -o allzeros && ./allzeros
char: done in 1595000 nanoseconds
size_t: done in 198300 nanoseconds (8.04337 times faster than char)
size_t * 4: done in 81500 nanoseconds (19.5706 times faster than char)
size_t * 8: done in 71000 nanoseconds (22.4648 times faster than char)

The final one above is 110GB/sec, so probably only going that fast
because the memory being checked is in L1. DDR5 is only 64GB/sec. So
it's probably overkill to unroll the loop that much.

Also, doing something like that means the final byte-at-a-time loop
might have more to do, which might cases with a long remainder slower.
To make up for that there's some incentive to introduce yet another
loop to process single size_t's up to aligned_end. Then you end up
with even more code.

I was happy enough with my patch with Bertrand's comments.  I'm not
sure why unsigned chars are better than chars. It doesn't seem to have
any effect on the compiled code.

David



pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: New "raw" COPY format
Next
From: Andrei Lepikhov
Date:
Subject: Re: Alias of VALUES RTE in explain plan