Home > mailing lists

Re: Popcount optimization for the slow-path lookups - Mailing list pgsql-hackers

From	Andrew Pogrebnoi
Subject	Re: Popcount optimization for the slow-path lookups
Date	December 5 17:53:38
Msg-id	CA+aWR11v_s_3wjJRvFPHNbZTv-+dJVWaKc_i4gxH-3jYw_TmFw@mail.gmail.com Whole thread Raw
In response to	Re: Popcount optimization for the slow-path lookups (David Geier <geidav.pg@gmail.com>)
List	pgsql-hackers

Tree view

Hi David,

Thanks for looking at it!

> I would like to test if I can reproduce your results. Could you share
> your test program?

Here you go: https://gist.github.com/dAdAbird/1480ff15764f5a6301174806d8512a3a

> You also don't specify an optimization level. That means the default
> level -O0 is used. Does it make it a difference if you run e.g. with -O3?

Yes, good point. I got carried away with experiments and totally forgot that..

I've re-run tests with -O3 and it seems like the Summing of Adjacent Bits might make sense for uint64:

X86

```

% g++ pop.cc -std=c++11 -isystem benchmark/include -L../benchmark/build/src -lbenchmark -lpthread -O3 -o mybenchmark && ./mybenchmark

2025-12-05T16:22:43+02:00

Running ./mybenchmark

Run on (12 X 3200 MHz CPU s)

CPU Caches:

L1 Data 32 KiB

L1 Instruction 32 KiB

L2 Unified 256 KiB (x6)

L3 Unified 12288 KiB

Load Average: 2.67, 4.72, 2.90

--------------------------------------------------------------------

Benchmark Time CPU Iterations

--------------------------------------------------------------------

Popcnt64_BranchlessLookup 5.92 ns 5.90 ns 113871130

Popcnt64_AdjacentBits 5.30 ns 5.27 ns 115084258

Popcnt64_PG 8.34 ns 8.32 ns 80622869

Popcnt32_BranchlessLookup 3.62 ns 3.61 ns 197816110

Popcnt32_AdjacentBits 4.56 ns 4.55 ns 154932383

Popcnt32_PG 5.32 ns 5.31 ns 121845083

```

$ g++ popcnt.cc -std=c++11 -isystem benchmark/include -L../benchmark/build/src -lbenchmark -lpthread -O3 -o mybenchmark && ./mybenchmark
2025-12-05T14:29:18+00:00
Running ./mybenchmark
Run on (10 X 48 MHz CPU s)
CPU Caches:
L1 Data 128 KiB (x10)
L1 Instruction 192 KiB (x10)
L2 Unified 12288 KiB (x10)
Load Average: 0.01, 0.01, 0.00
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
Popcnt64_BranchlessLookup 2.13 ns 2.13 ns 342101937
Popcnt64_AdjacentBits 1.89 ns 1.89 ns 370316571
Popcnt64_PG 3.83 ns 3.83 ns 190990088
Popcnt32_BranchlessLookup 1.19 ns 1.19 ns 591797267
Popcnt32_AdjacentBits 1.73 ns 1.73 ns 409381266
Popcnt32_PG 2.01 ns 2.01 ns 344969625

```

---

Cheers,

Andy

pgsql-hackers by date:

From: Bertrand Drouvot
Date: 05 December, 17:51:45
Subject: Re: More const-marking cleanup

From: "Euler Taveira"
Date: 05 December, 17:58:32
Subject: Re: making tid and HOTness of UPDATE available to logical decoding plugins

Re: Popcount optimization for the slow-path lookups - Mailing list pgsql-hackers

Previous

Next