Home > mailing lists

Re: [PATCH] SVE popcount support - Mailing list pgsql-hackers

From	Chiranmoy.Bhattacharya@fujitsu.com"
Subject	Re: [PATCH] SVE popcount support
Date	February 6 11:44:35
Msg-id	TY2PR01MB26673A2C028501C981E84CD697F62@TY2PR01MB2667.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: [PATCH] SVE popcount support (Nathan Bossart <nathandbossart@gmail.com>)
List	pgsql-hackers

> Hm. These results are so similar that I'm tempted to suggest we just
> remove the section of code dedicated to alignment. Is there any reason not

> to do that?

It seems that the double load overhead from unaligned memory access isn’t

too taxing, even on larger inputs. We can remove it to simplify the code.

> Does this hand-rolled loop unrolling offer any particular advantage? What

> do the numbers look like if we don't do this or if we process, say, 4

> vectors at a time?

The unrolled version performs better than the non-unrolled one, but

processing four vectors provides no additional benefit. The numbers

and code used are given below.

buf | Not Unrolled | Unrolled x2 | Unrolled x4

------+-------------+-------------+-------------

16 | 4.774 | 4.759 | 5.634

32 | 6.872 | 6.486 | 7.348

64 | 11.070 | 10.249 | 10.617

128 | 20.003 | 16.205 | 16.764

256 | 40.234 | 28.377 | 29.108

512 | 83.825 | 53.420 | 53.658

1024 | 191.181 | 101.677 | 102.727

2048 | 389.160 | 200.291 | 201.544

4096 | 785.742 | 404.593 | 399.134

8192 | 1587.226 | 811.314 | 810.961

/* Process 4 vectors */
for (; i < loop_bytes; i += vec_len * 4)

{

vec64_1 = svld1(pred, (const uint64 *) (buf + i));

accum1 = svadd_x(pred, accum1, svcnt_x(pred, vec64_1));

vec64_2 = svld1(pred, (const uint64 *) (buf + i + vec_len));

accum2 = svadd_x(pred, accum2, svcnt_x(pred, vec64_2));

vec64_3 = svld1(pred, (const uint64 *) (buf + i + 2 * vec_len));

accum3 = svadd_x(pred, accum3, svcnt_x(pred, vec64_3));

vec64_4 = svld1(pred, (const uint64 *) (buf + i + 3 * vec_len));

accum4 = svadd_x(pred, accum4, svcnt_x(pred, vec64_4));

}

-Chiranmoy

From: Yura Sokolov
Date: 06 February, 11:31:28
Subject: Re: Implement waiting for wal lsn replay: reloaded

From: "Hayato Kuroda (Fujitsu)"
Date: 06 February, 11:55:39
Subject: RE: Improving tracking/processing of buildfarm test failures