John Naylor <john.naylor@2ndquadrant.com> writes:
> v2 had an Assert that was only correct while experimenting with
> eliding right shift. Fixed in v3.
I think there must have been something wrong with your test that
said that eliminating the right shift from the non-CLZ code made
it slower. It should be an unconditional win, just as it is for
the CLZ code path. (Maybe some odd cache-line-boundary effect?)
Also, I think it's just weird to account for ALLOC_MINBITS one
way in the CLZ path and the other way in the other path.
I decided that it might be a good idea to do performance testing
in-place rather than in a standalone test program. I whipped up
the attached that just does a bunch of palloc/pfree cycles.
I got the following results on a non-cassert build (medians of
a number of tests; the times are repeatable to ~ 0.1% for me):
HEAD: 2429.431 ms
v3 CLZ: 2131.735 ms
v3 non-CLZ: 2477.835 ms
remove shift: 2266.755 ms
I didn't bother to try this on non-x86_64 architectures, as
previous testing convinces me the outcome should be about the
same.
Hence, pushed that way, with a bit of additional cosmetic foolery:
the static assertion made more sense to me in relation to the
documented assumption that size <= ALLOC_CHUNK_LIMIT, and I
thought the comment could use some work.
regards, tom lane
/*
create function drive_palloc(count int) returns void
strict volatile language c as '.../drive_palloc.so';
\timing
select drive_palloc(10000000);
*/
#include "postgres.h"
#include "fmgr.h"
#include "miscadmin.h"
#include "tcop/tcopprot.h"
#include "utils/builtins.h"
#include "utils/memutils.h"
PG_MODULE_MAGIC;
/*
* drive_palloc(count int) returns void
*/
PG_FUNCTION_INFO_V1(drive_palloc);
Datum
drive_palloc(PG_FUNCTION_ARGS)
{
int32 count = PG_GETARG_INT32(0);
while (count-- > 0)
{
for (size_t sz = 1; sz <= 8192; sz <<= 1)
{
void *p = palloc(sz);
pfree(p);
}
CHECK_FOR_INTERRUPTS();
}
PG_RETURN_VOID();
}