Re: pgbench - add pseudo-random permutation function - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench - add pseudo-random permutation function
Date
Msg-id alpine.DEB.2.21.2002011007340.20752@pseudo
Whole thread Raw
In response to Re: pgbench - add pseudo-random permutation function  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pgbench - add pseudo-random permutation function  (David Steele <david@pgmasters.net>)
Re: pgbench - add pseudo-random permutation function  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hello Alvaro,

>> I read the whole thread, I still don't know what this patch is supposed to
>> do.  I know what the words in the subject line mean, but I don't know how
>> this helps a pgbench user run better benchmarks.  I feel this is also the
>> sentiment expressed by others earlier in the thread.  You indicated that
>> this functionality makes sense to those who want this functionality, but so
>> far only two people, namely the patch author and the reviewer, have
>> participated in the discussion on the substance of this patch.  So either
>> the feature is extremely niche, or nobody understands it.  I think you ought
>> to take about three steps back and explain this in more basic terms, even
>> just in email at first so that we can then discuss what to put into the
>> documentation.
>
> After re-reading one more time, it dawned on me that the point of this
> is similar in spirit to this one:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt

Indeed. The one in the wiki is useless because it is on all integers, 
whereas in a benchmark you want it for a given size and you want seeding, 
but otherwise the same correlation-avoidance problem is addressed.

> The idea seems to be to map the int4 domain into itself, so you can use
> a sequence to generate numbers that will not look like a sequence,
> allowing the user to hide some properties (such as the generation rate)
> that might be useful to an eavesdropper/attacker.  In terms of writing
> benchmarks, it seems useful to destroy all locality of access, which
> changes the benchmark completely.

Yes.

> (I'm not sure if this is something benchmark writers really want to 
> have.)

I do not get this sentence. I'm sure that a benchmark writer should really 
want to avoid unrealistic correlations that have a performance impact.

> If I'm right, then I agree that the documentation provided with the
> patch does a pretty bad job at explaining it, because until now I didn't
> at all realize this is what it was.

The documentation is improvable, no doubt.

Attached is an attempt at improving things. I have added a explicit note 
and hijacked an existing example to better illustrate the purpose of the 
function.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Dent John
Date:
Subject: Re: polymorphic table functions light
Next
From: Tomas Vondra
Date:
Subject: Re: fix for BUG #3720: wrong results at using ltree