Re: Random-looking primary keys in the range 100000..999999 - Mailing list pgsql-general

From Kynn Jones
Subject Re: Random-looking primary keys in the range 100000..999999
Date
Msg-id CAFvQaj5_v8DPAKeJM_8fa6WapWP78ucjH5a-qcj7rv6BP1KZHQ@mail.gmail.com
Whole thread Raw
In response to Re: Random-looking primary keys in the range 100000..999999  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Random-looking primary keys in the range 100000..999999  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
List pgsql-general
Thanks to Gavin and Martijn for their suggestions.  They're both simple good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because I understand better how it avoids collisions, so it was easier for me to tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky prime 900001.  This happens to work *perfectly*, because the range of such a generator is p-1, not p.  (BTW, Martijn's choice of the "random" 2345 for the multiplier was a somewhat lucky one, since such generators are not full for arbitrary multipliers; for example, the one with modulus 899981 is not full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's pow function, and implemented a 3-argument pow for PL/PgSQL.  I include all the code below, including a snippet borrowed from Gavin's post, and modified here and there.  (I'm not very experienced with PL/PgSQL, so please feel free to point out ways in which my PL/PgSQL code can be improved.)

First the functions:

    CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint) returns bigint AS $$
    DECLARE
    x  bigint;
    xx bigint;
    BEGIN
      IF n = 0 THEN RETURN 1; END IF;

      x := bigx % m;
      xx := (x * x) % m;

      IF n % 2 = 0 THEN
        RETURN pow_mod(xx, n/2, m);
      ELSE
        RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
      END IF;

    END;
    $$ LANGUAGE plpgsql strict immutable;


    -- "mcg" = "multiplicative congruential generator"
    CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
    BEGIN
      -- CHECK (0 < i AND i < 900001)                                                                                                                                                                             
      RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
    END;
    $$ LANGUAGE plpgsql strict immutable;


And here's a small demo:

    DROP TABLE IF EXISTS rtab;
    DROP SEQUENCE IF EXISTS rseq;

    CREATE SEQUENCE rseq;

    CREATE TABLE rtab
    (
        id       int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
        payload  int NOT NULL
    );

    \timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1, 900000)); \timing off
    -- Timing is on.
    -- INSERT 0 900000
    -- Time: 201450.781 ms
    -- Timing is off.

    SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
    --    id   | payload
    -- --------+---------
    --  539815 |  449991
    --  901731 |  449992
    --  878336 |  449993
    --  564275 |  449994
    --  863664 |  449995
    --  720159 |  449996
    --  987833 |  449997
    --  999471 |  449998
    --  999977 |  449999
    --  999999 |  450000
    --  921739 |  450001
    --  722684 |  450002
    --  596638 |  450003
    --  121592 |  450004
    --  687895 |  450005
    --  477734 |  450006
    --  585988 |  450007
    --  942869 |  450008
    --  175776 |  450009
    --  377207 |  450010
    -- (20 rows)

kj



On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> I'm looking for a way to implement pseudorandom primary keys in the range
> 100000..999999.
>
> The randomization scheme does not need to be cryptographically strong.  As
> long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981)  ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
dwQY8zHxuOQ=
=IJFu
-----END PGP SIGNATURE-----


pgsql-general by date:

Previous
From: Jim Mlodgenski
Date:
Subject: Re: debugging with gdb in postgres
Next
From: Merlin Moncure
Date:
Subject: Re: Re : Re : Query "top 10 and others"